/INSTALL
#! | 313 lines | 229 code | 84 blank | 0 comment | 0 complexity | 4148671d02088314c93ee5582a824525 MD5 | raw file
- Install prerequisites:
- sudo apt-get install \
- apache2 apache2-threaded-dev ca-certificates cron curl git-core \
- libapache2-mod-php5 libapache2-mod-python libmysqlclient15-dev \
- make mysql-client mysql-server patch \
- php5 php5-curl php5-dev php5-gd php5-mysql php-db \
- python-biopython python-dev python-mysqldb python-pyrex \
- rsync unzip wget zip at --fix-missing
- sudo a2enmod php5
- sudo a2enmod rewrite
- sudo a2enmod expires
- sudo a2enmod negotiation
- sudo /etc/init.d/apache2 restart
- Remove www-data from /etc/at.deny
- sudo perl -ni~ -e 'print unless /www-data/' /etc/at.deny
- If apt-get python-biopython fails, ensure that you have universe in
- your apt source list.
- If your python is older than 2.6, install "multiprocessing" from
- http://pypi.python.org/pypi/multiprocessing/#downloads
- wget http://pypi.python.org/packages/source/m/multiprocessing/multiprocessing-2.6.2.1.tar.gz
- tar xzf multiprocessing-2.6.2.1.tar.gz
- cd multiprocessing-2.6.2.1
- sudo python setup.py install
- Clone get-evidence from github:
- git clone git://git.clinicalfuture.com/get-evidence.git
- Download and extract php-openid-2.1.3 and textile-2.0.0 and apply patch(es):
- cd ~/get-evidence
- make install
- Set MySQL server character set:
- sudo perl -pi~ -e '
- s:\n:\ndefault-character-set = utf8\n: if m:\[(client|mysqld)\]:;
- ' /etc/mysql/my.cnf
- sudo /etc/init.d/mysql restart
- Create MySQL db and user (change "shakespeare" to be your own password,
- note that it will be used later in scripts):
- mysql -u root -p
- [type in MySQL root password]
- create database evidence character set = utf8;
- create user evidence@localhost identified by 'shakespeare';
- grant all privileges on evidence.* to evidence@localhost;
- exit
- Create a directory where we store uploaded genomes and analysis, store
- environment variables for associated subdirectories, and create them:
- sudo mkdir /home/trait
- sudo chown www-data:www-data /home/trait
- Point apache DocumentRoot to public_html and turn on .htaccess support (replace
- /path/to/get-evidence here with the real path to your local git repo, and
- /path/to/your/trait/data/directory with the path to the directory where you
- will store uploaded data and analysis data - /home/trait in the example above):
- DocumentRoot /path/to/get-evidence/public_html
- <Directory /path/to/get-evidence/public_html>
- AllowOverride All
- # Restrict PHP access to the html directory of this user!
- php_admin_value open_basedir "/path/to/get-evidence:/path/to/your/trait/data/directory:/usr/share/php:/tmp:/dev/urandom"
- php_value include_path ".:/path/to/get-evidence/public_html:/usr/share/php"
- </Directory>
- The latest version of Ubuntu disables php interpretation in home
- directories. Open /etc/apache2/mods-available/php5.conf and if you see
- "To re-enable php in user directories..." then go ahead and comment out
- the lines specified.
- Put real database password and data directory path (see below) in
- public_html/config.php like this (but make sure there is no leading space or
- anything else before "<?php")
- <?php
- $gDbPassword = "shakespeare";
- $gBackendBaseDir = "/home/trait"; // (can omit if using default)
- ?>
- Visit http://{host}/install.php to create tables.
- Download and import GET-Evidence's SQL dump:
- wget http://evidence.personalgenomes.org/get-evidence.sql.gz
- mysql -u root -p evidence < get-evidence.sql
- NOTE (MPB 2010-09-19): Why aren't dbSNP and GeneTests data in the SQL dump?
- Add GeneTests data:
- cd ~/get-evidence
- mkdir tmp
- sudo wget -O/home/trait/data/genetests-data.txt \
- ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
- sudo chown www-data /home/trait/data/genetests-data.txt
- ./import_genetests_data.php /home/trait/data/genetests-data.txt
- Add dbSNP data (newer versions of dbSNP should work just as well):
- wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/b130_archive/b130_SNPChrPosOnRef_36_3.bcp.gz
- ./import_dbsnp.php tmp/dbsnp.bcp.gz
- Add OMIM data:
- cd ~/get-evidence
- make import_omim
- Make sure genome analysis server is executable:
- cd ~/get-evidence/server/
- chmod +x genome_analyzer.py
- Modify php.ini settings to enable genome uploads and extend idle
- session timeouts beyond the default 24 minutes. In ubuntu you can
- either edit /etc/php5/apache2/php.ini or create
- /etc/php5/conf.d/get-evidence.ini with the following values:
- magic_quotes_gpc = Off
- max_input_time = 600
- post_max_size = 512M
- upload_max_filesize = 512M
- memory_limit = 128M
- session.gc_maxlifetime = 172800
- If you use debian PHP modifications, make sure your
- /usr/lib/php5/maxlifetime script checks all *.ini files in config
- directories (like PHP itself does), not just php.ini. You might need
- to make this change:
- -for ini in /etc/php5/*/php.ini; do
- +for ini in /etc/php5/*/*.ini; do
- Alternatively, you can create a php.ini file in a new
- /etc/php5/get-evidence/ directory, and put the session.gc_maxlifetime
- setting in there as well. PHP will ignore it but the debian script
- will pay attention to it, even if your modified maxlifetime script
- gets overwritten by an upgrade.
- sudo mkdir /etc/php5/get-evidence
- echo 'session.gc_maxlifetime = 172800' | sudo tee /etc/php5/get-evidence/php.ini
- Populate the upload directory with its initial directory structure:
- cd ~/get-evidence/server/script/
- USER=www-data SOURCE=$HOME/get-evidence CORE=$HOME/get-evidence/server \
- CONFIG=/home/trait/config TMP=/home/trait/tmp \
- DATA=/home/trait/data UPLOAD=/home/trait/upload LOG=/home/trait/log \
- BASE_URL=http://localhost/ ./configure.sh
- source ~/get-evidence/server/script/config-local.sh
- sudo -u $USER mkdir -p $TMP $UPLOAD $LOG $CONFIG $DATA
- Log in as root, load environment variables and set up genome analysis server:
- cd ~/get-evidence/server/script
- sudo su
- source defaults.sh
- perl -p -e 's/%([A-Z]+)%/$ENV{$1}/g' \
- < $SOURCE/server/script/genome-analyzer.in \
- > /etc/init.d/genome-analyzer.tmp
- chmod 755 /etc/init.d/genome-analyzer.tmp
- chown 0:0 /etc/init.d/genome-analyzer.tmp
- mv /etc/init.d/genome-analyzer.tmp /etc/init.d/genome-analyzer
- update-rc.d genome-analyzer start 20 2 3 4 5 . stop 80 0 1 6 .
- exit
- Run install-user.sh as www-data (this includes some file downloads):
- cd ~/get-evidence/server/script/
- source config-local.sh
- sudo -u $USER ./install-user.sh
- Build python extensions:
- cd ~/get-evidence/server
- python setup.py build_ext --inplace
- Start genome analysis server:
- sudo /etc/init.d/genome-analyzer start
- Set up cron job to run "make" periodically.
- echo "12 3 * * * $USER cd $HOME/get-evidence && make daily" | sudo tee /etc/cron.d/get-evidence
- Run through the daily make once to set up the flat files, some of which
- GET-Evidence will expect to find.
- cd ~/get-evidence
- make daily
- ------
- The following are old instructions created prior to Trait-o-matic
- integration. Some of it may still be useful, so it's left here in case
- it may be informative (e.g. providing a record on how to make the
- tables found in the SQL dump). It is *not* required to use the current
- version of GET-Evidence.
- Run "make" to import genomes from Trait-o-matic.
- make
- Import dbSNP:
- wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/database/organism_data/human_9606/b130_SNPChrPosOnRef_36_3.bcp.gz
- gunzip dbsnp.bcp.gz
- ./import_dbsnp.php tmp/dbsnp.bcp.gz
- wget -Otmp/snp130.txt.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.txt.gz
- wget -Otmp/snp130.sql http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.sql
- mysql -uevidence -p evidence < tmp/snp130.sql
- if [ -e tmp/fifo ]; then rm tmp/fifo; fi
- mkfifo tmp/fifo
- gunzip < tmp/snp130.txt.gz > tmp/fifo &
- echo "load data local infile 'tmp/fifo' into table snp130 fields terminated by '\t' lines terminated by '\n'" | mysql -uevidence -p evidence
- Import PharmGKB data:
- wget -Otmp/variantAnnotations.zip http://www.pharmgkb.org/commonFileDownload.action?filename=variantAnnotations.zip
- (cd tmp && unzip variantAnnotations.zip)
- ./import_pharmgkb.php tmp/variantAnnotations.tsv
- Import OMIM data using omim.tsv from Trait-o-matic import process:
- ./import_omim.php omim.tsv
- Import gwas data using spreadsheet downloaded from web site (first
- convert from proprietary format to comma-separated, optionally
- doublequoted, fields):
- ** IMPORTANT: the ordering of the following import steps is relevant.
- **
- ** Run import_genomes.php first (see above)
- ** Then import_variant_locations.php
- ** Then import_gwas.php
- ** (relies on variant_locations to look up chr,pos->AA and add variants)
- ** Then import_1000genomes.php
- ** (discards too many allele freqs if import_gwas hasn't added variants)
- ** Then update_variant_frequency.php
- ** (merges frequencies from hapmap/import_genomes and import_1000genomes)
- Look up gene/aa changes for GWAS SNPs:
- 1. perl -ne 'print "$1\n" while /\brs(\d+)\b/g' < gwas.csv \
- | sort -u > /tmp/gwas.rs
- 2. on trait-o-matic host, using dbsnp database:
- CREATE TEMPORARY TABLE acgt (allele CHAR(1) PRIMARY KEY);
- INSERT INTO acgt VALUES ('A'),('C'),('G'),('T');
- CREATE TEMPORARY TABLE gwas_rs (gwas_snp_id INT UNSIGNED PRIMARY KEY);
- LOAD DATA LOCAL INFILE '/tmp/gwas.rs' INTO TABLE gwas_rs;
- ALTER TABLE gwas_rs ADD chr CHAR(6), ADD chr_pos INT UNSIGNED;
- UPDATE gwas_rs
- LEFT JOIN SNPChrPosOnRef dbsnp
- ON snp_id=gwas_snp_id
- SET gwas_rs.chr=dbsnp.chr,
- gwas_rs.chr_pos=dbsnp.pos+1;
- SELECT * FROM gwas_rs INTO '/tmp/gwas.chr';
- SELECT CONCAT('chr',chr),'gwas','SNP',chr_pos,chr_pos,'.','+','.',
- CONCAT('alleles ',allele,';dbsnp rs',gwas_snp_id)
- FROM gwas_rs
- LEFT JOIN acgt ON 1=1
- WHERE chr IS NOT NULL AND chr NOT LIKE 'Multi%'
- INTO OUTFILE '/tmp/gwas.gff.txt';
- 3. upload /tmp/gwas.gff to Trait-o-matic
- 4. download nsSNPs from Trait-o-matic results page and save to /tmp/gwas.ns.gff
- 5. ./gwas_gff2tsv /tmp/gwas.ns.gff > /tmp/gwas.ns.tsv
- 6. ./import_variant_locations.php /tmp/gwas.ns.tsv
- 7. copy ns.json from Trait-o-matic output directory and save to /tmp/gwas.ns.json
- 8. ./import_hapmap_ns_json.php /tmp/gwas.ns.json
- Import the gwas comments for "other external references"
- ./import_gwas.php gwas.csv
- Download 1000genomes data (*.hap.*) from
- http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2009_04/
- Import 1000genomes data:
- ./import_1000genomes.php /tmp/*.hap.2009_04.gz
- Import EVS frequency data. See ./evs-import.txt, then:
- ./evs-getev-reformat.pl tmp/ESP5400_getev-aa-changes_allele_freqs.txt | ./import_variant_frequency.php /dev/stdin EVS
- Merge variant frequencies from multiple databases:
- ./update_variant_frequency.php
- Import genenames database
- mkdir tmp
- wget -O./tmp/genenames.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids'
- ./import_genenames.php ./tmp/genenames.txt
- Import genetests database
- wget -O./tmp/genetests-data.txt \
- ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
- ./import_genetests_data.php ./tmp/genetests-data.txt