INSTALL | searchcode

/INSTALL

https://github.com/tomclegg/get-evidence
#! | 313 lines | 229 code | 84 blank | 0 comment | 0 complexity | 4148671d02088314c93ee5582a824525 MD5 | raw file

Install prerequisites:

 sudo apt-get install \
    apache2 apache2-threaded-dev ca-certificates cron curl git-core \
    libapache2-mod-php5 libapache2-mod-python libmysqlclient15-dev \
    make mysql-client mysql-server patch \
    php5 php5-curl php5-dev php5-gd php5-mysql php-db \
    python-biopython python-dev python-mysqldb python-pyrex \
    rsync unzip wget zip at --fix-missing
 sudo a2enmod php5
 sudo a2enmod rewrite
 sudo a2enmod expires
 sudo a2enmod negotiation
 sudo /etc/init.d/apache2 restart

Remove www-data from /etc/at.deny

 sudo perl -ni~ -e 'print unless /www-data/' /etc/at.deny

If apt-get python-biopython fails, ensure that you have universe in
your apt source list.

If your python is older than 2.6, install "multiprocessing" from
http://pypi.python.org/pypi/multiprocessing/#downloads

 wget http://pypi.python.org/packages/source/m/multiprocessing/multiprocessing-2.6.2.1.tar.gz
 tar xzf multiprocessing-2.6.2.1.tar.gz
 cd multiprocessing-2.6.2.1
 sudo python setup.py install

Clone get-evidence from github:

 git clone git://git.clinicalfuture.com/get-evidence.git

Download and extract php-openid-2.1.3 and textile-2.0.0 and apply patch(es):

 cd ~/get-evidence
 make install

Set MySQL server character set:

 sudo perl -pi~ -e '
   s:\n:\ndefault-character-set = utf8\n: if m:\[(client|mysqld)\]:;
   ' /etc/mysql/my.cnf
 sudo /etc/init.d/mysql restart

Create MySQL db and user (change "shakespeare" to be your own password,
note that it will be used later in scripts):

 mysql -u root -p
 [type in MySQL root password]
 create database evidence character set = utf8;
 create user evidence@localhost identified by 'shakespeare';
 grant all privileges on evidence.* to evidence@localhost;
 exit

Create a directory where we store uploaded genomes and analysis, store
environment variables for associated subdirectories, and create them:

 sudo mkdir /home/trait
 sudo chown www-data:www-data /home/trait

Point apache DocumentRoot to public_html and turn on .htaccess support (replace
/path/to/get-evidence here with the real path to your local git repo, and
/path/to/your/trait/data/directory with the path to the directory where you
will store uploaded data and analysis data - /home/trait in the example above):

 DocumentRoot /path/to/get-evidence/public_html
 <Directory /path/to/get-evidence/public_html>
   AllowOverride All
   # Restrict PHP access to the html directory of this user!
   php_admin_value open_basedir "/path/to/get-evidence:/path/to/your/trait/data/directory:/usr/share/php:/tmp:/dev/urandom"
   php_value include_path ".:/path/to/get-evidence/public_html:/usr/share/php"
 </Directory>

The latest version of Ubuntu disables php interpretation in home 
directories. Open /etc/apache2/mods-available/php5.conf and if you see
"To re-enable php in user directories..." then go ahead and comment out 
the lines specified.

Put real database password and data directory path (see below) in
public_html/config.php like this (but make sure there is no leading space or
anything else before "<?php")

 <?php 
   $gDbPassword = "shakespeare"; 
   $gBackendBaseDir = "/home/trait"; // (can omit if using default)
 ?>

Visit http://{host}/install.php to create tables.

Download and import GET-Evidence's SQL dump:

 wget http://evidence.personalgenomes.org/get-evidence.sql.gz
 mysql -u root -p evidence < get-evidence.sql

NOTE (MPB 2010-09-19): Why aren't dbSNP and GeneTests data in the SQL dump?

Add GeneTests data:
 cd ~/get-evidence
 mkdir tmp
 sudo wget -O/home/trait/data/genetests-data.txt \
      ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
 sudo chown www-data /home/trait/data/genetests-data.txt
 ./import_genetests_data.php /home/trait/data/genetests-data.txt

Add dbSNP data (newer versions of dbSNP should work just as well):

 wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/b130_archive/b130_SNPChrPosOnRef_36_3.bcp.gz
 ./import_dbsnp.php tmp/dbsnp.bcp.gz

Add OMIM data:

 cd ~/get-evidence
 make import_omim

Make sure genome analysis server is executable:

 cd ~/get-evidence/server/
 chmod +x genome_analyzer.py

Modify php.ini settings to enable genome uploads and extend idle
session timeouts beyond the default 24 minutes. In ubuntu you can
either edit /etc/php5/apache2/php.ini or create
/etc/php5/conf.d/get-evidence.ini with the following values:

 magic_quotes_gpc = Off
 max_input_time = 600
 post_max_size = 512M
 upload_max_filesize = 512M
 memory_limit = 128M
 session.gc_maxlifetime = 172800

If you use debian PHP modifications, make sure your
/usr/lib/php5/maxlifetime script checks all *.ini files in config
directories (like PHP itself does), not just php.ini.  You might need
to make this change:

 -for ini in /etc/php5/*/php.ini; do
 +for ini in /etc/php5/*/*.ini; do

Alternatively, you can create a php.ini file in a new
/etc/php5/get-evidence/ directory, and put the session.gc_maxlifetime
setting in there as well.  PHP will ignore it but the debian script
will pay attention to it, even if your modified maxlifetime script
gets overwritten by an upgrade.

 sudo mkdir /etc/php5/get-evidence
 echo 'session.gc_maxlifetime = 172800' | sudo tee /etc/php5/get-evidence/php.ini

Populate the upload directory with its initial directory structure:

 cd ~/get-evidence/server/script/
 USER=www-data SOURCE=$HOME/get-evidence CORE=$HOME/get-evidence/server \
    CONFIG=/home/trait/config TMP=/home/trait/tmp \
    DATA=/home/trait/data UPLOAD=/home/trait/upload LOG=/home/trait/log \
    BASE_URL=http://localhost/ ./configure.sh
 source ~/get-evidence/server/script/config-local.sh
 sudo -u $USER mkdir -p $TMP $UPLOAD $LOG $CONFIG $DATA

Log in as root, load environment variables and set up genome analysis server:

 cd ~/get-evidence/server/script
 sudo su
 source defaults.sh
 perl -p -e 's/%([A-Z]+)%/$ENV{$1}/g' \
   < $SOURCE/server/script/genome-analyzer.in \
   > /etc/init.d/genome-analyzer.tmp
 chmod 755 /etc/init.d/genome-analyzer.tmp
 chown 0:0 /etc/init.d/genome-analyzer.tmp
 mv /etc/init.d/genome-analyzer.tmp /etc/init.d/genome-analyzer
 update-rc.d genome-analyzer start 20 2 3 4 5 . stop 80 0 1 6 .
 exit

Run install-user.sh as www-data (this includes some file downloads):
 cd ~/get-evidence/server/script/
 source config-local.sh
 sudo -u $USER ./install-user.sh

Build python extensions:
 cd ~/get-evidence/server
 python setup.py build_ext --inplace

Start genome analysis server:
 sudo /etc/init.d/genome-analyzer start

Set up cron job to run "make" periodically.

 echo "12 3 * * * $USER cd $HOME/get-evidence && make daily" | sudo tee /etc/cron.d/get-evidence

Run through the daily make once to set up the flat files, some of which 
GET-Evidence will expect to find.

 cd ~/get-evidence
 make daily

------

The following are old instructions created prior to Trait-o-matic
integration. Some of it may still be useful, so it's left here in case
it may be informative (e.g. providing a record on how to make the
tables found in the SQL dump). It is *not* required to use the current
version of GET-Evidence.

Run "make" to import genomes from Trait-o-matic.

 make

Import dbSNP:

 wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/database/organism_data/human_9606/b130_SNPChrPosOnRef_36_3.bcp.gz
 gunzip dbsnp.bcp.gz
 ./import_dbsnp.php tmp/dbsnp.bcp.gz

 wget -Otmp/snp130.txt.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.txt.gz
 wget -Otmp/snp130.sql http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.sql
 mysql -uevidence -p evidence < tmp/snp130.sql
 if [ -e tmp/fifo ]; then rm tmp/fifo; fi
 mkfifo tmp/fifo
 gunzip < tmp/snp130.txt.gz > tmp/fifo &
 echo "load data local infile 'tmp/fifo' into table snp130 fields terminated by '\t' lines terminated by '\n'" | mysql -uevidence -p evidence

Import PharmGKB data:

 wget -Otmp/variantAnnotations.zip http://www.pharmgkb.org/commonFileDownload.action?filename=variantAnnotations.zip
 (cd tmp && unzip variantAnnotations.zip)
 ./import_pharmgkb.php tmp/variantAnnotations.tsv

Import OMIM data using omim.tsv from Trait-o-matic import process:

 ./import_omim.php omim.tsv

Import gwas data using spreadsheet downloaded from web site (first
convert from proprietary format to comma-separated, optionally
doublequoted, fields):

** IMPORTANT: the ordering of the following import steps is relevant.
**
** Run import_genomes.php first (see above)
** Then import_variant_locations.php
** Then import_gwas.php
**      (relies on variant_locations to look up chr,pos->AA and add variants)
** Then import_1000genomes.php
**      (discards too many allele freqs if import_gwas hasn't added variants)
** Then update_variant_frequency.php
**      (merges frequencies from hapmap/import_genomes and import_1000genomes)

Look up gene/aa changes for GWAS SNPs:

 1. perl -ne 'print "$1\n" while /\brs(\d+)\b/g' < gwas.csv \
    | sort -u > /tmp/gwas.rs

 2. on trait-o-matic host, using dbsnp database:
   CREATE TEMPORARY TABLE acgt (allele CHAR(1) PRIMARY KEY);
   INSERT INTO acgt VALUES ('A'),('C'),('G'),('T');
   CREATE TEMPORARY TABLE gwas_rs (gwas_snp_id INT UNSIGNED PRIMARY KEY);
   LOAD DATA LOCAL INFILE '/tmp/gwas.rs' INTO TABLE gwas_rs;
   ALTER TABLE gwas_rs ADD chr CHAR(6), ADD chr_pos INT UNSIGNED;
   UPDATE gwas_rs
    LEFT JOIN SNPChrPosOnRef dbsnp
    ON snp_id=gwas_snp_id
    SET gwas_rs.chr=dbsnp.chr,
        gwas_rs.chr_pos=dbsnp.pos+1;
   SELECT * FROM gwas_rs INTO '/tmp/gwas.chr';
   SELECT CONCAT('chr',chr),'gwas','SNP',chr_pos,chr_pos,'.','+','.',
    CONCAT('alleles ',allele,';dbsnp rs',gwas_snp_id)
    FROM gwas_rs
    LEFT JOIN acgt ON 1=1
    WHERE chr IS NOT NULL AND chr NOT LIKE 'Multi%'
    INTO OUTFILE '/tmp/gwas.gff.txt';

 3. upload /tmp/gwas.gff to Trait-o-matic

 4. download nsSNPs from Trait-o-matic results page and save to /tmp/gwas.ns.gff

 5. ./gwas_gff2tsv /tmp/gwas.ns.gff > /tmp/gwas.ns.tsv

 6. ./import_variant_locations.php /tmp/gwas.ns.tsv

 7. copy ns.json from Trait-o-matic output directory and save to /tmp/gwas.ns.json

 8. ./import_hapmap_ns_json.php /tmp/gwas.ns.json

Import the gwas comments for "other external references"

 ./import_gwas.php gwas.csv

Download 1000genomes data (*.hap.*) from
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2009_04/

Import 1000genomes data:

 ./import_1000genomes.php /tmp/*.hap.2009_04.gz

Import EVS frequency data.  See ./evs-import.txt, then:

 ./evs-getev-reformat.pl tmp/ESP5400_getev-aa-changes_allele_freqs.txt | ./import_variant_frequency.php /dev/stdin EVS

Merge variant frequencies from multiple databases:

 ./update_variant_frequency.php

Import genenames database

 mkdir tmp
 wget -O./tmp/genenames.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids'
 ./import_genenames.php ./tmp/genenames.txt

Import genetests database

 wget -O./tmp/genetests-data.txt \
      ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
 ./import_genetests_data.php ./tmp/genetests-data.txt