PageRenderTime 47ms CodeModel.GetById 19ms RepoModel.GetById 0ms app.codeStats 0ms

/INSTALL

https://github.com/tomclegg/get-evidence
#! | 313 lines | 229 code | 84 blank | 0 comment | 0 complexity | 4148671d02088314c93ee5582a824525 MD5 | raw file
  1. Install prerequisites:
  2. sudo apt-get install \
  3. apache2 apache2-threaded-dev ca-certificates cron curl git-core \
  4. libapache2-mod-php5 libapache2-mod-python libmysqlclient15-dev \
  5. make mysql-client mysql-server patch \
  6. php5 php5-curl php5-dev php5-gd php5-mysql php-db \
  7. python-biopython python-dev python-mysqldb python-pyrex \
  8. rsync unzip wget zip at --fix-missing
  9. sudo a2enmod php5
  10. sudo a2enmod rewrite
  11. sudo a2enmod expires
  12. sudo a2enmod negotiation
  13. sudo /etc/init.d/apache2 restart
  14. Remove www-data from /etc/at.deny
  15. sudo perl -ni~ -e 'print unless /www-data/' /etc/at.deny
  16. If apt-get python-biopython fails, ensure that you have universe in
  17. your apt source list.
  18. If your python is older than 2.6, install "multiprocessing" from
  19. http://pypi.python.org/pypi/multiprocessing/#downloads
  20. wget http://pypi.python.org/packages/source/m/multiprocessing/multiprocessing-2.6.2.1.tar.gz
  21. tar xzf multiprocessing-2.6.2.1.tar.gz
  22. cd multiprocessing-2.6.2.1
  23. sudo python setup.py install
  24. Clone get-evidence from github:
  25. git clone git://git.clinicalfuture.com/get-evidence.git
  26. Download and extract php-openid-2.1.3 and textile-2.0.0 and apply patch(es):
  27. cd ~/get-evidence
  28. make install
  29. Set MySQL server character set:
  30. sudo perl -pi~ -e '
  31. s:\n:\ndefault-character-set = utf8\n: if m:\[(client|mysqld)\]:;
  32. ' /etc/mysql/my.cnf
  33. sudo /etc/init.d/mysql restart
  34. Create MySQL db and user (change "shakespeare" to be your own password,
  35. note that it will be used later in scripts):
  36. mysql -u root -p
  37. [type in MySQL root password]
  38. create database evidence character set = utf8;
  39. create user evidence@localhost identified by 'shakespeare';
  40. grant all privileges on evidence.* to evidence@localhost;
  41. exit
  42. Create a directory where we store uploaded genomes and analysis, store
  43. environment variables for associated subdirectories, and create them:
  44. sudo mkdir /home/trait
  45. sudo chown www-data:www-data /home/trait
  46. Point apache DocumentRoot to public_html and turn on .htaccess support (replace
  47. /path/to/get-evidence here with the real path to your local git repo, and
  48. /path/to/your/trait/data/directory with the path to the directory where you
  49. will store uploaded data and analysis data - /home/trait in the example above):
  50. DocumentRoot /path/to/get-evidence/public_html
  51. <Directory /path/to/get-evidence/public_html>
  52. AllowOverride All
  53. # Restrict PHP access to the html directory of this user!
  54. php_admin_value open_basedir "/path/to/get-evidence:/path/to/your/trait/data/directory:/usr/share/php:/tmp:/dev/urandom"
  55. php_value include_path ".:/path/to/get-evidence/public_html:/usr/share/php"
  56. </Directory>
  57. The latest version of Ubuntu disables php interpretation in home
  58. directories. Open /etc/apache2/mods-available/php5.conf and if you see
  59. "To re-enable php in user directories..." then go ahead and comment out
  60. the lines specified.
  61. Put real database password and data directory path (see below) in
  62. public_html/config.php like this (but make sure there is no leading space or
  63. anything else before "<?php")
  64. <?php
  65. $gDbPassword = "shakespeare";
  66. $gBackendBaseDir = "/home/trait"; // (can omit if using default)
  67. ?>
  68. Visit http://{host}/install.php to create tables.
  69. Download and import GET-Evidence's SQL dump:
  70. wget http://evidence.personalgenomes.org/get-evidence.sql.gz
  71. mysql -u root -p evidence < get-evidence.sql
  72. NOTE (MPB 2010-09-19): Why aren't dbSNP and GeneTests data in the SQL dump?
  73. Add GeneTests data:
  74. cd ~/get-evidence
  75. mkdir tmp
  76. sudo wget -O/home/trait/data/genetests-data.txt \
  77. ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
  78. sudo chown www-data /home/trait/data/genetests-data.txt
  79. ./import_genetests_data.php /home/trait/data/genetests-data.txt
  80. Add dbSNP data (newer versions of dbSNP should work just as well):
  81. wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/b130_archive/b130_SNPChrPosOnRef_36_3.bcp.gz
  82. ./import_dbsnp.php tmp/dbsnp.bcp.gz
  83. Add OMIM data:
  84. cd ~/get-evidence
  85. make import_omim
  86. Make sure genome analysis server is executable:
  87. cd ~/get-evidence/server/
  88. chmod +x genome_analyzer.py
  89. Modify php.ini settings to enable genome uploads and extend idle
  90. session timeouts beyond the default 24 minutes. In ubuntu you can
  91. either edit /etc/php5/apache2/php.ini or create
  92. /etc/php5/conf.d/get-evidence.ini with the following values:
  93. magic_quotes_gpc = Off
  94. max_input_time = 600
  95. post_max_size = 512M
  96. upload_max_filesize = 512M
  97. memory_limit = 128M
  98. session.gc_maxlifetime = 172800
  99. If you use debian PHP modifications, make sure your
  100. /usr/lib/php5/maxlifetime script checks all *.ini files in config
  101. directories (like PHP itself does), not just php.ini. You might need
  102. to make this change:
  103. -for ini in /etc/php5/*/php.ini; do
  104. +for ini in /etc/php5/*/*.ini; do
  105. Alternatively, you can create a php.ini file in a new
  106. /etc/php5/get-evidence/ directory, and put the session.gc_maxlifetime
  107. setting in there as well. PHP will ignore it but the debian script
  108. will pay attention to it, even if your modified maxlifetime script
  109. gets overwritten by an upgrade.
  110. sudo mkdir /etc/php5/get-evidence
  111. echo 'session.gc_maxlifetime = 172800' | sudo tee /etc/php5/get-evidence/php.ini
  112. Populate the upload directory with its initial directory structure:
  113. cd ~/get-evidence/server/script/
  114. USER=www-data SOURCE=$HOME/get-evidence CORE=$HOME/get-evidence/server \
  115. CONFIG=/home/trait/config TMP=/home/trait/tmp \
  116. DATA=/home/trait/data UPLOAD=/home/trait/upload LOG=/home/trait/log \
  117. BASE_URL=http://localhost/ ./configure.sh
  118. source ~/get-evidence/server/script/config-local.sh
  119. sudo -u $USER mkdir -p $TMP $UPLOAD $LOG $CONFIG $DATA
  120. Log in as root, load environment variables and set up genome analysis server:
  121. cd ~/get-evidence/server/script
  122. sudo su
  123. source defaults.sh
  124. perl -p -e 's/%([A-Z]+)%/$ENV{$1}/g' \
  125. < $SOURCE/server/script/genome-analyzer.in \
  126. > /etc/init.d/genome-analyzer.tmp
  127. chmod 755 /etc/init.d/genome-analyzer.tmp
  128. chown 0:0 /etc/init.d/genome-analyzer.tmp
  129. mv /etc/init.d/genome-analyzer.tmp /etc/init.d/genome-analyzer
  130. update-rc.d genome-analyzer start 20 2 3 4 5 . stop 80 0 1 6 .
  131. exit
  132. Run install-user.sh as www-data (this includes some file downloads):
  133. cd ~/get-evidence/server/script/
  134. source config-local.sh
  135. sudo -u $USER ./install-user.sh
  136. Build python extensions:
  137. cd ~/get-evidence/server
  138. python setup.py build_ext --inplace
  139. Start genome analysis server:
  140. sudo /etc/init.d/genome-analyzer start
  141. Set up cron job to run "make" periodically.
  142. echo "12 3 * * * $USER cd $HOME/get-evidence && make daily" | sudo tee /etc/cron.d/get-evidence
  143. Run through the daily make once to set up the flat files, some of which
  144. GET-Evidence will expect to find.
  145. cd ~/get-evidence
  146. make daily
  147. ------
  148. The following are old instructions created prior to Trait-o-matic
  149. integration. Some of it may still be useful, so it's left here in case
  150. it may be informative (e.g. providing a record on how to make the
  151. tables found in the SQL dump). It is *not* required to use the current
  152. version of GET-Evidence.
  153. Run "make" to import genomes from Trait-o-matic.
  154. make
  155. Import dbSNP:
  156. wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/database/organism_data/human_9606/b130_SNPChrPosOnRef_36_3.bcp.gz
  157. gunzip dbsnp.bcp.gz
  158. ./import_dbsnp.php tmp/dbsnp.bcp.gz
  159. wget -Otmp/snp130.txt.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.txt.gz
  160. wget -Otmp/snp130.sql http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.sql
  161. mysql -uevidence -p evidence < tmp/snp130.sql
  162. if [ -e tmp/fifo ]; then rm tmp/fifo; fi
  163. mkfifo tmp/fifo
  164. gunzip < tmp/snp130.txt.gz > tmp/fifo &
  165. echo "load data local infile 'tmp/fifo' into table snp130 fields terminated by '\t' lines terminated by '\n'" | mysql -uevidence -p evidence
  166. Import PharmGKB data:
  167. wget -Otmp/variantAnnotations.zip http://www.pharmgkb.org/commonFileDownload.action?filename=variantAnnotations.zip
  168. (cd tmp && unzip variantAnnotations.zip)
  169. ./import_pharmgkb.php tmp/variantAnnotations.tsv
  170. Import OMIM data using omim.tsv from Trait-o-matic import process:
  171. ./import_omim.php omim.tsv
  172. Import gwas data using spreadsheet downloaded from web site (first
  173. convert from proprietary format to comma-separated, optionally
  174. doublequoted, fields):
  175. ** IMPORTANT: the ordering of the following import steps is relevant.
  176. **
  177. ** Run import_genomes.php first (see above)
  178. ** Then import_variant_locations.php
  179. ** Then import_gwas.php
  180. ** (relies on variant_locations to look up chr,pos->AA and add variants)
  181. ** Then import_1000genomes.php
  182. ** (discards too many allele freqs if import_gwas hasn't added variants)
  183. ** Then update_variant_frequency.php
  184. ** (merges frequencies from hapmap/import_genomes and import_1000genomes)
  185. Look up gene/aa changes for GWAS SNPs:
  186. 1. perl -ne 'print "$1\n" while /\brs(\d+)\b/g' < gwas.csv \
  187. | sort -u > /tmp/gwas.rs
  188. 2. on trait-o-matic host, using dbsnp database:
  189. CREATE TEMPORARY TABLE acgt (allele CHAR(1) PRIMARY KEY);
  190. INSERT INTO acgt VALUES ('A'),('C'),('G'),('T');
  191. CREATE TEMPORARY TABLE gwas_rs (gwas_snp_id INT UNSIGNED PRIMARY KEY);
  192. LOAD DATA LOCAL INFILE '/tmp/gwas.rs' INTO TABLE gwas_rs;
  193. ALTER TABLE gwas_rs ADD chr CHAR(6), ADD chr_pos INT UNSIGNED;
  194. UPDATE gwas_rs
  195. LEFT JOIN SNPChrPosOnRef dbsnp
  196. ON snp_id=gwas_snp_id
  197. SET gwas_rs.chr=dbsnp.chr,
  198. gwas_rs.chr_pos=dbsnp.pos+1;
  199. SELECT * FROM gwas_rs INTO '/tmp/gwas.chr';
  200. SELECT CONCAT('chr',chr),'gwas','SNP',chr_pos,chr_pos,'.','+','.',
  201. CONCAT('alleles ',allele,';dbsnp rs',gwas_snp_id)
  202. FROM gwas_rs
  203. LEFT JOIN acgt ON 1=1
  204. WHERE chr IS NOT NULL AND chr NOT LIKE 'Multi%'
  205. INTO OUTFILE '/tmp/gwas.gff.txt';
  206. 3. upload /tmp/gwas.gff to Trait-o-matic
  207. 4. download nsSNPs from Trait-o-matic results page and save to /tmp/gwas.ns.gff
  208. 5. ./gwas_gff2tsv /tmp/gwas.ns.gff > /tmp/gwas.ns.tsv
  209. 6. ./import_variant_locations.php /tmp/gwas.ns.tsv
  210. 7. copy ns.json from Trait-o-matic output directory and save to /tmp/gwas.ns.json
  211. 8. ./import_hapmap_ns_json.php /tmp/gwas.ns.json
  212. Import the gwas comments for "other external references"
  213. ./import_gwas.php gwas.csv
  214. Download 1000genomes data (*.hap.*) from
  215. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2009_04/
  216. Import 1000genomes data:
  217. ./import_1000genomes.php /tmp/*.hap.2009_04.gz
  218. Import EVS frequency data. See ./evs-import.txt, then:
  219. ./evs-getev-reformat.pl tmp/ESP5400_getev-aa-changes_allele_freqs.txt | ./import_variant_frequency.php /dev/stdin EVS
  220. Merge variant frequencies from multiple databases:
  221. ./update_variant_frequency.php
  222. Import genenames database
  223. mkdir tmp
  224. wget -O./tmp/genenames.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids'
  225. ./import_genenames.php ./tmp/genenames.txt
  226. Import genetests database
  227. wget -O./tmp/genetests-data.txt \
  228. ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
  229. ./import_genetests_data.php ./tmp/genetests-data.txt