PageRenderTime 22ms CodeModel.GetById 13ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/rgenetics/rgEigPCA.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 167 lines | 127 code | 40 blank | 0 comment | 0 complexity | 1d590e22ba9f5a8855232dcf51fd7359 MD5 | raw file
  1<tool id="rgEigPCA1" name="Eigensoft:">
  2    <description>PCA Ancestry using SNP</description>
  3
  4    <command interpreter="python">
  5    rgEigPCA.py "$i.extra_files_path/$i.metadata.base_name" "$title" "$out_file1"
  6    "$out_file1.files_path" "$k" "$m" "$t" "$s" "$pca"
  7    </command>
  8
  9    <inputs>
 10
 11       <param name="i"  type="data" label="Input genotype data file"
 12          size="120" format="ldindep" />
 13       <param name="title"  type="text" value="Ancestry PCA" label="Title for outputs from this run"
 14          size="80"  />
 15       <param name="k"  type="integer" value="4" label="Number of principal components to output"
 16          size="3"  />
 17       <param name="m"  type="integer" value="0" label="Max. outlier removal iterations"
 18          help="To turn on outlier removal, set m=5 or so. Do this if you plan on adjusting any analyses"
 19          size="3"  />
 20       <param name="t"  type="integer" value="5" label="# principal components used for outlier removal"
 21          size="3"  />
 22       <param name="s"  type="integer" value="6" label="#SDs for outlier removal"
 23          help = "Any individual with SD along one of k top principal components > s will be removed as an outlier."
 24          size="3"  />
 25
 26   </inputs>
 27
 28   <outputs>
 29       <data name="out_file1" format="html" label="${title}_rgEig.html"/>
 30       <data name="pca" format="txt" label="${title}_rgEig.txt"/>
 31   </outputs>
 32
 33<tests>
 34 <test>
 35   <param name='i' value='tinywga' ftype='ldindep' >
 36   <metadata name='base_name' value='tinywga' />
 37   <composite_data value='tinywga.bim' />
 38   <composite_data value='tinywga.bed' />
 39   <composite_data value='tinywga.fam' />
 40   <edit_attributes type='name' value='tinywga' /> 
 41   </param>
 42    <param name='title' value='rgEigPCAtest1' />
 43    <param name="k" value="4" />
 44    <param name="m" value="2" />
 45    <param name="t" value="2" />
 46    <param name="s" value="2" />
 47    <output name='out_file1' file='rgtestouts/rgEigPCA/rgEigPCAtest1.html' ftype='html' compare='diff' lines_diff='195'>
 48    <extra_files type="file" name='rgEigPCAtest1_PCAPlot.pdf' value="rgtestouts/rgEigPCA/rgEigPCAtest1_PCAPlot.pdf" compare="sim_size" delta="3000"/>
 49    </output>
 50    <output name='pca' file='rgtestouts/rgEigPCA/rgEigPCAtest1.txt' compare='diff'/>
 51 </test>
 52</tests>
 53
 54<help>
 55
 56
 57**Syntax**
 58
 59- **Genotype data** is an input genotype dataset in Plink lped (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml) format. See below for notes
 60- **Title** is used to name the output files so you can remember what the outputs are for
 61- **Tuning parameters** are documented in the Eigensoft (http://genepath.med.harvard.edu/~reich/Software.htm) documentation - see below 
 62
 63
 64-----
 65
 66**Summary**
 67
 68Eigensoft requires ld-reduced genotype data. 
 69Galaxy has an automatic converter for genotype data in Plink linkage pedigree (lped) format.
 70For details of this generic genotype format, please see the Plink documentation at 
 71http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
 72
 73Reading that documentation, you'll see that the linkage pedigree format is really two related files with the same 
 74file base name - a map and ped file - eg 'mygeno.ped' and 'mygeno.map'.
 75The map file has the chromosome, offset, genetic offset and snp name corresponding to each
 76genotype stored as separate alleles in the ped file. The ped file has family id, individual id, father id (or 0), mother id
 77(or 0), gender (1=male, 2=female, 0=unknown) and affection (1=unaffected, 2=affected, 0=unknown), 
 78then two separate allele columns for each genotype. 
 79
 80Once you have your data in the right format, you can upload those into your Galaxy history using the "upload" tool.
 81
 82To upload your lped data in the upload tool, choose 'lped' as the 'file format'. The tool form will change to 
 83allow you to navigate to and select each member of the pair of  ped and map files stored on your local computer
 84(or available at a public URL for Galaxy to grab). 
 85Give the dataset a meaningful name (replace rgeneticsData with something more useful!) and click execute. 
 86
 87When the upload is done, your new lped format dataset will appear in your history and then, 
 88when you choose the ancestry tool, that history dataset will be available as input.
 89
 90**Warning for the Impatient**
 91
 92When you execute the tool, it will look like it has not started running for a while as the automatic converter 
 93reduces the amount of LD - otherwise eigenstrat gives biased results.
 94
 95
 96**Attribution**
 97
 98This tool runs and relies on the work of many others, including the
 99maintainers of the Eigensoft program, and the R and
100Bioconductor projects. For full attribution, source code and documentation, please see
101http://genepath.med.harvard.edu/~reich/Software.htm, http://cran.r-project.org/
102and http://www.bioconductor.org/ respectively
103
104This implementation is a Galaxy tool wrapper around these third party applications.
105It was originally designed and written for family based data from the CAMP Illumina run of 2007 by
106ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit.
107
108copyright Ross Lazarus 2007
109Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html
110but is about as useful as a sponge boat without EIGENSOFT pca code.
111
112**README from eigensoft2 distribution at http://genepath.med.harvard.edu/~reich/Software.htm**
113
114[rerla@beast eigensoft2]$ cat README
115EIGENSOFT version 2.0, January 2008 (for Linux only)
116
117This is the same as our EIGENSOFT 2.0 BETA release with a few recent changes
118as described at http://genepath.med.harvard.edu/~reich/New_In_EIGENSOFT.htm.
119
120Features of EIGENSOFT version 2.0 include:
121-- Keeping track of ref/var alleles in all file formats: see CONVERTF/README
122-- Handling data sets up to 8 billion genotypes: see CONVERTF/README
123-- Output SNP weightings of each principal component: see POPGEN/README
124
125The EIGENSOFT package implements methods from the following 2 papers:
126Patterson N. et al. 2006 PLoS Genetics in press (population structure)
127Price A.L. et al. 2006 NG 38:904-9 (EIGENSTRAT stratification correction)
128
129See POPGEN/README for documentation of population structure programs.
130
131See EIGENSTRAT/README for documentation of EIGENSTRAT programs.
132
133See CONVERTF/README for documentation of programs for converting file formats.
134
135
136Executables and source code:
137----------------------------
138All C executables are in the bin/ directory.
139
140We have placed source code for all C executables in the src/ directory,
141for users who wish to modify and recompile our programs.  For example, to
142recompile the eigenstrat program, type
143"cd src"
144"make eigenstrat"
145"mv eigenstrat ../bin"
146
147Note that some of our software will only compile if your system has the
148lapack package installed.  (This package is used to compute eigenvectors.)
149Some users may need to change "blas-3" to "blas" in the Makefile,
150depending on how blas and lapack are installed.
151
152If cc is not available on your system, try "cp Makefile.alt Makefile"
153and then recompile.
154
155If you have trouble compiling and running our code, try compiling and
156running the pcatoy program in the src directory:
157"cd src"
158"make pcatoy"
159"./pcatoy"
160If you are unable to run the pcatoy program successfully, please contact
161your system administrator for help, as this is a systems issue which is
162beyond our scope.  Your system administrator will be able to troubleshoot
163your systems issue using this trivial program.  [You can also try running
164the pcatoy program in the bin directory, which we have already compiled.]
165</help>
166</tool>
167