PageRenderTime 39ms CodeModel.GetById 30ms app.highlight 3ms RepoModel.GetById 2ms app.codeStats 0ms

/tools/human_genome_variation/beam.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 137 lines | 89 code | 31 blank | 17 comment | 0 complexity | ded7242bb464089e7f4527172475bfc4 MD5 | raw file
  1<tool id="hgv_beam" name="BEAM" version="1.0.0">
  2  <description>significant single- and multi-locus SNP associations in case-control studies</description>
  3
  4  <command interpreter="bash">
  5    BEAM2_wrapper.sh map=${input.extra_files_path}/${input.metadata.base_name}.map ped=${input.extra_files_path}/${input.metadata.base_name}.ped $burnin $mcmc $pvalue significance=$significance posterior=$posterior
  6  </command>
  7
  8  <inputs>
  9    <param format="lped" name="input" type="data" label="Dataset"/>
 10    <param name="burnin" label="Number of MCMC burn-in steps" type="integer" value="200" />
 11    <param name="mcmc" label="Number of MCMC sampling steps" type="integer" value="200" />
 12    <param name="pvalue" label="Significance cutoff (after Bonferroni adjustment)" type="float" value="0.05" />
 13  </inputs>
 14
 15  <outputs>
 16    <data format="tabular" name="significance" />
 17    <data format="tabular" name="posterior" />
 18  </outputs>
 19
 20  <requirements>
 21    <requirement type="package">beam</requirement>
 22    <requirement type="binary">mv</requirement>
 23    <requirement type="binary">rm</requirement>
 24  </requirements>
 25
 26  <!-- broken.  will be fixed soon.
 27  <tests>
 28    <test>
 29      <param name='input' value='gpass_and_beam_input' ftype='lped' >
 30        <metadata name='base_name' value='gpass_and_beam_input' />
 31        <composite_data value='gpass_and_beam_input.ped' />
 32        <composite_data value='gpass_and_beam_input.map' />
 33        <edit_attributes type='name' value='gpass_and_beam_input' />
 34      </param>
 35      <param name="burnin" value="200"/>
 36      <param name="mcmc" value="200"/>
 37      <param name="pvalue" value="0.05"/>
 38      <output name="significance" file="beam_output1.tab"/>
 39      <output name="posterior" file="beam_output2.tab"/>
 40    </test>
 41  </tests>
 42  -->
 43
 44  <help>
 45.. class:: infomark
 46
 47This tool can take a long time to run, depending on the number of SNPs, the
 48sample size, and the number of MCMC steps specified.  If you have hundreds
 49of thousands of SNPs, it may take over a day.  The main tasks that slow down
 50this tool are searching for interactions and dynamically partitioning the
 51SNPs into blocks.  Optimization is certainly possible, but hasn't been done
 52yet.  **If your only interest is to detect SNPs with primary effects (i.e.,
 53single-SNP associations), please use the GPASS tool instead.**
 54
 55-----
 56
 57**Dataset formats**
 58
 59The input dataset must be in lped_ format.  The output datasets are both tabular_.
 60(`Dataset missing?`_)
 61
 62.. _lped: ./static/formatHelp.html#lped
 63.. _tabular: ./static/formatHelp.html#tabular
 64.. _Dataset missing?: ./static/formatHelp.html
 65
 66-----
 67
 68**What it does**
 69
 70BEAM (Bayesian Epistasis Association Mapping) uses a Markov Chain Monte Carlo (MCMC) method to infer SNP block structures and detect both single-marker
 71and interaction effects from case-control SNP data.
 72This tool also partitions SNPs into blocks based on linkage disequilibrium (LD).  The method utilized is Bayesian, so the outputs are posterior probabilities of association, along with block partitions.  An advantage of this method is that it provides uncertainty measures for the associations and block partitions, and it scales well from small to large sample sizes. It is powerful in detecting gene-gene interactions, although slow for large datasets.
 73
 74-----
 75
 76**Example**
 77
 78- input map file::
 79
 80    1  rs0  0  738547
 81    1  rs1  0  5597094
 82    1  rs2  0  9424115
 83    etc.
 84
 85- input ped file::
 86
 87    1 1 0 0 1  1  G G  A A  A A  A A  A A  A G  A A  G G  G G  A A  G G  G G  G G  A A  A A  A G  A A  G G  A G  A G  A A  G G  A A  G G  A A  G G  A G  A A  G G  A A  G G  A A  A G  A G  G G  A G  G G  G G  A A  A G  A A  G G  G G  G G  G G  A G  A A  A A  A A  A A
 88    1 1 0 0 1  1  G G  A G  G G  A A  A A  A G  A A  G G  G G  G G  A A  G G  A G  A G  G G  G G  A G  G G  A G  A A  G G  A G  G G  A A  G G  G G  A G  A G  G G  A G  A A  A A  G G  G G  A G  A G  G G  A G  A A  A A  A G  G G  A G  G G  A G  G G  G G  A A  G G  A G
 89    etc.
 90
 91- first output file, significance.txt::
 92
 93    ID   chr   position  results
 94    rs0  chr1  738547    10 20 score= 45.101397 , df= 8 , p= 0.000431 , N=1225
 95
 96- second output file, posterior.txt::
 97
 98    id:  chr position  marginal + interaction = total posterior
 99    0:   1 738547      0.0000 + 0.0000 = 0.0000
100    1:   1 5597094     0.0000 + 0.0000 = 0.0000
101    2:   1 9424115     0.0000 + 0.0000 = 0.0000
102    3:   1 13879818    0.0000 + 0.0000 = 0.0000
103    4:   1 13934751    0.0000 + 0.0000 = 0.0000
104    5:   1 16803491    0.0000 + 0.0000 = 0.0000
105    6:   1 17236854    0.0000 + 0.0000 = 0.0000
106    7:   1 18445387    0.0000 + 0.0000 = 0.0000
107    8:   1 21222571    0.0000 + 0.0000 = 0.0000
108    etc.
109
110    id:  chr position block_boundary  | allele counts in cases and controls
111    0:   1 738547      1.000          | 156 93 251 | 169 83 248 
112    1:   1 5597094     1.000          | 323 19 158 | 328 16 156 
113    2:   1 9424115     1.000          | 366 6 128 | 369 11 120 
114    3:   1 13879818    1.000          | 252 31 217 | 278 32 190 
115    4:   1 13934751    1.000          | 246 64 190 | 224 58 218 
116    5:   1 16803491    1.000          | 91 160 249 | 91 174 235 
117    6:   1 17236854    1.000          | 252 43 205 | 249 44 207 
118    7:   1 18445387    1.000          | 205 66 229 | 217 56 227 
119    8:   1 21222571    1.000          | 353 9 138 | 352 8 140 
120    etc.
121
122  The "id" field is an internally used index.
123
124-----
125
126**References**
127
128Zhang Y, Liu JS. (2007)
129Bayesian inference of epistatic interactions in case-control studies.
130Nat Genet. 39(9):1167-73. Epub 2007 Aug 26.
131
132Zhang Y, Zhang J, Liu JS. (2010)
133Block-based bayesian epistasis association mapping with application to WTCCC type 1 diabetes data.
134Submitted.
135
136  </help>
137</tool>