PageRenderTime 18ms CodeModel.GetById 10ms RepoModel.GetById 0ms app.codeStats 0ms

/tools/human_genome_variation/beam.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 137 lines | 89 code | 31 blank | 17 comment | 0 complexity | ded7242bb464089e7f4527172475bfc4 MD5 | raw file
  1. <tool id="hgv_beam" name="BEAM" version="1.0.0">
  2. <description>significant single- and multi-locus SNP associations in case-control studies</description>
  3. <command interpreter="bash">
  4. BEAM2_wrapper.sh map=${input.extra_files_path}/${input.metadata.base_name}.map ped=${input.extra_files_path}/${input.metadata.base_name}.ped $burnin $mcmc $pvalue significance=$significance posterior=$posterior
  5. </command>
  6. <inputs>
  7. <param format="lped" name="input" type="data" label="Dataset"/>
  8. <param name="burnin" label="Number of MCMC burn-in steps" type="integer" value="200" />
  9. <param name="mcmc" label="Number of MCMC sampling steps" type="integer" value="200" />
  10. <param name="pvalue" label="Significance cutoff (after Bonferroni adjustment)" type="float" value="0.05" />
  11. </inputs>
  12. <outputs>
  13. <data format="tabular" name="significance" />
  14. <data format="tabular" name="posterior" />
  15. </outputs>
  16. <requirements>
  17. <requirement type="package">beam</requirement>
  18. <requirement type="binary">mv</requirement>
  19. <requirement type="binary">rm</requirement>
  20. </requirements>
  21. <!-- broken. will be fixed soon.
  22. <tests>
  23. <test>
  24. <param name='input' value='gpass_and_beam_input' ftype='lped' >
  25. <metadata name='base_name' value='gpass_and_beam_input' />
  26. <composite_data value='gpass_and_beam_input.ped' />
  27. <composite_data value='gpass_and_beam_input.map' />
  28. <edit_attributes type='name' value='gpass_and_beam_input' />
  29. </param>
  30. <param name="burnin" value="200"/>
  31. <param name="mcmc" value="200"/>
  32. <param name="pvalue" value="0.05"/>
  33. <output name="significance" file="beam_output1.tab"/>
  34. <output name="posterior" file="beam_output2.tab"/>
  35. </test>
  36. </tests>
  37. -->
  38. <help>
  39. .. class:: infomark
  40. This tool can take a long time to run, depending on the number of SNPs, the
  41. sample size, and the number of MCMC steps specified. If you have hundreds
  42. of thousands of SNPs, it may take over a day. The main tasks that slow down
  43. this tool are searching for interactions and dynamically partitioning the
  44. SNPs into blocks. Optimization is certainly possible, but hasn't been done
  45. yet. **If your only interest is to detect SNPs with primary effects (i.e.,
  46. single-SNP associations), please use the GPASS tool instead.**
  47. -----
  48. **Dataset formats**
  49. The input dataset must be in lped_ format. The output datasets are both tabular_.
  50. (`Dataset missing?`_)
  51. .. _lped: ./static/formatHelp.html#lped
  52. .. _tabular: ./static/formatHelp.html#tabular
  53. .. _Dataset missing?: ./static/formatHelp.html
  54. -----
  55. **What it does**
  56. BEAM (Bayesian Epistasis Association Mapping) uses a Markov Chain Monte Carlo (MCMC) method to infer SNP block structures and detect both single-marker
  57. and interaction effects from case-control SNP data.
  58. This tool also partitions SNPs into blocks based on linkage disequilibrium (LD). The method utilized is Bayesian, so the outputs are posterior probabilities of association, along with block partitions. An advantage of this method is that it provides uncertainty measures for the associations and block partitions, and it scales well from small to large sample sizes. It is powerful in detecting gene-gene interactions, although slow for large datasets.
  59. -----
  60. **Example**
  61. - input map file::
  62. 1 rs0 0 738547
  63. 1 rs1 0 5597094
  64. 1 rs2 0 9424115
  65. etc.
  66. - input ped file::
  67. 1 1 0 0 1 1 G G A A A A A A A A A G A A G G G G A A G G G G G G A A A A A G A A G G A G A G A A G G A A G G A A G G A G A A G G A A G G A A A G A G G G A G G G G G A A A G A A G G G G G G G G A G A A A A A A A A
  68. 1 1 0 0 1 1 G G A G G G A A A A A G A A G G G G G G A A G G A G A G G G G G A G G G A G A A G G A G G G A A G G G G A G A G G G A G A A A A G G G G A G A G G G A G A A A A A G G G A G G G A G G G G G A A G G A G
  69. etc.
  70. - first output file, significance.txt::
  71. ID chr position results
  72. rs0 chr1 738547 10 20 score= 45.101397 , df= 8 , p= 0.000431 , N=1225
  73. - second output file, posterior.txt::
  74. id: chr position marginal + interaction = total posterior
  75. 0: 1 738547 0.0000 + 0.0000 = 0.0000
  76. 1: 1 5597094 0.0000 + 0.0000 = 0.0000
  77. 2: 1 9424115 0.0000 + 0.0000 = 0.0000
  78. 3: 1 13879818 0.0000 + 0.0000 = 0.0000
  79. 4: 1 13934751 0.0000 + 0.0000 = 0.0000
  80. 5: 1 16803491 0.0000 + 0.0000 = 0.0000
  81. 6: 1 17236854 0.0000 + 0.0000 = 0.0000
  82. 7: 1 18445387 0.0000 + 0.0000 = 0.0000
  83. 8: 1 21222571 0.0000 + 0.0000 = 0.0000
  84. etc.
  85. id: chr position block_boundary | allele counts in cases and controls
  86. 0: 1 738547 1.000 | 156 93 251 | 169 83 248
  87. 1: 1 5597094 1.000 | 323 19 158 | 328 16 156
  88. 2: 1 9424115 1.000 | 366 6 128 | 369 11 120
  89. 3: 1 13879818 1.000 | 252 31 217 | 278 32 190
  90. 4: 1 13934751 1.000 | 246 64 190 | 224 58 218
  91. 5: 1 16803491 1.000 | 91 160 249 | 91 174 235
  92. 6: 1 17236854 1.000 | 252 43 205 | 249 44 207
  93. 7: 1 18445387 1.000 | 205 66 229 | 217 56 227
  94. 8: 1 21222571 1.000 | 353 9 138 | 352 8 140
  95. etc.
  96. The "id" field is an internally used index.
  97. -----
  98. **References**
  99. Zhang Y, Liu JS. (2007)
  100. Bayesian inference of epistatic interactions in case-control studies.
  101. Nat Genet. 39(9):1167-73. Epub 2007 Aug 26.
  102. Zhang Y, Zhang J, Liu JS. (2010)
  103. Block-based bayesian epistasis association mapping with application to WTCCC type 1 diabetes data.
  104. Submitted.
  105. </help>
  106. </tool>