/tools/human_genome_variation/ldtools.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 111 lines · 89 code · 22 blank · 0 comment · 0 complexity · a1b8534ad59a9f6d77172b583134805f MD5 · raw file

  1. <tool id="hgv_ldtools" name="LD" version="1.0.0">
  2. <description>linkage disequilibrium and tag SNPs</description>
  3. <command interpreter="bash">
  4. ldtools_wrapper.sh rsquare=$rsquare freq=$freq input=$input output=$output
  5. </command>
  6. <inputs>
  7. <param format="tabular" name="input" type="data" label="Dataset"/>
  8. <param name="rsquare" label="r&lt;sup&gt;2&lt;/sup&gt; threshold" type="float" value="0.64">
  9. <validator type="in_range" message="rsquare must be in range [0.00, 1.00]" min="0.00" max="1.00" />
  10. </param>
  11. <param name="freq" label="Minimum allele frequency threshold" type="float" value="0.00">
  12. <validator type="in_range" message="freq must be in range (0.00, 0.50]" min="0.00" max="0.50" />
  13. </param>
  14. </inputs>
  15. <outputs>
  16. <data format="tabular" name="output" />
  17. </outputs>
  18. <tests>
  19. <test>
  20. <param name="input" value="ldInput1.txt" />
  21. <param name="rsquare" value="0.64" />
  22. <param name="freq" value="0.00" />
  23. <output name="output" file="ldOutput1.txt" />
  24. </test>
  25. </tests>
  26. <help>
  27. **Dataset formats**
  28. The input and output datasets are tabular_.
  29. (`Dataset missing?`_)
  30. .. _tabular: ./static/formatHelp.html#tab
  31. .. _Dataset missing?: ./static/formatHelp.html
  32. -----
  33. **What it does**
  34. This tool can be used to analyze the patterns of linkage disequilibrium
  35. (LD) between polymorphic sites in a locus. SNPs are grouped based on the
  36. threshold level of LD as measured by r\ :sup:`2` (regardless of genomic
  37. position), and a representative "tag SNP" is reported for each group.
  38. The other SNPs in the group are in LD with the tag SNP, but not necessarily
  39. with each other.
  40. The underlying algorithm is the same as the one used in ldSelect (Carlson
  41. et al. 2004). However, this tool is implemented to be much faster and more
  42. efficient than ldSelect.
  43. The input is a tabular file with genotype information for each individual
  44. at each SNP site, in exactly four columns: site ID, sample ID, and the
  45. two allele nucleotides.
  46. -----
  47. **Example**
  48. - input file::
  49. rs2334386 NA20364 G T
  50. rs2334386 NA20363 G G
  51. rs2334386 NA20360 G G
  52. rs2334386 NA20359 G G
  53. rs2334386 NA20358 G G
  54. rs2334386 NA20356 G G
  55. rs2334386 NA20357 G G
  56. rs2334386 NA20350 G G
  57. rs2334386 NA20349 G G
  58. rs2334386 NA20348 G G
  59. rs2334386 NA20347 G G
  60. rs2334386 NA20346 G G
  61. rs2334386 NA20345 G G
  62. rs2334386 NA20344 G G
  63. rs2334386 NA20342 G G
  64. etc.
  65. - output file::
  66. rs2238748 rs2793064,rs6518516,rs6518517,rs2283641,rs5993533,rs715590,rs2072123,rs2105421,rs2800954,rs1557847,rs807750,rs807753,rs5993488,rs8138035,rs2800980,rs2525079,rs5992353,rs712966,rs2525036,rs807743,rs1034727,rs807744,rs2074003
  67. rs2871023 rs1210715,rs1210711,rs5748189,rs1210709,rs3788298,rs7284649,rs9306217,rs9604954,rs1210703,rs5748179,rs5746727,rs5748190,rs5993603,rs2238766,rs885981,rs2238763,rs5748165,rs9605996,rs9606001,rs5992398
  68. rs7292006 rs13447232,rs5993665,rs2073733,rs1057457,rs756658,rs5992395,rs2073760,rs739369,rs9606017,rs739370,rs4493360,rs2073736
  69. rs2518840 rs1061325,rs2283646,rs362148,rs1340958,rs361956,rs361991,rs2073754,rs2040771,rs2073740,rs2282684
  70. rs2073775 rs10160,rs2800981,rs807751,rs5993492,rs2189490,rs5747997,rs2238743
  71. rs5747263 rs12159924,rs2300688,rs4239846,rs3747025,rs3747024,rs3747023,rs2300691
  72. rs433576 rs9605439,rs1109052,rs400509,rs401099,rs396012,rs410456,rs385105
  73. rs2106145 rs5748131,rs2013516,rs1210684,rs1210685,rs2238767,rs2277837
  74. rs2587082 rs2257083,rs2109659,rs2587081,rs5747306,rs2535704,rs2535694
  75. rs807667 rs2800974,rs756651,rs762523,rs2800973,rs1018764
  76. rs2518866 rs1206542,rs807467,rs807464,rs807462,rs712950
  77. rs1110661 rs1110660,rs7286607,rs1110659,rs5992917,rs1110662
  78. rs759076 rs5748760,rs5748755,rs5748752,rs4819925,rs933461
  79. rs5746487 rs5992895,rs2034113,rs2075455,rs1867353
  80. rs5748212 rs5746736,rs4141527,rs5748147,rs5748202
  81. etc.
  82. -----
  83. **Reference**
  84. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. (2004)
  85. Selecting a maximally informative set of single-nucleotide polymorphisms for
  86. association analyses using linkage disequilibrium.
  87. Am J Hum Genet. 74(1):106-20. Epub 2003 Dec 15.
  88. </help>
  89. </tool>