/tools/samtools/pileup_interval.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 189 lines · 159 code · 30 blank · 0 comment · 0 complexity · 2887b1dc5218945cf7235d217e0c1d33 MD5 · raw file

  1. <tool id="pileup_interval" name="Pileup-to-Interval" version="1.0.0">
  2. <description>condenses pileup format into ranges of bases</description>
  3. <requirements>
  4. <requirement type="package">samtools</requirement>
  5. </requirements>
  6. <command interpreter="python">
  7. pileup_interval.py
  8. --input=$input
  9. --output=$output
  10. --coverage=$coverage
  11. --format=$format_type.format
  12. #if $format_type.format == "ten":
  13. --base=$format_type.which_base
  14. --seq_column="None"
  15. --loc_column="None"
  16. --base_column="None"
  17. --cvrg_column="None"
  18. #elif $format_type.format == "manual":
  19. --base="None"
  20. --seq_column=$format_type.seq_column
  21. --loc_column=$format_type.loc_column
  22. --base_column=$format_type.base_column
  23. --cvrg_column=$format_type.cvrg_column
  24. #else:
  25. --base="None"
  26. --seq_column="None"
  27. --loc_column="None"
  28. --base_column="None"
  29. --cvrg_column="None"
  30. #end if
  31. </command>
  32. <inputs>
  33. <param name="input" type="data" format="tabular" label="Choose a pileup file to condense:" />
  34. <conditional name="format_type">
  35. <param name="format" type="select" label="which contains:" help="See &quot;Types of pileup datasets&quot; below for examples">
  36. <option value="six" selected="true">Pileup with six columns (simple)</option>
  37. <option value="ten">Pileup with ten columns (with consensus)</option>
  38. <option value="manual">Set columns manually</option>
  39. </param>
  40. <when value="six" />
  41. <when value="ten">
  42. <param name="which_base" type="select" label="Which base do you want to concatenate">
  43. <option value="first" selected="true">Reference base (first)</option>
  44. <option value="second">Consensus base (second)</option>
  45. </param>
  46. </when>
  47. <when value="manual">
  48. <param name="seq_column" label="Select column with sequence name" type="data_column" numerical="false" data_ref="input" />
  49. <param name="loc_column" label="Select column with base location" type="data_column" numerical="false" data_ref="input" />
  50. <param name="base_column" label="Select column with base to concatenate" type="data_column" numerical="false" data_ref="input" />
  51. <param name="cvrg_column" label="Select column with coverage" type="data_column" numerical="true" data_ref="input" />
  52. </when>
  53. </conditional>
  54. <param name="coverage" type="integer" value="3" label="Do not report bases with coverage less than:" />
  55. </inputs>
  56. <outputs>
  57. <data format="tabular" name="output" />
  58. </outputs>
  59. <tests>
  60. <test>
  61. <param name="input" value="pileup_interval_in1.tabular" />
  62. <param name="format" value="six" />
  63. <param name="coverage" value="3" />
  64. <output name="output" file="pileup_interval_out1.tabular" />
  65. </test>
  66. <test>
  67. <param name="input" value="pileup_interval_in2.tabular" />
  68. <param name="format" value="ten" />
  69. <param name="which_base" value="first" />
  70. <param name="coverage" value="3" />
  71. <output name="output" file="pileup_interval_out2.tabular" />
  72. </test>
  73. <test>
  74. <param name="input" value="pileup_interval_in2.tabular" />
  75. <param name="format" value="manual" />
  76. <param name="seq_column" value="1" />
  77. <param name="loc_column" value="2" />
  78. <param name="base_column" value="3" />
  79. <param name="cvrg_column" value="8" />
  80. <param name="coverage" value="3" />
  81. <output name="output" file="pileup_interval_out2.tabular" />
  82. </test>
  83. </tests>
  84. <help>
  85. **What is does**
  86. Reduces the size of a results set by taking a pileup file and producing a condensed version showing consecutive sequences of bases meeting coverage criteria. The tool works on six and ten column pileup formats produced with *samtools pileup* command. You also can specify columns for the input file manually. The tool assumes that the pileup dataset was produced by *samtools pileup* command (although you can override this by setting column assignments manually).
  87. --------
  88. **Types of pileup datasets**
  89. The description of pileup format below is largely based on information that can be found on SAMTools_ documentation page. The 6- and 10-column variants are described below.
  90. .. _SAMTools: http://samtools.sourceforge.net/pileup.shtml
  91. **Six column pileup**::
  92. 1 2 3 4 5 6
  93. ---------------------------------
  94. chrM 412 A 2 ., II
  95. chrM 413 G 4 ..t, IIIH
  96. chrM 414 C 4 ...a III2
  97. chrM 415 C 4 TTTt III7
  98. where::
  99. Column Definition
  100. ------ ----------------------------
  101. 1 Chromosome
  102. 2 Position (1-based)
  103. 3 Reference base at that position
  104. 4 Coverage (# reads aligning over that position)
  105. 5 Bases within reads where (see Galaxy wiki for more info)
  106. 6 Quality values (phred33 scale, see Galaxy wiki for more)
  107. **Ten column pileup**
  108. The `ten-column`__ pileup incorporates additional consensus information generated with *-c* option of *samtools pileup* command::
  109. 1 2 3 4 5 6 7 8 9 10
  110. ------------------------------------------------
  111. chrM 412 A A 75 0 25 2 ., II
  112. chrM 413 G G 72 0 25 4 ..t, IIIH
  113. chrM 414 C C 75 0 25 4 ...a III2
  114. chrM 415 C T 75 75 25 4 TTTt III7
  115. where::
  116. Column Definition
  117. ------- ----------------------------
  118. 1 Chromosome
  119. 2 Position (1-based)
  120. 3 Reference base at that position
  121. 4 Consensus bases
  122. 5 Consensus quality
  123. 6 SNP quality
  124. 7 Maximum mapping quality
  125. 8 Coverage (# reads aligning over that position)
  126. 9 Bases within reads where (see Galaxy wiki for more info)
  127. 10 Quality values (phred33 scale, see Galaxy wiki for more)
  128. .. __: http://samtools.sourceforge.net/cns0.shtml
  129. ------
  130. **The output format**
  131. The output file condenses the information in the pileup file so that consecutive bases are listed together as sequences. The starting and ending points of the sequence range are listed, with the starting value converted to a 0-based value.
  132. Given the following input with minimum coverage set to 3::
  133. 1 2 3 4 5 6
  134. ---------------------------------
  135. chr1 112 G 3 ..Ta III6
  136. chr1 113 T 2 aT.. III5
  137. chr1 114 A 5 ,,.. IIH2
  138. chr1 115 C 4 ,., III
  139. chrM 412 A 2 ., II
  140. chrM 413 G 4 ..t, IIIH
  141. chrM 414 C 4 ...a III2
  142. chrM 415 C 4 TTTt III7
  143. chrM 490 T 3 a I
  144. the following would be the output::
  145. 1 2 3 4
  146. -------------------
  147. chr1 111 112 G
  148. chr1 113 115 AC
  149. chrM 412 415 GCC
  150. chrM 489 490 T
  151. where::
  152. Column Definition
  153. ------- ----------------------------
  154. 1 Chromosome
  155. 2 Starting position (0-based)
  156. 3 Ending position (1-based)
  157. 4 Sequence of bases
  158. </help>
  159. </tool>