/tools/filters/trimmer.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 142 lines · 109 code · 33 blank · 0 comment · 0 complexity · bfb13988c6e5139ba020c55d3606a3ab MD5 · raw file

  1. <tool id="trimmer" name="Trim" version="0.0.1">
  2. <description>leading or trailing characters</description>
  3. <command interpreter="python">
  4. trimmer.py -a -f $input1 -c $col -s $start -e $end -i $ignore $fastq > $out_file1
  5. </command>
  6. <inputs>
  7. <param format="tabular,txt" name="input1" type="data" label="this dataset"/>
  8. <param name="col" type="integer" value="0" label="Trim this column only" help="0 = process entire line" />
  9. <param name="start" type="integer" size="10" value="1" label="Trim from the beginning up to this position" help="Only positive positions allowed. 1 = do not trim the beginning"/>
  10. <param name="end" type="integer" size="10" value="0" label="Remove everything from this position to the end" help="Use negative position to indicate position starting from the end. 0 = do not trim the end"/>
  11. <param name="fastq" type="select" label="Is input dataset in fastq format?" help="If set to YES, the tool will not trim evenly numbered lines (0, 2, 4, etc...). This allows for trimming the seq and qual lines, only if they are not spread over multiple lines (see warning below).">
  12. <option selected="true" value="">No</option>
  13. <option value="-q">Yes</option>
  14. </param>
  15. <param name="ignore" type="select" display="checkboxes" multiple="True" label="Ignore lines beginning with these characters" help="lines beginning with these are not trimmed">
  16. <option value="62">&gt;</option>
  17. <option value="64">@</option>
  18. <option value="43">+</option>
  19. <option value="60">&lt;</option>
  20. <option value="42">*</option>
  21. <option value="45">-</option>
  22. <option value="61">=</option>
  23. <option value="124">|</option>
  24. <option value="63">?</option>
  25. <option value="36">$</option>
  26. <option value="46">.</option>
  27. <option value="58">:</option>
  28. <option value="38">&amp;</option>
  29. <option value="37">%</option>
  30. <option value="94">^</option>
  31. <option value="35">&#35;</option>
  32. </param>
  33. </inputs>
  34. <outputs>
  35. <data name="out_file1" format="input" metadata_source="input1"/>
  36. </outputs>
  37. <tests>
  38. <test>
  39. <param name="input1" value="trimmer_tab_delimited.dat"/>
  40. <param name="col" value="0"/>
  41. <param name="start" value="1"/>
  42. <param name="end" value="13"/>
  43. <param name="ignore" value="62"/>
  44. <param name="fastq" value="No"/>
  45. <output name="out_file1" file="trimmer_a_f_c0_s1_e13_i62.dat"/>
  46. </test>
  47. <test>
  48. <param name="input1" value="trimmer_tab_delimited.dat"/>
  49. <param name="col" value="2"/>
  50. <param name="start" value="1"/>
  51. <param name="end" value="2"/>
  52. <param name="ignore" value="62"/>
  53. <param name="fastq" value="No"/>
  54. <output name="out_file1" file="trimmer_a_f_c2_s1_e2_i62.dat"/>
  55. </test>
  56. <test>
  57. <param name="input1" value="trimmer_tab_delimited.dat"/>
  58. <param name="col" value="2"/>
  59. <param name="start" value="2"/>
  60. <param name="end" value="-2"/>
  61. <param name="ignore" value="62"/>
  62. <param name="fastq" value="No"/>
  63. <output name="out_file1" file="trimmer_a_f_c2_s2_e-2_i62.dat"/>
  64. </test>
  65. </tests>
  66. <help>
  67. **What it does**
  68. Trims specified number of characters from a dataset or its field (if dataset is tab-delimited).
  69. -----
  70. **Example 1**
  71. Trimming this dataset::
  72. 1234567890
  73. abcdefghijk
  74. by setting **Trim from the beginning up to this position** to *2* and **Remove everything from this position to the end** to *6* will produce::
  75. 23456
  76. bcdef
  77. -----
  78. **Example 2**
  79. Trimming column 2 of this dataset::
  80. abcde 12345 fghij 67890
  81. fghij 67890 abcde 12345
  82. by setting **Trim content of this column only** to *2*, **Trim from the beginning up to this position** to *2*, and **Remove everything from this position to the end** to *4* will produce::
  83. abcde 234 fghij 67890
  84. fghij 789 abcde 12345
  85. -----
  86. **Example 3**
  87. Trimming column 2 of this dataset::
  88. abcde 12345 fghij 67890
  89. fghij 67890 abcde 12345
  90. by setting **Trim content of this column only** to *2*, **Trim from the beginning up to this position** to *2*, and **Remove everything from this position to the end** to *-2* will produce::
  91. abcde 23 fghij 67890
  92. fghij 78 abcde 12345
  93. ----
  94. **Trimming FASTQ datasets**
  95. This tool can be used to trim sequences and quality strings in fastq datasets. This is done by selected *Yes* from the **Is input dataset in fastq format?** dropdown. If set to *Yes*, the tool will skip all even numbered lines (see warning below). For example, trimming last 5 bases of this dataset::
  96. @081017-and-081020:1:1:1715:1759
  97. GGACTCAGATAGTAATCCACGCTCCTTTAAAATATC
  98. +
  99. II#IIIIIII$5+.(9IIIIIII$%*$G$A31I&amp;&amp;B
  100. cab done by setting **Remove everything from this position to the end** to 31::
  101. @081017-and-081020:1:1:1715:1759
  102. GGACTCAGATAGTAATCCACGCTCCTTTAAA
  103. +
  104. II#IIIIIII$5+.(9IIIIIII$%*$G$A3
  105. **Note** that headers are skipped.
  106. .. class:: warningmark
  107. **WARNING:** This tool will only work on properly formatted fastq datasets where (1) each read and quality string occupy one line and (2) '@' (read header) and "+" (quality header) lines are evenly numbered like in the above example.
  108. </help>
  109. </tool>