/tools/metag_tools/short_reads_trim_seq.xml

https://bitbucket.org/ialbert/galaxy-genetrack · XML · 93 lines · 69 code · 24 blank · 0 comment · 0 complexity · 1d67bb2d47265220dc29cc04ab2115d2 MD5 · raw file

  1. <tool id="trim_reads" name="Select high quality segments" version="1.0.0">
  2. <description>from short reads</description>
  3. <command interpreter="python">
  4. short_reads_trim_seq.py $trim $length $output1 $input1 $input2 $sequencing_method_choice.input3
  5. </command>
  6. <inputs>
  7. <page>
  8. <param name="input1" type="data" format="fasta,txtseq.zip" label="Reads" />
  9. <param name="input2" type="data" format="qualsolexa,qual454,txtseq.zip" label="Quality scores" />
  10. <param name="trim" type="integer" size="5" value="20" label="Minimal quality score" help="bases scoring below this value will trigger splitting"/>
  11. <param name="length" type="integer" size="5" value="100" label="Minimal length of contiguous segment" help="report all high quality segments above this length. Setting this option to '0' will cause the program to return a single longest run of high quality bases per read" />
  12. <conditional name="sequencing_method_choice">
  13. <param name="sequencer" type="select" label="Select technology">
  14. <option value="454">Roche (454) or ABI SOLiD</option>
  15. <option value="Solexa">Illumina (Solexa)</option>
  16. </param>
  17. <when value="454">
  18. <param name="input3" type="select" label="Low quality bases in homopolymers" help="if set to 'DO NOT trigger splitting' the program will not count low quality bases that are within or adjacent to homonucleotide runs. This will significantly reduce fragmentation of 454 data">
  19. <option value="yes">DO NOT trigger splitting </option>
  20. <option value="no">trigger splitting</option>
  21. </param>
  22. </when>
  23. <when value="Solexa">
  24. <param name="input3" type="integer" size="5" value="0" label="Restrict length of each read to" help="('0' = do not trim) The quality of Solexa reads drops towards the end. This option allows selecting the specified number of nucleotides from the beginning and then running the tool." />
  25. </when>
  26. </conditional>
  27. </page>
  28. </inputs>
  29. <outputs>
  30. <data name="output1" format="fasta" />
  31. </outputs>
  32. <tests>
  33. <test>
  34. <param name="sequencer" value="454" />
  35. <param name="input1" value="454.fasta" ftype="fasta" />
  36. <param name="input2" value="454.qual" ftype="qual454" />
  37. <param name="input3" value="no" />
  38. <param name="trim" value="20" />
  39. <param name="length" value="0" />
  40. <output name="output1" file="short_reads_trim_seq_out1.fasta" />
  41. </test>
  42. <test>
  43. <param name="sequencer" value="Solexa" />
  44. <param name="input1" value="solexa.fasta" ftype="fasta" />
  45. <param name="input2" value="solexa.qual" ftype="qualsolexa" />
  46. <param name="input3" value="0" />
  47. <param name="trim" value="20" />
  48. <param name="length" value="0" />
  49. <output name="output1" file="short_reads_trim_seq_out2.fasta" />
  50. </test>
  51. </tests>
  52. <help>
  53. .. class:: warningmark
  54. To use this tool your quality score dataset needs to be in *Quality Score* format. Click pencil icon next to your dataset to set datatype to *Quality Score*.
  55. -----
  56. **What it does**
  57. This tool finds high quality segments within sequencing reads generated by by Roche (454), Illumina (Solexa), or ABI SOLiD machines.
  58. -----
  59. **Example**
  60. Suppose this is your sequencing read::
  61. 5'---------*-------------*------**----3'
  62. where **dashes** (-) are HIGH quality bases (above 20) and **asterisks** (*) are LOW quality bases (below 20). If the **Minimal length of contiguous segment** is set to **5** (of course, only for the purposes of this example), the tool will return::
  63. 5'---------
  64. -------------
  65. -------
  66. you can see that the tool simply splits the read on low quality bases and then returns all segments longer than 5. **Note**, that the output of this tool will likely contain higher number of shorter sequences compared to the original input. If we set the **Minimal length of contiguous segment** to **0**, the tool will only return the single longest segment::
  67. -------------
  68. </help>
  69. </tool>