/tools/metag_tools/short_reads_figure_score.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 87 lines · 62 code · 25 blank · 0 comment · 0 complexity · f0a0865e7d092d148e5acd40218ce045 MD5 · raw file

  1. <tool id="quality_score_distribution" name="Build base quality distribution" version="1.0.2">
  2. <description></description>
  3. <command interpreter="python">short_reads_figure_score.py $input1 $output1 </command>
  4. <inputs>
  5. <page>
  6. <param name="input1" type="data" format="qualsolexa, qual454" label="Quality score file" help="No dataset? Read tip below"/>
  7. </page>
  8. </inputs>
  9. <outputs>
  10. <data name="output1" format="png" />
  11. </outputs>
  12. <requirements>
  13. <requirement type="python-module">rpy</requirement>
  14. </requirements>
  15. <tests>
  16. <test>
  17. <param name="input1" value="solexa.qual" ftype="qualsolexa" />
  18. <output name="output1" file="solexaScore.png" ftype="png" />
  19. </test>
  20. <test>
  21. <param name="input1" value="454.qual" ftype="qual454" />
  22. <output name="output1" file="454Score.png" ftype="png" />
  23. </test>
  24. </tests>
  25. <help>
  26. .. class:: warningmark
  27. To use this tool, your dataset needs to be in the *Quality Score* format. Click the pencil icon next to your dataset to set the datatype to *Quality Score* (see below for examples).
  28. -----
  29. **What it does**
  30. This tool takes Quality Files generated by Roche (454), Illumina (Solexa), or ABI SOLiD machines and builds a graph showing score distribution like the one below. Such graph allows you to perform initial evaluation of data quality in a single pass.
  31. -----
  32. **Examples of Quality Data**
  33. Roche (454) or ABI SOLiD data::
  34. &gt;seq1
  35. 23 33 34 25 28 28 28 32 23 34 27 4 28 28 31 21 28
  36. Illumina (Solexa) data::
  37. -40 -40 40 -40 -40 -40 -40 40
  38. -----
  39. **Output example**
  40. Quality scores are summarized as boxplot (Roche 454 FLX data):
  41. .. image:: ${static_path}/images/short_reads_boxplot.png
  42. where the **X-axis** is coordinate along the read and the **Y-axis** is quality score adjusted to comply with the Phred score metric. Units on the X-axis depend on whether your data comes from Roche (454) or Illumina (Solexa) and ABI SOLiD machines:
  43. - For Roche (454) X-axis (shown above) indicates **relative** position (in %) within reads as this technology produces reads of different lengths;
  44. - For Illumina (Solexa) and ABI SOLiD X-axis shows **absolute** position in nucleotides within reads.
  45. Every box on the plot shows the following values::
  46. o &lt;---- Outliers
  47. o
  48. -+- &lt;---- Upper Extreme Value that is no more
  49. | than box length away from the box
  50. |
  51. +--+--+ &lt;---- Upper Quartile
  52. | |
  53. +-----+ &lt;---- Median
  54. | |
  55. +--+--+ &lt;---- Lower Quartile
  56. |
  57. |
  58. -+- &lt;---- Lower Extreme Value that is no more
  59. than box length away from the box
  60. o &lt;---- Outlier
  61. </help>
  62. </tool>