/tools/rgenetics/rgQQ.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 99 lines · 74 code · 25 blank · 0 comment · 0 complexity · 4adc748da31bdfc0078941fd32546080 MD5 · raw file

  1. <tool id="rgQQ1" name="QQ Plots:">
  2. <code file="rgQQ_code.py"/>
  3. <description>for p values from an analysis </description>
  4. <command interpreter="python">
  5. rgQQ.py "$input1" "$title" "$sample" "$cols" "$allqq" "$height" "$width" "$logtrans" "$allqq.id" "$__new_file_path__"
  6. </command>
  7. <inputs>
  8. <page>
  9. <param name="input1" type="data" label="Choose the History dataset containing p values to QQ plot"
  10. size="80" format="tabular" help="Dataset missing? See Tip below" />
  11. <param name="title" type="text" size="80" label = "Descriptive title for QQ plot" value="QQ" />
  12. <param name="logtrans" type="boolean" label = "Use a log scale - recommended for p values in range 0-1.0"
  13. truevalue="true" falsevalue="false"/>
  14. <param name="sample" type="float" label="Random sample fraction - set to 1.0 for all data points" value="0.01"
  15. help="If you have a million values, the QQ plots will be huge - a random sample of 1% will be fine" />
  16. <param name="height" type="integer" label="PDF image height (inches)" value="6" />
  17. <param name="width" type="integer" label="PDF image width (inches)" value="6" />
  18. </page>
  19. <page>
  20. <param name="cols" type="select" display="checkboxes" multiple="True"
  21. help="Choose from these numeric columns in the data file to make a quantile-quantile plot against a uniform distribution"
  22. label="Columns (p values 0-1 eg) to make QQ plots" dynamic_options="get_columns( input1 )" />
  23. </page>
  24. </inputs>
  25. <outputs>
  26. <data format="pdf" name="allqq" label="${title}.html"/>
  27. </outputs>
  28. <tests>
  29. <test>
  30. <param name='input1' value='tinywga.pphe' />
  31. <param name='title' value="rgQQtest1" />
  32. <param name='logtrans' value="false" />
  33. <param name='sample' value='1.0' />
  34. <param name='height' value='8' />
  35. <param name='width' value='10' />
  36. <param name='cols' value='3' />
  37. <output name='allqq' file='rgQQtest1.pdf' ftype='binary' compare="diff" lines_diff="29"/>
  38. </test>
  39. </tests>
  40. <help>
  41. .. class:: infomark
  42. **Explanation**
  43. A quantile-quantile (QQ) plot is a good way to see systematic departures from the null expectation of uniform p-values
  44. from a genomic analysis. If the QQ plot shows departure from the null (ie a uniform 0-1 distribution), you hope that this will be
  45. in the very smallest p-values suggesting that there might be some interesting results to look at. A log scale will help emphasise departures
  46. from the null at low p values more clear
  47. -----
  48. .. class:: infomark
  49. **Syntax**
  50. This tool has 2 pages. On the first one you choose the data set and output options, then on the second page, the
  51. column names are shown so you can choose the one containing the p values you wish to plot.
  52. - **History data** is one of your history tabular data sets
  53. - **Descriptive Title** is the text to appear in the output file names to remind you what the plots are!
  54. - **Use a Log scale** is recommended for p values in the range 0-1 as it highlights departures from the null at small p values
  55. - **Random Sample Fraction** is the fraction of points to randomly sample - highly recommended for >5k or so values
  56. - **Height and Width** will determine the scale of the pdf images
  57. -----
  58. .. class:: infomark
  59. **Summary**
  60. Generate a uniform QQ plot for any large number of p values from an analysis.
  61. Essentially a plot of n ranked p values against their rank as a centile - ie rank/n
  62. Works well where you have a column containing p values from
  63. a statistical test of some sort. These will be plotted against the values expected under the null. Departure
  64. from the diagonal suggests one distribution is more extreme than the other. You hope your p values are
  65. smaller than expected under the null.
  66. The sampling fraction will help cut down the size of the pdfs. If there are fewer than 5k points on any plot, all will be shown.
  67. Otherwise the sampling fraction will be used or 5k, whichever is larger.
  68. Note that the use of a log scale is ill-advised if you are plotting log transformed p values because the
  69. uniform distribution chosen for the qq plot is always 0-1 and log transformation is applied if required.
  70. The most useful plots for p values are log QQ plots of untransformed p values in the range 0-1
  71. Originally designed and written for family based data from the CAMP Illumina run of 2007 by
  72. ross lazarus (ross.lazarus@gmail.com)
  73. </help>
  74. </tool>