/tools/expression/go_analysis.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 102 lines · 80 code · 21 blank · 1 comment · 0 complexity · a26789037496eaa9c3e95c94bfef3bdf MD5 · raw file

  1. <?xml version="1.0"?>
  2. <tool name="Conduct GO" id="goId" force_history_refresh="True">
  3. <description>
  4. Given a list of genes, using Bioconductor (GO, GOstats) and DAVID at NIH
  5. </description>
  6. <code file="go_analysis_code.py"/>
  7. <command interpreter="python">
  8. go_analysis.py '$title' '$diff_expr_file' '$logmeta' '$diff_expr_file.dbkey', '$annotation'
  9. </command>
  10. <inputs>
  11. <param name="title" label="Title to label the new output file" type="text" size="80" value="Conduct GO" />
  12. <param name="diff_expr_file" type="data" format="txt" label="Target Gene List"
  13. optional="false" size="120" help="Choose a target gene list from your history (make sure in this file there is a column called 'Gene' for gene Entrez IDs or the file only contains a single column for Entrez IDs) "/>
  14. <param name="annotation" type="select" label="Gene Universe">
  15. <option value="hgu133a" selected="True">Homo sapiens hgu133a</option>
  16. <option value="hgu133b">Homo sapiens hgu133b</option>
  17. <option value="hgu133plus2">Homo sapiens hgu133plus</option>
  18. <option value="hgu95av2">Homo sapiens hgu95av2</option>
  19. <option value="mouse430a2">Mouse 430a2</option>
  20. <option value="celegans">C. elengans</option>
  21. <!--<option value="fly.db0">Fly</option>-->
  22. <option value="drosophila2">Drosophila</option>
  23. <option value="org.Hs.eg">org.Hs.eg</option>
  24. <option value="org.Mm.eg">org.Mm.eg</option>
  25. <option value="org.Ce.eg">org.Ce.eg</option>
  26. <option value="org.Dm.eg">org.Dm.eg</option>
  27. </param>
  28. </inputs>
  29. <outputs>
  30. <data format='txt' name="logmeta"/>
  31. </outputs>
  32. <help>
  33. **Syntax**
  34. - **Title:** is used to name the output files - so make it meaningful
  35. - **Target Gene List:** Choose a target gene list from your history
  36. - **Gene Universe:** Select a gene universe
  37. -----
  38. **Summary**
  39. For a list of input genes, this tool uses R/BioC packages (GO, GOstats) to
  40. identify over represented GO terms. The number of input genes that can be associated
  41. with the GO term are compared to the number of genes from the gene universe that can
  42. be associated with the specific GO term. The gene universe should be defined as the
  43. list of genes that were used to identify differentially expressed genes (the input genes).
  44. This gene universe can be either the collection of all genes that can be detected with
  45. the microarray used in the analysis, or the list of genes that passed a non-specific
  46. pre-filtering in an analysis for the identification of differentially expressed genes.
  47. This tool also allows to perform GO analysis using DAVID (http://david.abcc.ncifcrf.gov).
  48. The input list of target genes, (Entrez Gene ID) is typically obtained as result of the use of the
  49. "Calculate differential expression" tool and the format is as follow:
  50. ::
  51. Probe Symbol Description Gene Cytoband Log2Ratio PValue
  52. 20042_at SOD1 superoxide.. 6647 21q22.1 0.838191 0.008021
  53. 200818_at ATP50 ATP synthase.. 539 21q22.1-q 0.711812 0.006348
  54. 201123_s_at EIF5A eukayotic.. 1984 17p13-p12 -1.80077 0.008021
  55. Any gene list with Entrez ID can be used as input for this tool, you can load the list into your history using the
  56. "Upload File from your computer" tool, the tool will look for a column called "Gene". The following is a valid example:
  57. ::
  58. Gene
  59. 351
  60. 6647
  61. 3337
  62. 754
  63. 6612
  64. 539
  65. 1984
  66. 1471
  67. 5445
  68. 8209
  69. 522
  70. The output will be 3 different text files:
  71. - Cellular Component ontology: GO_CC_Result.txt
  72. - Biological Process ontology: GO_BP_Result.txt
  73. - Molecular Function ontology: GO_MF_Result.txt
  74. The column "Gene" (EntrezIDs) will be used to map the significantly over represented GO
  75. terms for the particular GO analysis, GO-terms reported will be sorted according to their
  76. significance (p-value).
  77. After the GO analysis is conducted using uses R/BioC packages (GO, GOstats), the original
  78. target gene list is also send to DAVID (http://david.abcc.ncifcrf.gov) for comparative
  79. analysis.
  80. </help>
  81. </tool>