/tools/taxonomy/find_diag_hits.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 99 lines · 78 code · 21 blank · 0 comment · 0 complexity · 9435255794323314d579f9f41b7fd2dd MD5 · raw file

  1. <tool id="find_diag_hits" name="Find diagnostic hits" version="1.0.0">
  2. <description></description>
  3. <requirements>
  4. <requirement type="package">taxonomy</requirement>
  5. </requirements>
  6. <command interpreter="python">find_diag_hits.py $input1 $id_col $rank_list $out_format $out_file1</command>
  7. <inputs>
  8. <param format="taxonomy" name="input1" type="data" label="Find diagnostic hits in"/>
  9. <param name="id_col" type="data_column" data_ref="input1" numerical="False" label="Select column with sequence id" />
  10. <param name="rank_list" type="select" display="checkboxes" multiple="true" label="select taxonomic ranks">
  11. <option value="superkingdom">Superkingdom</option>
  12. <option value="kingdom">Kingdom</option>
  13. <option value="subkingdom">Subkingdom</option>
  14. <option value="superphylum">Superphylum</option>
  15. <option value="phylum">Phylum</option>
  16. <option value="subphylum">Subphylum</option>
  17. <option value="superclass">Superclass</option>
  18. <option value="class">Class</option>
  19. <option value="subclass">Subclass</option>
  20. <option value="superorder">Superorder</option>
  21. <option value="order">Order</option>
  22. <option value="suborder">Suborder</option>
  23. <option value="superfamily">Superfamily</option>
  24. <option value="family">Family</option>
  25. <option value="subfamily">Subfamily</option>
  26. <option value="tribe">Tribe</option>
  27. <option value="subtribe">Subtribe</option>
  28. <option value="genus">Genus</option>
  29. <option value="subgenus">Subgenus</option>
  30. <option selected="true" value="species">Species</option>
  31. <option value="subspecies">Subspecies</option>
  32. </param>
  33. <param name="out_format" type="select" label="Select output format">
  34. <option value="reads">Diagnostic read list</option>
  35. <option value="counts">Number of diagnostic reads per taxonomic rank</option>
  36. </param>
  37. </inputs>
  38. <outputs>
  39. <data format="tabular" name="out_file1" />
  40. </outputs>
  41. <tests>
  42. <test>
  43. <param name="input1" value="taxonomyGI.taxonomy" ftype="taxonomy"/>
  44. <param name="id_col" value="1" />
  45. <param name="rank_list" value="order,genus" />
  46. <param name="out_format" value="counts" />
  47. <output name="out_file1" file="find_diag_hits.tabular" />
  48. </test>
  49. </tests>
  50. <help>
  51. **What it does**
  52. When performing metagenomic analyses it is often necessary to identify sequence reads corresponding to a particular taxonomic group, or, in other words, diagnostic of a particular taxonomic rank. This utility performs this analysis. It takes data generated by *Taxonomy manipulation->Fetch Taxonomic Ranks* as input and outputs either a list of sequence reads unique to a particular taxonomic rank, or a list of taxonomic ranks and the count of unique reads corresponding to each rank.
  53. ------
  54. **Example**
  55. Suppose the *Taxonomy manipulation->Fetch Taxonomic Ranks* generated the following taxonomy representation::
  56. read1 2 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Laurasiatheria n Ruminantia n Bovidae Bovinae n n Bos n Bos taurus n
  57. read2 12585 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Euarchontoglires Primates Haplorrhini Hominoidea Hominidae n n n Homo n Homo sapiens n
  58. read1 58615 root Eukaryota Metazoa n n Arthropoda n Hexapoda Insecta Neoptera Amphiesmenoptera Lepidoptera Glossata Papilionoidea Nymphalidae Nymphalinae Melitaeini Phyciodina Anthanassa n Anthanassa otanes n
  59. read3 56785 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Euarchontoglires Primates Haplorrhini Hominoidea Hominidae n n n Homo n Homo sapiens n
  60. Running this tool with the following parameters:
  61. * *Select column with sequence id* set to **c1**
  62. * *Select taxonomic ranks* with **order**, and **genus** checked
  63. * *Output format* set to **Diagnostic read list**
  64. will return::
  65. read2 Primates order
  66. read3 Primates order
  67. read2 Homo genus
  68. read3 Homo genus
  69. Changing *Output format* set to **Number of diagnostic reads per taxonomic rank** will produce::
  70. Primates 2 order
  71. Homo 2 genus
  72. .. class:: infomark
  73. Note that **read1** is omitted because it is non-unique: it hits Mammals and Insects at the same time.
  74. --------
  75. .. class:: warningmark
  76. This tool omits "**n**" corresponding to ranks missing from NCBI taxonomy. In the above example *Home sapiens* contains the order name (Primates) while *Bos taurus* does not.
  77. </help>
  78. </tool>