/tools/taxonomy/find_diag_hits.xml
https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 99 lines · 78 code · 21 blank · 0 comment · 0 complexity · 9435255794323314d579f9f41b7fd2dd MD5 · raw file
- <tool id="find_diag_hits" name="Find diagnostic hits" version="1.0.0">
- <description></description>
- <requirements>
- <requirement type="package">taxonomy</requirement>
- </requirements>
- <command interpreter="python">find_diag_hits.py $input1 $id_col $rank_list $out_format $out_file1</command>
- <inputs>
- <param format="taxonomy" name="input1" type="data" label="Find diagnostic hits in"/>
- <param name="id_col" type="data_column" data_ref="input1" numerical="False" label="Select column with sequence id" />
- <param name="rank_list" type="select" display="checkboxes" multiple="true" label="select taxonomic ranks">
- <option value="superkingdom">Superkingdom</option>
- <option value="kingdom">Kingdom</option>
- <option value="subkingdom">Subkingdom</option>
- <option value="superphylum">Superphylum</option>
- <option value="phylum">Phylum</option>
- <option value="subphylum">Subphylum</option>
- <option value="superclass">Superclass</option>
- <option value="class">Class</option>
- <option value="subclass">Subclass</option>
- <option value="superorder">Superorder</option>
- <option value="order">Order</option>
- <option value="suborder">Suborder</option>
- <option value="superfamily">Superfamily</option>
- <option value="family">Family</option>
- <option value="subfamily">Subfamily</option>
- <option value="tribe">Tribe</option>
- <option value="subtribe">Subtribe</option>
- <option value="genus">Genus</option>
- <option value="subgenus">Subgenus</option>
- <option selected="true" value="species">Species</option>
- <option value="subspecies">Subspecies</option>
- </param>
- <param name="out_format" type="select" label="Select output format">
- <option value="reads">Diagnostic read list</option>
- <option value="counts">Number of diagnostic reads per taxonomic rank</option>
- </param>
- </inputs>
- <outputs>
- <data format="tabular" name="out_file1" />
- </outputs>
- <tests>
- <test>
- <param name="input1" value="taxonomyGI.taxonomy" ftype="taxonomy"/>
- <param name="id_col" value="1" />
- <param name="rank_list" value="order,genus" />
- <param name="out_format" value="counts" />
- <output name="out_file1" file="find_diag_hits.tabular" />
- </test>
- </tests>
-
- <help>
- **What it does**
- When performing metagenomic analyses it is often necessary to identify sequence reads corresponding to a particular taxonomic group, or, in other words, diagnostic of a particular taxonomic rank. This utility performs this analysis. It takes data generated by *Taxonomy manipulation->Fetch Taxonomic Ranks* as input and outputs either a list of sequence reads unique to a particular taxonomic rank, or a list of taxonomic ranks and the count of unique reads corresponding to each rank.
- ------
- **Example**
- Suppose the *Taxonomy manipulation->Fetch Taxonomic Ranks* generated the following taxonomy representation::
- read1 2 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Laurasiatheria n Ruminantia n Bovidae Bovinae n n Bos n Bos taurus n
- read2 12585 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Euarchontoglires Primates Haplorrhini Hominoidea Hominidae n n n Homo n Homo sapiens n
- read1 58615 root Eukaryota Metazoa n n Arthropoda n Hexapoda Insecta Neoptera Amphiesmenoptera Lepidoptera Glossata Papilionoidea Nymphalidae Nymphalinae Melitaeini Phyciodina Anthanassa n Anthanassa otanes n
- read3 56785 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Euarchontoglires Primates Haplorrhini Hominoidea Hominidae n n n Homo n Homo sapiens n
- Running this tool with the following parameters:
- * *Select column with sequence id* set to **c1**
- * *Select taxonomic ranks* with **order**, and **genus** checked
- * *Output format* set to **Diagnostic read list**
-
- will return::
- read2 Primates order
- read3 Primates order
- read2 Homo genus
- read3 Homo genus
-
- Changing *Output format* set to **Number of diagnostic reads per taxonomic rank** will produce::
- Primates 2 order
- Homo 2 genus
-
- .. class:: infomark
- Note that **read1** is omitted because it is non-unique: it hits Mammals and Insects at the same time.
- --------
- .. class:: warningmark
- This tool omits "**n**" corresponding to ranks missing from NCBI taxonomy. In the above example *Home sapiens* contains the order name (Primates) while *Bos taurus* does not.
- </help>
- </tool>