featureCounter.xml | searchcode

/tools/regVariation/featureCounter.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 75 lines · 58 code · 17 blank · 0 comment · 0 complexity · 914f22ecd7bc7ab39829b026586de2f7 MD5 · raw file

<tool id="featureCoverage1" name="Feature coverage" version="2.0.0">
  <description></description>
  <command interpreter="python">featureCounter.py $input1 $input2 $output -1 ${input1.metadata.chromCol},${input1.metadata.startCol},${input1.metadata.endCol},${input1.metadata.strandCol} -2 ${input2.metadata.chromCol},${input2.metadata.startCol},${input2.metadata.endCol},${input2.metadata.strandCol}</command>
  <inputs>
    <param format="interval" name="input1" type="data" help="First dataset">
      <label>What portion of</label>
    </param>
    <param format="interval" name="input2" type="data" help="Second dataset">
      <label>is covered by</label>
    </param>
   </inputs>
  <outputs>
    <data format="interval" name="output" metadata_source="input1" />
  </outputs>
  
  <tests>
    <test>
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <output name="output" file="6_feature_coverage.bed" />
    </test>
    <test>
      <param name="input1" value="chrY1.bed" />
      <param name="input2" value="chrY2.bed" />
      <output name="output" file="chrY_Coverage.bed" />
    </test>
  </tests>
  <help>

.. class:: infomark

**What it does**

This tool finds the coverage of intervals in the first dataset on intervals in the second dataset. The coverage and count are appended as 4 new columns in the resulting dataset.

-----

**Example**

- If **First dataset** consists of the following windows::

    chrX 1     10001 seg 0 -
    chrX 10001 20001 seg 0 -
    chrX 20001 30001 seg 0 -
    chrX 30001 40001 seg 0 -
      
- and **Second dataset** consists of the following exons::

    chrX 5000  6000  seg2 0 -
    chrX 5500  7000  seg2 0 -
    chrX 9000  22000 seg2 0 -
    chrX 24000 34000 seg2 0 -
    chrX 36000 38000 seg2 0 -
      
- the **Result** is the coverage of exons of the second dataset in each of the windows contained in first dataset::

    chrX 1     10001 seg 0 - 3001  0.3001 2 1
    chrX 10001 20001 seg 0 - 10000 1.0    1 0
    chrX 20001 30001 seg 0 - 8000  0.8    0 2
    chrX 30001 40001 seg 0 - 5999  0.5999 1 1
	  
- To clarify, the following line of output ( added columns are indexed by a, b and c )::

                         a    b      c d
    chrX 1 10001 seg 0 - 3001 0.3001 2 1
                                  
  implies that 2 exons (c) fall fully in this window (chrX:1-10001), 1 exon (d) partially overlaps this window, and these 3 exons cover 30.01% (c) of the window size, spanning 3001 nucleotides (a).

  * a: number of nucleotides in this window covered by the features in (c) and (d) - features overlapping with each other will be merged to calculate (a)
  * b: fraction of window size covered by features in (c) and (d) - features overlapping with each other will be merged to calculate (b)
  * c: number of features in the 2nd dataset that fall **completely** within this window
  * d: number of features in the 2nd dataset that **partially** overlap this window
  	 
</help>
</tool>