PageRenderTime 48ms CodeModel.GetById 25ms app.highlight 7ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/stats/cor.xml

https://bitbucket.org/ialbert/galaxy-genetrack
XML | 101 lines | 74 code | 24 blank | 3 comment | 0 complexity | 986f3c558d148747521804763440ab47 MD5 | raw file
  1<tool id="cor2" name="Correlation">
  2  <description>for numeric columns</description>
  3  <command interpreter="python">cor.py $input1 $out_file1 $numeric_columns $method</command>
  4  <inputs>
  5    <param format="tabular" name="input1" type="data" label="Dataset" help="Query missing? See TIP below"/>
  6    <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
  7    <param name="method" type="select" label="Method">
  8      <option value="pearson">Pearson</option>
  9      <option value="kendall">Kendall rank</option>
 10      <option value="spearman">Spearman rank</option>
 11    </param>
 12  </inputs>
 13  <outputs>
 14    <data format="txt" name="out_file1" />
 15  </outputs>
 16  <requirements>
 17    <requirement type="python-module">rpy</requirement>
 18  </requirements>
 19  <tests>
 20    <!--
 21    Test a tabular input with the first line being a comment without a # character to start
 22    -->
 23    <test>
 24      <param name="input1" value="cor.tabular" />
 25      <param name="numeric_columns" value="2,3" />
 26      <param name="method" value="pearson" />
 27      <output name="out_file1" file="cor_out.txt" />
 28    </test>
 29  </tests>
 30  <help>
 31
 32.. class:: infomark
 33
 34**TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
 35
 36.. class:: warningmark
 37
 38Missing data ("nan") removed from each pairwise comparison
 39
 40-----
 41
 42**Syntax**
 43
 44This tool computes the matrix of correlation coefficients between numeric columns.
 45
 46- All invalid, blank and comment lines are skipped when performing computations.  The number of skipped lines is displayed in the resulting history item.
 47
 48- **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
 49
 50    .. image:: ../static/images/pearson.png
 51
 52    where n is the number of items
 53
 54- **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
 55
 56    .. image:: ../static/images/kendall.png
 57
 58    where n is the number of items, and P is the sum.
 59
 60- **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
 61
 62    .. image:: ../static/images/spearman.png
 63
 64    where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
 65
 66-----
 67
 68**Example**
 69
 70- Input file::
 71
 72    #Person	Height	Self Esteem
 73    1		68		4.1
 74    2 		71 		4.6
 75    3 		62 		3.8
 76    4 		75 		4.4
 77    5 		58 		3.2
 78    6 		60 		3.1
 79    7 		67 		3.8
 80    8 		68 		4.1
 81    9 		71 		4.3
 82    10 		69 		3.7
 83    11 		68 		3.5
 84    12 		67 		3.2
 85    13 		63 		3.7
 86    14 		62 		3.3
 87    15 		60 		3.4
 88    16 		63 		4.0
 89    17 		65 		4.1
 90    18 		67 		3.8
 91    19 		63 		3.4
 92    20 		61 		3.6
 93
 94- Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
 95
 96    1.0	0.730635686279
 97    0.730635686279	1.0
 98
 99  So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
100  </help>
101</tool>