/tools/multivariate_stats/cca.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 95 lines · 79 code · 16 blank · 0 comment · 0 complexity · d1f6db780450b2294d7d589135b0b8f6 MD5 · raw file

  1. <tool id="cca1" name="Canonical Correlation Analysis" version="1.0.0">
  2. <description> </description>
  3. <command interpreter="python">
  4. cca.py
  5. $input1
  6. $x_cols
  7. $y_cols
  8. $x_scale
  9. $y_scale
  10. $std_scores
  11. $out_file1
  12. $out_file2
  13. </command>
  14. <inputs>
  15. <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
  16. <param name="x_cols" label="Select columns containing X variables " type="data_column" data_ref="input1" numerical="True" multiple="true" >
  17. <validator type="no_options" message="Please select at least one column."/>
  18. </param>
  19. <param name="y_cols" label="Select columns containing Y variables " type="data_column" data_ref="input1" numerical="True" multiple="true" >
  20. <validator type="no_options" message="Please select at least one column."/>
  21. </param>
  22. <param name="x_scale" type="select" label="Type of Scaling for X variables" help="Can be used to center and/or scale variables">
  23. <option value="none" selected="true">None</option>
  24. <option value="center">Center only</option>
  25. <option value="scale">Scale only</option>
  26. <option value="both">Center and Scale</option>
  27. </param>
  28. <param name="y_scale" type="select" label="Type of Scaling for Y variables" help="Can be used to center and/or scale variables">
  29. <option value="none" selected="true">None</option>
  30. <option value="center">Center only</option>
  31. <option value="scale">Scale only</option>
  32. <option value="both">Center and Scale</option>
  33. </param>
  34. <param name="std_scores" type="select" label="Report standardized scores?" help="Selecting 'Yes' will rescale scores (and coefficients) to produce scores of unit variance">
  35. <option value="no" selected="true">No</option>
  36. <option value="yes">Yes</option>
  37. </param>
  38. </inputs>
  39. <outputs>
  40. <data format="input" name="out_file1" metadata_source="input1" />
  41. <data format="pdf" name="out_file2" />
  42. </outputs>
  43. <requirements>
  44. <requirement type="python-module">rpy</requirement>
  45. </requirements>
  46. <tests>
  47. <test>
  48. <param name="input1" value="iris.tabular"/>
  49. <param name="x_cols" value="3,4"/>
  50. <param name="y_cols" value="1,2"/>
  51. <param name="x_scale" value="both"/>
  52. <param name="y_scale" value="scale"/>
  53. <param name="std_scores" value="yes"/>
  54. <output name="out_file1" file="cca_out1.tabular"/>
  55. <output name="out_file2" file="cca_out2.pdf"/>
  56. </test>
  57. </tests>
  58. <help>
  59. .. class:: infomark
  60. **TIP:** If your data is not TAB delimited, use *Edit Datasets-&gt;Convert characters*
  61. -----
  62. .. class:: infomark
  63. **What it does**
  64. This tool uses functions from 'yacca' library from R statistical package to perform Canonical Correlation Analysis (CCA) on the input data. It outputs two files, one containing the summary statistics of the performed CCA, and the other containing helioplots, which display structural loadings of X and Y variables on different canonical components.
  65. *Carter T. Butts (2009). yacca: Yet Another Canonical Correlation Analysis Package. R package version 1.1.*
  66. -----
  67. .. class:: warningmark
  68. **Note**
  69. - This tool currently treats all predictor and response variables as continuous numeric variables. Running the tool on categorical variables might result in incorrect results.
  70. - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
  71. - The summary statistics in the output are described below:
  72. - correlation: Canonical correlation between the canonical variates (i.e. transformed variables)
  73. - F-statistic: F-value obtained from F Test for Canonical Correlations Using Rao's Approximation
  74. - p-value: denotes significance of canonical correlations
  75. - Coefficients: represent the coefficients of X and Y variables on each canonical variate
  76. - Loadings: represent the correlations between the original variables in each set and their respective canonical variates
  77. - CrossLoadings: represent the correlations between the original variables in each set and the opposite canonical variates
  78. </help>
  79. </tool>