/tools/regVariation/categorize_elements_satisfying_criteria.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 78 lines · 57 code · 21 blank · 0 comment · 0 complexity · 5ff7ae14a22117c01dea3f5528a815b5 MD5 · raw file

  1. <tool id="categorize_elements_satisfying_criteria" name="Categorize Elements" version="1.0.0">
  2. <description>satisfying criteria</description>
  3. <command interpreter="perl">
  4. categorize_elements_satisfying_criteria.pl $inputFile1 $inputFile2 $outputFile1
  5. </command>
  6. <inputs>
  7. <param format="tabular" name="inputFile1" type="data" label="Select file containing categories and their elements"/>
  8. <param format="tabular" name="inputFile2" type="data" label="Select file containing criteria and elements data"/>
  9. </inputs>
  10. <outputs>
  11. <data format="tabular" name="outputFile1"/>
  12. </outputs>
  13. <tests>
  14. <test>
  15. <param name="inputFile1" value="categories.tabular" ftype="tabular" />
  16. <param name="inputFile2" value="criteria_elements_data.tabular" ftype="tabular" />
  17. <output name="outputFile1" file="categorized_elements.tabular" />
  18. </test>
  19. </tests>
  20. <help>
  21. .. class:: infomark
  22. **What it does**
  23. The program takes as input a set of categories, such that each category contains many elements. It also takes a table relating elements with criteria, such that each element is assigned a number representing the number of times the element satisfies a certain criterion.
  24. - The first input is a TABULAR format file, such that the left column represents the names of categories and, all other columns represent the names of elements in each category.
  25. - The second input is a TABULAR format file relating elements with criteria, such that the first line represents the names of criteria and the left column represents the names of elements.
  26. - The output is a TABULAR format file relating catergories with criteria, such that each categoy is assigned a number representing the total number of times its elements satisfies a certain criterion.. Each category is assigned as many numbers as criteria.
  27. **Example**
  28. Let the first input file be a group of motif categories as follows::
  29. Deletion_Hotspots deletionHoptspot1 deletionHoptspot2 deletionHoptspot3
  30. Dna_Pol_Pause_Frameshift dnaPolPauseFrameshift1 dnaPolPauseFrameshift2 dnaPolPauseFrameshift3 dnaPolPauseFrameshift4
  31. Indel_Hotspots indelHotspot1
  32. Insertion_Hotspots insertionHotspot1 insertionHotspot2
  33. Topoisomerase_Cleavage_Sites topoisomeraseCleavageSite1 topoisomeraseCleavageSite2 topoisomeraseCleavageSite3
  34. And let the second input file represent the number of times each motif occurs in a certain window size of indel flanking regions, as follows::
  35. 10bp 20bp 40bp
  36. deletionHoptspot1 1 1 2
  37. deletionHoptspot2 1 1 1
  38. deletionHoptspot3 0 0 0
  39. dnaPolPauseFrameshift1 1 1 1
  40. dnaPolPauseFrameshift2 0 2 1
  41. dnaPolPauseFrameshift3 0 0 0
  42. dnaPolPauseFrameshift4 0 1 2
  43. indelHotspot1 0 0 0
  44. insertionHotspot1 0 0 1
  45. insertionHotspot2 1 1 1
  46. topoisomeraseCleavageSite1 1 1 1
  47. topoisomeraseCleavageSite2 1 2 1
  48. topoisomeraseCleavageSite3 0 0 2
  49. Running the program will give the total number of times the motifs of each category occur in every window size of indel flanking regions::
  50. 10bp 20bp 40bp
  51. Deletion_Hotspots 2 2 3
  52. Dna_Pol_Pause_Frameshift 1 4 4
  53. Indel_Hotspots 0 0 0
  54. Insertion_Hotspots 1 1 2
  55. Topoisomerase_Cleavage_Sites 2 3 4
  56. </help>
  57. </tool>