/tools/evolution/add_scores.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 106 lines · 80 code · 26 blank · 0 comment · 0 complexity · 3d77abe1d5f2c27529e9ea48746eeaf9 MD5 · raw file

  1. <tool id="hgv_add_scores" name="phyloP" version="1.0.0">
  2. <description>interspecies conservation scores</description>
  3. <command interpreter="python">
  4. add_scores.py "$input1" "$out_file1" "${GALAXY_DATA_INDEX_DIR}/add_scores.loc" "${input1.metadata.dbkey}" "${input1.metadata.chromCol}" "${input1.metadata.startCol}"
  5. </command>
  6. <inputs>
  7. <param format="interval" name="input1" type="data" label="Dataset">
  8. <validator type="unspecified_build"/>
  9. <validator type="dataset_metadata_in_file" filename="add_scores.loc" metadata_name="dbkey" metadata_column="0" message="Data is currently not available for the specified build."/>
  10. </param>
  11. </inputs>
  12. <outputs>
  13. <data format="input" name="out_file1" />
  14. </outputs>
  15. <requirements>
  16. <requirement type="package">add_scores</requirement>
  17. </requirements>
  18. <tests>
  19. <test>
  20. <param name="input1" value="add_scores_input1.interval" ftype="interval" dbkey="hg18" />
  21. <output name="output" file="add_scores_output1.interval" />
  22. </test>
  23. <test>
  24. <param name="input1" value="add_scores_input2.bed" ftype="interval" dbkey="hg18" />
  25. <output name="output" file="add_scores_output2.interval" />
  26. </test>
  27. </tests>
  28. <help>
  29. .. class:: warningmark
  30. This currently works only for builds hg18 and hg19.
  31. -----
  32. **Dataset formats**
  33. The input can be any interval_ format dataset. The output is also in interval format.
  34. (`Dataset missing?`_)
  35. .. _interval: ${static_path}/formatHelp.html#interval
  36. .. _Dataset missing?: ${static_path}/formatHelp.html
  37. -----
  38. **What it does**
  39. This tool adds a column that measures interspecies conservation at each SNP
  40. position, using conservation scores for primates pre-computed by the
  41. phyloP program. PhyloP performs an exact P-value computation under a
  42. continuous Markov substitution model.
  43. The chromosome and start position
  44. are used to look up the scores, so if a larger interval is in the input,
  45. only the score for the first nucleotide is returned.
  46. -----
  47. **Example**
  48. - input file, with SNPs::
  49. chr22 16440426 14440427 C/T
  50. chr22 15494851 14494852 A/G
  51. chr22 14494911 14494912 A/T
  52. chr22 14550435 14550436 A/G
  53. chr22 14611956 14611957 G/T
  54. chr22 14612076 14612077 A/G
  55. chr22 14668537 14668538 C
  56. chr22 14668703 14668704 A/T
  57. chr22 14668775 14668776 G
  58. chr22 14680074 14680075 A/T
  59. etc.
  60. - output file, showing conservation scores for primates::
  61. chr22 16440426 14440427 C/T 0.509
  62. chr22 15494851 14494852 A/G 0.427
  63. chr22 14494911 14494912 A/T NA
  64. chr22 14550435 14550436 A/G NA
  65. chr22 14611956 14611957 G/T -2.142
  66. chr22 14612076 14612077 A/G 0.369
  67. chr22 14668537 14668538 C 0.419
  68. chr22 14668703 14668704 A/T -1.462
  69. chr22 14668775 14668776 G 0.470
  70. chr22 14680074 14680075 A/T 0.303
  71. etc.
  72. "NA" means that the phyloP score was not available.
  73. -----
  74. **Reference**
  75. Siepel A, Pollard KS, Haussler D. (2006)
  76. New methods for detecting lineage-specific selection.
  77. In Proceedings of the 10th International Conference on Research in Computational
  78. Molecular Biology (RECOMB 2006), pp. 190-205.
  79. </help>
  80. </tool>