/tools/evolution/add_scores.xml
https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 106 lines · 80 code · 26 blank · 0 comment · 0 complexity · 3d77abe1d5f2c27529e9ea48746eeaf9 MD5 · raw file
- <tool id="hgv_add_scores" name="phyloP" version="1.0.0">
- <description>interspecies conservation scores</description>
- <command interpreter="python">
- add_scores.py "$input1" "$out_file1" "${GALAXY_DATA_INDEX_DIR}/add_scores.loc" "${input1.metadata.dbkey}" "${input1.metadata.chromCol}" "${input1.metadata.startCol}"
- </command>
- <inputs>
- <param format="interval" name="input1" type="data" label="Dataset">
- <validator type="unspecified_build"/>
- <validator type="dataset_metadata_in_file" filename="add_scores.loc" metadata_name="dbkey" metadata_column="0" message="Data is currently not available for the specified build."/>
- </param>
- </inputs>
- <outputs>
- <data format="input" name="out_file1" />
- </outputs>
- <requirements>
- <requirement type="package">add_scores</requirement>
- </requirements>
- <tests>
- <test>
- <param name="input1" value="add_scores_input1.interval" ftype="interval" dbkey="hg18" />
- <output name="output" file="add_scores_output1.interval" />
- </test>
- <test>
- <param name="input1" value="add_scores_input2.bed" ftype="interval" dbkey="hg18" />
- <output name="output" file="add_scores_output2.interval" />
- </test>
- </tests>
- <help>
- .. class:: warningmark
- This currently works only for builds hg18 and hg19.
- -----
- **Dataset formats**
- The input can be any interval_ format dataset. The output is also in interval format.
- (`Dataset missing?`_)
- .. _interval: ${static_path}/formatHelp.html#interval
- .. _Dataset missing?: ${static_path}/formatHelp.html
- -----
- **What it does**
- This tool adds a column that measures interspecies conservation at each SNP
- position, using conservation scores for primates pre-computed by the
- phyloP program. PhyloP performs an exact P-value computation under a
- continuous Markov substitution model.
- The chromosome and start position
- are used to look up the scores, so if a larger interval is in the input,
- only the score for the first nucleotide is returned.
- -----
- **Example**
- - input file, with SNPs::
- chr22 16440426 14440427 C/T
- chr22 15494851 14494852 A/G
- chr22 14494911 14494912 A/T
- chr22 14550435 14550436 A/G
- chr22 14611956 14611957 G/T
- chr22 14612076 14612077 A/G
- chr22 14668537 14668538 C
- chr22 14668703 14668704 A/T
- chr22 14668775 14668776 G
- chr22 14680074 14680075 A/T
- etc.
- - output file, showing conservation scores for primates::
- chr22 16440426 14440427 C/T 0.509
- chr22 15494851 14494852 A/G 0.427
- chr22 14494911 14494912 A/T NA
- chr22 14550435 14550436 A/G NA
- chr22 14611956 14611957 G/T -2.142
- chr22 14612076 14612077 A/G 0.369
- chr22 14668537 14668538 C 0.419
- chr22 14668703 14668704 A/T -1.462
- chr22 14668775 14668776 G 0.470
- chr22 14680074 14680075 A/T 0.303
- etc.
- "NA" means that the phyloP score was not available.
- -----
- **Reference**
- Siepel A, Pollard KS, Haussler D. (2006)
- New methods for detecting lineage-specific selection.
- In Proceedings of the 10th International Conference on Research in Computational
- Molecular Biology (RECOMB 2006), pp. 190-205.
- </help>
- </tool>