PageRenderTime 15ms CodeModel.GetById 6ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/filters/axt_to_fasta.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 69 lines | 51 code | 18 blank | 0 comment | 0 complexity | 18190a821de7dea8af3499959b08c6a8 MD5 | raw file
 1<tool id="axt_to_fasta" name="AXT to FASTA">
 2  <description>Converts an AXT formatted file to FASTA format</description>
 3  <command interpreter="python">axt_to_fasta.py $dbkey_1 $dbkey_2 &lt; $axt_input &gt; $out_file1</command>
 4  <inputs>
 5    <param format="axt" name="axt_input" type="data" label="AXT file"/>
 6    <param name="dbkey_1" type="genomebuild" label="Genome"/>
 7    <param name="dbkey_2" type="genomebuild" label="Genome"/>
 8  </inputs>
 9  <outputs>
10    <data format="fasta" name="out_file1" />
11  </outputs>
12  <tests>
13    <test>
14      <param name="axt_input" value="1.axt" ftype="axt" />
15      <param name="dbkey_1" value="hg17" />
16      <param name="dbkey_2" value="panTro1" />
17      <output name="out_file1" file="axt_to_fasta.dat" />
18    </test>
19  </tests>
20  <help>
21
22.. class:: warningmark
23
24**IMPORTANT**: AXT formatted alignments will be phased out from Galaxy in the coming weeks. They will be replaced with pairwise MAF alignments, which are already available. To try pairwise MAF alignments use "Extract Pairwise MAF blocks" tool in *Fetch Sequences and Alignments* section.
25
26--------
27
28
29**Syntax**
30
31This tool converts an AXT formatted file to the FASTA format.
32
33- **AXT format** The alignments are produced from Blastz, an alignment tool available from Webb Miller's lab at Penn State University. The lav format Blastz output, which does not include the sequence, was converted to AXT format with lavToAxt. Each alignment block in an AXT file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.
34
35- **FASTA format** a text-based format for representing both nucleic and protein sequences, in which base pairs or proteins are represented using a single-letter code.
36
37  - This format contains an one line header. It starts with a " >" symbol. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.
38  - The remaining lines contain the sequence itself.
39  - Blank lines in a FASTA file are ignored, and so are spaces or other gap symbols (dashes, underscores, periods) in a sequence.
40  - Fasta files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.
41
42-----
43
44**Example**
45
46- AXT format::
47
48    0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500
49    TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
50    TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
51
52    1 chr19 3008279 3008357 chr11 70573976 70574054 - 3900
53    CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA
54    CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA
55
56- Convert the above file to FASTA format::
57
58    &gt;hg16.chr19(+):3001012-3001075|hg16_0
59    TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
60    &gt;mm5.chr11(-):70568380-70568443|mm5_0
61    TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
62
63    &gt;hg16.chr19(+):3008279-3008357|hg16_1
64    CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA
65    &gt;mm5.chr11(-):70573976-70574054|mm5_1
66    CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA
67
68  </help>
69</tool>