PageRenderTime 6ms CodeModel.GetById 1ms app.highlight 2ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/filters/axt_to_lav.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 94 lines | 71 code | 23 blank | 0 comment | 0 complexity | 3669f5cd22f008b55af89bd1a3017ffb MD5 | raw file
 1<tool id="axt_to_lav_1" name="AXT to LAV">
 2  <description>Converts an AXT formatted file to LAV format</description>
 3  <command interpreter="python">axt_to_lav.py /galaxy/data/$dbkey_1/seq/%s.nib:$dbkey_1:${GALAXY_DATA_INDEX_DIR}/shared/ucsc/chrom/${dbkey_1}.len /galaxy/data/$dbkey_2/seq/%s.nib:$dbkey_2:${GALAXY_DATA_INDEX_DIR}/shared/ucsc/chrom/${dbkey_2}.len $align_input $lav_file $seq_file1 $seq_file2</command>
 4  <inputs>
 5    <param name="align_input" type="data" format="axt" label="Alignment File" optional="False"/>
 6    <param name="dbkey_1" type="genomebuild" label="Genome"/>
 7    <param name="dbkey_2" type="genomebuild" label="Genome"/>
 8  </inputs>
 9  <outputs>
10    <data name="lav_file" format="lav"/>
11    <data name="seq_file1" format="fasta" parent="lav_file"/>
12    <data name="seq_file2" format="fasta" parent="lav_file"/>
13  </outputs>
14  <help>
15
16.. class:: warningmark
17
18**IMPORTANT**: AXT formatted alignments will be phased out from Galaxy in the coming weeks. They will be replaced with pairwise MAF alignments, which are already available. To try pairwise MAF alignments use "Extract Pairwise MAF blocks" tool in *Fetch Sequences and Alignments* section.
19
20--------
21
22
23**Syntax**
24
25This tool converts an AXT formatted file to the LAV format.
26
27- **AXT format** The alignments are produced from Blastz, an alignment tool available from Webb Miller's lab at Penn State University. The lav format Blastz output, which does not include the sequence, was converted to AXT format with lavToAxt. Each alignment block in an AXT file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.
28
29- **LAV format** LAV is an alignment format developed by Webb Miller's group. It is the primary output format for BLASTZ.
30
31- **FASTA format** a text-based format for representing both nucleic and protein sequences, in which base pairs or proteins are represented using a single-letter code.
32
33  - This format contains an one line header. It starts with a ">" symbol. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.
34  - The remaining lines contain the sequence itself.
35  - Blank lines in a FASTA file are ignored, and so are spaces or other gap symbols (dashes, underscores, periods) in a sequence.
36  - Fasta files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.
37
38-----
39
40**Example**
41
42- AXT format::
43
44    0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500
45    TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
46    TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
47
48    1 chr19 3008279 3008357 chr11 70573976 70574054 - 3900
49    CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA
50    CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA
51
52- Convert the above file to LAV format::
53
54    #:lav
55    s {
56      &quot;/galaxy/data/hg16/seq/chr19.nib&quot; 1 63811651 0 1
57      &quot;/galaxy/data/mm5/seq/chr11.nib-&quot; 1 121648857 0 1
58    }
59    h {
60      &quot;> hg16.chr19&quot;
61      &quot;> mm5.chr11 (reverse complement)&quot;
62    }
63    a {
64      s 3500
65      b 3001012 70568380
66      e 3001075 70568443
67      l 3001012 70568380 3001075 70568443 81
68    }
69    a {
70      s 3900
71      b 3008279 70573976
72      e 3008357 70574054
73      l 3008279 70573976 3008357 70574054 78
74    }
75    #:eof
76
77- With two files in the FASTA format::
78
79    &gt;hg16.chr19_-_3001011_3001075
80    TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
81    
82    &gt;hg16.chr19_-_3008278_3008357
83    CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA
84    
85 **and**::
86    
87    &gt;mm5.chr11_-_70568379_70568443
88    TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
89    
90    &gt;mm5.chr11_-_70573975_70574054
91    CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA
92  </help>
93  <code file="axt_to_lav_code.py"/>
94</tool>