PageRenderTime 55ms CodeModel.GetById 51ms app.highlight 1ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/maf/maf_to_fasta.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 199 lines | 156 code | 43 blank | 0 comment | 0 complexity | 6070ce5f2311016787b776b577d7eff0 MD5 | raw file
  1<tool id="MAF_To_Fasta1" name="MAF to FASTA" version="1.0.1">
  2  <description>Converts a MAF formatted file to FASTA format</description>
  3  <command interpreter="python">
  4    #if $fasta_target_type.fasta_type == "multiple" #maf_to_fasta_multiple_sets.py $input1 $out_file1 $fasta_target_type.species $fasta_target_type.complete_blocks
  5    #else                                           #maf_to_fasta_concat.py $fasta_target_type.species $input1 $out_file1
  6    #end if#
  7  </command>
  8  <inputs>
  9    <param format="maf" name="input1" type="data" label="MAF file to convert"/>
 10    <conditional name="fasta_target_type">
 11      <param name="fasta_type" type="select" label="Type of FASTA Output">
 12        <option value="multiple" selected="true">Multiple Blocks</option>
 13        <option value="concatenated">One Sequence per Species</option>
 14      </param>
 15      <when value="multiple">
 16        <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="checked taxa will be included in the output">
 17          <options>
 18            <filter type="data_meta" ref="input1" key="species" />
 19          </options>
 20        </param>
 21	    <param name="complete_blocks" type="select" label="Choose to">
 22	      <option value="partial_allowed">include blocks with missing species</option>
 23	      <option value="partial_disallowed">exclude blocks with missing species</option>
 24	    </param>
 25      </when>
 26      <when value="concatenated">
 27        <param name="species" type="select" label="Species to extract" display="checkboxes" multiple="true">
 28          <options>
 29            <filter type="data_meta" ref="input1" key="species" />
 30          </options>
 31        </param>
 32      </when>
 33    </conditional>
 34  </inputs>
 35  <outputs>
 36    <data format="fasta" name="out_file1" />
 37  </outputs>
 38  <tests>
 39    <test>
 40      <param name="input1" value="3.maf" ftype="maf"/>
 41      <param name="fasta_type" value="concatenated"/>
 42      <param name="species" value="canFam1"/>
 43      <output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta"/>
 44    </test>
 45    <test>
 46      <param name="input1" value="4.maf" ftype="maf"/>
 47      <param name="fasta_type" value="multiple"/>
 48      <param name="species" value="hg17,panTro1,rheMac2,rn3,mm7,canFam2,bosTau2,dasNov1"/>
 49      <param name="complete_blocks" value="partial_allowed"/>
 50      <output name="out_file1" file="cf_maf2fasta_new.dat" ftype="fasta"/>
 51    </test>
 52  </tests>
 53  <help>
 54
 55**Types of MAF to FASTA conversion**
 56
 57 * **Multiple Blocks** converts a single MAF block to a single FASTA block. For example, if you have 6 MAF blocks, they will be converted to 6 FASTA blocks.
 58 * **One Sequence per Species** converts MAF blocks to a single aggregated FASTA block. For example, if you have 6 MAF blocks, they will be converted and concatenated into a single FASTA block.
 59
 60-------
 61
 62**What it does**
 63
 64This tool converts MAF blocks to FASTA format and concatenates them into a single FASTA block or outputs multiple FASTA blocks separated by empty lines.
 65
 66The interface for this tool contains two pages (steps): 
 67
 68 * **Step 1 of 2**. Choose multiple alignments from history to be converted to FASTA format.
 69 * **Step 2 of 2**. Choose the type of output as well as the species from the alignment to be included in the output.
 70 
 71   Multiple Block output has additional options:
 72   
 73   *  **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (all species are selected by default). 
 74   *  **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then such a block **will not** be included in the output (see **Example 2** below).  For example, if you want to extract human, mouse, and rat from a series of alignments and one of the blocks does not contain mouse sequence, then this block will not be converted to FASTA and will not be returned.
 75
 76
 77-----
 78
 79**Example 1**:
 80
 81In the concatenated approach, the following alignment::
 82
 83  ##maf version=1
 84  a score=68686.000000
 85  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 86  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 87  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
 88  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
 89  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
 90
 91  a score=10289.000000
 92  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 93  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 94  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
 95
 96will be converted to (**note** that because mm8 (mouse) and canFam2 (dog) are absent from the second block, they are replaced with gaps after concatenation)::
 97
 98  &gt;canFam2
 99  CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C-------------------------------------
100  &gt;hg18
101  GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
102  &gt;mm8
103  AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC--------------------------------------------
104  &gt;panTro2
105  GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
106  &gt;rheMac2
107  GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
108
109------
110
111**Example 2a**: Multiple Block Approach **Include all species** and **include blocks with missing species**:
112
113The following alignment::
114
115  ##maf version=1
116  a score=68686.000000
117  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
118  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
119  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
120  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
121  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
122
123  a score=10289.000000
124  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
125  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
126  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
127
128will be converted to::
129
130  &gt;hg18.chr20(+):56827368-56827443|hg18_0
131  GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
132  &gt;panTro2.chr20(+):56528685-56528760|panTro2_0
133  GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
134  &gt;rheMac2.chr10(-):89144112-89144181|rheMac2_0
135  GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
136  &gt;mm8.chr2(+):173910832-173910893|mm8_0
137  AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
138  &gt;canFam2.chr24(+):46551822-46551889|canFam2_0
139  CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
140
141  &gt;hg18.chr20(+):56827443-56827480|hg18_1
142  ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
143  &gt;panTro2.chr20(+):56528760-56528797|panTro2_1
144  ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
145  &gt;rheMac2.chr10(-):89144181-89144218|rheMac2_1
146  ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
147
148-----
149
150**Example 2b**: Multiple Block Approach **Include hg18 and mm8** and **exclude blocks with missing species**:
151
152The following alignment::
153
154  ##maf version=1
155  a score=68686.000000
156  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
157  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
158  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
159  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
160  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
161
162  a score=10289.000000
163  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
164  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
165  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
166
167will be converted to (**note** that the second MAF block, which does not have mm8, is not included in the output)::
168
169  &gt;hg18.chr20(+):56827368-56827443|hg18_0
170  GACAGGGTGCATCTGGGAGGGCCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC
171  &gt;mm8.chr2(+):173910832-173910893|mm8_0
172  AGAAGGATCCACCT---------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------
173
174------
175
176.. class:: infomark
177
178**About formats**
179
180 **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes. 
181
182 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
183 - Each sequence in an alignment is on a single line.
184 - Lines starting with # are considered to be comments.
185 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
186 - Some MAF files may contain two optional line types: 
187
188   - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line; 
189   - An "e" line containing information about the size of the gap between the alignments that span the current block.
190
191------
192
193**Citation**
194
195If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. &lt;http://www.ncbi.nlm.nih.gov/pubmed/21775304&gt;`_
196
197
198  </help>
199</tool>