PageRenderTime 15ms CodeModel.GetById 9ms app.highlight 2ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/maf/maf_to_interval.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 133 lines | 96 code | 37 blank | 0 comment | 0 complexity | 3e100aab7a468f887087e01d84c40527 MD5 | raw file
  1<tool id="MAF_To_Interval1" name="MAF to Interval" force_history_refresh="True">
  2  <description>Converts a MAF formatted file to the Interval format</description>
  3  <command interpreter="python">maf_to_interval.py $input1 $out_file1 $out_file1.id $__new_file_path__ $input1.dbkey $species $input1.metadata.species $complete_blocks $remove_gaps</command>
  4  <inputs>
  5    <param format="maf" name="input1" type="data" label="MAF file to convert"/>
  6    <param name="species" type="select" label="Select additional species" display="checkboxes" multiple="true" help="The species matching the dbkey of the alignment is always included. A separate history item will be created for each species.">
  7      <options>
  8        <filter type="data_meta" ref="input1" key="species" />
  9        <filter type="remove_value" meta_ref="input1" key="dbkey" />
 10      </options>
 11    </param>
 12    <param name="complete_blocks" type="select" label="Exclude blocks which have a species missing">
 13      <option value="partial_allowed">include blocks with missing species</option>
 14      <option value="partial_disallowed">exclude blocks with missing species</option>
 15    </param>
 16    <param name="remove_gaps" type="select" label="Remove Gap characters from sequences">
 17      <option value="keep_gaps">keep gaps</option>
 18      <option value="remove_gaps">remove gaps</option>
 19    </param>
 20  </inputs>
 21  <outputs>
 22    <data format="interval" name="out_file1" />
 23  </outputs>
 24  <tests>
 25    <test>
 26      <param name="input1" value="4.maf" dbkey="hg17"/>
 27      <param name="complete_blocks" value="partial_disallowed"/>
 28      <param name="remove_gaps" value="keep_gaps"/>
 29      <param name="species" value="panTro1" />
 30      <output name="out_file1" file="maf_to_interval_out_hg17.interval"/>
 31      <output name="out_file1" file="maf_to_interval_out_panTro1.interval"/>
 32    </test>
 33  </tests>
 34  <help>
 35
 36**What it does**
 37
 38This tool converts every MAF block to a set of genomic intervals describing the position of that alignment block within a corresponding genome. Sequences from aligning species are also included in the output.
 39
 40The interface for this tool contains several options: 
 41
 42 * **MAF file to convert**. Choose multiple alignments from history to be converted to BED format.
 43 * **Choose species**. Choose additional species from the alignment to be included in the output 
 44 * **Exclude blocks which have a species missing**. if an alignment block does not contain any one of the species found in the alignment set and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below).
 45 * **Remove Gap characters from sequences**. Gaps can be removed from sequences before they are output.
 46
 47
 48-----
 49
 50**Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**:
 51
 52For the following alignment::
 53
 54  ##maf version=1
 55  a score=68686.000000
 56  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 57  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 58  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
 59  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
 60  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
 61
 62  a score=10289.000000
 63  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 64  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 65  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
 66
 67the tool will create **a single** history item containing the following (**note** the name field is numbered iteratively: hg18_0_0, hg18_1_0 etc. where the first number is the block number and the second number is the iteration through the block (if a species appears twice in a block, that interval will be repeated) and sequences for each species are included in the order specified in the header: the field is left empty when no sequence is available for that species)::
 68
 69  #chrom	start	end	strand	score	name	canFam2	hg18	mm8	panTro2	rheMac2
 70  chr20	56827368	56827443	+	68686.0	hg18_0_0	CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C	GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------	GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
 71  chr20	56827443	56827480	+	10289.0	hg18_1_0		ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG		ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG	ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
 72
 73
 74-----
 75
 76**Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**:
 77
 78For the following alignment::
 79
 80  ##maf version=1
 81  a score=68686.000000
 82  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 83  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 84  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
 85  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
 86  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
 87
 88  a score=10289.000000
 89  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 90  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 91  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
 92
 93the tool will create **two** history items (one for hg18 and one for mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8):
 94
 95History item **1** (for hg18)::
 96
 97   #chrom	start	end	strand	score	name	canFam2	hg18	mm8	panTro2	rheMac2
 98   chr20	56827368	56827443	+	68686.0	hg18_0_0	CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C	GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------	GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
 99
100
101History item **2** (for mm8)::
102
103   #chrom	start	end	strand	score	name	canFam2	hg18	mm8	panTro2	rheMac2
104   chr2	173910832	173910893	+	68686.0	mm8_0_0	CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C	GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------	GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-	GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
105
106
107-------
108
109.. class:: infomark
110
111**About formats**
112
113**MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes. 
114
115 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
116 - Each sequence in an alignment is on a single line.
117 - Lines starting with # are considered to be comments.
118 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
119 - Some MAF files may contain two optional line types: 
120
121   - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line; 
122   - An "e" line containing information about the size of the gap between the alignments that span the current block.
123
124------
125
126**Citation**
127
128If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. &lt;http://www.ncbi.nlm.nih.gov/pubmed/21775304&gt;`_
129
130
131    </help>
132</tool>
133