PageRenderTime 39ms CodeModel.GetById 30ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/maf/maf_to_bed.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 136 lines | 96 code | 40 blank | 0 comment | 0 complexity | f80264f7866ee6392aac65b6a2732c4f MD5 | raw file
  1<tool id="MAF_To_BED1" name="Maf to BED" force_history_refresh="True">
  2  <description>Converts a MAF formatted file to the BED format</description>
  3  <command interpreter="python">maf_to_bed.py $input1 $out_file1 $species $complete_blocks $__new_file_path__</command>
  4  <inputs>
  5    <param format="maf" name="input1" type="data" label="MAF file to convert"/>
  6    <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="a separate history item will be created for each checked species">
  7      <options>
  8        <filter type="data_meta" ref="input1" key="species" />
  9      </options>
 10    </param>
 11    <param name="complete_blocks" type="select" label="Exclude blocks which have a requested species missing">
 12      <option value="partial_allowed">include blocks with missing species</option>
 13      <option value="partial_disallowed">exclude blocks with missing species</option>
 14    </param>
 15  </inputs>
 16  <outputs>
 17    <data format="bed" name="out_file1" />
 18  </outputs>
 19  <tests>
 20    <test>
 21      <param name="input1" value="4.maf"/>
 22      <param name="species" value="hg17"/>
 23      <param name="complete_blocks" value="partial_disallowed"/>
 24      <output name="out_file1" file="cf_maf_to_bed.dat"/>
 25    </test>
 26  </tests>
 27  <help>
 28
 29**What it does**
 30
 31This tool converts every MAF block to an interval line (in BED format; scroll down for description of MAF and BED formats) describing position of that alignment block within a corresponding genome. 
 32
 33The interface for this tool contains two pages (steps): 
 34
 35 * **Step 1 of 2**. Choose multiple alignments from history to be converted to BED format.
 36 * **Step 2 of 2**. Choose species from the alignment to be included in the output and specify how to deal with alignment blocks that lack one or more species:
 37
 38   *  **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (only reference genome, shown in **bold**, is selected by default). If you select more than one species, then more than one history item will be created.
 39   *  **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below).  
 40
 41
 42-----
 43
 44**Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**:
 45
 46For the following alignment::
 47
 48  ##maf version=1
 49  a score=68686.000000
 50  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 51  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 52  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
 53  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
 54  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
 55
 56  a score=10289.000000
 57  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 58  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 59  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
 60
 61the tool will create **a single** history item containing the following (**note** that field 4 is added to the output and is numbered iteratively: hg18_0, hg18_1 etc.)::
 62
 63  chr20    56827368    56827443   hg18_0   0   +
 64  chr20    56827443    56827480   hg18_1   0   +
 65
 66-----
 67
 68**Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**:
 69
 70For the following alignment::
 71
 72  ##maf version=1
 73  a score=68686.000000
 74  s hg18.chr20     56827368 75 +  62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 75  s panTro2.chr20  56528685 75 +  62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- 
 76  s rheMac2.chr10  89144112 69 -  94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- 
 77  s mm8.chr2      173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- 
 78  s canFam2.chr24  46551822 67 +  50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C 
 79
 80  a score=10289.000000
 81  s hg18.chr20    56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 82  s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG 
 83  s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG 
 84
 85the tool will create **two** history items (one for hg18 and one fopr mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8):
 86
 87History item **1** (for hg18)::
 88
 89   chr20    56827368    56827443   hg18_0   0   +
 90
 91History item **2** (for mm8)::
 92
 93   chr2    173910832   173910893    mm8_0   0   +
 94
 95-------
 96
 97.. class:: infomark
 98
 99**About formats**
100
101**MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes. 
102
103 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
104 - Each sequence in an alignment is on a single line.
105 - Lines starting with # are considered to be comments.
106 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
107 - Some MAF files may contain two optional line types: 
108
109   - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line; 
110   - An "e" line containing information about the size of the gap between the alignments that span the current block.
111
112**BED format** Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and a number of additional optional ones:
113
114The first three BED fields (required) are::
115
116    1. chrom - The name of the chromosome (e.g. chr1, chrY_random).
117    2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.)
118    3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).
119
120Additional (optional) fields are::
121
122    4. name - The name of the BED line.
123    5. score - A score between 0 and 1000.
124    6. strand - Defines the strand - either '+' or '-'.
125
126------
127
128**Citation**
129
130If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. &lt;http://www.ncbi.nlm.nih.gov/pubmed/21775304&gt;`_
131
132
133    </help>
134    <code file="maf_to_bed_code.py"/>
135</tool>
136