PageRenderTime 19ms CodeModel.GetById 10ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/regVariation/delete_overlapping_indels.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 66 lines | 48 code | 18 blank | 0 comment | 0 complexity | 9dea0c35c519da38cc87617afb5ca2f1 MD5 | raw file
 1<tool id="delete_overlapping_indels" name="Delete Overlapping Indels" version="1.0.0">
 2  <description>from a chromosome indels file</description>
 3  
 4  <command interpreter="perl">
 5  	delete_overlapping_indels.pl $inputFile1 $inputIndelStartColumnNumber2 $inputIndelEndColumnNumber3 $outputFile1
 6  </command>
 7
 8  <inputs>
 9  	<param format="tabular" name="inputFile1" type="data" label="Select indels file"/>
10  	<param type="data_column" name="inputIndelStartColumnNumber2" data_ref="inputFile1" accept_default="true" label="Choose the indel start coordinate column number" />
11    <param type="data_column" name="inputIndelEndColumnNumber3" data_ref="inputFile1" accept_default="true" label="Choose the the indel end coordinate column number" />
12  </inputs>
13  
14  <outputs>
15    <data format="tabular" name="outputFile1"/>
16  </outputs>
17  
18  <tests>
19  	<test>
20  		<param name="inputFile1" value="indels1.tabular" />
21    	<param name="inputIndelStartColumnNumber2" value="5" />
22    	<param name="inputIndelEndColumnNumber3" value="6" />
23    	<output name="outputFile1" file="non_overlapping_indels1.tabular" />     
24  	</test>
25  </tests>
26  
27  <help> 
28
29.. class:: infomark
30
31**What it does**
32
33This program detects overlapping indels in a chromosome and keeps all non-overlapping indels. As for overlapping indels, the first encountered one is kept and all others are removed. 
34It requires three inputs: 
35
36- The first input is a TABULAR format file containing coordinates of indels in blocks extracted from multi-alignment.
37- The second input is an integer number representing the number of the column where indel start coordinates are stored in the input file.
38- The third input is an integer number representing the number of the column where indel end coordinates are stored in the input file.
39- The output is a TABULAR format file containing all non-overlapping indels in the input file, and the first encountered indel of overlapping ones.
40
41Note: The number of the first column is 1.
42
43
44**Example**
45
46Let us have the following insertions in the human genome. The start and end coordinates of insertions are on columns 5 and 6 respectively::
47
48	3	hg18.chr22_insert	3	hg18.chr22	14508610	14508612	3924	-	panTro2.chr2b	132518950	132518951	3910	+	rheMac2.chr17	14311798	14311799	3896	+
49	7	hg18.chr22_insert	13	hg18.chr22	14513678	14513690	348	-	panTro2.chr2b	132517876	132517877	321	+	rheMac2.chr17	14274462	14274463	337	+
50	7	hg18.chr22_insert	6	hg18.chr22	14513688	14513699	348	-	panTro2.chr2b	132517879	132517880	321	+	rheMac2.chr17	14274465	14274466	337	+
51	25	hg18.chr22_insert	9	hg18.chr22	14529501	14529509	385	-	panTro2.chr22	14528775	14528776	376	-	rheMac2.chr9	42869449	42869450	375	-
52	36	hg18.chr22_insert	4	hg18.chr22	14566316	14566319	540	-	panTro2.chr2b	132492077	132492078	533	+	rheMac2.chr10	59230438	59230439	533	-
53	40	hg18.chr22_insert	7	hg18.chr22	14508610	14508616	2337	-	panTro2.chr2b	132487750	132487751	2313	+	rheMac2.chr10	59128305	59128306	2332	+
54	41	hg18.chr22_insert	4	hg18.chr22	14571556	14571559	2483	-	panTro2.chr2b	132485878	132485879	2481	+	rheMac2.chr10	59126094	59126095	2508	+
55
56By removing the overlapping indels which, we get::
57
58	3	hg18.chr22_insert	3	hg18.chr22	14508610	14508612	3924	-	panTro2.chr2b	132518950	132518951	3910	+	rheMac2.chr17	14311798	14311799	3896	+
59	7	hg18.chr22_insert	13	hg18.chr22	14513678	14513690	348	-	panTro2.chr2b	132517876	132517877	321	+	rheMac2.chr17	14274462	14274463	337	+
60	25	hg18.chr22_insert	9	hg18.chr22	14529501	14529509	385	-	panTro2.chr22	14528775	14528776	376	-	rheMac2.chr9	42869449	42869450	375	-
61	36	hg18.chr22_insert	4	hg18.chr22	14566316	14566319	540	-	panTro2.chr2b	132492077	132492078	533	+	rheMac2.chr10	59230438	59230439	533	-
62	41	hg18.chr22_insert	4	hg18.chr22	14571556	14571559	2483	-	panTro2.chr2b	132485878	132485879	2481	+	rheMac2.chr10	59126094	59126095	2508	+
63
64  </help>  
65  
66</tool>