PageRenderTime 17ms CodeModel.GetById 7ms app.highlight 4ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/regVariation/compute_motifs_frequency.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 109 lines | 73 code | 36 blank | 0 comment | 0 complexity | 9f0c2dc82febf40c4c17c789802bc72b MD5 | raw file
  1<tool id="compute_motifs_frequency" name="Compute Motif Frequencies" version="1.0.0">
  2  <description>in indel flanking regions</description>
  3  
  4  
  5  <command interpreter="perl">
  6    compute_motifs_frequency.pl $inputFile1 $inputFile2 $inputNumber3 $outputFile1 $outputFile2
  7  </command>
  8  
  9  
 10  <inputs>
 11 
 12    <param format="tabular" name="inputFile1" type="data" label="Select motifs file"/>
 13
 14    <param format="tabular" name="inputFile2" type="data" label="Select indel flanking regions file from your history"/>
 15      
 16    <param type="integer" name="inputNumber3" size="5" value="0" label="What is the size of each window?" help="'0' = all the upstream flanking sequence will be one window only, and the same for the downstream flanking sequence."/>
 17        
 18  </inputs>
 19  
 20  
 21  <outputs>
 22    <data format="tabular" name="outputFile1"/>
 23    <data format="tabular" name="outputFile2"/>
 24  </outputs>
 25  
 26  <tests>
 27  	<test>
 28  		<param name="inputFile1" value="motifs1.tabular" />
 29  		<param name="inputFile2" value="indelsFlankingSequences1.tabular" />
 30    	<param name="inputNumber3" value="0" />
 31    	<output name="outputFile1" file="flankingSequencesWindows0.tabular" />
 32    	<output name="outputFile2" file="motifFrequencies0.tabular" />    
 33  	</test>
 34  	
 35  	<test>
 36  		<param name="inputFile1" value="motifs1.tabular" />
 37  		<param name="inputFile2" value="indelsFlankingSequences1.tabular" />
 38    	<param name="inputNumber3" value="10" />
 39    	<output name="outputFile1" file="flankingSequencesWindows10.tabular" /> 
 40    	<output name="outputFile2" file="motifFrequencies10.tabular" />    
 41  	</test>
 42  </tests>
 43
 44   
 45   <help>
 46
 47.. class:: infomark
 48
 49**What it does**
 50
 51This program computes the frequency of motifs in the flanking regions of indels found in a chromosome or a genome.
 52Each indel has an upstream flanking sequence and a downstream flanking one. Each of the upstream and downstream flanking 
 53sequences will be divided into a certain number of windows according to the window size input by the user. 
 54The frequency of a motif in a certain window in one of the two flanking sequences is the total sum of occurrences of 
 55that motif in that window of that flanking sequence over all indels. The indel flanking regions file will be taken
 56from your history or it will be uploaded, whereas the motifs file should be uploaded.
 57
 58- The first input file is the motifs file and it is a tabular file consisting of two columns:
 59
 60 - the first column represents the motif name
 61 - the second column represents the motif sequence, as follows::
 62  
 63	dnaPolPauseFrameshift1	GAG
 64	dnaPolPauseFrameshift2	ACG
 65	xSites1			CCG
 66
 67- The second input file is the indels flanking regions file and it is a tabular file consisting of five columns:
 68
 69 - the first column represents the indel start coordinate
 70 - the second column represents the indel end coordinate
 71 - the third column represents the indel length
 72 - the fourth column represents the upstream flanking sequence
 73 - the fifth column represents the upstream flanking sequence, as follows::
 74  
 75  	16694766   16694768   3   GTGGGTCCTGCCCAGCCTCTGCCTCAGAGGGAAGAGTAGAGAACTGGG   AGAGCAGGTCCTTAGGGAGCCCGAGGAAGTCCCTGACGCCAGCTGTTCTCGCGGACGAA
 76	25169542   25169545   4   caagcccacaagccttcagaccatagcaCGGGCTCCAGAGGTGTGAGG   CAGGTCAGGTGCTTTAGAAGTCAAAAACTCTCAGTAAGGCAAATCACCCCCTATCTCCT
 77	41929580   41929585   6   ggctgtcgtatggaatctggggctcaggactctgtcccatttctctaa   accattctgcTTCAACCCAGACACTGACTGTTTTCCAAATTTACTTGTTTGTTTGTTTT
 78
 79
 80-----
 81
 82.. class:: warningmark
 83
 84**Notes**
 85
 86- The lengths of the upstream flanking sequences must be equal for all indels.
 87- The lengths of the downstream flanking sequences must be equal for all indels.
 88- If the length of the upstream flanking sequence L is not an integer multiple of the window size S, in other words if L/S = m + r where m is the result of division and r is the remainder, then the upstream flanking sequence will be divided into m windows only starting from the indel, and the rest of the sequence will not be considered. The same rule applies to the downstream flanking sequence. 
 89
 90-----
 91
 92The **output** of this program is two files:
 93
 94- The first output file is a tabular file and represents the windows of both upstream  and downstream flanking sequences. It consists of multiple left columns representing the windows of the upstream flanking sequence, followed by one column representing the indels, then followed by multiple right columns representing the windows of the downstream flanking sequence, as follows::
 95
 96	cgaggtcagg	agatcgagac	catcctggct	aacatggtga	aatcccgtct	ctactaaaaa	indel	aaatttatat	ttataaacaa	ttttaataca	cctatgttta	ttatacattt
 97	GCCAGTTTAT	GGTCTAACAA	GGAGAGAAAC	AGGGGGCTGA	AGGGGTTTCT	TAACCTCCAG	indel	TTCCGGGCTC	TGTCCCTAAC	CCCCAGCTAG	GTAAGTGGCA	AAGCACTTCT
 98	CAGTGGGACC	AAGCACTGAA	CCACTTTGGG	GAGAATCTCA	CACTGGGGCC	CTCTGACACC	indel	tatatatttt	tttttttttt	tttttttttt	tttttttttg	agatggtgtc
 99	AGAGCAGCAG	CACCCACTTT	TGCAGTGTGT	GACGTTGGTG	GAGCCATCGA	AGTCTGTGCT	indel	GAGCCCTCCC	CAGTGCTCCG	AGGAGCTGCT	GTTCCCCCTG	GAGCTCAGAA
100
101- The second output file is a tabular file and represents the motif frequencies in every window of every flanking sequence. The first column on the left represents the names of motifs. The other columns represent the frequencies of motifs in the windows that correspond to the ones in the first output file, as follows::
102
103	dnaPolPauseFrameshift1	2	3	1	0	1	2	indel	0	2	2	1	3
104	dnaPolPauseFrameshift2	2	3	1	0	1	2	indel	0	2	2	1	3
105	xSites1			3	2	0	1	1	2	indel	1	1	3	2	3
106	
107  </help>
108   
109</tool>