PageRenderTime 19ms CodeModel.GetById 12ms app.highlight 2ms RepoModel.GetById 2ms app.codeStats 0ms

/tools/fastx_toolkit/fastx_collapser.xml

https://bitbucket.org/cistrome/cistrome-harvard/
XML | 88 lines | 59 code | 21 blank | 8 comment | 0 complexity | f65240917fb5b2de58a6573d18b4691c MD5 | raw file
 1<tool id="cshl_fastx_collapser" name="Collapse">
 2	<description>sequences</description>
 3	<requirements><requirement type="package">fastx_toolkit</requirement></requirements>
 4	<command>zcat -f '$input' | fastx_collapser -v -o '$output' 
 5#if $input.ext == "fastqsanger":
 6-Q 33
 7#end if
 8	</command>
 9
10	<inputs>
11		<param format="fasta,fastqsanger,fastqsolexa" name="input" type="data" label="Library to collapse" />
12	</inputs>
13
14    <!-- The order of sequences in the test output differ between 32 bit and 64 bit machines. 
15	<tests>
16		<test>
17			<param name="input" value="fasta_collapser1.fasta" />
18			<output name="output" file="fasta_collapser1.out" />
19		</test>
20	</tests>
21    -->
22	<outputs>
23		<data format="fasta" name="output" metadata_source="input" />
24	</outputs>
25  <help>
26
27**What it does**
28
29This tool collapses identical sequences in a FASTA file into a single sequence.
30
31--------
32
33**Example**
34
35Example Input File (Sequence "ATAT" appears multiple times):: 
36
37    >CSHL_2_FC0042AGLLOO_1_1_605_414
38    TGCG
39    >CSHL_2_FC0042AGLLOO_1_1_537_759
40    ATAT
41    >CSHL_2_FC0042AGLLOO_1_1_774_520
42    TGGC
43    >CSHL_2_FC0042AGLLOO_1_1_742_502
44    ATAT
45    >CSHL_2_FC0042AGLLOO_1_1_781_514
46    TGAG
47    >CSHL_2_FC0042AGLLOO_1_1_757_487
48    TTCA
49    >CSHL_2_FC0042AGLLOO_1_1_903_769
50    ATAT
51    >CSHL_2_FC0042AGLLOO_1_1_724_499
52    ATAT
53
54Example Output file::
55
56    >1-1
57    TGCG
58    >2-4
59    ATAT
60    >3-1
61    TGGC
62    >4-1
63    TGAG
64    >5-1
65    TTCA
66    
67.. class:: infomark
68
69Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. 
70
71The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value.
72
73The following output::
74
75    >2-4
76    ATAT
77
78means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file.
79
80
81------
82
83This tool is based on `FASTX-toolkit`__ by Assaf Gordon.
84
85 .. __: http://hannonlab.cshl.edu/fastx_toolkit/
86 
87</help>
88</tool>