/tools/expression/expPkg.xml
https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 52 lines · 43 code · 9 blank · 0 comment · 0 complexity · 52e6f143ac51dbbec342f0668a72ff68 MD5 · raw file
- <tool name="Expression CEL file packager" id="expPkg">
- <description>can download .cel files from GEO by given GSM IDs and prepare a cel.zip file for expression analysis.</description>
- <command interpreter="command">/bin/bash $shscript</command>
- <inputs>
- <param name="group0name" type="text" label="Control group name"/>
- <repeat name="group0gsmid" title="GSM ID of control group">
- <param name="gsmid" type="text" label="GEO GSM ID"/>
- </repeat>
- <param name="group1name" type="text" label="Sample group name"/>
- <repeat name="group1gsmid" title="GSM ID of sample group">
- <param name="gsmid" type="text" label="GEO GSM ID"/>
- </repeat>
- </inputs>
- <outputs>
- <data format="cel.zip" name="output" label="Expression CEL file zip" />
- </outputs>
- <configfiles>
- <configfile name="shscript">
- #!/bin/bash
- #set $tmp0 = ""
- #for $g0gsmid in $group0gsmid
- #set $tmp0 = $tmp0 + " 0:"+str($group0name).replace(" ","_").replace("\t","_").replace(":","_")+":GSM"+str($g0gsmid.gsmid).upper().lstrip("GSM")+" "
- #end for
- #set $tmp1 = ""
- #for $g1gsmid in $group1gsmid
- #set $tmp1 = $tmp1 + " 1:"+str($group1name).replace(" ","_").replace("\t","_").replace(":","_")+":GSM"+str($g1gsmid.gsmid).upper().lstrip("GSM")+" "
- #end for
- #set $dollar = chr(36)
- #set $gt = chr(62)
- #set $lt = chr(60)
- #set $ad = chr(38)
- expressPkgr.py $tmp0 $tmp1
- mv package.zip $output
- </configfile>
- </configfiles>
- <help>
- This expression CEL file packager can prepare a cel.zip file for Cistrome expression array tools. This tool is only designed to fetch Affymetrix CEL files for expression array from NCBI GEO database. This script was written by Len Taing.
- The script can group CEL files into two groups -- a sample group and a control group. Typically, control group can be the wildtype, and sample group can be the samples after certain genes are knocked down/off. When the gene expression index results from the package generated by this script are used in differential expression analysis, the difference represents sample group vs control group, so if a fold change is 1.5, then the gene is highly expressed in sample group.
- Possible reasons of Error:
- 1. GSM ID can't be found on GEO site;
- 2. If you fetch many CEL files or there are many users downloading CEL files from our Cistrome server at the same time, connection to GEO site may be failed;
- </help>
- </tool>