/tools/ncbi_blast_plus/blastxml_to_tabular.xml

https://bitbucket.org/cistrome/cistrome-harvard/ · XML · 127 lines · 111 code · 8 blank · 8 comment · 0 complexity · f5900e3943083b3c926fe193a6a9c7f5 MD5 · raw file

  1. <tool id="blastxml_to_tabular" name="BLAST XML to tabular" version="0.0.8">
  2. <description>Convert BLAST XML output to tabular</description>
  3. <command interpreter="python">
  4. blastxml_to_tabular.py $blastxml_file $tabular_file $out_format
  5. </command>
  6. <inputs>
  7. <param name="blastxml_file" type="data" format="blastxml" label="BLAST results as XML"/>
  8. <param name="out_format" type="select" label="Output format">
  9. <option value="std" selected="True">Tabular (standard 12 columns)</option>
  10. <option value="ext">Tabular (extended 24 columns)</option>
  11. </param>
  12. </inputs>
  13. <outputs>
  14. <data name="tabular_file" format="tabular" label="BLAST results as tabular" />
  15. </outputs>
  16. <requirements>
  17. </requirements>
  18. <tests>
  19. <test>
  20. <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" />
  21. <param name="out_format" value="std" />
  22. <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin.tabluar -->
  23. <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted.tabular" ftype="tabular" />
  24. </test>
  25. <test>
  26. <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" />
  27. <param name="out_format" value="ext" />
  28. <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin_22c.tabluar -->
  29. <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted_ext.tabular" ftype="tabular" />
  30. </test>
  31. <test>
  32. <param name="blastxml_file" value="blastp_sample.xml" ftype="blastxml" />
  33. <param name="out_format" value="std" />
  34. <!-- Note this has some white space differences from the actual blastp output -->
  35. <output name="tabular_file" file="blastp_sample_converted.tabular" ftype="tabular" />
  36. </test>
  37. <test>
  38. <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" />
  39. <param name="out_format" value="std" />
  40. <!-- Note this has some white space differences from the actual blastx output -->
  41. <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted.tabular" ftype="tabular" />
  42. </test>
  43. <test>
  44. <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" />
  45. <param name="out_format" value="ext" />
  46. <!-- Note this has some white space and XXXX masking differences from the actual blastx output -->
  47. <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted_ext.tabular" ftype="tabular" />
  48. </test>
  49. <test>
  50. <param name="blastxml_file" value="blastx_sample.xml" ftype="blastxml" />
  51. <param name="out_format" value="std" />
  52. <!-- Note this has some white space differences from the actual blastx output -->
  53. <output name="tabular_file" file="blastx_sample_converted.tabular" ftype="tabular" />
  54. </test>
  55. <test>
  56. <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" />
  57. <param name="out_format" value="std" />
  58. <!-- Note this has some white space differences from the actual blastp output -->
  59. <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_std.tabular" ftype="tabular" />
  60. </test>
  61. <test>
  62. <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" />
  63. <param name="out_format" value="ext" />
  64. <!-- Note this has some white space differences from the actual blastp output -->
  65. <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_ext.tabular" ftype="tabular" />
  66. </test>
  67. </tests>
  68. <help>
  69. **What it does**
  70. NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of
  71. formats including tabular and a more detailed XML format. A complex workflow
  72. may need both the XML and the tabular output - but running BLAST twice is
  73. slow and wasteful.
  74. This tool takes the BLAST XML output and by default converts it into the
  75. standard 12 column tabular equivalent:
  76. ====== ========= ============================================
  77. Column NCBI name Description
  78. ------ --------- --------------------------------------------
  79. 1 qseqid Query Seq-id (ID of your sequence)
  80. 2 sseqid Subject Seq-id (ID of the database hit)
  81. 3 pident Percentage of identical matches
  82. 4 length Alignment length
  83. 5 mismatch Number of mismatches
  84. 6 gapopen Number of gap openings
  85. 7 qstart Start of alignment in query
  86. 8 qend End of alignment in query
  87. 9 sstart Start of alignment in subject (database hit)
  88. 10 send End of alignment in subject (database hit)
  89. 11 evalue Expectation value (E-value)
  90. 12 bitscore Bit score
  91. ====== ========= ============================================
  92. The BLAST+ tools can optionally output additional columns of information,
  93. but this takes longer to calculate. Most (but not all) of these columns are
  94. included by selecting the extended tabular output. The extra columns are
  95. included *after* the standard 12 columns. This is so that you can write
  96. workflow filtering steps that accept either the 12 or 22 column tabular
  97. BLAST output.
  98. ====== ============= ===========================================
  99. Column NCBI name Description
  100. ------ ------------- -------------------------------------------
  101. 13 sallseqid All subject Seq-id(s), separated by a ';'
  102. 14 score Raw score
  103. 15 nident Number of identical matches
  104. 16 positive Number of positive-scoring matches
  105. 17 gaps Total number of gaps
  106. 18 ppos Percentage of positive-scoring matches
  107. 19 qframe Query frame
  108. 20 sframe Subject frame
  109. 21 qseq Aligned part of query sequence
  110. 22 sseq Aligned part of subject sequence
  111. 23 qlen Query sequence length
  112. 24 slen Subject sequence length
  113. ====== ============= ===========================================
  114. Beware that the XML file (and thus the conversion) and the tabular output
  115. direct from BLAST+ may differ in the presence of XXXX masking on regions
  116. low complexity (columns 21 and 22), and thus also calculated figures like
  117. the percentage idenity (column 3).
  118. </help>
  119. </tool>