tabular_to_fastq.py - This Python script takes a tab-delimi…

/tools/fastq/tabular_to_fastq.py

https://bitbucket.org/cistrome/cistrome-harvard/ · Python · 29 lines · 24 code · 4 blank · 1 comment · 6 complexity · 430e3005b333784fa0df5345883bac75 MD5 · raw file


#Dan Blankenberg
import sys

def main():
    input_filename = sys.argv[1]
    output_filename = sys.argv[2]
    identifier_col = int( sys.argv[3] ) - 1
    sequence_col = int( sys.argv[4] ) - 1
    quality_col = int( sys.argv[5] ) - 1
    
    max_col = max( identifier_col, sequence_col, quality_col )
    num_reads = None
    fastq_read = None
    skipped_lines = 0
    out = open( output_filename, 'wb' )
    for num_reads, line in enumerate( open( input_filename ) ):
        fields = line.rstrip( '\n\r' ).split( '\t' )
        if len( fields ) > max_col:
            out.write( "@%s\n%s\n+\n%s\n" % ( fields[identifier_col], fields[sequence_col], fields[quality_col] ) )
        else:
            skipped_lines += 1
    
    out.close()
    if num_reads is None:
        print "Input was empty."
    else:
        print "%i tabular lines were written as FASTQ reads. Be sure to use the FASTQ Groomer tool on this output before further analysis." % ( num_reads + 1 - skipped_lines )
    
if __name__ == "__main__": main()

Summary ✨

This Python script takes a tab-delimited file as input, converts each row into a FASTQ read, and writes the output to a new file. The script assumes that the first column contains the identifier, the second column contains the sequence, and the third column contains the quality score. If any of these columns are missing or empty, the corresponding line will be skipped. The script also keeps track of the number of reads written and prints a message at the end indicating how many lines were successfully converted to FASTQ reads.

Tech Fingerprint

Standard Library: OS Interaction

Alerts (3)

'def' Ensure functions have docstrings for documentation
4
'open(' Use 'with open()' to ensure Files are properly closed
15 16