PageRenderTime 369ms CodeModel.GetById 181ms app.highlight 17ms RepoModel.GetById 164ms app.codeStats 0ms

/Doc/library/csv.rst

http://unladen-swallow.googlecode.com/
ReStructuredText | 569 lines | 379 code | 190 blank | 0 comment | 0 complexity | 9fe3fca22819a9726beea58ad4a32db0 MD5 | raw file
  1
  2:mod:`csv` --- CSV File Reading and Writing
  3===========================================
  4
  5.. module:: csv
  6   :synopsis: Write and read tabular data to and from delimited files.
  7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
  8
  9
 10.. versionadded:: 2.3
 11
 12.. index::
 13   single: csv
 14   pair: data; tabular
 15
 16The so-called CSV (Comma Separated Values) format is the most common import and
 17export format for spreadsheets and databases.  There is no "CSV standard", so
 18the format is operationally defined by the many applications which read and
 19write it.  The lack of a standard means that subtle differences often exist in
 20the data produced and consumed by different applications.  These differences can
 21make it annoying to process CSV files from multiple sources.  Still, while the
 22delimiters and quoting characters vary, the overall format is similar enough
 23that it is possible to write a single module which can efficiently manipulate
 24such data, hiding the details of reading and writing the data from the
 25programmer.
 26
 27The :mod:`csv` module implements classes to read and write tabular data in CSV
 28format.  It allows programmers to say, "write this data in the format preferred
 29by Excel," or "read data from this file which was generated by Excel," without
 30knowing the precise details of the CSV format used by Excel.  Programmers can
 31also describe the CSV formats understood by other applications or define their
 32own special-purpose CSV formats.
 33
 34The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
 35write sequences.  Programmers can also read and write data in dictionary form
 36using the :class:`DictReader` and :class:`DictWriter` classes.
 37
 38.. note::
 39
 40   This version of the :mod:`csv` module doesn't support Unicode input.  Also,
 41   there are currently some issues regarding ASCII NUL characters.  Accordingly,
 42   all input should be UTF-8 or printable ASCII to be safe; see the examples in
 43   section :ref:`csv-examples`. These restrictions will be removed in the future.
 44
 45
 46.. seealso::
 47
 48   :pep:`305` - CSV File API
 49      The Python Enhancement Proposal which proposed this addition to Python.
 50
 51
 52.. _csv-contents:
 53
 54Module Contents
 55---------------
 56
 57The :mod:`csv` module defines the following functions:
 58
 59
 60.. function:: reader(csvfile[, dialect='excel'][, fmtparam])
 61
 62   Return a reader object which will iterate over lines in the given *csvfile*.
 63   *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
 64   string each time its :meth:`next` method is called --- file objects and list
 65   objects are both suitable.   If *csvfile* is a file object, it must be opened
 66   with the 'b' flag on platforms where that makes a difference.  An optional
 67   *dialect* parameter can be given which is used to define a set of parameters
 68   specific to a particular CSV dialect.  It may be an instance of a subclass of
 69   the :class:`Dialect` class or one of the strings returned by the
 70   :func:`list_dialects` function.  The other optional *fmtparam* keyword arguments
 71   can be given to override individual formatting parameters in the current
 72   dialect.  For full details about the dialect and formatting parameters, see
 73   section :ref:`csv-fmt-params`.
 74
 75   All data read are returned as strings.  No automatic data type conversion is
 76   performed.
 77
 78   A short usage example::
 79
 80      >>> import csv
 81      >>> spamReader = csv.reader(open('eggs.csv'), delimiter=' ', quotechar='|')
 82      >>> for row in spamReader:
 83      ...     print ', '.join(row)
 84      Spam, Spam, Spam, Spam, Spam, Baked Beans
 85      Spam, Lovely Spam, Wonderful Spam
 86
 87   .. versionchanged:: 2.5
 88      The parser is now stricter with respect to multi-line quoted fields. Previously,
 89      if a line ended within a quoted field without a terminating newline character, a
 90      newline would be inserted into the returned field. This behavior caused problems
 91      when reading files which contained carriage return characters within fields.
 92      The behavior was changed to return the field without inserting newlines. As a
 93      consequence, if newlines embedded within fields are important, the input should
 94      be split into lines in a manner which preserves the newline characters.
 95
 96
 97.. function:: writer(csvfile[, dialect='excel'][, fmtparam])
 98
 99   Return a writer object responsible for converting the user's data into delimited
100   strings on the given file-like object.  *csvfile* can be any object with a
101   :func:`write` method.  If *csvfile* is a file object, it must be opened with the
102   'b' flag on platforms where that makes a difference.  An optional *dialect*
103   parameter can be given which is used to define a set of parameters specific to a
104   particular CSV dialect.  It may be an instance of a subclass of the
105   :class:`Dialect` class or one of the strings returned by the
106   :func:`list_dialects` function.  The other optional *fmtparam* keyword arguments
107   can be given to override individual formatting parameters in the current
108   dialect.  For full details about the dialect and formatting parameters, see
109   section :ref:`csv-fmt-params`. To make it
110   as easy as possible to interface with modules which implement the DB API, the
111   value :const:`None` is written as the empty string.  While this isn't a
112   reversible transformation, it makes it easier to dump SQL NULL data values to
113   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
114   All other non-string data are stringified with :func:`str` before being written.
115
116   A short usage example::
117
118      >>> import csv
119      >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
120      ...                         quotechar='|', quoting=csv.QUOTE_MINIMAL)
121      >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
122      >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
123
124
125.. function:: register_dialect(name[, dialect][, fmtparam])
126
127   Associate *dialect* with *name*.  *name* must be a string or Unicode object. The
128   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
129   by *fmtparam* keyword arguments, or both, with keyword arguments overriding
130   parameters of the dialect. For full details about the dialect and formatting
131   parameters, see section :ref:`csv-fmt-params`.
132
133
134.. function:: unregister_dialect(name)
135
136   Delete the dialect associated with *name* from the dialect registry.  An
137   :exc:`Error` is raised if *name* is not a registered dialect name.
138
139
140.. function:: get_dialect(name)
141
142   Return the dialect associated with *name*.  An :exc:`Error` is raised if *name*
143   is not a registered dialect name.
144
145   .. versionchanged:: 2.5
146      This function now returns an immutable :class:`Dialect`.  Previously an
147      instance of the requested dialect was returned.  Users could modify the
148      underlying class, changing the behavior of active readers and writers.
149
150.. function:: list_dialects()
151
152   Return the names of all registered dialects.
153
154
155.. function:: field_size_limit([new_limit])
156
157   Returns the current maximum field size allowed by the parser. If *new_limit* is
158   given, this becomes the new limit.
159
160   .. versionadded:: 2.5
161
162The :mod:`csv` module defines the following classes:
163
164
165.. class:: DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
166
167   Create an object which operates like a regular reader but maps the information
168   read into a dict whose keys are given by the optional  *fieldnames* parameter.
169   If the *fieldnames* parameter is omitted, the values in the first row of the
170   *csvfile* will be used as the fieldnames. If the row read has fewer fields than
171   the fieldnames sequence, the value of *restval* will be used as the default
172   value.  If the row read has more fields than the fieldnames sequence, the
173   remaining data is added as a sequence keyed by the value of *restkey*.  If the
174   row read has fewer fields than the fieldnames sequence, the remaining keys take
175   the value of the optional *restval* parameter.  Any other optional or keyword
176   arguments are passed to the underlying :class:`reader` instance.
177
178
179.. class:: DictWriter(csvfile, fieldnames[, restval=''[, extrasaction='raise'[, dialect='excel'[, *args, **kwds]]]])
180
181   Create an object which operates like a regular writer but maps dictionaries onto
182   output rows.  The *fieldnames* parameter identifies the order in which values in
183   the dictionary passed to the :meth:`writerow` method are written to the
184   *csvfile*.  The optional *restval* parameter specifies the value to be written
185   if the dictionary is missing a key in *fieldnames*.  If the dictionary passed to
186   the :meth:`writerow` method contains a key not found in *fieldnames*, the
187   optional *extrasaction* parameter indicates what action to take.  If it is set
188   to ``'raise'`` a :exc:`ValueError` is raised.  If it is set to ``'ignore'``,
189   extra values in the dictionary are ignored.  Any other optional or keyword
190   arguments are passed to the underlying :class:`writer` instance.
191
192   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
193   the :class:`DictWriter` is not optional.  Since Python's :class:`dict` objects
194   are not ordered, there is not enough information available to deduce the order
195   in which the row should be written to the *csvfile*.
196
197
198.. class:: Dialect
199
200   The :class:`Dialect` class is a container class relied on primarily for its
201   attributes, which are used to define the parameters for a specific
202   :class:`reader` or :class:`writer` instance.
203
204
205.. class:: excel()
206
207   The :class:`excel` class defines the usual properties of an Excel-generated CSV
208   file.  It is registered with the dialect name ``'excel'``.
209
210
211.. class:: excel_tab()
212
213   The :class:`excel_tab` class defines the usual properties of an Excel-generated
214   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
215
216
217.. class:: Sniffer()
218
219   The :class:`Sniffer` class is used to deduce the format of a CSV file.
220
221   The :class:`Sniffer` class provides two methods:
222
223   .. method:: sniff(sample[, delimiters=None])
224
225      Analyze the given *sample* and return a :class:`Dialect` subclass
226      reflecting the parameters found.  If the optional *delimiters* parameter
227      is given, it is interpreted as a string containing possible valid
228      delimiter characters.
229
230
231   .. method:: has_header(sample)
232
233      Analyze the sample text (presumed to be in CSV format) and return
234      :const:`True` if the first row appears to be a series of column headers.
235
236An example for :class:`Sniffer` use::
237
238   csvfile = open("example.csv")
239   dialect = csv.Sniffer().sniff(csvfile.read(1024))
240   csvfile.seek(0)
241   reader = csv.reader(csvfile, dialect)
242   # ... process CSV file contents here ...
243
244
245The :mod:`csv` module defines the following constants:
246
247.. data:: QUOTE_ALL
248
249   Instructs :class:`writer` objects to quote all fields.
250
251
252.. data:: QUOTE_MINIMAL
253
254   Instructs :class:`writer` objects to only quote those fields which contain
255   special characters such as *delimiter*, *quotechar* or any of the characters in
256   *lineterminator*.
257
258
259.. data:: QUOTE_NONNUMERIC
260
261   Instructs :class:`writer` objects to quote all non-numeric fields.
262
263   Instructs the reader to convert all non-quoted fields to type *float*.
264
265
266.. data:: QUOTE_NONE
267
268   Instructs :class:`writer` objects to never quote fields.  When the current
269   *delimiter* occurs in output data it is preceded by the current *escapechar*
270   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
271   any characters that require escaping are encountered.
272
273   Instructs :class:`reader` to perform no special processing of quote characters.
274
275The :mod:`csv` module defines the following exception:
276
277
278.. exception:: Error
279
280   Raised by any of the functions when an error is detected.
281
282
283.. _csv-fmt-params:
284
285Dialects and Formatting Parameters
286----------------------------------
287
288To make it easier to specify the format of input and output records, specific
289formatting parameters are grouped together into dialects.  A dialect is a
290subclass of the :class:`Dialect` class having a set of specific methods and a
291single :meth:`validate` method.  When creating :class:`reader` or
292:class:`writer` objects, the programmer can specify a string or a subclass of
293the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
294of, the *dialect* parameter, the programmer can also specify individual
295formatting parameters, which have the same names as the attributes defined below
296for the :class:`Dialect` class.
297
298Dialects support the following attributes:
299
300
301.. attribute:: Dialect.delimiter
302
303   A one-character string used to separate fields.  It defaults to ``','``.
304
305
306.. attribute:: Dialect.doublequote
307
308   Controls how instances of *quotechar* appearing inside a field should be
309   themselves be quoted.  When :const:`True`, the character is doubled. When
310   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
311   defaults to :const:`True`.
312
313   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
314   :exc:`Error` is raised if a *quotechar* is found in a field.
315
316
317.. attribute:: Dialect.escapechar
318
319   A one-character string used by the writer to escape the *delimiter* if *quoting*
320   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
321   :const:`False`. On reading, the *escapechar* removes any special meaning from
322   the following character. It defaults to :const:`None`, which disables escaping.
323
324
325.. attribute:: Dialect.lineterminator
326
327   The string used to terminate lines produced by the :class:`writer`. It defaults
328   to ``'\r\n'``.
329
330   .. note::
331
332      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
333      end-of-line, and ignores *lineterminator*. This behavior may change in the
334      future.
335
336
337.. attribute:: Dialect.quotechar
338
339   A one-character string used to quote fields containing special characters, such
340   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
341   defaults to ``'"'``.
342
343
344.. attribute:: Dialect.quoting
345
346   Controls when quotes should be generated by the writer and recognised by the
347   reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
348   :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
349
350
351.. attribute:: Dialect.skipinitialspace
352
353   When :const:`True`, whitespace immediately following the *delimiter* is ignored.
354   The default is :const:`False`.
355
356
357Reader Objects
358--------------
359
360Reader objects (:class:`DictReader` instances and objects returned by the
361:func:`reader` function) have the following public methods:
362
363
364.. method:: csvreader.next()
365
366   Return the next row of the reader's iterable object as a list, parsed according
367   to the current dialect.
368
369Reader objects have the following public attributes:
370
371
372.. attribute:: csvreader.dialect
373
374   A read-only description of the dialect in use by the parser.
375
376
377.. attribute:: csvreader.line_num
378
379   The number of lines read from the source iterator. This is not the same as the
380   number of records returned, as records can span multiple lines.
381
382   .. versionadded:: 2.5
383
384
385DictReader objects have the following public attribute:
386
387
388.. attribute:: csvreader.fieldnames
389
390   If not passed as a parameter when creating the object, this attribute is
391   initialized upon first access or when the first record is read from the
392   file.
393
394   .. versionchanged:: 2.6
395
396
397Writer Objects
398--------------
399
400:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
401the :func:`writer` function) have the following public methods.  A *row* must be
402a sequence of strings or numbers for :class:`Writer` objects and a dictionary
403mapping fieldnames to strings or numbers (by passing them through :func:`str`
404first) for :class:`DictWriter` objects.  Note that complex numbers are written
405out surrounded by parens. This may cause some problems for other programs which
406read CSV files (assuming they support complex numbers at all).
407
408
409.. method:: csvwriter.writerow(row)
410
411   Write the *row* parameter to the writer's file object, formatted according to
412   the current dialect.
413
414
415.. method:: csvwriter.writerows(rows)
416
417   Write all the *rows* parameters (a list of *row* objects as described above) to
418   the writer's file object, formatted according to the current dialect.
419
420Writer objects have the following public attribute:
421
422
423.. attribute:: csvwriter.dialect
424
425   A read-only description of the dialect in use by the writer.
426
427
428.. _csv-examples:
429
430Examples
431--------
432
433The simplest example of reading a CSV file::
434
435   import csv
436   reader = csv.reader(open("some.csv", "rb"))
437   for row in reader:
438       print row
439
440Reading a file with an alternate format::
441
442   import csv
443   reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
444   for row in reader:
445       print row
446
447The corresponding simplest possible writing example is::
448
449   import csv
450   writer = csv.writer(open("some.csv", "wb"))
451   writer.writerows(someiterable)
452
453Registering a new dialect::
454
455   import csv
456
457   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
458
459   reader = csv.reader(open("passwd", "rb"), 'unixpwd')
460
461A slightly more advanced use of the reader --- catching and reporting errors::
462
463   import csv, sys
464   filename = "some.csv"
465   reader = csv.reader(open(filename, "rb"))
466   try:
467       for row in reader:
468           print row
469   except csv.Error, e:
470       sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
471
472And while the module doesn't directly support parsing strings, it can easily be
473done::
474
475   import csv
476   for row in csv.reader(['one,two,three']):
477       print row
478
479The :mod:`csv` module doesn't directly support reading and writing Unicode, but
480it is 8-bit-clean save for some problems with ASCII NUL characters.  So you can
481write functions or classes that handle the encoding and decoding for you as long
482as you avoid encodings like UTF-16 that use NULs.  UTF-8 is recommended.
483
484:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
485to handle Unicode CSV data (a list of Unicode strings).  :func:`utf_8_encoder`
486is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
487a time.  The encoded strings are parsed by the CSV reader, and
488:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
489
490   import csv
491
492   def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
493       # csv.py doesn't do Unicode; encode temporarily as UTF-8:
494       csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
495                               dialect=dialect, **kwargs)
496       for row in csv_reader:
497           # decode UTF-8 back to Unicode, cell by cell:
498           yield [unicode(cell, 'utf-8') for cell in row]
499
500   def utf_8_encoder(unicode_csv_data):
501       for line in unicode_csv_data:
502           yield line.encode('utf-8')
503
504For all other encodings the following :class:`UnicodeReader` and
505:class:`UnicodeWriter` classes can be used. They take an additional *encoding*
506parameter in their constructor and make sure that the data passes the real
507reader or writer encoded as UTF-8::
508
509   import csv, codecs, cStringIO
510
511   class UTF8Recoder:
512       """
513       Iterator that reads an encoded stream and reencodes the input to UTF-8
514       """
515       def __init__(self, f, encoding):
516           self.reader = codecs.getreader(encoding)(f)
517
518       def __iter__(self):
519           return self
520
521       def next(self):
522           return self.reader.next().encode("utf-8")
523
524   class UnicodeReader:
525       """
526       A CSV reader which will iterate over lines in the CSV file "f",
527       which is encoded in the given encoding.
528       """
529
530       def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
531           f = UTF8Recoder(f, encoding)
532           self.reader = csv.reader(f, dialect=dialect, **kwds)
533
534       def next(self):
535           row = self.reader.next()
536           return [unicode(s, "utf-8") for s in row]
537
538       def __iter__(self):
539           return self
540
541   class UnicodeWriter:
542       """
543       A CSV writer which will write rows to CSV file "f",
544       which is encoded in the given encoding.
545       """
546
547       def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
548           # Redirect output to a queue
549           self.queue = cStringIO.StringIO()
550           self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
551           self.stream = f
552           self.encoder = codecs.getincrementalencoder(encoding)()
553
554       def writerow(self, row):
555           self.writer.writerow([s.encode("utf-8") for s in row])
556           # Fetch UTF-8 output from the queue ...
557           data = self.queue.getvalue()
558           data = data.decode("utf-8")
559           # ... and reencode it into the target encoding
560           data = self.encoder.encode(data)
561           # write to the target stream
562           self.stream.write(data)
563           # empty queue
564           self.queue.truncate(0)
565
566       def writerows(self, rows):
567           for row in rows:
568               self.writerow(row)
569