PageRenderTime 79ms CodeModel.GetById 16ms app.highlight 42ms RepoModel.GetById 11ms app.codeStats 0ms

/Doc/library/difflib.rst

http://unladen-swallow.googlecode.com/
ReStructuredText | 761 lines | 550 code | 211 blank | 0 comment | 0 complexity | 362f58373750a1d90bc5d57c2ede098d MD5 | raw file
  1:mod:`difflib` --- Helpers for computing deltas
  2===============================================
  3
  4.. module:: difflib
  5   :synopsis: Helpers for computing differences between objects.
  6.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
  7.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
  8.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
  9
 10.. testsetup::
 11
 12   import sys
 13   from difflib import *
 14
 15.. versionadded:: 2.1
 16
 17This module provides classes and functions for comparing sequences. It
 18can be used for example, for comparing files, and can produce difference
 19information in various formats, including HTML and context and unified
 20diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
 21
 22.. class:: SequenceMatcher
 23
 24   This is a flexible class for comparing pairs of sequences of any type, so long
 25   as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
 26   little fancier than, an algorithm published in the late 1980's by Ratcliff and
 27   Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
 28   find the longest contiguous matching subsequence that contains no "junk"
 29   elements (the Ratcliff and Obershelp algorithm doesn't address junk).  The same
 30   idea is then applied recursively to the pieces of the sequences to the left and
 31   to the right of the matching subsequence.  This does not yield minimal edit
 32   sequences, but does tend to yield matches that "look right" to people.
 33
 34   **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
 35   case and quadratic time in the expected case. :class:`SequenceMatcher` is
 36   quadratic time for the worst case and has expected-case behavior dependent in a
 37   complicated way on how many elements the sequences have in common; best case
 38   time is linear.
 39
 40
 41.. class:: Differ
 42
 43   This is a class for comparing sequences of lines of text, and producing
 44   human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
 45   both to compare sequences of lines, and to compare sequences of characters
 46   within similar (near-matching) lines.
 47
 48   Each line of a :class:`Differ` delta begins with a two-letter code:
 49
 50   +----------+-------------------------------------------+
 51   | Code     | Meaning                                   |
 52   +==========+===========================================+
 53   | ``'- '`` | line unique to sequence 1                 |
 54   +----------+-------------------------------------------+
 55   | ``'+ '`` | line unique to sequence 2                 |
 56   +----------+-------------------------------------------+
 57   | ``'  '`` | line common to both sequences             |
 58   +----------+-------------------------------------------+
 59   | ``'? '`` | line not present in either input sequence |
 60   +----------+-------------------------------------------+
 61
 62   Lines beginning with '``?``' attempt to guide the eye to intraline differences,
 63   and were not present in either input sequence. These lines can be confusing if
 64   the sequences contain tab characters.
 65
 66
 67.. class:: HtmlDiff
 68
 69   This class can be used to create an HTML table (or a complete HTML file
 70   containing the table) showing a side by side, line by line comparison of text
 71   with inter-line and intra-line change highlights.  The table can be generated in
 72   either full or contextual difference mode.
 73
 74   The constructor for this class is:
 75
 76
 77   .. function:: __init__([tabsize][, wrapcolumn][, linejunk][, charjunk])
 78
 79      Initializes instance of :class:`HtmlDiff`.
 80
 81      *tabsize* is an optional keyword argument to specify tab stop spacing and
 82      defaults to ``8``.
 83
 84      *wrapcolumn* is an optional keyword to specify column number where lines are
 85      broken and wrapped, defaults to ``None`` where lines are not wrapped.
 86
 87      *linejunk* and *charjunk* are optional keyword arguments passed into ``ndiff()``
 88      (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
 89      ``ndiff()`` documentation for argument default values and descriptions.
 90
 91   The following methods are public:
 92
 93
 94   .. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
 95
 96      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
 97      is a complete HTML file containing a table showing line by line differences with
 98      inter-line and intra-line changes highlighted.
 99
100      *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
101      column header strings (both default to an empty string).
102
103      *context* and *numlines* are both optional keyword arguments. Set *context* to
104      ``True`` when contextual differences are to be shown, else the default is
105      ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
106      is ``True`` *numlines* controls the number of context lines which surround the
107      difference highlights.  When *context* is ``False`` *numlines* controls the
108      number of lines which are shown before a difference highlight when using the
109      "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
110      the next difference highlight at the top of the browser without any leading
111      context).
112
113
114   .. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
115
116      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
117      is a complete HTML table showing line by line differences with inter-line and
118      intra-line changes highlighted.
119
120      The arguments for this method are the same as those for the :meth:`make_file`
121      method.
122
123   :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
124   contains a good example of its use.
125
126   .. versionadded:: 2.4
127
128
129.. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
130
131   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
132   generating the delta lines) in context diff format.
133
134   Context diffs are a compact way of showing just the lines that have changed plus
135   a few lines of context.  The changes are shown in a before/after style.  The
136   number of context lines is set by *n* which defaults to three.
137
138   By default, the diff control lines (those with ``***`` or ``---``) are created
139   with a trailing newline.  This is helpful so that inputs created from
140   :func:`file.readlines` result in diffs that are suitable for use with
141   :func:`file.writelines` since both the inputs and outputs have trailing
142   newlines.
143
144   For inputs that do not have trailing newlines, set the *lineterm* argument to
145   ``""`` so that the output will be uniformly newline free.
146
147   The context diff format normally has a header for filenames and modification
148   times.  Any or all of these may be specified using strings for *fromfile*,
149   *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
150   expressed in the format returned by :func:`time.ctime`.  If not specified, the
151   strings default to blanks.
152
153      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
154      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
155      >>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
156      ...     sys.stdout.write(line)  # doctest: +NORMALIZE_WHITESPACE
157      *** before.py
158      --- after.py
159      ***************
160      *** 1,4 ****
161      ! bacon
162      ! eggs
163      ! ham
164        guido
165      --- 1,4 ----
166      ! python
167      ! eggy
168      ! hamster
169        guido
170
171   See :ref:`difflib-interface` for a more detailed example.
172
173   .. versionadded:: 2.3
174
175
176.. function:: get_close_matches(word, possibilities[, n][, cutoff])
177
178   Return a list of the best "good enough" matches.  *word* is a sequence for which
179   close matches are desired (typically a string), and *possibilities* is a list of
180   sequences against which to match *word* (typically a list of strings).
181
182   Optional argument *n* (default ``3``) is the maximum number of close matches to
183   return; *n* must be greater than ``0``.
184
185   Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
186   Possibilities that don't score at least that similar to *word* are ignored.
187
188   The best (no more than *n*) matches among the possibilities are returned in a
189   list, sorted by similarity score, most similar first.
190
191      >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
192      ['apple', 'ape']
193      >>> import keyword
194      >>> get_close_matches('wheel', keyword.kwlist)
195      ['while']
196      >>> get_close_matches('apple', keyword.kwlist)
197      []
198      >>> get_close_matches('accept', keyword.kwlist)
199      ['except']
200
201
202.. function:: ndiff(a, b[, linejunk][, charjunk])
203
204   Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
205   delta (a :term:`generator` generating the delta lines).
206
207   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
208   (or ``None``):
209
210   *linejunk*: A function that accepts a single string argument, and returns true
211   if the string is junk, or false if not. The default is (``None``), starting with
212   Python 2.3.  Before then, the default was the module-level function
213   :func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
214   for at most one pound character (``'#'``). As of Python 2.3, the underlying
215   :class:`SequenceMatcher` class does a dynamic analysis of which lines are so
216   frequent as to constitute noise, and this usually works better than the pre-2.3
217   default.
218
219   *charjunk*: A function that accepts a character (a string of length 1), and
220   returns if the character is junk, or false if not. The default is module-level
221   function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
222   blank or tab; note: bad idea to include newline in this!).
223
224   :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
225
226      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
227      ...              'ore\ntree\nemu\n'.splitlines(1))
228      >>> print ''.join(diff),
229      - one
230      ?  ^
231      + ore
232      ?  ^
233      - two
234      - three
235      ?  -
236      + tree
237      + emu
238
239
240.. function:: restore(sequence, which)
241
242   Return one of the two sequences that generated a delta.
243
244   Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
245   lines originating from file 1 or 2 (parameter *which*), stripping off line
246   prefixes.
247
248   Example:
249
250      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
251      ...              'ore\ntree\nemu\n'.splitlines(1))
252      >>> diff = list(diff) # materialize the generated delta into a list
253      >>> print ''.join(restore(diff, 1)),
254      one
255      two
256      three
257      >>> print ''.join(restore(diff, 2)),
258      ore
259      tree
260      emu
261
262
263.. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
264
265   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
266   generating the delta lines) in unified diff format.
267
268   Unified diffs are a compact way of showing just the lines that have changed plus
269   a few lines of context.  The changes are shown in a inline style (instead of
270   separate before/after blocks).  The number of context lines is set by *n* which
271   defaults to three.
272
273   By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
274   created with a trailing newline.  This is helpful so that inputs created from
275   :func:`file.readlines` result in diffs that are suitable for use with
276   :func:`file.writelines` since both the inputs and outputs have trailing
277   newlines.
278
279   For inputs that do not have trailing newlines, set the *lineterm* argument to
280   ``""`` so that the output will be uniformly newline free.
281
282   The context diff format normally has a header for filenames and modification
283   times.  Any or all of these may be specified using strings for *fromfile*,
284   *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
285   expressed in the format returned by :func:`time.ctime`.  If not specified, the
286   strings default to blanks.
287
288      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
289      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
290      >>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
291      ...     sys.stdout.write(line)   # doctest: +NORMALIZE_WHITESPACE
292      --- before.py
293      +++ after.py
294      @@ -1,4 +1,4 @@
295      -bacon
296      -eggs
297      -ham
298      +python
299      +eggy
300      +hamster
301       guido
302
303   See :ref:`difflib-interface` for a more detailed example.
304
305   .. versionadded:: 2.3
306
307
308.. function:: IS_LINE_JUNK(line)
309
310   Return true for ignorable lines.  The line *line* is ignorable if *line* is
311   blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
312   default for parameter *linejunk* in :func:`ndiff` before Python 2.3.
313
314
315.. function:: IS_CHARACTER_JUNK(ch)
316
317   Return true for ignorable characters.  The character *ch* is ignorable if *ch*
318   is a space or tab, otherwise it is not ignorable.  Used as a default for
319   parameter *charjunk* in :func:`ndiff`.
320
321
322.. seealso::
323
324   `Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
325      Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
326      was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
327
328
329.. _sequence-matcher:
330
331SequenceMatcher Objects
332-----------------------
333
334The :class:`SequenceMatcher` class has this constructor:
335
336
337.. class:: SequenceMatcher([isjunk[, a[, b]]])
338
339   Optional argument *isjunk* must be ``None`` (the default) or a one-argument
340   function that takes a sequence element and returns true if and only if the
341   element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
342   equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
343   For example, pass::
344
345      lambda x: x in " \t"
346
347   if you're comparing lines as sequences of characters, and don't want to synch up
348   on blanks or hard tabs.
349
350   The optional arguments *a* and *b* are sequences to be compared; both default to
351   empty strings.  The elements of both sequences must be :term:`hashable`.
352
353   :class:`SequenceMatcher` objects have the following methods:
354
355
356   .. method:: set_seqs(a, b)
357
358      Set the two sequences to be compared.
359
360   :class:`SequenceMatcher` computes and caches detailed information about the
361   second sequence, so if you want to compare one sequence against many
362   sequences, use :meth:`set_seq2` to set the commonly used sequence once and
363   call :meth:`set_seq1` repeatedly, once for each of the other sequences.
364
365
366   .. method:: set_seq1(a)
367
368      Set the first sequence to be compared.  The second sequence to be compared
369      is not changed.
370
371
372   .. method:: set_seq2(b)
373
374      Set the second sequence to be compared.  The first sequence to be compared
375      is not changed.
376
377
378   .. method:: find_longest_match(alo, ahi, blo, bhi)
379
380      Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
381
382      If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
383      ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
384      <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
385      k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
386      <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
387      all maximal matching blocks, return one that starts earliest in *a*, and
388      of all those maximal matching blocks that start earliest in *a*, return
389      the one that starts earliest in *b*.
390
391         >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
392         >>> s.find_longest_match(0, 5, 0, 9)
393         Match(a=0, b=4, size=5)
394
395      If *isjunk* was provided, first the longest matching block is determined
396      as above, but with the additional restriction that no junk element appears
397      in the block.  Then that block is extended as far as possible by matching
398      (only) junk elements on both sides. So the resulting block never matches
399      on junk except as identical junk happens to be adjacent to an interesting
400      match.
401
402      Here's the same example as before, but considering blanks to be junk. That
403      prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
404      second sequence directly.  Instead only the ``'abcd'`` can match, and
405      matches the leftmost ``'abcd'`` in the second sequence:
406
407         >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
408         >>> s.find_longest_match(0, 5, 0, 9)
409         Match(a=1, b=0, size=4)
410
411      If no blocks match, this returns ``(alo, blo, 0)``.
412
413      .. versionchanged:: 2.6
414         This method returns a :term:`named tuple` ``Match(a, b, size)``.
415
416
417   .. method:: get_matching_blocks()
418
419      Return list of triples describing matching subsequences. Each triple is of
420      the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``.  The
421      triples are monotonically increasing in *i* and *j*.
422
423      The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
424      is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
425      are adjacent triples in the list, and the second is not the last triple in
426      the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
427      triples always describe non-adjacent equal blocks.
428
429      .. XXX Explain why a dummy is used!
430
431      .. versionchanged:: 2.5
432         The guarantee that adjacent triples always describe non-adjacent blocks
433         was implemented.
434
435      .. doctest::
436
437         >>> s = SequenceMatcher(None, "abxcd", "abcd")
438         >>> s.get_matching_blocks()
439         [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
440
441
442   .. method:: get_opcodes()
443
444      Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
445      of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
446      0``, and remaining tuples have *i1* equal to the *i2* from the preceding
447      tuple, and, likewise, *j1* equal to the previous *j2*.
448
449      The *tag* values are strings, with these meanings:
450
451      +---------------+---------------------------------------------+
452      | Value         | Meaning                                     |
453      +===============+=============================================+
454      | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
455      |               | ``b[j1:j2]``.                               |
456      +---------------+---------------------------------------------+
457      | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
458      |               | ``j1 == j2`` in this case.                  |
459      +---------------+---------------------------------------------+
460      | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
461      |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
462      |               | this case.                                  |
463      +---------------+---------------------------------------------+
464      | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
465      |               | are equal).                                 |
466      +---------------+---------------------------------------------+
467
468      For example:
469
470         >>> a = "qabxcd"
471         >>> b = "abycdf"
472         >>> s = SequenceMatcher(None, a, b)
473         >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
474         ...    print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
475         ...           (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
476          delete a[0:1] (q) b[0:0] ()
477           equal a[1:3] (ab) b[0:2] (ab)
478         replace a[3:4] (x) b[2:3] (y)
479           equal a[4:6] (cd) b[3:5] (cd)
480          insert a[6:6] () b[5:6] (f)
481
482
483   .. method:: get_grouped_opcodes([n])
484
485      Return a :term:`generator` of groups with up to *n* lines of context.
486
487      Starting with the groups returned by :meth:`get_opcodes`, this method
488      splits out smaller change clusters and eliminates intervening ranges which
489      have no changes.
490
491      The groups are returned in the same format as :meth:`get_opcodes`.
492
493      .. versionadded:: 2.3
494
495
496   .. method:: ratio()
497
498      Return a measure of the sequences' similarity as a float in the range [0,
499      1].
500
501      Where T is the total number of elements in both sequences, and M is the
502      number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
503      sequences are identical, and ``0.0`` if they have nothing in common.
504
505      This is expensive to compute if :meth:`get_matching_blocks` or
506      :meth:`get_opcodes` hasn't already been called, in which case you may want
507      to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
508      upper bound.
509
510
511   .. method:: quick_ratio()
512
513      Return an upper bound on :meth:`ratio` relatively quickly.
514
515      This isn't defined beyond that it is an upper bound on :meth:`ratio`, and
516      is faster to compute.
517
518
519   .. method:: real_quick_ratio()
520
521      Return an upper bound on :meth:`ratio` very quickly.
522
523      This isn't defined beyond that it is an upper bound on :meth:`ratio`, and
524      is faster to compute than either :meth:`ratio` or :meth:`quick_ratio`.
525
526The three methods that return the ratio of matching to total characters can give
527different results due to differing levels of approximation, although
528:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
529:meth:`ratio`:
530
531   >>> s = SequenceMatcher(None, "abcd", "bcde")
532   >>> s.ratio()
533   0.75
534   >>> s.quick_ratio()
535   0.75
536   >>> s.real_quick_ratio()
537   1.0
538
539
540.. _sequencematcher-examples:
541
542SequenceMatcher Examples
543------------------------
544
545This example compares two strings, considering blanks to be "junk:"
546
547   >>> s = SequenceMatcher(lambda x: x == " ",
548   ...                     "private Thread currentThread;",
549   ...                     "private volatile Thread currentThread;")
550
551:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
552sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
553sequences are close matches:
554
555   >>> print round(s.ratio(), 3)
556   0.866
557
558If you're only interested in where the sequences match,
559:meth:`get_matching_blocks` is handy:
560
561   >>> for block in s.get_matching_blocks():
562   ...     print "a[%d] and b[%d] match for %d elements" % block
563   a[0] and b[0] match for 8 elements
564   a[8] and b[17] match for 21 elements
565   a[29] and b[38] match for 0 elements
566
567Note that the last tuple returned by :meth:`get_matching_blocks` is always a
568dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
569tuple element (number of elements matched) is ``0``.
570
571If you want to know how to change the first sequence into the second, use
572:meth:`get_opcodes`:
573
574   >>> for opcode in s.get_opcodes():
575   ...     print "%6s a[%d:%d] b[%d:%d]" % opcode
576    equal a[0:8] b[0:8]
577   insert a[8:8] b[8:17]
578    equal a[8:29] b[17:38]
579
580See also the function :func:`get_close_matches` in this module, which shows how
581simple code building on :class:`SequenceMatcher` can be used to do useful work.
582
583
584.. _differ-objects:
585
586Differ Objects
587--------------
588
589Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
590diffs. To the contrary, minimal diffs are often counter-intuitive, because they
591synch up anywhere possible, sometimes accidental matches 100 pages apart.
592Restricting synch points to contiguous matches preserves some notion of
593locality, at the occasional cost of producing a longer diff.
594
595The :class:`Differ` class has this constructor:
596
597
598.. class:: Differ([linejunk[, charjunk]])
599
600   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
601   (or ``None``):
602
603   *linejunk*: A function that accepts a single string argument, and returns true
604   if the string is junk.  The default is ``None``, meaning that no line is
605   considered junk.
606
607   *charjunk*: A function that accepts a single character argument (a string of
608   length 1), and returns true if the character is junk. The default is ``None``,
609   meaning that no character is considered junk.
610
611   :class:`Differ` objects are used (deltas generated) via a single method:
612
613
614   .. method:: Differ.compare(a, b)
615
616      Compare two sequences of lines, and generate the delta (a sequence of lines).
617
618      Each sequence must contain individual single-line strings ending with newlines.
619      Such sequences can be obtained from the :meth:`readlines` method of file-like
620      objects.  The delta generated also consists of newline-terminated strings, ready
621      to be printed as-is via the :meth:`writelines` method of a file-like object.
622
623
624.. _differ-examples:
625
626Differ Example
627--------------
628
629This example compares two texts. First we set up the texts, sequences of
630individual single-line strings ending with newlines (such sequences can also be
631obtained from the :meth:`readlines` method of file-like objects):
632
633   >>> text1 = '''  1. Beautiful is better than ugly.
634   ...   2. Explicit is better than implicit.
635   ...   3. Simple is better than complex.
636   ...   4. Complex is better than complicated.
637   ... '''.splitlines(1)
638   >>> len(text1)
639   4
640   >>> text1[0][-1]
641   '\n'
642   >>> text2 = '''  1. Beautiful is better than ugly.
643   ...   3.   Simple is better than complex.
644   ...   4. Complicated is better than complex.
645   ...   5. Flat is better than nested.
646   ... '''.splitlines(1)
647
648Next we instantiate a Differ object:
649
650   >>> d = Differ()
651
652Note that when instantiating a :class:`Differ` object we may pass functions to
653filter out line and character "junk."  See the :meth:`Differ` constructor for
654details.
655
656Finally, we compare the two:
657
658   >>> result = list(d.compare(text1, text2))
659
660``result`` is a list of strings, so let's pretty-print it:
661
662   >>> from pprint import pprint
663   >>> pprint(result)
664   ['    1. Beautiful is better than ugly.\n',
665    '-   2. Explicit is better than implicit.\n',
666    '-   3. Simple is better than complex.\n',
667    '+   3.   Simple is better than complex.\n',
668    '?     ++\n',
669    '-   4. Complex is better than complicated.\n',
670    '?            ^                     ---- ^\n',
671    '+   4. Complicated is better than complex.\n',
672    '?           ++++ ^                      ^\n',
673    '+   5. Flat is better than nested.\n']
674
675As a single multi-line string it looks like this:
676
677   >>> import sys
678   >>> sys.stdout.writelines(result)
679       1. Beautiful is better than ugly.
680   -   2. Explicit is better than implicit.
681   -   3. Simple is better than complex.
682   +   3.   Simple is better than complex.
683   ?     ++
684   -   4. Complex is better than complicated.
685   ?            ^                     ---- ^
686   +   4. Complicated is better than complex.
687   ?           ++++ ^                      ^
688   +   5. Flat is better than nested.
689
690
691.. _difflib-interface:
692
693A command-line interface to difflib
694-----------------------------------
695
696This example shows how to use difflib to create a ``diff``-like utility.
697It is also contained in the Python source distribution, as
698:file:`Tools/scripts/diff.py`.
699
700.. testcode::
701
702   """ Command line interface to difflib.py providing diffs in four formats:
703
704   * ndiff:    lists every line and highlights interline changes.
705   * context:  highlights clusters of changes in a before/after format.
706   * unified:  highlights clusters of changes in an inline format.
707   * html:     generates side by side comparison with change highlights.
708
709   """
710
711   import sys, os, time, difflib, optparse
712
713   def main():
714        # Configure the option parser
715       usage = "usage: %prog [options] fromfile tofile"
716       parser = optparse.OptionParser(usage)
717       parser.add_option("-c", action="store_true", default=False,
718                         help='Produce a context format diff (default)')
719       parser.add_option("-u", action="store_true", default=False,
720                         help='Produce a unified format diff')
721       hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
722       parser.add_option("-m", action="store_true", default=False, help=hlp)
723       parser.add_option("-n", action="store_true", default=False,
724                         help='Produce a ndiff format diff')
725       parser.add_option("-l", "--lines", type="int", default=3,
726                         help='Set number of context lines (default 3)')
727       (options, args) = parser.parse_args()
728
729       if len(args) == 0:
730           parser.print_help()
731           sys.exit(1)
732       if len(args) != 2:
733           parser.error("need to specify both a fromfile and tofile")
734
735       n = options.lines
736       fromfile, tofile = args # as specified in the usage string
737
738       # we're passing these as arguments to the diff function
739       fromdate = time.ctime(os.stat(fromfile).st_mtime)
740       todate = time.ctime(os.stat(tofile).st_mtime)
741       fromlines = open(fromfile, 'U').readlines()
742       tolines = open(tofile, 'U').readlines()
743
744       if options.u:
745           diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
746                                       fromdate, todate, n=n)
747       elif options.n:
748           diff = difflib.ndiff(fromlines, tolines)
749       elif options.m:
750           diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
751                                               tofile, context=options.c,
752                                               numlines=n)
753       else:
754           diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
755                                       fromdate, todate, n=n)
756
757       # we're using writelines because diff is a generator
758       sys.stdout.writelines(diff)
759
760   if __name__ == '__main__':
761       main()