/Doc/howto/functional.rst
ReStructuredText | 1378 lines | 1022 code | 356 blank | 0 comment | 0 complexity | 477be5252aa21501c01d4da0e255dce1 MD5 | raw file
1******************************** 2 Functional Programming HOWTO 3******************************** 4 5:Author: A. M. Kuchling 6:Release: 0.31 7 8(This is a first draft. Please send comments/error reports/suggestions to 9amk@amk.ca.) 10 11In this document, we'll take a tour of Python's features suitable for 12implementing programs in a functional style. After an introduction to the 13concepts of functional programming, we'll look at language features such as 14:term:`iterator`\s and :term:`generator`\s and relevant library modules such as 15:mod:`itertools` and :mod:`functools`. 16 17 18Introduction 19============ 20 21This section explains the basic concept of functional programming; if you're 22just interested in learning about Python language features, skip to the next 23section. 24 25Programming languages support decomposing problems in several different ways: 26 27* Most programming languages are **procedural**: programs are lists of 28 instructions that tell the computer what to do with the program's input. C, 29 Pascal, and even Unix shells are procedural languages. 30 31* In **declarative** languages, you write a specification that describes the 32 problem to be solved, and the language implementation figures out how to 33 perform the computation efficiently. SQL is the declarative language you're 34 most likely to be familiar with; a SQL query describes the data set you want 35 to retrieve, and the SQL engine decides whether to scan tables or use indexes, 36 which subclauses should be performed first, etc. 37 38* **Object-oriented** programs manipulate collections of objects. Objects have 39 internal state and support methods that query or modify this internal state in 40 some way. Smalltalk and Java are object-oriented languages. C++ and Python 41 are languages that support object-oriented programming, but don't force the 42 use of object-oriented features. 43 44* **Functional** programming decomposes a problem into a set of functions. 45 Ideally, functions only take inputs and produce outputs, and don't have any 46 internal state that affects the output produced for a given input. Well-known 47 functional languages include the ML family (Standard ML, OCaml, and other 48 variants) and Haskell. 49 50The designers of some computer languages choose to emphasize one 51particular approach to programming. This often makes it difficult to 52write programs that use a different approach. Other languages are 53multi-paradigm languages that support several different approaches. 54Lisp, C++, and Python are multi-paradigm; you can write programs or 55libraries that are largely procedural, object-oriented, or functional 56in all of these languages. In a large program, different sections 57might be written using different approaches; the GUI might be 58object-oriented while the processing logic is procedural or 59functional, for example. 60 61In a functional program, input flows through a set of functions. Each function 62operates on its input and produces some output. Functional style discourages 63functions with side effects that modify internal state or make other changes 64that aren't visible in the function's return value. Functions that have no side 65effects at all are called **purely functional**. Avoiding side effects means 66not using data structures that get updated as a program runs; every function's 67output must only depend on its input. 68 69Some languages are very strict about purity and don't even have assignment 70statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all 71side effects. Printing to the screen or writing to a disk file are side 72effects, for example. For example, in Python a ``print`` statement or a 73``time.sleep(1)`` both return no useful value; they're only called for their 74side effects of sending some text to the screen or pausing execution for a 75second. 76 77Python programs written in functional style usually won't go to the extreme of 78avoiding all I/O or all assignments; instead, they'll provide a 79functional-appearing interface but will use non-functional features internally. 80For example, the implementation of a function will still use assignments to 81local variables, but won't modify global variables or have other side effects. 82 83Functional programming can be considered the opposite of object-oriented 84programming. Objects are little capsules containing some internal state along 85with a collection of method calls that let you modify this state, and programs 86consist of making the right set of state changes. Functional programming wants 87to avoid state changes as much as possible and works with data flowing between 88functions. In Python you might combine the two approaches by writing functions 89that take and return instances representing objects in your application (e-mail 90messages, transactions, etc.). 91 92Functional design may seem like an odd constraint to work under. Why should you 93avoid objects and side effects? There are theoretical and practical advantages 94to the functional style: 95 96* Formal provability. 97* Modularity. 98* Composability. 99* Ease of debugging and testing. 100 101 102Formal provability 103------------------ 104 105A theoretical benefit is that it's easier to construct a mathematical proof that 106a functional program is correct. 107 108For a long time researchers have been interested in finding ways to 109mathematically prove programs correct. This is different from testing a program 110on numerous inputs and concluding that its output is usually correct, or reading 111a program's source code and concluding that the code looks right; the goal is 112instead a rigorous proof that a program produces the right result for all 113possible inputs. 114 115The technique used to prove programs correct is to write down **invariants**, 116properties of the input data and of the program's variables that are always 117true. For each line of code, you then show that if invariants X and Y are true 118**before** the line is executed, the slightly different invariants X' and Y' are 119true **after** the line is executed. This continues until you reach the end of 120the program, at which point the invariants should match the desired conditions 121on the program's output. 122 123Functional programming's avoidance of assignments arose because assignments are 124difficult to handle with this technique; assignments can break invariants that 125were true before the assignment without producing any new invariants that can be 126propagated onward. 127 128Unfortunately, proving programs correct is largely impractical and not relevant 129to Python software. Even trivial programs require proofs that are several pages 130long; the proof of correctness for a moderately complicated program would be 131enormous, and few or none of the programs you use daily (the Python interpreter, 132your XML parser, your web browser) could be proven correct. Even if you wrote 133down or generated a proof, there would then be the question of verifying the 134proof; maybe there's an error in it, and you wrongly believe you've proved the 135program correct. 136 137 138Modularity 139---------- 140 141A more practical benefit of functional programming is that it forces you to 142break apart your problem into small pieces. Programs are more modular as a 143result. It's easier to specify and write a small function that does one thing 144than a large function that performs a complicated transformation. Small 145functions are also easier to read and to check for errors. 146 147 148Ease of debugging and testing 149----------------------------- 150 151Testing and debugging a functional-style program is easier. 152 153Debugging is simplified because functions are generally small and clearly 154specified. When a program doesn't work, each function is an interface point 155where you can check that the data are correct. You can look at the intermediate 156inputs and outputs to quickly isolate the function that's responsible for a bug. 157 158Testing is easier because each function is a potential subject for a unit test. 159Functions don't depend on system state that needs to be replicated before 160running a test; instead you only have to synthesize the right input and then 161check that the output matches expectations. 162 163 164Composability 165------------- 166 167As you work on a functional-style program, you'll write a number of functions 168with varying inputs and outputs. Some of these functions will be unavoidably 169specialized to a particular application, but others will be useful in a wide 170variety of programs. For example, a function that takes a directory path and 171returns all the XML files in the directory, or a function that takes a filename 172and returns its contents, can be applied to many different situations. 173 174Over time you'll form a personal library of utilities. Often you'll assemble 175new programs by arranging existing functions in a new configuration and writing 176a few functions specialized for the current task. 177 178 179Iterators 180========= 181 182I'll start by looking at a Python language feature that's an important 183foundation for writing functional-style programs: iterators. 184 185An iterator is an object representing a stream of data; this object returns the 186data one element at a time. A Python iterator must support a method called 187``next()`` that takes no arguments and always returns the next element of the 188stream. If there are no more elements in the stream, ``next()`` must raise the 189``StopIteration`` exception. Iterators don't have to be finite, though; it's 190perfectly reasonable to write an iterator that produces an infinite stream of 191data. 192 193The built-in :func:`iter` function takes an arbitrary object and tries to return 194an iterator that will return the object's contents or elements, raising 195:exc:`TypeError` if the object doesn't support iteration. Several of Python's 196built-in data types support iteration, the most common being lists and 197dictionaries. An object is called an **iterable** object if you can get an 198iterator for it. 199 200You can experiment with the iteration interface manually: 201 202 >>> L = [1,2,3] 203 >>> it = iter(L) 204 >>> print it 205 <...iterator object at ...> 206 >>> it.next() 207 1 208 >>> it.next() 209 2 210 >>> it.next() 211 3 212 >>> it.next() 213 Traceback (most recent call last): 214 File "<stdin>", line 1, in ? 215 StopIteration 216 >>> 217 218Python expects iterable objects in several different contexts, the most 219important being the ``for`` statement. In the statement ``for X in Y``, Y must 220be an iterator or some object for which ``iter()`` can create an iterator. 221These two statements are equivalent:: 222 223 for i in iter(obj): 224 print i 225 226 for i in obj: 227 print i 228 229Iterators can be materialized as lists or tuples by using the :func:`list` or 230:func:`tuple` constructor functions: 231 232 >>> L = [1,2,3] 233 >>> iterator = iter(L) 234 >>> t = tuple(iterator) 235 >>> t 236 (1, 2, 3) 237 238Sequence unpacking also supports iterators: if you know an iterator will return 239N elements, you can unpack them into an N-tuple: 240 241 >>> L = [1,2,3] 242 >>> iterator = iter(L) 243 >>> a,b,c = iterator 244 >>> a,b,c 245 (1, 2, 3) 246 247Built-in functions such as :func:`max` and :func:`min` can take a single 248iterator argument and will return the largest or smallest element. The ``"in"`` 249and ``"not in"`` operators also support iterators: ``X in iterator`` is true if 250X is found in the stream returned by the iterator. You'll run into obvious 251problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"`` 252will never return, and if the element X never appears in the stream, the 253``"in"`` operator won't return either. 254 255Note that you can only go forward in an iterator; there's no way to get the 256previous element, reset the iterator, or make a copy of it. Iterator objects 257can optionally provide these additional capabilities, but the iterator protocol 258only specifies the ``next()`` method. Functions may therefore consume all of 259the iterator's output, and if you need to do something different with the same 260stream, you'll have to create a new iterator. 261 262 263 264Data Types That Support Iterators 265--------------------------------- 266 267We've already seen how lists and tuples support iterators. In fact, any Python 268sequence type, such as strings, will automatically support creation of an 269iterator. 270 271Calling :func:`iter` on a dictionary returns an iterator that will loop over the 272dictionary's keys: 273 274.. not a doctest since dict ordering varies across Pythons 275 276:: 277 278 >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 279 ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} 280 >>> for key in m: 281 ... print key, m[key] 282 Mar 3 283 Feb 2 284 Aug 8 285 Sep 9 286 Apr 4 287 Jun 6 288 Jul 7 289 Jan 1 290 May 5 291 Nov 11 292 Dec 12 293 Oct 10 294 295Note that the order is essentially random, because it's based on the hash 296ordering of the objects in the dictionary. 297 298Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries 299have methods that return other iterators. If you want to iterate over keys, 300values, or key/value pairs, you can explicitly call the ``iterkeys()``, 301``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator. 302 303The :func:`dict` constructor can accept an iterator that returns a finite stream 304of ``(key, value)`` tuples: 305 306 >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')] 307 >>> dict(iter(L)) 308 {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'} 309 310Files also support iteration by calling the ``readline()`` method until there 311are no more lines in the file. This means you can read each line of a file like 312this:: 313 314 for line in file: 315 # do something for each line 316 ... 317 318Sets can take their contents from an iterable and let you iterate over the set's 319elements:: 320 321 S = set((2, 3, 5, 7, 11, 13)) 322 for i in S: 323 print i 324 325 326 327Generator expressions and list comprehensions 328============================================= 329 330Two common operations on an iterator's output are 1) performing some operation 331for every element, 2) selecting a subset of elements that meet some condition. 332For example, given a list of strings, you might want to strip off trailing 333whitespace from each line or extract all the strings containing a given 334substring. 335 336List comprehensions and generator expressions (short form: "listcomps" and 337"genexps") are a concise notation for such operations, borrowed from the 338functional programming language Haskell (http://www.haskell.org). You can strip 339all the whitespace from a stream of strings with the following code:: 340 341 line_list = [' line 1\n', 'line 2 \n', ...] 342 343 # Generator expression -- returns iterator 344 stripped_iter = (line.strip() for line in line_list) 345 346 # List comprehension -- returns list 347 stripped_list = [line.strip() for line in line_list] 348 349You can select only certain elements by adding an ``"if"`` condition:: 350 351 stripped_list = [line.strip() for line in line_list 352 if line != ""] 353 354With a list comprehension, you get back a Python list; ``stripped_list`` is a 355list containing the resulting lines, not an iterator. Generator expressions 356return an iterator that computes the values as necessary, not needing to 357materialize all the values at once. This means that list comprehensions aren't 358useful if you're working with iterators that return an infinite stream or a very 359large amount of data. Generator expressions are preferable in these situations. 360 361Generator expressions are surrounded by parentheses ("()") and list 362comprehensions are surrounded by square brackets ("[]"). Generator expressions 363have the form:: 364 365 ( expression for expr in sequence1 366 if condition1 367 for expr2 in sequence2 368 if condition2 369 for expr3 in sequence3 ... 370 if condition3 371 for exprN in sequenceN 372 if conditionN ) 373 374Again, for a list comprehension only the outside brackets are different (square 375brackets instead of parentheses). 376 377The elements of the generated output will be the successive values of 378``expression``. The ``if`` clauses are all optional; if present, ``expression`` 379is only evaluated and added to the result when ``condition`` is true. 380 381Generator expressions always have to be written inside parentheses, but the 382parentheses signalling a function call also count. If you want to create an 383iterator that will be immediately passed to a function you can write:: 384 385 obj_total = sum(obj.count for obj in list_all_objects()) 386 387The ``for...in`` clauses contain the sequences to be iterated over. The 388sequences do not have to be the same length, because they are iterated over from 389left to right, **not** in parallel. For each element in ``sequence1``, 390``sequence2`` is looped over from the beginning. ``sequence3`` is then looped 391over for each resulting pair of elements from ``sequence1`` and ``sequence2``. 392 393To put it another way, a list comprehension or generator expression is 394equivalent to the following Python code:: 395 396 for expr1 in sequence1: 397 if not (condition1): 398 continue # Skip this element 399 for expr2 in sequence2: 400 if not (condition2): 401 continue # Skip this element 402 ... 403 for exprN in sequenceN: 404 if not (conditionN): 405 continue # Skip this element 406 407 # Output the value of 408 # the expression. 409 410This means that when there are multiple ``for...in`` clauses but no ``if`` 411clauses, the length of the resulting output will be equal to the product of the 412lengths of all the sequences. If you have two lists of length 3, the output 413list is 9 elements long: 414 415.. doctest:: 416 :options: +NORMALIZE_WHITESPACE 417 418 >>> seq1 = 'abc' 419 >>> seq2 = (1,2,3) 420 >>> [(x,y) for x in seq1 for y in seq2] 421 [('a', 1), ('a', 2), ('a', 3), 422 ('b', 1), ('b', 2), ('b', 3), 423 ('c', 1), ('c', 2), ('c', 3)] 424 425To avoid introducing an ambiguity into Python's grammar, if ``expression`` is 426creating a tuple, it must be surrounded with parentheses. The first list 427comprehension below is a syntax error, while the second one is correct:: 428 429 # Syntax error 430 [ x,y for x in seq1 for y in seq2] 431 # Correct 432 [ (x,y) for x in seq1 for y in seq2] 433 434 435Generators 436========== 437 438Generators are a special class of functions that simplify the task of writing 439iterators. Regular functions compute a value and return it, but generators 440return an iterator that returns a stream of values. 441 442You're doubtless familiar with how regular function calls work in Python or C. 443When you call a function, it gets a private namespace where its local variables 444are created. When the function reaches a ``return`` statement, the local 445variables are destroyed and the value is returned to the caller. A later call 446to the same function creates a new private namespace and a fresh set of local 447variables. But, what if the local variables weren't thrown away on exiting a 448function? What if you could later resume the function where it left off? This 449is what generators provide; they can be thought of as resumable functions. 450 451Here's the simplest example of a generator function: 452 453.. testcode:: 454 455 def generate_ints(N): 456 for i in range(N): 457 yield i 458 459Any function containing a ``yield`` keyword is a generator function; this is 460detected by Python's :term:`bytecode` compiler which compiles the function 461specially as a result. 462 463When you call a generator function, it doesn't return a single value; instead it 464returns a generator object that supports the iterator protocol. On executing 465the ``yield`` expression, the generator outputs the value of ``i``, similar to a 466``return`` statement. The big difference between ``yield`` and a ``return`` 467statement is that on reaching a ``yield`` the generator's state of execution is 468suspended and local variables are preserved. On the next call to the 469generator's ``.next()`` method, the function will resume executing. 470 471Here's a sample usage of the ``generate_ints()`` generator: 472 473 >>> gen = generate_ints(3) 474 >>> gen 475 <generator object generate_ints at ...> 476 >>> gen.next() 477 0 478 >>> gen.next() 479 1 480 >>> gen.next() 481 2 482 >>> gen.next() 483 Traceback (most recent call last): 484 File "stdin", line 1, in ? 485 File "stdin", line 2, in generate_ints 486 StopIteration 487 488You could equally write ``for i in generate_ints(5)``, or ``a,b,c = 489generate_ints(3)``. 490 491Inside a generator function, the ``return`` statement can only be used without a 492value, and signals the end of the procession of values; after executing a 493``return`` the generator cannot return any further values. ``return`` with a 494value, such as ``return 5``, is a syntax error inside a generator function. The 495end of the generator's results can also be indicated by raising 496``StopIteration`` manually, or by just letting the flow of execution fall off 497the bottom of the function. 498 499You could achieve the effect of generators manually by writing your own class 500and storing all the local variables of the generator as instance variables. For 501example, returning a list of integers could be done by setting ``self.count`` to 5020, and having the ``next()`` method increment ``self.count`` and return it. 503However, for a moderately complicated generator, writing a corresponding class 504can be much messier. 505 506The test suite included with Python's library, ``test_generators.py``, contains 507a number of more interesting examples. Here's one generator that implements an 508in-order traversal of a tree using generators recursively. :: 509 510 # A recursive generator that generates Tree leaves in in-order. 511 def inorder(t): 512 if t: 513 for x in inorder(t.left): 514 yield x 515 516 yield t.label 517 518 for x in inorder(t.right): 519 yield x 520 521Two other examples in ``test_generators.py`` produce solutions for the N-Queens 522problem (placing N queens on an NxN chess board so that no queen threatens 523another) and the Knight's Tour (finding a route that takes a knight to every 524square of an NxN chessboard without visiting any square twice). 525 526 527 528Passing values into a generator 529------------------------------- 530 531In Python 2.4 and earlier, generators only produced output. Once a generator's 532code was invoked to create an iterator, there was no way to pass any new 533information into the function when its execution is resumed. You could hack 534together this ability by making the generator look at a global variable or by 535passing in some mutable object that callers then modify, but these approaches 536are messy. 537 538In Python 2.5 there's a simple way to pass values into a generator. 539:keyword:`yield` became an expression, returning a value that can be assigned to 540a variable or otherwise operated on:: 541 542 val = (yield i) 543 544I recommend that you **always** put parentheses around a ``yield`` expression 545when you're doing something with the returned value, as in the above example. 546The parentheses aren't always necessary, but it's easier to always add them 547instead of having to remember when they're needed. 548 549(PEP 342 explains the exact rules, which are that a ``yield``-expression must 550always be parenthesized except when it occurs at the top-level expression on the 551right-hand side of an assignment. This means you can write ``val = yield i`` 552but have to use parentheses when there's an operation, as in ``val = (yield i) 553+ 12``.) 554 555Values are sent into a generator by calling its ``send(value)`` method. This 556method resumes the generator's code and the ``yield`` expression returns the 557specified value. If the regular ``next()`` method is called, the ``yield`` 558returns ``None``. 559 560Here's a simple counter that increments by 1 and allows changing the value of 561the internal counter. 562 563.. testcode:: 564 565 def counter (maximum): 566 i = 0 567 while i < maximum: 568 val = (yield i) 569 # If value provided, change counter 570 if val is not None: 571 i = val 572 else: 573 i += 1 574 575And here's an example of changing the counter: 576 577 >>> it = counter(10) 578 >>> print it.next() 579 0 580 >>> print it.next() 581 1 582 >>> print it.send(8) 583 8 584 >>> print it.next() 585 9 586 >>> print it.next() 587 Traceback (most recent call last): 588 File "t.py", line 15, in ? 589 print it.next() 590 StopIteration 591 592Because ``yield`` will often be returning ``None``, you should always check for 593this case. Don't just use its value in expressions unless you're sure that the 594``send()`` method will be the only method used resume your generator function. 595 596In addition to ``send()``, there are two other new methods on generators: 597 598* ``throw(type, value=None, traceback=None)`` is used to raise an exception 599 inside the generator; the exception is raised by the ``yield`` expression 600 where the generator's execution is paused. 601 602* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to 603 terminate the iteration. On receiving this exception, the generator's code 604 must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the 605 exception and doing anything else is illegal and will trigger a 606 :exc:`RuntimeError`. ``close()`` will also be called by Python's garbage 607 collector when the generator is garbage-collected. 608 609 If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest 610 using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`. 611 612The cumulative effect of these changes is to turn generators from one-way 613producers of information into both producers and consumers. 614 615Generators also become **coroutines**, a more generalized form of subroutines. 616Subroutines are entered at one point and exited at another point (the top of the 617function, and a ``return`` statement), but coroutines can be entered, exited, 618and resumed at many different points (the ``yield`` statements). 619 620 621Built-in functions 622================== 623 624Let's look in more detail at built-in functions often used with iterators. 625 626Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat 627obsolete; they duplicate the features of list comprehensions but return actual 628lists instead of iterators. 629 630``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]), 631f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``. 632 633 >>> def upper(s): 634 ... return s.upper() 635 636 >>> map(upper, ['sentence', 'fragment']) 637 ['SENTENCE', 'FRAGMENT'] 638 639 >>> [upper(s) for s in ['sentence', 'fragment']] 640 ['SENTENCE', 'FRAGMENT'] 641 642As shown above, you can achieve the same effect with a list comprehension. The 643:func:`itertools.imap` function does the same thing but can handle infinite 644iterators; it'll be discussed later, in the section on the :mod:`itertools` module. 645 646``filter(predicate, iter)`` returns a list that contains all the sequence 647elements that meet a certain condition, and is similarly duplicated by list 648comprehensions. A **predicate** is a function that returns the truth value of 649some condition; for use with :func:`filter`, the predicate must take a single 650value. 651 652 >>> def is_even(x): 653 ... return (x % 2) == 0 654 655 >>> filter(is_even, range(10)) 656 [0, 2, 4, 6, 8] 657 658This can also be written as a list comprehension: 659 660 >>> [x for x in range(10) if is_even(x)] 661 [0, 2, 4, 6, 8] 662 663:func:`filter` also has a counterpart in the :mod:`itertools` module, 664:func:`itertools.ifilter`, that returns an iterator and can therefore handle 665infinite sequences just as :func:`itertools.imap` can. 666 667``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the 668:mod:`itertools` module because it cumulatively performs an operation on all the 669iterable's elements and therefore can't be applied to infinite iterables. 670``func`` must be a function that takes two elements and returns a single value. 671:func:`reduce` takes the first two elements A and B returned by the iterator and 672calculates ``func(A, B)``. It then requests the third element, C, calculates 673``func(func(A, B), C)``, combines this result with the fourth element returned, 674and continues until the iterable is exhausted. If the iterable returns no 675values at all, a :exc:`TypeError` exception is raised. If the initial value is 676supplied, it's used as a starting point and ``func(initial_value, A)`` is the 677first calculation. 678 679 >>> import operator 680 >>> reduce(operator.concat, ['A', 'BB', 'C']) 681 'ABBC' 682 >>> reduce(operator.concat, []) 683 Traceback (most recent call last): 684 ... 685 TypeError: reduce() of empty sequence with no initial value 686 >>> reduce(operator.mul, [1,2,3], 1) 687 6 688 >>> reduce(operator.mul, [], 1) 689 1 690 691If you use :func:`operator.add` with :func:`reduce`, you'll add up all the 692elements of the iterable. This case is so common that there's a special 693built-in called :func:`sum` to compute it: 694 695 >>> reduce(operator.add, [1,2,3,4], 0) 696 10 697 >>> sum([1,2,3,4]) 698 10 699 >>> sum([]) 700 0 701 702For many uses of :func:`reduce`, though, it can be clearer to just write the 703obvious :keyword:`for` loop:: 704 705 # Instead of: 706 product = reduce(operator.mul, [1,2,3], 1) 707 708 # You can write: 709 product = 1 710 for i in [1,2,3]: 711 product *= i 712 713 714``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples 715containing the count and each element. 716 717 >>> for item in enumerate(['subject', 'verb', 'object']): 718 ... print item 719 (0, 'subject') 720 (1, 'verb') 721 (2, 'object') 722 723:func:`enumerate` is often used when looping through a list and recording the 724indexes at which certain conditions are met:: 725 726 f = open('data.txt', 'r') 727 for i, line in enumerate(f): 728 if line.strip() == '': 729 print 'Blank line at line #%i' % i 730 731``sorted(iterable, [cmp=None], [key=None], [reverse=False])`` collects all the 732elements of the iterable into a list, sorts the list, and returns the sorted 733result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to 734the constructed list's ``.sort()`` method. :: 735 736 >>> import random 737 >>> # Generate 8 random numbers between [0, 10000) 738 >>> rand_list = random.sample(range(10000), 8) 739 >>> rand_list 740 [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207] 741 >>> sorted(rand_list) 742 [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878] 743 >>> sorted(rand_list, reverse=True) 744 [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] 745 746(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the 747Python wiki at http://wiki.python.org/moin/HowTo/Sorting.) 748 749The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an 750iterable's contents. :func:`any` returns True if any element in the iterable is 751a true value, and :func:`all` returns True if all of the elements are true 752values: 753 754 >>> any([0,1,0]) 755 True 756 >>> any([0,0,0]) 757 False 758 >>> any([1,1,1]) 759 True 760 >>> all([0,1,0]) 761 False 762 >>> all([0,0,0]) 763 False 764 >>> all([1,1,1]) 765 True 766 767 768Small functions and the lambda expression 769========================================= 770 771When writing functional-style programs, you'll often need little functions that 772act as predicates or that combine elements in some way. 773 774If there's a Python built-in or a module function that's suitable, you don't 775need to define a new function at all:: 776 777 stripped_lines = [line.strip() for line in lines] 778 existing_files = filter(os.path.exists, file_list) 779 780If the function you need doesn't exist, you need to write it. One way to write 781small functions is to use the ``lambda`` statement. ``lambda`` takes a number 782of parameters and an expression combining these parameters, and creates a small 783function that returns the value of the expression:: 784 785 lowercase = lambda x: x.lower() 786 787 print_assign = lambda name, value: name + '=' + str(value) 788 789 adder = lambda x, y: x+y 790 791An alternative is to just use the ``def`` statement and define a function in the 792usual way:: 793 794 def lowercase(x): 795 return x.lower() 796 797 def print_assign(name, value): 798 return name + '=' + str(value) 799 800 def adder(x,y): 801 return x + y 802 803Which alternative is preferable? That's a style question; my usual course is to 804avoid using ``lambda``. 805 806One reason for my preference is that ``lambda`` is quite limited in the 807functions it can define. The result has to be computable as a single 808expression, which means you can't have multiway ``if... elif... else`` 809comparisons or ``try... except`` statements. If you try to do too much in a 810``lambda`` statement, you'll end up with an overly complicated expression that's 811hard to read. Quick, what's the following code doing? 812 813:: 814 815 total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1] 816 817You can figure it out, but it takes time to disentangle the expression to figure 818out what's going on. Using a short nested ``def`` statements makes things a 819little bit better:: 820 821 def combine (a, b): 822 return 0, a[1] + b[1] 823 824 total = reduce(combine, items)[1] 825 826But it would be best of all if I had simply used a ``for`` loop:: 827 828 total = 0 829 for a, b in items: 830 total += b 831 832Or the :func:`sum` built-in and a generator expression:: 833 834 total = sum(b for a,b in items) 835 836Many uses of :func:`reduce` are clearer when written as ``for`` loops. 837 838Fredrik Lundh once suggested the following set of rules for refactoring uses of 839``lambda``: 840 8411) Write a lambda function. 8422) Write a comment explaining what the heck that lambda does. 8433) Study the comment for a while, and think of a name that captures the essence 844 of the comment. 8454) Convert the lambda to a def statement, using that name. 8465) Remove the comment. 847 848I really like these rules, but you're free to disagree 849about whether this lambda-free style is better. 850 851 852The itertools module 853==================== 854 855The :mod:`itertools` module contains a number of commonly-used iterators as well 856as functions for combining several iterators. This section will introduce the 857module's contents by showing small examples. 858 859The module's functions fall into a few broad classes: 860 861* Functions that create a new iterator based on an existing iterator. 862* Functions for treating an iterator's elements as function arguments. 863* Functions for selecting portions of an iterator's output. 864* A function for grouping an iterator's output. 865 866Creating new iterators 867---------------------- 868 869``itertools.count(n)`` returns an infinite stream of integers, increasing by 1 870each time. You can optionally supply the starting number, which defaults to 0:: 871 872 itertools.count() => 873 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 874 itertools.count(10) => 875 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... 876 877``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable 878and returns a new iterator that returns its elements from first to last. The 879new iterator will repeat these elements infinitely. :: 880 881 itertools.cycle([1,2,3,4,5]) => 882 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... 883 884``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or 885returns the element endlessly if ``n`` is not provided. :: 886 887 itertools.repeat('abc') => 888 abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... 889 itertools.repeat('abc', 5) => 890 abc, abc, abc, abc, abc 891 892``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as 893input, and returns all the elements of the first iterator, then all the elements 894of the second, and so on, until all of the iterables have been exhausted. :: 895 896 itertools.chain(['a', 'b', 'c'], (1, 2, 3)) => 897 a, b, c, 1, 2, 3 898 899``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and 900returns them in a tuple:: 901 902 itertools.izip(['a', 'b', 'c'], (1, 2, 3)) => 903 ('a', 1), ('b', 2), ('c', 3) 904 905It's similar to the built-in :func:`zip` function, but doesn't construct an 906in-memory list and exhaust all the input iterators before returning; instead 907tuples are constructed and returned only if they're requested. (The technical 908term for this behaviour is `lazy evaluation 909<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.) 910 911This iterator is intended to be used with iterables that are all of the same 912length. If the iterables are of different lengths, the resulting stream will be 913the same length as the shortest iterable. :: 914 915 itertools.izip(['a', 'b'], (1, 2, 3)) => 916 ('a', 1), ('b', 2) 917 918You should avoid doing this, though, because an element may be taken from the 919longer iterators and discarded. This means you can't go on to use the iterators 920further because you risk skipping a discarded element. 921 922``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a 923slice of the iterator. With a single ``stop`` argument, it will return the 924first ``stop`` elements. If you supply a starting index, you'll get 925``stop-start`` elements, and if you supply a value for ``step``, elements will 926be skipped accordingly. Unlike Python's string and list slicing, you can't use 927negative values for ``start``, ``stop``, or ``step``. :: 928 929 itertools.islice(range(10), 8) => 930 0, 1, 2, 3, 4, 5, 6, 7 931 itertools.islice(range(10), 2, 8) => 932 2, 3, 4, 5, 6, 7 933 itertools.islice(range(10), 2, 8, 2) => 934 2, 4, 6 935 936``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n`` 937independent iterators that will all return the contents of the source iterator. 938If you don't supply a value for ``n``, the default is 2. Replicating iterators 939requires saving some of the contents of the source iterator, so this can consume 940significant memory if the iterator is large and one of the new iterators is 941consumed more than the others. :: 942 943 itertools.tee( itertools.count() ) => 944 iterA, iterB 945 946 where iterA -> 947 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 948 949 and iterB -> 950 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 951 952 953Calling functions on elements 954----------------------------- 955 956Two functions are used for calling other functions on the contents of an 957iterable. 958 959``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing 960``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``:: 961 962 itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) => 963 6, 8, 8 964 965The ``operator`` module contains a set of functions corresponding to Python's 966operators. Some examples are ``operator.add(a, b)`` (adds two values), 967``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')`` 968(returns a callable that fetches the ``"id"`` attribute). 969 970``itertools.starmap(func, iter)`` assumes that the iterable will return a stream 971of tuples, and calls ``f()`` using these tuples as the arguments:: 972 973 itertools.starmap(os.path.join, 974 [('/usr', 'bin', 'java'), ('/bin', 'python'), 975 ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')]) 976 => 977 /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby 978 979 980Selecting elements 981------------------ 982 983Another group of functions chooses a subset of an iterator's elements based on a 984predicate. 985 986``itertools.ifilter(predicate, iter)`` returns all the elements for which the 987predicate returns true:: 988 989 def is_even(x): 990 return (x % 2) == 0 991 992 itertools.ifilter(is_even, itertools.count()) => 993 0, 2, 4, 6, 8, 10, 12, 14, ... 994 995``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all 996elements for which the predicate returns false:: 997 998 itertools.ifilterfalse(is_even, itertools.count()) => 999 1, 3, 5, 7, 9, 11, 13, 15, ... 1000 1001``itertools.takewhile(predicate, iter)`` returns elements for as long as the 1002predicate returns true. Once the predicate returns false, the iterator will 1003signal the end of its results. 1004 1005:: 1006 1007 def less_than_10(x): 1008 return (x < 10) 1009 1010 itertools.takewhile(less_than_10, itertools.count()) => 1011 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 1012 1013 itertools.takewhile(is_even, itertools.count()) => 1014 0 1015 1016``itertools.dropwhile(predicate, iter)`` discards elements while the predicate 1017returns true, and then returns the rest of the iterable's results. 1018 1019:: 1020 1021 itertools.dropwhile(less_than_10, itertools.count()) => 1022 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... 1023 1024 itertools.dropwhile(is_even, itertools.count()) => 1025 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... 1026 1027 1028Grouping elements 1029----------------- 1030 1031The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is 1032the most complicated. ``key_func(elem)`` is a function that can compute a key 1033value for each element returned by the iterable. If you don't supply a key 1034function, the key is simply each element itself. 1035 1036``groupby()`` collects all the consecutive elements from the underlying iterable 1037that have the same key value, and returns a stream of 2-tuples containing a key 1038value and an iterator for the elements with that key. 1039 1040:: 1041 1042 city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), 1043 ('Anchorage', 'AK'), ('Nome', 'AK'), 1044 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), 1045 ... 1046 ] 1047 1048 def get_state ((city, state)): 1049 return state 1050 1051 itertools.groupby(city_list, get_state) => 1052 ('AL', iterator-1), 1053 ('AK', iterator-2), 1054 ('AZ', iterator-3), ... 1055 1056 where 1057 iterator-1 => 1058 ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL') 1059 iterator-2 => 1060 ('Anchorage', 'AK'), ('Nome', 'AK') 1061 iterator-3 => 1062 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ') 1063 1064``groupby()`` assumes that the underlying iterable's contents will already be 1065sorted based on the key. Note that the returned iterators also use the 1066underlying iterable, so you have to consume the results of iterator-1 before 1067requesting iterator-2 and its corresponding key. 1068 1069 1070The functools module 1071==================== 1072 1073The :mod:`functools` module in Python 2.5 contains some higher-order functions. 1074A **higher-order function** takes one or more functions as input and returns a 1075new function. The most useful tool in this module is the 1076:func:`functools.partial` function. 1077 1078For programs written in a functional style, you'll sometimes want to construct 1079variants of existing functions that have some of the parameters filled in. 1080Consider a Python function ``f(a, b, c)``; you may wish to create a new function 1081``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for 1082one of ``f()``'s parameters. This is called "partial function application". 1083 1084The constructor for ``partial`` takes the arguments ``(function, arg1, arg2, 1085... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you 1086can just call it to invoke ``function`` with the filled-in arguments. 1087 1088Here's a small but realistic example:: 1089 1090 import functools 1091 1092 def log (message, subsystem): 1093 "Write the contents of 'message' to the specified subsystem." 1094 print '%s: %s' % (subsystem, message) 1095 ... 1096 1097 server_log = functools.partial(log, subsystem='server') 1098 server_log('Unable to open socket') 1099 1100 1101The operator module 1102------------------- 1103 1104The :mod:`operator` module was mentioned earlier. It contains a set of 1105functions corresponding to Python's operators. These functions are often useful 1106in functional-style code because they save you from writing trivial functions 1107that perform a single operation. 1108 1109Some of the functions in this module are: 1110 1111* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``, 1112 ``abs()``, ... 1113* Logical operations: ``not_()``, ``truth()``. 1114* Bitwise operations: ``and_()``, ``or_()``, ``invert()``. 1115* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``. 1116* Object identity: ``is_()``, ``is_not()``. 1117 1118Consult the operator module's documentation for a complete list. 1119 1120 1121 1122The functional module 1123--------------------- 1124 1125Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__ 1126provides a number of more advanced tools for functional programming. It also 1127reimplements several Python built-ins, trying to make them more intuitive to 1128those used to functional programming in other languages. 1129 1130This section contains an introduction to some of the most important functions in 1131``functional``; full documentation can be found at `the project's website 1132<http://oakwinter.com/code/functional/documentation/>`__. 1133 1134``compose(outer, inner, unpack=False)`` 1135 1136The ``compose()`` function implements function composition. In other words, it 1137returns a wrapper around the ``outer`` and ``inner`` callables, such that the 1138return value from ``inner`` is fed directly to ``outer``. That is, :: 1139 1140 >>> def add(a, b): 1141 ... return a + b 1142 ... 1143 >>> def double(a): 1144 ... return 2 * a 1145 ... 1146 >>> compose(double, add)(5, 6) 1147 22 1148 1149is equivalent to :: 1150 1151 >>> double(add(5, 6)) 1152 22 1153 1154The ``unpack`` keyword is provided to work around the fact that Python functions 1155are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__. By 1156default, it is expected that the ``inner`` function will return a single object 1157and that the ``outer`` function will take a single argument. Setting the 1158``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which 1159will be expanded before being passed to ``outer``. Put simply, :: 1160 1161 compose(f, g)(5, 6) 1162 1163is equivalent to:: 1164 1165 f(g(5, 6)) 1166 1167while :: 1168 1169 compose(f, g, unpack=True)(5, 6) 1170 1171is equivalent to:: 1172 1173 f(*g(5, 6)) 1174 1175Even though ``compose()`` only accepts two functions, it's trivial to build up a 1176version that will compose any number of functions. We'll use ``reduce()``, 1177``compose()`` and ``partial()`` (the last of which is provided by both 1178``functional`` and ``functools``). :: 1179 1180 from functional import compose, partial 1181 1182 multi_compose = partial(reduce, compose) 1183 1184 1185We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of 1186``"".join(...)`` that converts its arguments to string:: 1187 1188 from functional import compose, partial 1189 1190 join = compose("".join, partial(map, str)) 1191 1192 1193``flip(func)`` 1194 1195``flip()`` wraps the callable in ``func`` and causes it to receive its 1196non-keyword arguments in reverse order. :: 1197 1198 >>> def triple(a, b, c): 1199 ... return (a, b, c) 1200 ... 1201 >>> triple(5, 6, 7) 1202 (5, 6, 7) 1203 >>> 1204 >>> flipped_triple = flip(triple) 1205 >>> flipped_triple(5, 6, 7) 1206 (7, 6, 5) 1207 1208``foldl(func, start, iterable)`` 1209 1210``foldl()`` takes a binary function, a starting value (usually some kind of 1211'zero'), and an iterable. The function is applied to the starting value and the 1212first element of the list, then the result of that and the second element of the 1213list, then the result of that and the third element of the list, and so on. 1214 1215This means that a call such as:: 1216 1217 foldl(f, 0, [1, 2, 3]) 1218 1219is equivalent to:: 1220 1221 f(f(f(0, 1), 2), 3) 1222 1223 1224``foldl()`` is roughly equivalent to the following recursive function:: 1225 1226 def foldl(func, start, seq): 1227 if len(seq) == 0: 1228 return start 1229 1230 return foldl(func, func(start, seq[0]), seq[1:]) 1231 1232Speaking of equivalence, the above ``foldl`` call can be expressed in terms of 1233the built-in ``reduce`` like so:: 1234 1235 reduce(f, [1, 2, 3], 0) 1236 1237 1238We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a 1239cleaner, more aesthetically-pleasing version of Python's ``"".join(...)`` 1240idiom:: 1241 1242 from functional import foldl, partial from operator import concat 1243 1244 join = partial(foldl, concat, "") 1245 1246 1247Revision History and Acknowledgements 1248===================================== 1249 1250The author would like to thank the following people for offering suggestions, 1251corrections and assistance with various drafts of this article: Ian Bicking, 1252Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro 1253Lameiro, Jussi Salmela, Collin Winter, Blake Winton. 1254 1255Version 0.1: posted June 30 2006. 1256 1257Version 0.11: posted July 1 2006. Typo fixes. 1258 1259Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one. 1260Typo fixes. 1261 1262Version 0.21: Added more references suggested on the tutor mailing list. 1263 1264Version 0.30: Adds a section on the ``functional`` module written by Collin 1265Winter; adds short section on the operator module; a few other edits. 1266 1267 1268References 1269========== 1270 1271General 1272------- 1273 1274**Structure and Interpretation of Computer Programs**, by Harold Abelson and 1275Gerald Jay Sussman with Julie Sussman. Full text at 1276http://mitpress.mit.edu/sicp/. In this classic textbook of computer science, 1277chapters 2 and 3 discuss the use of sequences and streams to organize the data 1278flow inside a program. The book uses Scheme for its examples, but many of the 1279design approaches described in these chapters are applicable to functional-style 1280Python code. 1281 1282http://www.defmacro.org/ramblings/fp.html: A general introduction to functional 1283programming that uses Java examples and has a lengthy historical introduction. 1284 1285http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry 1286describing functional programming. 1287 1288http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines. 1289 1290http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying. 1291 1292Python-specific 1293--------------- 1294 1295http://gnosis.cx/TPiP/: The first chapter of David Mertz's book 1296:title-reference:`Text Processing in Python` discusses functional programming 1297for text processing, in the section titled "Utilizing Higher-Order Functions in 1298Text Processing". 1299 1300Mertz also wrote a 3-part series of articles on functional programming 1301for IBM's DeveloperWorks site; see 1302`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__, 1303`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and 1304`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__, 1305 1306 1307Python documentation 1308-------------------- 1309 1310Documentation for the :mod:`itertools` module. 1311 1312Documentation for the :mod:`operator` module. 1313 1314:pep:`289`: "Generator Expressions" 1315 1316:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator 1317features in Python 2.5. 1318 1319.. comment 1320 1321 Topics to place 1322 ----------------------------- 1323 1324 XXX os.walk() 1325 1326 XXX Need a large example. 1327 1328 But will an example add much? I'll post a first draft and see 1329 what the comments say. 1330 1331.. comment 1332 1333 Original outline: 1334 Introduction 1335 Idea of FP 1336 Programs built out of functions 1337 Functions are strictly input-output, no internal state 1338 Opposed to OO programming, where objects have state 1339 1340 Why FP? 1341 Formal provability 1342 Assignment is difficult to reason about 1343 Not very relevant to Python 1344 Modularity 1345 Small functions that do one thing 1346 Debuggability: 1347 Easy to test due to lack of state 1348 Easy to verify output from intermediate steps 1349 Composability 1350 You assemble a toolbox of functions that can be mixed 1351 1352 Tackling a problem 1353 Need a significant example 1354 1355 Iterators 1356 Generators 1357 The itertools module 1358 List comprehensions 1359 Small functions and the lambda statement 1360 Built-in functions 1361 map 1362 filter 1363 reduce 1364 1365.. comment 1366 1367 Handy little function for printing part of an iterator -- used 1368 while writing this document. 1369 1370 import itertools 1371 def print_iter(it): 1372 slice = itertools.islice(it, 10) 1373 for elem in slice[:-1]: 1374 sys.stdout.write(str(elem)) 1375 sys.stdout.write(', ') 1376 print elem[-1] 1377 1378