/doc/tutorials/simple_tutorial/step6_transform.rst

https://code.google.com/p/ruffus/ · ReStructuredText · 89 lines · 60 code · 29 blank · 0 comment · 0 complexity · b0ee391f6afad8f1e621c5f239e2b842 MD5 · raw file

  1. .. include:: ../../global.inc
  2. .. _Simple_Tutorial_6th_step:
  3. .. _tutorial.transform:
  4. .. index::
  5. pair: @transform; Tutorial
  6. ###################################################################
  7. Step 6: Running jobs in parallel
  8. ###################################################################
  9. * :ref:`Simple tutorial overview <Simple_Tutorial>`
  10. * :ref:`@transform in detail <decorators.transform>`
  11. .. note::
  12. Remember to look at the example code:
  13. * :ref:`Python Code for step 6 <Simple_Tutorial_6th_step_code>`
  14. **************************************************************************************
  15. Calculating sums and sum of squares in parallel
  16. **************************************************************************************
  17. Now that we have many smaller lists of numbers in separate files, we can calculate their sums and
  18. sum of squares in parallel.
  19. All we need is a function which takes a ``*.chunk`` file, reads the numbers, calculates
  20. the answers and writes them back out to a corresponding ``*.sums`` file.
  21. *Ruffus* magically takes care of applying this task function to all the different
  22. data files in parallel.
  23. .. image:: ../../images/simple_tutorial_transform.png
  24. .. ::
  25. ::
  26. #---------------------------------------------------------------
  27. #
  28. # Calculate sum and sum of squares for each chunk file
  29. #
  30. @transform(step_5_split_numbers_into_chunks, suffix(".chunks"), ".sums")
  31. def step_6_calculate_sum_of_squares (input_file_name, output_file_name):
  32. #
  33. # calculate sums and sums of squares for all values in the input_file_name
  34. # writing to output_file_name
  35. ""
  36. | The first thing to note about this example is that the *input* files are not specified
  37. as a |glob|_ (e.g. ``*.chunk``) but as the preceding task.
  38. | *Ruffus* will take all
  39. the files produced by ``step_5_split_numbers_into_chunks()`` and feed them as the *input*
  40. into step 6.
  41. This handy shortcut also means that **Ruffus** knows that ``step_6_calculate_sum_of_squares``
  42. depends on ``step_5_split_numbers_into_chunks`` and an additional ``@follows`` directive
  43. is unnecessary.
  44. The use of :ref:`suffix<decorators.transform.suffix_string>` within the decorator tells
  45. *Ruffus* to take all *input* files with the ``.chunks`` suffix and substitute a ``.sums``
  46. suffix to generate the corresponding *output* file name.
  47. Thus if ``step_5_split_numbers_into_chunks`` created
  48. ::
  49. "1.chunks"
  50. "2.chunks"
  51. "3.chunks"
  52. This would result in the following function calls:
  53. ::
  54. step_6_calculate_sum_of_squares ("1.chunk", "1.sums")
  55. step_6_calculate_sum_of_squares ("2.chunk", "2.sums")
  56. step_6_calculate_sum_of_squares ("3.chunk", "3.sums")
  57. # etc...
  58. .. note::
  59. It is possible to generate *output* filenames using more powerful regular expressions
  60. as well. See the :ref:`@transform <decorators.transform>` syntax documentation for more details.