#### /doc/tutorials/simple_tutorial/step6_transform.rst

ReStructuredText | 89 lines | 60 code | 29 blank | 0 comment | 0 complexity | b0ee391f6afad8f1e621c5f239e2b842 MD5 | raw file
``` 1.. include:: ../../global.inc
2.. _Simple_Tutorial_6th_step:
3.. _tutorial.transform:
4
5.. index::
6    pair: @transform; Tutorial
7
8
9
10###################################################################
11Step 6: Running jobs in parallel
12###################################################################
13* :ref:`Simple tutorial overview <Simple_Tutorial>`
14* :ref:`@transform in detail <decorators.transform>`
15
16.. note::
17    Remember to look at the example code:
18
19    * :ref:`Python Code for step 6 <Simple_Tutorial_6th_step_code>`
20
21**************************************************************************************
22Calculating sums and sum of squares in parallel
23**************************************************************************************
24    Now that we have many smaller lists of numbers in separate files, we can calculate their sums and
25    sum of squares in parallel.
26
27    All we need is a function which takes a ``*.chunk`` file, reads the numbers, calculates
28    the answers and writes them back out to a corresponding ``*.sums`` file.
29
30    *Ruffus* magically takes care of applying this task function to all the different
31    data files in parallel.
32
33        .. image:: ../../images/simple_tutorial_transform.png
34
35    .. ::
36        ::
37
38            #---------------------------------------------------------------
39            #
40            #   Calculate sum and sum of squares for each chunk file
41            #
42            @transform(step_5_split_numbers_into_chunks, suffix(".chunks"), ".sums")
43            def step_6_calculate_sum_of_squares (input_file_name, output_file_name):
44                #
45                #   calculate sums and sums of squares for all values in the input_file_name
46                #       writing to output_file_name
47                ""
48
49
50
51    | The first thing to note about this example is that the *input* files are not specified
52      as a |glob|_  (e.g. ``*.chunk``) but as the preceding task.
53    | *Ruffus* will take all
54      the files produced by ``step_5_split_numbers_into_chunks()`` and feed them as the *input*
55      into step 6.
56
57    This handy shortcut also means that **Ruffus** knows that ``step_6_calculate_sum_of_squares``
58    depends on ``step_5_split_numbers_into_chunks`` and an additional ``@follows`` directive
59    is unnecessary.
60
61    The use of :ref:`suffix<decorators.transform.suffix_string>` within the decorator tells
62    *Ruffus* to take all *input* files with the ``.chunks`` suffix and substitute a ``.sums``
63    suffix to generate the corresponding *output* file name.
64
65
66    Thus if ``step_5_split_numbers_into_chunks`` created
67        ::
68
69            "1.chunks"
70            "2.chunks"
71            "3.chunks"
72
73    This would result in the following function calls:
74
75        ::
76
77            step_6_calculate_sum_of_squares ("1.chunk", "1.sums")
78            step_6_calculate_sum_of_squares ("2.chunk", "2.sums")
79            step_6_calculate_sum_of_squares ("3.chunk", "3.sums")
80
81            # etc...
82
83
84
85    .. note::
86
87        It is possible to generate *output* filenames using more powerful regular expressions
88        as well. See the :ref:`@transform <decorators.transform>` syntax documentation for more details.
89
```