PageRenderTime 25ms CodeModel.GetById 19ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/tutorials/simple_tutorial/step6_transform.rst

https://code.google.com/p/ruffus/
ReStructuredText | 89 lines | 60 code | 29 blank | 0 comment | 0 complexity | b0ee391f6afad8f1e621c5f239e2b842 MD5 | raw file
 1.. include:: ../../global.inc
 2.. _Simple_Tutorial_6th_step:
 3.. _tutorial.transform:
 4
 5.. index:: 
 6    pair: @transform; Tutorial
 7
 8
 9
10###################################################################
11Step 6: Running jobs in parallel
12###################################################################
13* :ref:`Simple tutorial overview <Simple_Tutorial>` 
14* :ref:`@transform in detail <decorators.transform>`
15
16.. note::
17    Remember to look at the example code:
18
19    * :ref:`Python Code for step 6 <Simple_Tutorial_6th_step_code>` 
20
21**************************************************************************************
22Calculating sums and sum of squares in parallel
23**************************************************************************************
24    Now that we have many smaller lists of numbers in separate files, we can calculate their sums and 
25    sum of squares in parallel.
26    
27    All we need is a function which takes a ``*.chunk`` file, reads the numbers, calculates
28    the answers and writes them back out to a corresponding ``*.sums`` file.
29    
30    *Ruffus* magically takes care of applying this task function to all the different
31    data files in parallel.
32    
33        .. image:: ../../images/simple_tutorial_transform.png
34      
35    .. ::
36        ::
37            
38            #---------------------------------------------------------------
39            #
40            #   Calculate sum and sum of squares for each chunk file
41            #
42            @transform(step_5_split_numbers_into_chunks, suffix(".chunks"), ".sums")
43            def step_6_calculate_sum_of_squares (input_file_name, output_file_name):
44                #
45                #   calculate sums and sums of squares for all values in the input_file_name
46                #       writing to output_file_name
47                ""
48
49       
50
51    | The first thing to note about this example is that the *input* files are not specified
52      as a |glob|_  (e.g. ``*.chunk``) but as the preceding task. 
53    | *Ruffus* will take all
54      the files produced by ``step_5_split_numbers_into_chunks()`` and feed them as the *input*
55      into step 6. 
56    
57    This handy shortcut also means that **Ruffus** knows that ``step_6_calculate_sum_of_squares``
58    depends on ``step_5_split_numbers_into_chunks`` and an additional ``@follows`` directive
59    is unnecessary.
60    
61    The use of :ref:`suffix<decorators.transform.suffix_string>` within the decorator tells 
62    *Ruffus* to take all *input* files with the ``.chunks`` suffix and substitute a ``.sums`` 
63    suffix to generate the corresponding *output* file name.
64    
65    
66    Thus if ``step_5_split_numbers_into_chunks`` created
67        ::
68        
69            "1.chunks"
70            "2.chunks"
71            "3.chunks"
72        
73    This would result in the following function calls:
74    
75        ::
76        
77            step_6_calculate_sum_of_squares ("1.chunk", "1.sums")
78            step_6_calculate_sum_of_squares ("2.chunk", "2.sums")
79            step_6_calculate_sum_of_squares ("3.chunk", "3.sums")
80            
81            # etc...
82            
83
84
85    .. note::    
86
87        It is possible to generate *output* filenames using more powerful regular expressions
88        as well. See the :ref:`@transform <decorators.transform>` syntax documentation for more details.
89