PageRenderTime 24ms CodeModel.GetById 13ms app.highlight 5ms RepoModel.GetById 2ms app.codeStats 0ms

/doc/tutorials/simple_tutorial/step7_merge.rst

https://code.google.com/p/ruffus/
ReStructuredText | 80 lines | 47 code | 33 blank | 0 comment | 0 complexity | 6b382e249aaa1e64174c66ab9ab87308 MD5 | raw file
 1.. include:: ../../global.inc
 2.. _Simple_Tutorial_7th_step:
 3
 4.. index:: 
 5    pair: @merge; Tutorial
 6
 7
 8###################################################################
 9Step 7: Merging results back together
10###################################################################
11* :ref:`Simple tutorial overview <Simple_Tutorial>` 
12* :ref:`@merge in detail <decorators.merge>`
13
14.. note::
15    Remember to look at the example code:
16
17    * :ref:`Python Code for step 7 <Simple_Tutorial_7th_step_code>` 
18
19
20Now that we have all the partial solutions in ``*.sums``, we can merge them
21together to generate the final answer: the variance of all 100,000 random
22numbers.
23
24**************************************************************************************
25Calculating variances from the sums and sum of squares of all chunks
26**************************************************************************************
27
28    If we add up all the sums, and sum of squares we calculated previously, we can
29    obtain the variance as follows::
30    
31        variance = (sum_squared - sum * sum / N)/N
32        
33    where ``N`` is the number of values
34
35    See the `wikipedia <http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance>`_ entry for a discussion of 
36    why this is a very naive approach!
37    
38    To do this, all we have to do is merge together all the values in ``*.sums``, i.e.
39    add up the ``sums`` and ``sum_squared`` for each chunk. We can then apply the above (naive) formula.
40    
41    Merging files is straightforward in **Ruffus**:
42    
43        .. image:: ../../images/simple_tutorial_merge1.png
44    
45    .. ::
46        
47        ::
48
49            @merge(step_6_calculate_sum_of_squares, "variance.result")
50            def step_7_calculate_variance (input_file_names, output_file_name):
51                #
52                #   add together sums and sums of squares from each input_file_name
53                #       calculate variance and write to output_file_name
54                ""
55
56
57    The :ref:`@merge <decorators.merge>` decorator tells *Ruffus* to take all the files from the step 6 task (i.e. ``*.sums``),
58    and produced a merged file in the form of ``"variance.result"``.
59    
60    Thus if ``step_6_calculate_sum_of_squares`` created
61        | ``1.sums`` and 
62        | ``2.sums`` etc.
63        
64    This would result in the following function call:
65    
66        .. image:: ../../images/simple_tutorial_merge2.png
67
68    .. ::
69
70        ::
71        
72            step_7_calculate_variance (["1.sums", "2.sums"], "variance.result")
73            
74
75    The final result is, of course, in ``"variance.result"``.
76            
77
78
79
80