/doc/tutorials/simple_tutorial/step7_merge.rst

https://code.google.com/p/ruffus/ · ReStructuredText · 80 lines · 47 code · 33 blank · 0 comment · 0 complexity · 6b382e249aaa1e64174c66ab9ab87308 MD5 · raw file

  1. .. include:: ../../global.inc
  2. .. _Simple_Tutorial_7th_step:
  3. .. index::
  4. pair: @merge; Tutorial
  5. ###################################################################
  6. Step 7: Merging results back together
  7. ###################################################################
  8. * :ref:`Simple tutorial overview <Simple_Tutorial>`
  9. * :ref:`@merge in detail <decorators.merge>`
  10. .. note::
  11. Remember to look at the example code:
  12. * :ref:`Python Code for step 7 <Simple_Tutorial_7th_step_code>`
  13. Now that we have all the partial solutions in ``*.sums``, we can merge them
  14. together to generate the final answer: the variance of all 100,000 random
  15. numbers.
  16. **************************************************************************************
  17. Calculating variances from the sums and sum of squares of all chunks
  18. **************************************************************************************
  19. If we add up all the sums, and sum of squares we calculated previously, we can
  20. obtain the variance as follows::
  21. variance = (sum_squared - sum * sum / N)/N
  22. where ``N`` is the number of values
  23. See the `wikipedia <http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance>`_ entry for a discussion of
  24. why this is a very naive approach!
  25. To do this, all we have to do is merge together all the values in ``*.sums``, i.e.
  26. add up the ``sums`` and ``sum_squared`` for each chunk. We can then apply the above (naive) formula.
  27. Merging files is straightforward in **Ruffus**:
  28. .. image:: ../../images/simple_tutorial_merge1.png
  29. .. ::
  30. ::
  31. @merge(step_6_calculate_sum_of_squares, "variance.result")
  32. def step_7_calculate_variance (input_file_names, output_file_name):
  33. #
  34. # add together sums and sums of squares from each input_file_name
  35. # calculate variance and write to output_file_name
  36. ""
  37. The :ref:`@merge <decorators.merge>` decorator tells *Ruffus* to take all the files from the step 6 task (i.e. ``*.sums``),
  38. and produced a merged file in the form of ``"variance.result"``.
  39. Thus if ``step_6_calculate_sum_of_squares`` created
  40. | ``1.sums`` and
  41. | ``2.sums`` etc.
  42. This would result in the following function call:
  43. .. image:: ../../images/simple_tutorial_merge2.png
  44. .. ::
  45. ::
  46. step_7_calculate_variance (["1.sums", "2.sums"], "variance.result")
  47. The final result is, of course, in ``"variance.result"``.