/doc/tutorials/manual/manual_introduction.rst

https://code.google.com/p/ruffus/ · ReStructuredText · 254 lines · 172 code · 82 blank · 0 comment · 0 complexity · 58bdd9f654ced46741d93f0abc12821c MD5 · raw file

  1. .. include:: ../../global.inc
  2. .. _manual.introduction:
  3. .. index::
  4. pair: manual; introduction
  5. ####################################################################
  6. **Ruffus** Manual
  7. ####################################################################
  8. | The chapters of this manual go through each of the features of **Ruffus** in turn.
  9. | Some of these (especially those labelled **esoteric** or **deprecated**) may not
  10. be of interest to all users of **Ruffus**.
  11. If you are looking for a quick introduction to **Ruffus**, you may want to look at the
  12. :ref:`Simple Tutorial <Simple_Tutorial>` first, some of which content is shared with,
  13. or elaborated on, by this manual.
  14. ***************************************
  15. **Ruffus** Manual: Table of Contents:
  16. ***************************************
  17. .. toctree::
  18. :maxdepth: 1
  19. follows.rst
  20. tasks_as_recipes.rst
  21. files.rst
  22. tasks_and_globs_in_inputs.rst
  23. tracing_pipeline_parameters.rst
  24. parallel_processing.rst
  25. split.rst
  26. transform.rst
  27. merge.rst
  28. posttask.rst
  29. jobs_limit.rst
  30. dependencies.rst
  31. onthefly.rst
  32. collate.rst
  33. advanced_transform.rst
  34. parallel.rst
  35. check_if_uptodate.rst
  36. exceptions.rst
  37. logging.rst
  38. files_re.rst
  39. ***************************************
  40. Introduction
  41. ***************************************
  42. The **Ruffus** module is a lightweight way to run computational pipelines.
  43. Computational pipelines often become quite simple
  44. if we breakdown the process into simple stages.
  45. .. note::
  46. Ruffus refers to each stage of your pipeline as a :term:`task`.
  47. | Let us start with the usual "Hello World".
  48. | We have the following two python functions which
  49. we would like to turn into an automatic pipeline:
  50. .. image:: ../../images/simple_tutorial_hello_world.png
  51. .. ::
  52. ::
  53. def first_task():
  54. print "Hello "
  55. def second_task():
  56. print "world"
  57. The simplest **Ruffus** pipeline would look like this:
  58. .. image:: ../../images/simple_tutorial_intro_follows.png
  59. .. ::
  60. ::
  61. from ruffus import *
  62. def first_task():
  63. print "Hello "
  64. @follows(first_task)
  65. def second_task():
  66. print "world"
  67. pipeline_run([second_task])
  68. The functions which do the actual work of each stage of the pipeline remain unchanged.
  69. The role of **Ruffus** is to make sure these functions are called in the right order,
  70. with the right parameters, running in parallel using multiprocessing if desired.
  71. There are three simple parts to building a **ruffus** pipeline
  72. #. importing ruffus
  73. #. "Decorating" functions which are part of the pipeline
  74. #. Running the pipeline!
  75. .. _manual.introduction.import:
  76. .. index::
  77. single: importing ruffus
  78. ****************************
  79. Importing ruffus
  80. ****************************
  81. The most convenient way to use ruffus is to import the various names directly:
  82. ::
  83. from ruffus import *
  84. This will allow **ruffus** terms to be used directly in your code. This is also
  85. the style we have adopted for this manual.
  86. .. csv-table::
  87. :header: "Category", "Terms"
  88. :stub-columns: 1
  89. "*Pipeline functions*", "
  90. ::
  91. pipeline_printout
  92. pipeline_printout_graph
  93. pipeline_run
  94. register_cleanup"
  95. "*Decorators*", "
  96. ::
  97. @follows
  98. @files
  99. @split
  100. @transform
  101. @merge
  102. @collate
  103. @posttask
  104. @jobs_limit
  105. @parallel
  106. @check_if_uptodate
  107. @files_re"
  108. "*Loggers*", "
  109. ::
  110. stderr_logger
  111. black_hole_logger"
  112. "*Parameter disambiguating Indicators*", "
  113. ::
  114. suffix
  115. regex
  116. inputs
  117. touch_file
  118. combine
  119. mkdir
  120. output_from"
  121. If any of these clash with names in your code, you can use qualified names instead:
  122. ::
  123. import ruffus
  124. ruffus.pipeline_printout("...")
  125. .. index::
  126. pair: decorators; Manual
  127. .. _manual.introduction.decorators:
  128. ****************************
  129. "Decorating" functions
  130. ****************************
  131. You need to tag or :term:`decorator` existing code to tell **Ruffus** that they are part
  132. of the pipeline.
  133. .. note::
  134. :term:`decorator`\ s are ways to tag or mark out functions.
  135. They start with an ``@`` prefix and take a number of parameters in parenthesis.
  136. .. image:: ../../images/simple_tutorial_decorator_syntax.png
  137. The **ruffus** decorator :ref:`@follows <decorators.follows>` makes sure that
  138. ``second_task`` follows ``first_task``.
  139. | Multiple :term:`decorator`\ s can be used for each :term:`task` function to add functionality
  140. to *Ruffus* pipeline functions.
  141. | However, the decorated python functions can still be
  142. called normally, outside of *Ruffus*.
  143. | *Ruffus* :term:`decorator`\ s can be added to (stacked on top of) any function in any order.
  144. * :ref:`More on @follows in |manual.follows.chapter_num| <manual.follows>`
  145. * :ref:`@follows syntax in detail <decorators.follows>`
  146. .. index::
  147. pair: Running the pipeline; Manual
  148. pair: pipeline_run; Manual
  149. .. _manual.introduction.running_pipeline:
  150. ****************************
  151. Running the pipeline
  152. ****************************
  153. We run the pipeline by specifying the **last** stage (:term:`task` function) of your pipeline.
  154. Ruffus will know what other functions this depends on, following the appropriate chain of
  155. dependencies automatically, making sure that the entire pipeline is up-to-date.
  156. In our example above, because ``second_task`` depends on ``first_task``, both functions are executed in order.
  157. ::
  158. >>> pipeline_run([second_task], verbose = 1)
  159. **Ruffus** by default prints out the ``verbose`` progress through your pipeline,
  160. interleaved with our ``Hello`` and ``World``.
  161. .. image:: ../../images/simple_tutorial_hello_world_output.png
  162. .. ::
  163. ::
  164. >>> pipeline_run([second_task], verbose = 1)
  165. Start Task = first_task
  166. Hello
  167. Job completed
  168. Completed Task = first_task
  169. Start Task = second_task
  170. world
  171. Job completed
  172. Completed Task = second_task