/doc/tutorials/manual/parallel_processing.rst
https://code.google.com/p/ruffus/ · ReStructuredText · 52 lines · 33 code · 19 blank · 0 comment · 0 complexity · 0b7ac7dd2ecf2cf5e3dc88e858402c8f MD5 · raw file
- .. include:: ../../global.inc
- .. include:: chapter_numbers.inc
- .. _manual.multiprocessing:
- #######################################################################################
- |manual.multiprocessing.chapter_num|: `Running Tasks and Jobs in parallel`
- #######################################################################################
- .. hlist::
-
- * :ref:`Manual overview <manual>`
- =====================
- Multi Processing
- =====================
- *Ruffus* uses python `multiprocessing <http://docs.python.org/library/multiprocessing.html>`_ to run
- each job in a separate process.
-
- This means that jobs do *not* necessarily complete in the order of the defined parameters.
- Task hierachies are, of course, inviolate: upstream tasks run before downstream, dependent tasks.
-
- Tasks that are independent (i.e. do not precede each other) may be run in parallel as well.
-
- The number of concurrent jobs can be set in :ref:`pipeline_run<pipeline_functions.pipeline_run>`:
- ::
-
- pipeline_run([parallel_task], multiprocess = 5)
-
-
- If ``multiprocess`` is set to 1, then jobs will be run on a single process.
-
- =====================
- Data sharing
- =====================
-
- Running jobs in separate processes allows *Ruffus* to make full use of the multiple
- processors in modern computers. However, some of the
- `multiprocessing guidelines <http://docs.python.org/library/multiprocessing.html#multiprocessing-programming>`_
- should be borne in mind when writing *Ruffus* pipelines. In particular:
-
- * Try not to pass large amounts of data between jobs, or at least be aware that this has to be marshalled
- across process boundaries.
-
- * Only data which can be `pickled <http://docs.python.org/library/pickle.html>`_ can be passed as
- parameters to *Ruffus* task functions. Happily, that applies to almost any Python data type.
- The use of the rare, unpicklable object will cause python to complain (fail) loudly when *Ruffus* pipelines
- are run.