PageRenderTime 41ms CodeModel.GetById 35ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/tutorials/manual/parallel_processing.rst

https://code.google.com/p/ruffus/
ReStructuredText | 52 lines | 33 code | 19 blank | 0 comment | 0 complexity | 0b7ac7dd2ecf2cf5e3dc88e858402c8f MD5 | raw file
 1.. include:: ../../global.inc
 2.. include:: chapter_numbers.inc
 3
 4.. _manual.multiprocessing:
 5
 6#######################################################################################
 7|manual.multiprocessing.chapter_num|: `Running Tasks and Jobs in parallel`
 8#######################################################################################
 9    .. hlist::
10    
11       * :ref:`Manual overview <manual>` 
12
13=====================
14Multi Processing
15=====================
16
17    *Ruffus* uses python `multiprocessing <http://docs.python.org/library/multiprocessing.html>`_ to run
18    each job in a separate process.
19    
20    This means that jobs do *not* necessarily complete in the order of the defined parameters.
21    Task hierachies are, of course, inviolate: upstream tasks run before downstream, dependent tasks.
22    
23    Tasks that are independent (i.e. do not precede each other) may be run in parallel as well.
24    
25    The number of concurrent jobs can be set in :ref:`pipeline_run<pipeline_functions.pipeline_run>`:
26
27        ::
28        
29            pipeline_run([parallel_task], multiprocess = 5)
30        
31        
32    If ``multiprocess`` is set to 1, then jobs will be run on a single process.
33
34    
35=====================
36Data sharing
37=====================
38    
39    Running jobs in separate processes allows *Ruffus* to make full use of the multiple
40    processors in modern computers. However, some of the 
41    `multiprocessing guidelines <http://docs.python.org/library/multiprocessing.html#multiprocessing-programming>`_
42    should be borne in mind when writing *Ruffus* pipelines. In particular:
43    
44    * Try not to pass large amounts of data between jobs, or at least be aware that this has to be marshalled
45      across process boundaries.
46      
47    * Only data which can be `pickled <http://docs.python.org/library/pickle.html>`_ can be passed as 
48      parameters to *Ruffus* task functions. Happily, that applies to almost any Python data type.
49      The use of the rare, unpicklable object will cause python to complain (fail) loudly when *Ruffus* pipelines
50      are run.
51      
52