PageRenderTime 26ms CodeModel.GetById 18ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

/CHANGES.TXT

https://code.google.com/p/ruffus/
Plain Text | 593 lines | 502 code | 91 blank | 0 comment | 0 complexity | be4ea61b290fb379d8d324e03f692c3c MD5 | raw file
  1= v. 2.3=
  2    _03/October/2011_
  3    ==`@active_if` turns off tasks at runtime==
  4        * Issue 36
  5        * Design and initial implementation from Jacob Biesinger
  6        * Takes one or more parameters which can be either booleans or functions or callable objects which return True / False
  7        * The expressions inside @active_if are evaluated each time 
  8          `pipeline_run`, `pipeline_printout` or `pipeline_printout_graph` is called.
  9        * Dormant tasks behave as if they are up to date and have no output.
 10    ==Command line parsing for pipelines==
 11        * From "Issue 44"
 12        * Added ruffus/cmdline.py
 13        * Supports both argparse (python 2.7) and optparse (python 2.6):
 14        * The following options are defined by default:
 15          {{{
 16    --verbose
 17    --version
 18    --log_file
 19
 20-t, --target_tasks
 21-j, --jobs
 22-n, --just_print
 23    --flowchart
 24    --key_legend_in_graph
 25    --draw_graph_horizontally
 26    --flowchart_format
 27    --forced_tasks
 28           }}}
 29        * Usage with argparse (Python > 2.7):
 30          {{{
 31from ruffus import *
 32
 33parser = cmdline.get_argparse(   description='WHAT DOES THIS PIPELINE DO?')
 34
 35# for example...
 36parser.add_argument("--input_file")
 37
 38options = parser.parse_args()
 39
 40#  optional logger which can be passed to ruffus tasks
 41logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose)
 42
 43#_____________________________________________________________________________________
 44#   pipelined functions go here
 45#_____________________________________________________________________________________
 46
 47cmdline.run (options)
 48          }}}
 49        * Usage with optparse (Python 2.6):
 50          {{{
 51from ruffus import *
 52
 53parser = cmdline.get_optgparse(version="%prog 1.0", usage = "\n\n    %prog [options]")
 54
 55# for example...
 56parser.add_option("-c", "--custom", dest="custom", action="count")
 57
 58(options, remaining_args) = parser.parse_args()
 59
 60#  logger which can be passed to ruffus tasks
 61logger, logger_mutex = cmdline.setup_logging ("this_program", options.log_file, options.verbose)
 62
 63#_____________________________________________________________________________________
 64#   pipelined functions go here
 65#_____________________________________________________________________________________
 66
 67cmdline.run (options)
 68          }}}
 69    ==Optionally terminate pipeline after first exception==
 70        * To have all exceptions interrupt immediately:
 71          {{{
 72pipeline_run(..., exceptions_terminate_immediately = True)
 73          }}}
 74        * From "Issue 43"
 75        * By default ruffus accumulates `NN` errors before interrupting the pipeline prematurely. `NN` is the specified parallelism for `pipeline_run(...)`. 
 76        * By default, a pipeline will only be interrupted immediately if exceptions of type `ruffus.JobSignalledBreak` are thrown.
 77    ==Display exceptions without delay==
 78        * To see exceptions as they occur:
 79          {{{
 80pipeline_run(..., log_exceptions = True)
 81          }}}
 82        * From "Issue 43"
 83        * By default, Ruffus re-throws exceptions in ensemble after pipeline termination.
 84        * `logger.error(...)` will be invoked with the string representation of the each exception, and associated stack trace.
 85        * The default logger prints to sys.stderr, but this can be changed to any class from the logging module or compatible object via `pipeline_run(..., logger = ???)`
 86    ==`@split` operations now show the 1->many output in pipeline_printout==
 87        * From "Issue 45"
 88        * New output
 89          {{{
 90Task = split_animals
 91     Job = [None
 92           -> cows
 93           -> horses
 94           -> pigs
 95            , any_extra_parameters]
 96          }}}
 97    ==Improved display from `pipeline_printout()`==
 98        * File date and time are displayed in human readable form and out of date
 99          files are flagged with asterisks. 
100
101
102
103= v. 2.2=
104    _21/July/2010_
105    ==Parameter substitution for `inputs(...)` / `add_inputs(...)`==
106        `glob`s and tasks can be added as the prerequisites / input files using
107        `inputs(...)` and `add_inputs(...)`. `glob` expansions will take place when the task
108        is run.
109    ==Simplifying `@transform` syntax with suffix==
110        Regular expressions within ruffus are very powerful, and can allow files to be moved
111        from one directory to another and renamed at will.<br><br>
112        However, using consistent file extensions and
113        `@transform(..., suffix(...))` makes the code much simpler and easier to read. <br><br>
114        Previously, `suffix(...)` did not cooperately well with `inputs(...)`.
115        For example, finding the corresponding header file (``'.h'``) for the matching input
116        required a complicated `regex(...)` regular expression and `input(...)`. This simple case,
117        e.g. matching ``'something.c'`` with ``'something.h'``, is now much easier in Ruffus.<br><br>
118        For example:
119          {{{
120source_files = ["something.c", "more_code.c"]
121@transform(source_files, suffix(".c"), add_inputs(r"\1.h", "common.h"), ".o")
122def compile(input_files, output_file):
123    ( source_file,
124      header_file,
125      common_header) = input_files
126    # call compiler to make object file
127          }}}
128          This is equivalent to calling:
129          {{{
130compile(["something.c", "something.h", "common.h"], "something.o")
131compile(["more_code.c", "more_code.h", "common.h"], "more_code.o")
132          }}}
133
134        The `\1` matches everything *but* the suffix and will be applied to both `glob`s and file names.<br>
135        For simplicity and compatibility with previous versions, there is always an implied `r"\1"` before
136        the output parameters. I.e. output parameters strings are *always* substituted.<br>
137        
138    ==Advanced form of `@split`:==
139        The standard `@split` divided one set of inputs into multiple outputs (the number of which
140        can be determined at runtime).<br>
141        This is a `one->many` operation.<br><br>
142        An advanced form of `@split` has been added which can split each of several files further.<br>
143        In other words, this is a `many->"many more"` operation.<br><br>
144        For example, given three starting files:
145        {{{
146original_files = ["original_0.file",
147                  "original_1.file",
148                  "original_2.file"]
149        }}}
150        We can split each into its own set of sub-sections:
151        {{{
152@split(original_files,
153   regex(r"starting_(\d+).fa"),                         # match starting files
154         r"files.split.\1.*.fa"                         # glob pattern
155         r"\1")                                         # index of original file
156def split_files(input_file, output_files, original_index):
157    """
158        Code to split each input_file
159            "original_0.file" -> "files.split.0.*.fa"
160            "original_1.file" -> "files.split.1.*.fa"
161            "original_2.file" -> "files.split.2.*.fa"
162    """
163        }}}
164        This is, conceptually, the reverse of the @collate(...) decorator
165    ==Ruffus will complain about unescaped regular expression special characters:==
166        Ruffus uses ``'\1'`` and ``'\2'`` in regular expression substitutions. Even seasoned python
167        users may not remember that these have to be 'escaped' in strings. The best option is
168        to use 'raw' python strings e.g. `r"\1_substitutes\2correctly\3four\4times"`.<br>
169        Ruffus will throw an exception if it sees an unescaped ``'\1'`` or ``'\2'`` in a file name,
170        which should catch most of these bugs.
171    ==Flowchart changes:==
172        Changed to nicer colours, symbols etc. for a more professional look.
173                Colours, size and resolution are now fully customisable. An svg bug in firefox has
174                been worked around so that font size are displayed correctly
175                {{{
176pipeline_printout_graph( #...
177                        user_colour_scheme = {
178                                              "colour_scheme_index":1
179                                              "Task to run"  : {"fillcolor":"blue"},
180                                               pipeline_name : "My flowchart",
181                                               size          : (11,8),
182                                               dpi           : 120)})
183                }}}
184    ==Bug Fix:==
185        * From "Issue 27"
186            Previously, Ruffus paused for one second after each job.
187            This accomodates poor (one second) timestamp precision in some older file systems (ext3?),  
188            and makes sure that output from the previous tasks has a different
189            timestamp from that of the following task.<br><br>
190            Unfortunately, Ruffus (was too clever by half and) paused only when the jobs were less
191            than a second in duration. 
192            Output files may be created at the end of a task, and
193            the timestamps checked at the beginning of the following task. We thus *always* need a 
194            gap of > 1 seconds between tasks in older filesystems, whether the jobs are long or short.<br><br>
195            The fix is to introduce a pause before the first job of each task.
196            (See `one_second_per_job` in `task.py:make_job_parameter_generator(...)`)<br><br>
197            As previously, if you are using a modern file system (e.g. ext4 / JFS / NTFS), you can avoid these unnecessary pauses by setting the `one_second_per_job` flag:
198            {{{
199pipeline_run(one_second_per_job=False)
200            }}}
201        * From "Issue 30"
202            @split with empty input files crashes Ruffus 
203    ==Documentation changes:==
204        * New bioinformatics example
205        * New contributed Gallery of flowcharts
206
207
208
209= v. 2.1.1=
210    _12/March/2010_
211    ==Bug Fix:==
212        * From "Issue 26"
213          The code "with job_limit_semaphore" breaks compatability with python 2.5<br>
214          Thanks to patch from S. Binet
215        * From "Issue 25"
216          @merge forwarding single arguments to @merge erroneously as lists<br>
217          Thanks to A. heger.
218        * @transform(..., suffix(...), inputs(...))
219          Suffix substitution should not have been taking place within `inputs()`. This
220          makes it pass a file name to `inputs()` without suffix substitution.
221          `Regex()` regular expression substitution continues to take place within `inputs()`<br>
222          However, see changes in v. 2.2
223    ==Documentation changes:==
224        * New step in tutorial emphasising the value of Pipeline_printout(...) in pipeline development
225        * pipeline_printout discussion in the manual.
226        * @jobs_limit directive described in the manual.
227        * Advance uses of @split described in the manual.
228        * touch_files_only parameter described in the manual.
229        * `add_inputs(...)` parameter described in the manual.
230    ==`@transform(.., add_inputs(...))`==
231        * `inputs(...)` allowed the addition of extra input dependencies / parameters for each job.
232          For example, compiling a source file might require pulling in a corresponding 
233          header file.
234          However, replacing all the input parameters always seemed a very blunt instrument
235          just to inject an extra dependency (e.g. a header file).
236        * `add_inputs(...)`, as the name suggests, just adds the additional items as the input parameter
237          {{{
238from ruffus import *
239@transform(["a.input", "b.input"], suffix(".input"), add_inputs("just.1.more","just.2.more"), ".output")
240def task(i, o):
241  ""
242          }}}
243          produces:
244          {{{
245Job = [[a.input, just.1.more, just.2.more] ->a.output]
246Job = [[b.input, just.1.more, just.2.more] ->b.output]
247          }}}
248        * like `inputs`, `add_inputs` accepts strings, tasks and globs
249          This minor syntactic change promises to add much clarity to some of our
250          Ruffus code.
251        * `add_inputs()` is available for `@transform`, `@collate` and `@split`
252   
253
254
255= v. 2.1.0=
256    _2/March/2010_
257    ==Bug Fix:==
258        * From "Issue 25".
259          Regression for v. 2.0.10
260          @files forwarding single arguments as lists.
261          (Thanks to A. Heger)
262    ==@jobs_limit directive==
263        * Some tasks are resource intensive and too many jobs should not be run at the 
264          same time. Examples include disk intensive operations such as unzipping, or 
265          downloading from FTP sites. 
266          Adding 
267          {{{
268@jobs_limit(4)
269@transform(new_data_list, suffix(".big_data.gz"), ".big_data")
270def unzip(i, o):
271  "unzip code goes here"
272          }}}
273          would limit the unzip operation to 4 jobs at a time, even if the rest of the
274          pipeline runs highly in parallel.
275          (Thanks to R. Young for suggesting this.)
276
277= v. 2.0.10=
278    _27/February/2010_
279    ==pipeline_run(..., touch_files_only = True)==
280        * This will only `touch` output files for each job without running the 
281          python function. I.e. The output files are updated if they are old, or created
282          if missing.
283          This can be useful for simulating a pipeline run so that all files look as
284          if they are up-to-date.
285        Caveats:
286        * This may not work correctly where output files are only determined at runtime, e.g. with @split
287        * Only the output from pipelined jobs which are currently out-of-date will be touched.
288          In other words, the pipeline runs *as normal*, the only difference is that the
289          output files are touched instead of being created by the python task functions
290          which would otherwise have been invoked.
291    ==parameter substitution for inputs(...)==
292        * The inputs(...) parameter in @transform, @collate can now take tasks and globs,
293          and these will be substituted appropriately (after regular expression replacement).
294    ==Bug Fix:==
295        * From "Issue 21".
296          Empty @files specifications no longer throw exceptions.
297          If verbose logging is on, a warning is printed.
298          (Thanks to A. Heger)
299
300
301= v. 2.0.9=
302    _25/February/2010_
303    ==Bug Fix:==
304        * From "Issue 10".
305          Source code directory under svn is now in /ruffus rather than src/ruffus
306          (Thanks to P.J. Davis)
307        * Better display of @split parameters when logging output
308          The output parameters in @split should not be expanded 
309          if they are wildcards. This was previously handled as a special case. Now
310          all parameter factories return two sets of parameters:
311          The first to go to jobs, the second for displaying in trace logs.
312        * Pipeline_printout defaults to verbose of 1. Verbose of 0 does nothing
313          (Thanks to L.S.G)
314        * The "Start Task" log message at verbosity of 3 was misleading.
315          This is only when the task enters the queue. 
316          If there are multiple independent tasks, they may all enter the queue at the 
317          same time even with multiprocess=1. Jobs will be run one at a time.                                     
318          (Thanks to C. Nellaker.)
319    ==Advanced form of split:==
320        * Previously split only takes 1 set of input (tasks/files/globs) and split these into an indeterminate number of output.
321          The new advanced form of split takes multiple input, and splits EACH of these
322          further. I.e. it is like a combination of @split and @transform.
323          For example:
324          {{{
325@split(get_files, regex(r"(.+).original"), r"\1.*.split")
326def split_files(i, o): 
327     pass
328          }}}
329          This experimental feature will be in beta without documentation. Caveat utilitor!
330            
331
332
333
334
335= v. 2.0.8=
336    _22/January/2010_
337    ==Bug Fix:==
338        * Now accepts unicode file names: 
339          Change `isinstance(x,str)` to `isinstance(x, basestring)`
340          (Thanks to P.J. Davis for contributing this.)
341        * inputs(...) now raises an exception when passed multiple arguments.
342          If the input parameter is being passed a tuple, add an extra set of enclosing
343          brackets. Documentation updated accordingly.
344          (Thanks to Z. Harris for spotting this.)
345        * tasks where regular expressions are incorrectly specified are a great source of frustration
346          and puzzlement.
347          Now if no regular expression matches occur, a warning is printed
348          (Thanks to C. Nellaker for suggesting this)
349
350= v. 2.0.7=
351    _11/December/2009_
352    ==Bug Fix:==
353        * graph printout blows up because of missing run time data error
354          (Thanks to A. Heger for reporting this!)
355
356
357= v. 2.0.6=
358    _10/December/2009_
359    ==Bug Fix:==
360        * several minor bugs
361        * better error messages when eoncountering decorator problems when checking if the pipeline is uptodate
362        * Exception when output specifications in @split were expanded (unnecessarily) in logging.
363          (Thanks to N. Spies for reporting this!)
364
365= v. 2.0.4=
366    _22/November/2009_
367    ==Bug Fix:==
368        * task.get_job_names() dies for jobs with no parameters
369        * JobSignalledBreak was not exported
370
371= v. 2.0.3=
372    _18/November/2009_
373    ==Bug Fix:==
374        * @transform accepts single file names. Thanks Chris N.
375
376= v. 2.0.2=
377    _18/November/2009_
378    ==Better Logging:==
379        * pipeline_printout output much prettier
380        * pipeline_run at high verbose levels 
381
382          Shows which tasks are being checked
383          to see if they are up-to-date or not
384    ==Documentation:==
385        * New tutorial
386        * New manual
387        * pretty code figures
388
389= v. 2.0.1=
390    _18/November/2009_
391    All unit tests passed
392    ==Bug Fix:==
393        * Numerous bugs to do with ordering of glob / job output consistency
394
395= v. 2.0.1 beta4=
396    _16/November/2009_
397    ==Bug Fix:==
398        * Fixed problems with tasks depending on @split
399
400= v. 2.0 beta=
401    _30/October/2009_
402    With the experience and feedback over the past few months, I have reworked **Ruffus** 
403    completely mainly to make the syntax more transparent, with fewer gotchas.
404    Previous limitations to do with parameters have been removed.
405    The experience with what *Ruffus* calls "Indicator Objects" has been very positive
406    and there are more of them. 
407    These are dummy classes with obvious names like "regex" or "suffix" which indicate the
408    type of optional parameters much like named parameters.
409
410    ==New Decorators:==
411        * @split
412        * @merge
413        * @transform
414        * @collate
415
416    ==Deprecated Decorators:==
417        * @files_re
418          Functionality is divided among the new decorators
419            
420    ==New Features:==
421        * Files can be chained from task to task, implicit dependencies are inferred automatically
422        * Limitations on parameters removed. Any degree of nesting is allowed.
423        * Strings contain glob letters ``[]?*`` automatically inferred as globs and expanded
424        * input and output parameters containing strings assumed to be filenames, whatever the nested data structures they are found in
425
426    ==Documentation:==
427        * New documentation almost complete
428        * New Simplified 7 step tutorial
429        * New manual work in progress
430
431    ==Bug Fix:==
432        * Scheduling errors
433
434= v. 1.1.4=
435    _15/October/2009_
436
437    ==New Feature:==
438        * Tasks can get their input by automatically chaining to the output from one or more parent tasks using the `@files_re`
439        * Added example showing how files can be split up into multiple jobs and then recombined
440           # Run `test/test_filesre_split_and_combine.py` with `-v|--verbose` `-s|--start_again`
441           # Run with `-D|--debug` to test.
442        * Documentation to follow
443
444    ==Bug Fix:==
445        * Scheduling race conditions
446
447= v. 1.1.3=
448    _14/October/2009_
449
450    ==Bug Fix:==
451        * Minor (but show stopping) bug in task.generic_job_descriptor
452
453= v. 1.1.2=
454    _9/October/2009_
455    
456    ==Bug Fix:==
457        * Nasty (long standing) bug for single job tasks only decorated with `@follows(mkdir(...))` to be caught in an infinite loop
458
459    ==Code Changes:==
460        * Add example of combining multiple input files depending on a regular expression pattern. 
461           # Run `test/test_filesre_combine.py` with -v (verbose)
462           # Run with -D (debug) to test.
463
464
465
466= v. 1.1.1=
467    _8/October/2009_
468    
469    ==New Feature:==
470        * _Combine multiple input files using a regular expression_
471        * Added `combine` syntax to `@files_re` decorators:
472        * Documentation to follow...
473        * Example from `src/ruffus/test/test_branching_dependencies.py`:
474        {{{
475@files_re('*.*', '(.*/)(.*\.[345]$)', combine(r'\1\2'), r'\1final.6')
476def test(input_files, output_files):
477  pass`
478        }}}     
479        * will take all files in the current directory
480        * will identify files which end in `.3`,  `.4` and `.5` as input files
481        * will use `final.6` as the output file
482        * `input_files  == [a.3, a.4, b.3, b.5]`  (for example)
483        * `output_files == [final.6]` 
484           
485    ==Bug Fix:==
486        * All (known) bugs for running jobs from independent tasks in parallel
487
488
489
490= v. 1.0.9=
491    _8/October/2009_
492    
493    ==New Feature:==
494        _Multitasking independent tasks_
495        * In a major piece of retooling, jobs from independent tasks which do not         depend on each other will be run in parallel.
496        * This involves major changes to the scheduling code. 
497        * Please contact me asap if anything breaks.
498
499    ==Code Changes:==
500        * Add example of independent tasks running concurrently in
501          `test/test_branching_dependencies.py`
502          * Run with -v (verbose) and -j 1 or -j 10 to show the indeterminancy of multiprocessing.
503          * Run with -D (debug) to test.
504
505= v. 1.0.8=
506    _12/August/2009_
507    
508    ==Documentation:==
509        * Errors fixed. Thanks to Matteo Bertini!
510
511    ==Code Changes:==
512        * Added functions which print out job parameters more prettily.
513        * `task.shorten_filenames_encoder`
514        * `task.ignore_unknown_encoder`
515        * Parameters which look like file paths will only have the file part printed
516          (i.e. `"/a/b/c" -> 'c'`)
517        * Test scripts `simpler_with_shared_logging.py` and `test_follows_mkdir.py`
518          have been changed to test for this.
519
520
521= v. 1.0.7=
522    _17/June/2009_
523    
524    ==Code Changes:==
525        * Added `proxy_logger` module for accessing a shared log across multiple jobs in
526          different processes.
527
528= v. 1.0.6=
529    _12/June/2009_
530    
531    ==Bug fix:==
532        * _Ruffus_ version module (`ruffus_version.py`) links fixed
533          Soft links in linux do not travel well
534        * `mkdir` now can take a list of strings
535          added test case
536
537    ==Documentation:==
538        * Added history of changes
539
540= v. 1.0.5=
541    _11/June/2009_
542
543    ==Bug fix:==
544        * Changed "graph_printout" to `pipeline_printout_graph` in documentation.
545          This function had been renamed in the code but not in the documentation :-(
546
547    ==Documentation:==
548        * Added example for sharing synchronising data between jobs.
549          This shows how different jobs can write to a common log file while still leaveraging the full power of _ruffus_.
550        
551
552    ==Code Changes:==
553        * The graph and print_dependencies modules are no longer exported by default from task.
554          Please email me if this breaks anything.
555        * More informative error message when refer to unadorned (without _Ruffus_ decorators) python functions as pipelined Tasks
556        * Added Ruffus version module `ruffus_version.py`
557
558
559
560= v. 1.0.4=
561    _05/June/2009_
562    ==Bug fix: ==
563        * `task.task_names_to_tasks` did not include tasks specified by function rather than name
564        * `task.run_all_jobs_in_task` did not work properly without multiprocessing (# of jobs = 1)
565        * `task.pipeline_run` only uses multiprocessing pools if `multiprocess` (# of jobs)  > 1
566    
567    ==Changes to allow python 2.4/2.5 to run:==
568        * `setup.py` changed to remove dependency
569        * `simplejson` can be loaded instead of python 2.6 `json` module
570        * Changed `NamedTemporaryFile` to `mkstemp` because delete parameter is not available before python 2.6
571
572    ==Windows programmes==
573        It is necessary to protect the "entry point" of the program under windows.
574        Otherwise, a new process with be created recursively, like the magicians's apprentice
575        See: http://docs.python.org/library/multiprocessing.html#multiprocessing-programming
576
577= v. 1.0.3=
578    _04/June/2009_
579    ==Documentation ==
580        
581        Including SGE `qrsh` workaround in FAQ.
582
583= v. 1.0.1=
584    _22/May/2009_
585    ==Add simple tutorial.==
586    
587        No major bugs so far...!!
588
589= v. 1.0.0 beta =
590    _28/April/2009_
591
592    Initial Release in Oxford       
593