PageRenderTime 12ms CodeModel.GetById 10ms app.highlight 0ms RepoModel.GetById 1ms app.codeStats 0ms

/PAPER

http://github.com/fizx/parsley
#! | 36 lines | 28 code | 8 blank | 0 comment | 0 complexity | e91728b534f974edc14aab2c3d18915d MD5 | raw file
 1Abstract
 2================================================================
 3A common programming task is data extraction from xml and html documents.  I introduce parsley, an embedded language (ala SQL, regular expressions) that improves the usability and/or speed of current extraction techniques.
 4
 5Introduction
 6================================================================
 7
 8Today, developers use a couple toolsets to do data extraction.  Many developers use libraries like Hpricot for Ruby and Beautiful Soup for Python.  These libraries allow extraction of xml subtrees via XPath or CSS selectors.  These subtrees are futher refined using the scripting language, often with the help of regular expressions.
 9
10Other developers use XSLT.  While fast, mature, and conceptually elegant, XSLT
11
12- current techniques
13- benefits of standardization
14- best of current
15
16Features
17================================================================
18- integrated grammars
19  - with some expression examples
20- multiple elements, one pass / context switching
21- exslt / standard library
22- json
23- language integration
24- pruning
25- structural parsing
26
27Examples
28- Ruby/python/json
29- structural parse
30- 
31
32Benchmarks
33- size comparision with XSLT
34- speed comparision with nokogiri, hpricot
35
36Conclusion