PageRenderTime 92ms CodeModel.GetById 51ms app.highlight 35ms RepoModel.GetById 1ms app.codeStats 0ms

/tools/Ruby/lib/ruby/1.8/rdoc/markup/simple_markup.rb

http://github.com/agross/netopenspace
Ruby | 476 lines | 169 code | 61 blank | 246 comment | 32 complexity | 5f659ce8f792455a9f7139810db989b0 MD5 | raw file
  1# = Introduction
  2#
  3# SimpleMarkup parses plain text documents and attempts to decompose
  4# them into their constituent parts. Some of these parts are high-level:
  5# paragraphs, chunks of verbatim text, list entries and the like. Other
  6# parts happen at the character level: a piece of bold text, a word in
  7# code font. This markup is similar in spirit to that used on WikiWiki
  8# webs, where folks create web pages using a simple set of formatting
  9# rules.
 10#
 11# SimpleMarkup itself does no output formatting: this is left to a
 12# different set of classes.
 13#
 14# SimpleMarkup is extendable at runtime: you can add new markup
 15# elements to be recognised in the documents that SimpleMarkup parses.
 16#
 17# SimpleMarkup is intended to be the basis for a family of tools which
 18# share the common requirement that simple, plain-text should be
 19# rendered in a variety of different output formats and media. It is
 20# envisaged that SimpleMarkup could be the basis for formating RDoc
 21# style comment blocks, Wiki entries, and online FAQs.
 22#
 23# = Basic Formatting
 24#
 25# * SimpleMarkup looks for a document's natural left margin. This is
 26#   used as the initial margin for the document.
 27#
 28# * Consecutive lines starting at this margin are considered to be a
 29#   paragraph.
 30#
 31# * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
 32#   taken to be the start of a list. The margin in increased to be the
 33#   first non-space following the list start flag. Subsequent lines
 34#   should be indented to this new margin until the list ends. For
 35#   example:
 36#
 37#      * this is a list with three paragraphs in
 38#        the first item. This is the first paragraph.
 39#
 40#        And this is the second paragraph.
 41#
 42#        1. This is an indented, numbered list.
 43#        2. This is the second item in that list
 44#
 45#        This is the third conventional paragraph in the
 46#        first list item.
 47#
 48#      * This is the second item in the original list
 49#
 50# * You can also construct labeled lists, sometimes called description
 51#   or definition lists. Do this by putting the label in square brackets
 52#   and indenting the list body:
 53#
 54#       [cat]  a small furry mammal
 55#              that seems to sleep a lot
 56#
 57#       [ant]  a little insect that is known
 58#              to enjoy picnics
 59#
 60#   A minor variation on labeled lists uses two colons to separate the
 61#   label from the list body:
 62#
 63#       cat::  a small furry mammal
 64#              that seems to sleep a lot
 65#
 66#       ant::  a little insect that is known
 67#              to enjoy picnics
 68#     
 69#   This latter style guarantees that the list bodies' left margins are
 70#   aligned: think of them as a two column table.
 71#
 72# * Any line that starts to the right of the current margin is treated
 73#   as verbatim text. This is useful for code listings. The example of a
 74#   list above is also verbatim text.
 75#
 76# * A line starting with an equals sign (=) is treated as a
 77#   heading. Level one headings have one equals sign, level two headings
 78#   have two,and so on.
 79#
 80# * A line starting with three or more hyphens (at the current indent)
 81#   generates a horizontal rule. THe more hyphens, the thicker the rule
 82#   (within reason, and if supported by the output device)
 83#
 84# * You can use markup within text (except verbatim) to change the
 85#   appearance of parts of that text. Out of the box, SimpleMarkup
 86#   supports word-based and general markup.
 87#
 88#   Word-based markup uses flag characters around individual words:
 89#
 90#   [\*word*]  displays word in a *bold* font
 91#   [\_word_]  displays word in an _emphasized_ font
 92#   [\+word+]  displays word in a +code+ font
 93#
 94#   General markup affects text between a start delimiter and and end
 95#   delimiter. Not surprisingly, these delimiters look like HTML markup.
 96#
 97#   [\<b>text...</b>]    displays word in a *bold* font
 98#   [\<em>text...</em>]  displays word in an _emphasized_ font
 99#   [\<i>text...</i>]    displays word in an _emphasized_ font
100#   [\<tt>text...</tt>]  displays word in a +code+ font
101#
102#   Unlike conventional Wiki markup, general markup can cross line
103#   boundaries. You can turn off the interpretation of markup by
104#   preceding the first character with a backslash, so \\\<b>bold
105#   text</b> and \\\*bold* produce \<b>bold text</b> and \*bold
106#   respectively.
107#
108# = Using SimpleMarkup
109#
110# For information on using SimpleMarkup programatically, 
111# see SM::SimpleMarkup.
112#
113# Author::   Dave Thomas,  dave@pragmaticprogrammer.com
114# Version::  0.0
115# License::  Ruby license
116
117
118
119require 'rdoc/markup/simple_markup/fragments'
120require 'rdoc/markup/simple_markup/lines.rb'
121
122module SM  #:nodoc:
123
124  # == Synopsis
125  #
126  # This code converts <tt>input_string</tt>, which is in the format
127  # described in markup/simple_markup.rb, to HTML. The conversion
128  # takes place in the +convert+ method, so you can use the same
129  # SimpleMarkup object to convert multiple input strings.
130  #
131  #   require 'rdoc/markup/simple_markup'
132  #   require 'rdoc/markup/simple_markup/to_html'
133  #
134  #   p = SM::SimpleMarkup.new
135  #   h = SM::ToHtml.new
136  #
137  #   puts p.convert(input_string, h)
138  #
139  # You can extend the SimpleMarkup parser to recognise new markup
140  # sequences, and to add special processing for text that matches a
141  # regular epxression. Here we make WikiWords significant to the parser,
142  # and also make the sequences {word} and \<no>text...</no> signify
143  # strike-through text. When then subclass the HTML output class to deal
144  # with these:
145  #
146  #   require 'rdoc/markup/simple_markup'
147  #   require 'rdoc/markup/simple_markup/to_html'
148  #
149  #   class WikiHtml < SM::ToHtml
150  #     def handle_special_WIKIWORD(special)
151  #       "<font color=red>" + special.text + "</font>"
152  #     end
153  #   end
154  #
155  #   p = SM::SimpleMarkup.new
156  #   p.add_word_pair("{", "}", :STRIKE)
157  #   p.add_html("no", :STRIKE)
158  #
159  #   p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
160  #
161  #   h = WikiHtml.new
162  #   h.add_tag(:STRIKE, "<strike>", "</strike>")
163  #
164  #   puts "<body>" + p.convert(ARGF.read, h) + "</body>"
165  #
166  # == Output Formatters
167  #
168  # _missing_
169  #
170  #
171
172  class SimpleMarkup
173
174    SPACE = ?\s
175
176    # List entries look like:
177    #  *       text
178    #  1.      text
179    #  [label] text
180    #  label:: text
181    #
182    # Flag it as a list entry, and
183    # work out the indent for subsequent lines
184
185    SIMPLE_LIST_RE = /^(
186                  (  \*          (?# bullet)
187                    |-           (?# bullet)
188                    |\d+\.       (?# numbered )
189                    |[A-Za-z]\.  (?# alphabetically numbered )
190                  )
191                  \s+
192                )\S/x
193
194    LABEL_LIST_RE = /^(
195                        (  \[.*?\]    (?# labeled  )
196                          |\S.*::     (?# note     )
197                        )(?:\s+|$)
198                      )/x
199
200
201    ##
202    # take a block of text and use various heuristics to determine
203    # it's structure (paragraphs, lists, and so on). Invoke an
204    # event handler as we identify significant chunks.
205    #
206
207    def initialize
208      @am = AttributeManager.new
209      @output = nil
210    end
211
212    ##
213    # Add to the sequences used to add formatting to an individual word 
214    # (such as *bold*). Matching entries will generate attibutes
215    # that the output formatters can recognize by their +name+
216
217    def add_word_pair(start, stop, name)
218      @am.add_word_pair(start, stop, name)
219    end
220
221    ##
222    # Add to the sequences recognized as general markup
223    #
224
225    def add_html(tag, name)
226      @am.add_html(tag, name)
227    end
228
229    ##
230    # Add to other inline sequences. For example, we could add
231    # WikiWords using something like:
232    #
233    #    parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
234    #
235    # Each wiki word will be presented to the output formatter 
236    # via the accept_special method
237    #
238
239    def add_special(pattern, name)
240      @am.add_special(pattern, name)
241    end
242
243
244    # We take a string, split it into lines, work out the type of
245    # each line, and from there deduce groups of lines (for example
246    # all lines in a paragraph). We then invoke the output formatter
247    # using a Visitor to display the result
248
249    def convert(str, op)
250      @lines = Lines.new(str.split(/\r?\n/).collect { |aLine| 
251                           Line.new(aLine) })
252      return "" if @lines.empty?
253      @lines.normalize
254      assign_types_to_lines
255      group = group_lines
256      # call the output formatter to handle the result
257      #      group.to_a.each {|i| p i}
258      group.accept(@am, op)
259    end
260
261
262    #######
263    private
264    #######
265
266
267    ##
268    # Look through the text at line indentation. We flag each line as being
269    # Blank, a paragraph, a list element, or verbatim text
270    #
271
272    def assign_types_to_lines(margin = 0, level = 0)
273
274      while line = @lines.next
275        if line.isBlank?
276          line.stamp(Line::BLANK, level)
277          next
278        end
279        
280        # if a line contains non-blanks before the margin, then it must belong
281        # to an outer level
282
283        text = line.text
284        
285        for i in 0...margin
286          if text[i] != SPACE
287            @lines.unget
288            return
289          end
290        end
291
292        active_line = text[margin..-1]
293
294        # Rules (horizontal lines) look like
295        #
296        #  ---   (three or more hyphens)
297        #
298        # The more hyphens, the thicker the rule
299        #
300
301        if /^(---+)\s*$/ =~ active_line
302          line.stamp(Line::RULE, level, $1.length-2)
303          next
304        end
305
306        # Then look for list entries. First the ones that have to have
307        # text following them (* xxx, - xxx, and dd. xxx)
308
309        if SIMPLE_LIST_RE =~ active_line
310
311          offset = margin + $1.length
312          prefix = $2
313          prefix_length = prefix.length
314
315          flag = case prefix
316                 when "*","-" then ListBase::BULLET
317                 when /^\d/   then ListBase::NUMBER
318                 when /^[A-Z]/ then ListBase::UPPERALPHA
319                 when /^[a-z]/ then ListBase::LOWERALPHA
320                 else raise "Invalid List Type: #{self.inspect}"
321                 end
322
323          line.stamp(Line::LIST, level+1, prefix, flag)
324          text[margin, prefix_length] = " " * prefix_length
325          assign_types_to_lines(offset, level + 1)
326          next
327        end
328
329
330        if LABEL_LIST_RE =~ active_line
331          offset = margin + $1.length
332          prefix = $2
333          prefix_length = prefix.length
334
335          next if handled_labeled_list(line, level, margin, offset, prefix)
336        end
337
338        # Headings look like
339        # = Main heading
340        # == Second level
341        # === Third
342        #
343        # Headings reset the level to 0
344
345        if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
346          prefix_length = $1.length
347          prefix_length = 6 if prefix_length > 6
348          line.stamp(Line::HEADING, 0, prefix_length)
349          line.strip_leading(margin + prefix_length)
350          next
351        end
352        
353        # If the character's a space, then we have verbatim text,
354        # otherwise 
355
356        if active_line[0] == SPACE
357          line.strip_leading(margin) if margin > 0
358          line.stamp(Line::VERBATIM, level)
359        else
360          line.stamp(Line::PARAGRAPH, level)
361        end
362      end
363    end
364
365    # Handle labeled list entries, We have a special case
366    # to deal with. Because the labels can be long, they force
367    # the remaining block of text over the to right:
368    #
369    # this is a long label that I wrote:: and here is the
370    #                                     block of text with
371    #                                     a silly margin
372    #
373    # So we allow the special case. If the label is followed
374    # by nothing, and if the following line is indented, then
375    # we take the indent of that line as the new margin
376    #
377    # this is a long label that I wrote::
378    #     here is a more reasonably indented block which
379    #     will ab attached to the label.
380    #
381    
382    def handled_labeled_list(line, level, margin, offset, prefix)
383      prefix_length = prefix.length
384      text = line.text
385      flag = nil
386      case prefix
387      when /^\[/
388        flag = ListBase::LABELED
389        prefix = prefix[1, prefix.length-2]
390      when /:$/
391        flag = ListBase::NOTE
392        prefix.chop!
393      else raise "Invalid List Type: #{self.inspect}"
394      end
395      
396      # body is on the next line
397      
398      if text.length <= offset
399        original_line = line
400        line = @lines.next
401        return(false) unless line
402        text = line.text
403        
404        for i in 0..margin
405          if text[i] != SPACE
406            @lines.unget
407            return false
408          end
409        end
410        i = margin
411        i += 1 while text[i] == SPACE
412        if i >= text.length
413          @lines.unget
414          return false
415        else
416          offset = i
417          prefix_length = 0
418          @lines.delete(original_line)
419        end
420      end
421      
422      line.stamp(Line::LIST, level+1, prefix, flag)
423      text[margin, prefix_length] = " " * prefix_length
424      assign_types_to_lines(offset, level + 1)
425      return true
426    end
427
428    # Return a block consisting of fragments which are
429    # paragraphs, list entries or verbatim text. We merge consecutive
430    # lines of the same type and level together. We are also slightly
431    # tricky with lists: the lines following a list introduction
432    # look like paragraph lines at the next level, and we remap them
433    # into list entries instead
434
435    def group_lines
436      @lines.rewind
437
438      inList = false
439      wantedType = wantedLevel = nil
440
441      block = LineCollection.new
442      group = nil
443
444      while line = @lines.next
445        if line.level == wantedLevel and line.type == wantedType
446          group.add_text(line.text)
447        else
448          group = block.fragment_for(line)
449          block.add(group)
450          if line.type == Line::LIST
451            wantedType = Line::PARAGRAPH
452          else
453            wantedType = line.type
454          end
455          wantedLevel = line.type == Line::HEADING ? line.param : line.level
456        end
457      end
458
459      block.normalize
460      block
461    end
462
463    ## for debugging, we allow access to our line contents as text
464    def content
465      @lines.as_text
466    end
467    public :content
468
469    ## for debugging, return the list of line types
470    def get_line_types
471      @lines.line_types
472    end
473    public :get_line_types
474  end
475
476end