/tools/Ruby/lib/ruby/1.8/rdoc/markup/simple_markup.rb

http://github.com/agross/netopenspace · Ruby · 476 lines · 169 code · 61 blank · 246 comment · 32 complexity · 5f659ce8f792455a9f7139810db989b0 MD5 · raw file

  1. # = Introduction
  2. #
  3. # SimpleMarkup parses plain text documents and attempts to decompose
  4. # them into their constituent parts. Some of these parts are high-level:
  5. # paragraphs, chunks of verbatim text, list entries and the like. Other
  6. # parts happen at the character level: a piece of bold text, a word in
  7. # code font. This markup is similar in spirit to that used on WikiWiki
  8. # webs, where folks create web pages using a simple set of formatting
  9. # rules.
  10. #
  11. # SimpleMarkup itself does no output formatting: this is left to a
  12. # different set of classes.
  13. #
  14. # SimpleMarkup is extendable at runtime: you can add new markup
  15. # elements to be recognised in the documents that SimpleMarkup parses.
  16. #
  17. # SimpleMarkup is intended to be the basis for a family of tools which
  18. # share the common requirement that simple, plain-text should be
  19. # rendered in a variety of different output formats and media. It is
  20. # envisaged that SimpleMarkup could be the basis for formating RDoc
  21. # style comment blocks, Wiki entries, and online FAQs.
  22. #
  23. # = Basic Formatting
  24. #
  25. # * SimpleMarkup looks for a document's natural left margin. This is
  26. # used as the initial margin for the document.
  27. #
  28. # * Consecutive lines starting at this margin are considered to be a
  29. # paragraph.
  30. #
  31. # * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
  32. # taken to be the start of a list. The margin in increased to be the
  33. # first non-space following the list start flag. Subsequent lines
  34. # should be indented to this new margin until the list ends. For
  35. # example:
  36. #
  37. # * this is a list with three paragraphs in
  38. # the first item. This is the first paragraph.
  39. #
  40. # And this is the second paragraph.
  41. #
  42. # 1. This is an indented, numbered list.
  43. # 2. This is the second item in that list
  44. #
  45. # This is the third conventional paragraph in the
  46. # first list item.
  47. #
  48. # * This is the second item in the original list
  49. #
  50. # * You can also construct labeled lists, sometimes called description
  51. # or definition lists. Do this by putting the label in square brackets
  52. # and indenting the list body:
  53. #
  54. # [cat] a small furry mammal
  55. # that seems to sleep a lot
  56. #
  57. # [ant] a little insect that is known
  58. # to enjoy picnics
  59. #
  60. # A minor variation on labeled lists uses two colons to separate the
  61. # label from the list body:
  62. #
  63. # cat:: a small furry mammal
  64. # that seems to sleep a lot
  65. #
  66. # ant:: a little insect that is known
  67. # to enjoy picnics
  68. #
  69. # This latter style guarantees that the list bodies' left margins are
  70. # aligned: think of them as a two column table.
  71. #
  72. # * Any line that starts to the right of the current margin is treated
  73. # as verbatim text. This is useful for code listings. The example of a
  74. # list above is also verbatim text.
  75. #
  76. # * A line starting with an equals sign (=) is treated as a
  77. # heading. Level one headings have one equals sign, level two headings
  78. # have two,and so on.
  79. #
  80. # * A line starting with three or more hyphens (at the current indent)
  81. # generates a horizontal rule. THe more hyphens, the thicker the rule
  82. # (within reason, and if supported by the output device)
  83. #
  84. # * You can use markup within text (except verbatim) to change the
  85. # appearance of parts of that text. Out of the box, SimpleMarkup
  86. # supports word-based and general markup.
  87. #
  88. # Word-based markup uses flag characters around individual words:
  89. #
  90. # [\*word*] displays word in a *bold* font
  91. # [\_word_] displays word in an _emphasized_ font
  92. # [\+word+] displays word in a +code+ font
  93. #
  94. # General markup affects text between a start delimiter and and end
  95. # delimiter. Not surprisingly, these delimiters look like HTML markup.
  96. #
  97. # [\<b>text...</b>] displays word in a *bold* font
  98. # [\<em>text...</em>] displays word in an _emphasized_ font
  99. # [\<i>text...</i>] displays word in an _emphasized_ font
  100. # [\<tt>text...</tt>] displays word in a +code+ font
  101. #
  102. # Unlike conventional Wiki markup, general markup can cross line
  103. # boundaries. You can turn off the interpretation of markup by
  104. # preceding the first character with a backslash, so \\\<b>bold
  105. # text</b> and \\\*bold* produce \<b>bold text</b> and \*bold
  106. # respectively.
  107. #
  108. # = Using SimpleMarkup
  109. #
  110. # For information on using SimpleMarkup programatically,
  111. # see SM::SimpleMarkup.
  112. #
  113. # Author:: Dave Thomas, dave@pragmaticprogrammer.com
  114. # Version:: 0.0
  115. # License:: Ruby license
  116. require 'rdoc/markup/simple_markup/fragments'
  117. require 'rdoc/markup/simple_markup/lines.rb'
  118. module SM #:nodoc:
  119. # == Synopsis
  120. #
  121. # This code converts <tt>input_string</tt>, which is in the format
  122. # described in markup/simple_markup.rb, to HTML. The conversion
  123. # takes place in the +convert+ method, so you can use the same
  124. # SimpleMarkup object to convert multiple input strings.
  125. #
  126. # require 'rdoc/markup/simple_markup'
  127. # require 'rdoc/markup/simple_markup/to_html'
  128. #
  129. # p = SM::SimpleMarkup.new
  130. # h = SM::ToHtml.new
  131. #
  132. # puts p.convert(input_string, h)
  133. #
  134. # You can extend the SimpleMarkup parser to recognise new markup
  135. # sequences, and to add special processing for text that matches a
  136. # regular epxression. Here we make WikiWords significant to the parser,
  137. # and also make the sequences {word} and \<no>text...</no> signify
  138. # strike-through text. When then subclass the HTML output class to deal
  139. # with these:
  140. #
  141. # require 'rdoc/markup/simple_markup'
  142. # require 'rdoc/markup/simple_markup/to_html'
  143. #
  144. # class WikiHtml < SM::ToHtml
  145. # def handle_special_WIKIWORD(special)
  146. # "<font color=red>" + special.text + "</font>"
  147. # end
  148. # end
  149. #
  150. # p = SM::SimpleMarkup.new
  151. # p.add_word_pair("{", "}", :STRIKE)
  152. # p.add_html("no", :STRIKE)
  153. #
  154. # p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
  155. #
  156. # h = WikiHtml.new
  157. # h.add_tag(:STRIKE, "<strike>", "</strike>")
  158. #
  159. # puts "<body>" + p.convert(ARGF.read, h) + "</body>"
  160. #
  161. # == Output Formatters
  162. #
  163. # _missing_
  164. #
  165. #
  166. class SimpleMarkup
  167. SPACE = ?\s
  168. # List entries look like:
  169. # * text
  170. # 1. text
  171. # [label] text
  172. # label:: text
  173. #
  174. # Flag it as a list entry, and
  175. # work out the indent for subsequent lines
  176. SIMPLE_LIST_RE = /^(
  177. ( \* (?# bullet)
  178. |- (?# bullet)
  179. |\d+\. (?# numbered )
  180. |[A-Za-z]\. (?# alphabetically numbered )
  181. )
  182. \s+
  183. )\S/x
  184. LABEL_LIST_RE = /^(
  185. ( \[.*?\] (?# labeled )
  186. |\S.*:: (?# note )
  187. )(?:\s+|$)
  188. )/x
  189. ##
  190. # take a block of text and use various heuristics to determine
  191. # it's structure (paragraphs, lists, and so on). Invoke an
  192. # event handler as we identify significant chunks.
  193. #
  194. def initialize
  195. @am = AttributeManager.new
  196. @output = nil
  197. end
  198. ##
  199. # Add to the sequences used to add formatting to an individual word
  200. # (such as *bold*). Matching entries will generate attibutes
  201. # that the output formatters can recognize by their +name+
  202. def add_word_pair(start, stop, name)
  203. @am.add_word_pair(start, stop, name)
  204. end
  205. ##
  206. # Add to the sequences recognized as general markup
  207. #
  208. def add_html(tag, name)
  209. @am.add_html(tag, name)
  210. end
  211. ##
  212. # Add to other inline sequences. For example, we could add
  213. # WikiWords using something like:
  214. #
  215. # parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
  216. #
  217. # Each wiki word will be presented to the output formatter
  218. # via the accept_special method
  219. #
  220. def add_special(pattern, name)
  221. @am.add_special(pattern, name)
  222. end
  223. # We take a string, split it into lines, work out the type of
  224. # each line, and from there deduce groups of lines (for example
  225. # all lines in a paragraph). We then invoke the output formatter
  226. # using a Visitor to display the result
  227. def convert(str, op)
  228. @lines = Lines.new(str.split(/\r?\n/).collect { |aLine|
  229. Line.new(aLine) })
  230. return "" if @lines.empty?
  231. @lines.normalize
  232. assign_types_to_lines
  233. group = group_lines
  234. # call the output formatter to handle the result
  235. # group.to_a.each {|i| p i}
  236. group.accept(@am, op)
  237. end
  238. #######
  239. private
  240. #######
  241. ##
  242. # Look through the text at line indentation. We flag each line as being
  243. # Blank, a paragraph, a list element, or verbatim text
  244. #
  245. def assign_types_to_lines(margin = 0, level = 0)
  246. while line = @lines.next
  247. if line.isBlank?
  248. line.stamp(Line::BLANK, level)
  249. next
  250. end
  251. # if a line contains non-blanks before the margin, then it must belong
  252. # to an outer level
  253. text = line.text
  254. for i in 0...margin
  255. if text[i] != SPACE
  256. @lines.unget
  257. return
  258. end
  259. end
  260. active_line = text[margin..-1]
  261. # Rules (horizontal lines) look like
  262. #
  263. # --- (three or more hyphens)
  264. #
  265. # The more hyphens, the thicker the rule
  266. #
  267. if /^(---+)\s*$/ =~ active_line
  268. line.stamp(Line::RULE, level, $1.length-2)
  269. next
  270. end
  271. # Then look for list entries. First the ones that have to have
  272. # text following them (* xxx, - xxx, and dd. xxx)
  273. if SIMPLE_LIST_RE =~ active_line
  274. offset = margin + $1.length
  275. prefix = $2
  276. prefix_length = prefix.length
  277. flag = case prefix
  278. when "*","-" then ListBase::BULLET
  279. when /^\d/ then ListBase::NUMBER
  280. when /^[A-Z]/ then ListBase::UPPERALPHA
  281. when /^[a-z]/ then ListBase::LOWERALPHA
  282. else raise "Invalid List Type: #{self.inspect}"
  283. end
  284. line.stamp(Line::LIST, level+1, prefix, flag)
  285. text[margin, prefix_length] = " " * prefix_length
  286. assign_types_to_lines(offset, level + 1)
  287. next
  288. end
  289. if LABEL_LIST_RE =~ active_line
  290. offset = margin + $1.length
  291. prefix = $2
  292. prefix_length = prefix.length
  293. next if handled_labeled_list(line, level, margin, offset, prefix)
  294. end
  295. # Headings look like
  296. # = Main heading
  297. # == Second level
  298. # === Third
  299. #
  300. # Headings reset the level to 0
  301. if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
  302. prefix_length = $1.length
  303. prefix_length = 6 if prefix_length > 6
  304. line.stamp(Line::HEADING, 0, prefix_length)
  305. line.strip_leading(margin + prefix_length)
  306. next
  307. end
  308. # If the character's a space, then we have verbatim text,
  309. # otherwise
  310. if active_line[0] == SPACE
  311. line.strip_leading(margin) if margin > 0
  312. line.stamp(Line::VERBATIM, level)
  313. else
  314. line.stamp(Line::PARAGRAPH, level)
  315. end
  316. end
  317. end
  318. # Handle labeled list entries, We have a special case
  319. # to deal with. Because the labels can be long, they force
  320. # the remaining block of text over the to right:
  321. #
  322. # this is a long label that I wrote:: and here is the
  323. # block of text with
  324. # a silly margin
  325. #
  326. # So we allow the special case. If the label is followed
  327. # by nothing, and if the following line is indented, then
  328. # we take the indent of that line as the new margin
  329. #
  330. # this is a long label that I wrote::
  331. # here is a more reasonably indented block which
  332. # will ab attached to the label.
  333. #
  334. def handled_labeled_list(line, level, margin, offset, prefix)
  335. prefix_length = prefix.length
  336. text = line.text
  337. flag = nil
  338. case prefix
  339. when /^\[/
  340. flag = ListBase::LABELED
  341. prefix = prefix[1, prefix.length-2]
  342. when /:$/
  343. flag = ListBase::NOTE
  344. prefix.chop!
  345. else raise "Invalid List Type: #{self.inspect}"
  346. end
  347. # body is on the next line
  348. if text.length <= offset
  349. original_line = line
  350. line = @lines.next
  351. return(false) unless line
  352. text = line.text
  353. for i in 0..margin
  354. if text[i] != SPACE
  355. @lines.unget
  356. return false
  357. end
  358. end
  359. i = margin
  360. i += 1 while text[i] == SPACE
  361. if i >= text.length
  362. @lines.unget
  363. return false
  364. else
  365. offset = i
  366. prefix_length = 0
  367. @lines.delete(original_line)
  368. end
  369. end
  370. line.stamp(Line::LIST, level+1, prefix, flag)
  371. text[margin, prefix_length] = " " * prefix_length
  372. assign_types_to_lines(offset, level + 1)
  373. return true
  374. end
  375. # Return a block consisting of fragments which are
  376. # paragraphs, list entries or verbatim text. We merge consecutive
  377. # lines of the same type and level together. We are also slightly
  378. # tricky with lists: the lines following a list introduction
  379. # look like paragraph lines at the next level, and we remap them
  380. # into list entries instead
  381. def group_lines
  382. @lines.rewind
  383. inList = false
  384. wantedType = wantedLevel = nil
  385. block = LineCollection.new
  386. group = nil
  387. while line = @lines.next
  388. if line.level == wantedLevel and line.type == wantedType
  389. group.add_text(line.text)
  390. else
  391. group = block.fragment_for(line)
  392. block.add(group)
  393. if line.type == Line::LIST
  394. wantedType = Line::PARAGRAPH
  395. else
  396. wantedType = line.type
  397. end
  398. wantedLevel = line.type == Line::HEADING ? line.param : line.level
  399. end
  400. end
  401. block.normalize
  402. block
  403. end
  404. ## for debugging, we allow access to our line contents as text
  405. def content
  406. @lines.as_text
  407. end
  408. public :content
  409. ## for debugging, return the list of line types
  410. def get_line_types
  411. @lines.line_types
  412. end
  413. public :get_line_types
  414. end
  415. end