PageRenderTime 75ms CodeModel.GetById 22ms RepoModel.GetById 0ms app.codeStats 0ms

/documentor/libraries/docutils-0.9.1-py3.2/docutils/parsers/rst/states.py

https://github.com/tictactatic/Superdesk
Python | 3049 lines | 2953 code | 21 blank | 75 comment | 48 complexity | 14748e446acf36c12334059de74259fe MD5 | raw file
Possible License(s): BSD-3-Clause, GPL-3.0, GPL-2.0
  1. # $Id: states.py 7363 2012-02-20 21:31:48Z goodger $
  2. # Author: David Goodger <goodger@python.org>
  3. # Copyright: This module has been placed in the public domain.
  4. """
  5. This is the ``docutils.parsers.rst.states`` module, the core of
  6. the reStructuredText parser. It defines the following:
  7. :Classes:
  8. - `RSTStateMachine`: reStructuredText parser's entry point.
  9. - `NestedStateMachine`: recursive StateMachine.
  10. - `RSTState`: reStructuredText State superclass.
  11. - `Inliner`: For parsing inline markup.
  12. - `Body`: Generic classifier of the first line of a block.
  13. - `SpecializedBody`: Superclass for compound element members.
  14. - `BulletList`: Second and subsequent bullet_list list_items
  15. - `DefinitionList`: Second+ definition_list_items.
  16. - `EnumeratedList`: Second+ enumerated_list list_items.
  17. - `FieldList`: Second+ fields.
  18. - `OptionList`: Second+ option_list_items.
  19. - `RFC2822List`: Second+ RFC2822-style fields.
  20. - `ExtensionOptions`: Parses directive option fields.
  21. - `Explicit`: Second+ explicit markup constructs.
  22. - `SubstitutionDef`: For embedded directives in substitution definitions.
  23. - `Text`: Classifier of second line of a text block.
  24. - `SpecializedText`: Superclass for continuation lines of Text-variants.
  25. - `Definition`: Second line of potential definition_list_item.
  26. - `Line`: Second line of overlined section title or transition marker.
  27. - `Struct`: An auxiliary collection class.
  28. :Exception classes:
  29. - `MarkupError`
  30. - `ParserError`
  31. - `MarkupMismatch`
  32. :Functions:
  33. - `escape2null()`: Return a string, escape-backslashes converted to nulls.
  34. - `unescape()`: Return a string, nulls removed or restored to backslashes.
  35. :Attributes:
  36. - `state_classes`: set of State classes used with `RSTStateMachine`.
  37. Parser Overview
  38. ===============
  39. The reStructuredText parser is implemented as a recursive state machine,
  40. examining its input one line at a time. To understand how the parser works,
  41. please first become familiar with the `docutils.statemachine` module. In the
  42. description below, references are made to classes defined in this module;
  43. please see the individual classes for details.
  44. Parsing proceeds as follows:
  45. 1. The state machine examines each line of input, checking each of the
  46. transition patterns of the state `Body`, in order, looking for a match.
  47. The implicit transitions (blank lines and indentation) are checked before
  48. any others. The 'text' transition is a catch-all (matches anything).
  49. 2. The method associated with the matched transition pattern is called.
  50. A. Some transition methods are self-contained, appending elements to the
  51. document tree (`Body.doctest` parses a doctest block). The parser's
  52. current line index is advanced to the end of the element, and parsing
  53. continues with step 1.
  54. B. Other transition methods trigger the creation of a nested state machine,
  55. whose job is to parse a compound construct ('indent' does a block quote,
  56. 'bullet' does a bullet list, 'overline' does a section [first checking
  57. for a valid section header], etc.).
  58. - In the case of lists and explicit markup, a one-off state machine is
  59. created and run to parse contents of the first item.
  60. - A new state machine is created and its initial state is set to the
  61. appropriate specialized state (`BulletList` in the case of the
  62. 'bullet' transition; see `SpecializedBody` for more detail). This
  63. state machine is run to parse the compound element (or series of
  64. explicit markup elements), and returns as soon as a non-member element
  65. is encountered. For example, the `BulletList` state machine ends as
  66. soon as it encounters an element which is not a list item of that
  67. bullet list. The optional omission of inter-element blank lines is
  68. enabled by this nested state machine.
  69. - The current line index is advanced to the end of the elements parsed,
  70. and parsing continues with step 1.
  71. C. The result of the 'text' transition depends on the next line of text.
  72. The current state is changed to `Text`, under which the second line is
  73. examined. If the second line is:
  74. - Indented: The element is a definition list item, and parsing proceeds
  75. similarly to step 2.B, using the `DefinitionList` state.
  76. - A line of uniform punctuation characters: The element is a section
  77. header; again, parsing proceeds as in step 2.B, and `Body` is still
  78. used.
  79. - Anything else: The element is a paragraph, which is examined for
  80. inline markup and appended to the parent element. Processing
  81. continues with step 1.
  82. """
  83. __docformat__ = 'reStructuredText'
  84. import sys
  85. import re
  86. try:
  87. import roman
  88. except ImportError:
  89. import docutils.utils.roman as roman
  90. from types import FunctionType, MethodType
  91. from docutils import nodes, statemachine, utils, urischemes
  92. from docutils import ApplicationError, DataError
  93. from docutils.statemachine import StateMachineWS, StateWS
  94. from docutils.nodes import fully_normalize_name as normalize_name
  95. from docutils.nodes import whitespace_normalize_name
  96. import docutils.parsers.rst
  97. from docutils.parsers.rst import directives, languages, tableparser, roles
  98. from docutils.parsers.rst.languages import en as _fallback_language_module
  99. from docutils.utils import escape2null, unescape, column_width
  100. from docutils.utils import punctuation_chars
  101. class MarkupError(DataError): pass
  102. class UnknownInterpretedRoleError(DataError): pass
  103. class InterpretedRoleNotImplementedError(DataError): pass
  104. class ParserError(ApplicationError): pass
  105. class MarkupMismatch(Exception): pass
  106. class Struct:
  107. """Stores data attributes for dotted-attribute access."""
  108. def __init__(self, **keywordargs):
  109. self.__dict__.update(keywordargs)
  110. class RSTStateMachine(StateMachineWS):
  111. """
  112. reStructuredText's master StateMachine.
  113. The entry point to reStructuredText parsing is the `run()` method.
  114. """
  115. def run(self, input_lines, document, input_offset=0, match_titles=True,
  116. inliner=None):
  117. """
  118. Parse `input_lines` and modify the `document` node in place.
  119. Extend `StateMachineWS.run()`: set up parse-global data and
  120. run the StateMachine.
  121. """
  122. self.language = languages.get_language(
  123. document.settings.language_code)
  124. self.match_titles = match_titles
  125. if inliner is None:
  126. inliner = Inliner()
  127. inliner.init_customizations(document.settings)
  128. self.memo = Struct(document=document,
  129. reporter=document.reporter,
  130. language=self.language,
  131. title_styles=[],
  132. section_level=0,
  133. section_bubble_up_kludge=False,
  134. inliner=inliner)
  135. self.document = document
  136. self.attach_observer(document.note_source)
  137. self.reporter = self.memo.reporter
  138. self.node = document
  139. results = StateMachineWS.run(self, input_lines, input_offset,
  140. input_source=document['source'])
  141. assert results == [], 'RSTStateMachine.run() results should be empty!'
  142. self.node = self.memo = None # remove unneeded references
  143. class NestedStateMachine(StateMachineWS):
  144. """
  145. StateMachine run from within other StateMachine runs, to parse nested
  146. document structures.
  147. """
  148. def run(self, input_lines, input_offset, memo, node, match_titles=True):
  149. """
  150. Parse `input_lines` and populate a `docutils.nodes.document` instance.
  151. Extend `StateMachineWS.run()`: set up document-wide data.
  152. """
  153. self.match_titles = match_titles
  154. self.memo = memo
  155. self.document = memo.document
  156. self.attach_observer(self.document.note_source)
  157. self.reporter = memo.reporter
  158. self.language = memo.language
  159. self.node = node
  160. results = StateMachineWS.run(self, input_lines, input_offset)
  161. assert results == [], ('NestedStateMachine.run() results should be '
  162. 'empty!')
  163. return results
  164. class RSTState(StateWS):
  165. """
  166. reStructuredText State superclass.
  167. Contains methods used by all State subclasses.
  168. """
  169. nested_sm = NestedStateMachine
  170. nested_sm_cache = []
  171. def __init__(self, state_machine, debug=False):
  172. self.nested_sm_kwargs = {'state_classes': state_classes,
  173. 'initial_state': 'Body'}
  174. StateWS.__init__(self, state_machine, debug)
  175. def runtime_init(self):
  176. StateWS.runtime_init(self)
  177. memo = self.state_machine.memo
  178. self.memo = memo
  179. self.reporter = memo.reporter
  180. self.inliner = memo.inliner
  181. self.document = memo.document
  182. self.parent = self.state_machine.node
  183. # enable the reporter to determine source and source-line
  184. if not hasattr(self.reporter, 'get_source_and_line'):
  185. self.reporter.get_source_and_line = self.state_machine.get_source_and_line
  186. # print "adding get_source_and_line to reporter", self.state_machine.input_offset
  187. def goto_line(self, abs_line_offset):
  188. """
  189. Jump to input line `abs_line_offset`, ignoring jumps past the end.
  190. """
  191. try:
  192. self.state_machine.goto_line(abs_line_offset)
  193. except EOFError:
  194. pass
  195. def no_match(self, context, transitions):
  196. """
  197. Override `StateWS.no_match` to generate a system message.
  198. This code should never be run.
  199. """
  200. self.reporter.severe(
  201. 'Internal error: no transition pattern match. State: "%s"; '
  202. 'transitions: %s; context: %s; current line: %r.'
  203. % (self.__class__.__name__, transitions, context,
  204. self.state_machine.line))
  205. return context, None, []
  206. def bof(self, context):
  207. """Called at beginning of file."""
  208. return [], []
  209. def nested_parse(self, block, input_offset, node, match_titles=False,
  210. state_machine_class=None, state_machine_kwargs=None):
  211. """
  212. Create a new StateMachine rooted at `node` and run it over the input
  213. `block`.
  214. """
  215. use_default = 0
  216. if state_machine_class is None:
  217. state_machine_class = self.nested_sm
  218. use_default += 1
  219. if state_machine_kwargs is None:
  220. state_machine_kwargs = self.nested_sm_kwargs
  221. use_default += 1
  222. block_length = len(block)
  223. state_machine = None
  224. if use_default == 2:
  225. try:
  226. state_machine = self.nested_sm_cache.pop()
  227. except IndexError:
  228. pass
  229. if not state_machine:
  230. state_machine = state_machine_class(debug=self.debug,
  231. **state_machine_kwargs)
  232. state_machine.run(block, input_offset, memo=self.memo,
  233. node=node, match_titles=match_titles)
  234. if use_default == 2:
  235. self.nested_sm_cache.append(state_machine)
  236. else:
  237. state_machine.unlink()
  238. new_offset = state_machine.abs_line_offset()
  239. # No `block.parent` implies disconnected -- lines aren't in sync:
  240. if block.parent and (len(block) - block_length) != 0:
  241. # Adjustment for block if modified in nested parse:
  242. self.state_machine.next_line(len(block) - block_length)
  243. return new_offset
  244. def nested_list_parse(self, block, input_offset, node, initial_state,
  245. blank_finish,
  246. blank_finish_state=None,
  247. extra_settings={},
  248. match_titles=False,
  249. state_machine_class=None,
  250. state_machine_kwargs=None):
  251. """
  252. Create a new StateMachine rooted at `node` and run it over the input
  253. `block`. Also keep track of optional intermediate blank lines and the
  254. required final one.
  255. """
  256. if state_machine_class is None:
  257. state_machine_class = self.nested_sm
  258. if state_machine_kwargs is None:
  259. state_machine_kwargs = self.nested_sm_kwargs.copy()
  260. state_machine_kwargs['initial_state'] = initial_state
  261. state_machine = state_machine_class(debug=self.debug,
  262. **state_machine_kwargs)
  263. if blank_finish_state is None:
  264. blank_finish_state = initial_state
  265. state_machine.states[blank_finish_state].blank_finish = blank_finish
  266. for key, value in list(extra_settings.items()):
  267. setattr(state_machine.states[initial_state], key, value)
  268. state_machine.run(block, input_offset, memo=self.memo,
  269. node=node, match_titles=match_titles)
  270. blank_finish = state_machine.states[blank_finish_state].blank_finish
  271. state_machine.unlink()
  272. return state_machine.abs_line_offset(), blank_finish
  273. def section(self, title, source, style, lineno, messages):
  274. """Check for a valid subsection and create one if it checks out."""
  275. if self.check_subsection(source, style, lineno):
  276. self.new_subsection(title, lineno, messages)
  277. def check_subsection(self, source, style, lineno):
  278. """
  279. Check for a valid subsection header. Return 1 (true) or None (false).
  280. When a new section is reached that isn't a subsection of the current
  281. section, back up the line count (use ``previous_line(-x)``), then
  282. ``raise EOFError``. The current StateMachine will finish, then the
  283. calling StateMachine can re-examine the title. This will work its way
  284. back up the calling chain until the correct section level isreached.
  285. @@@ Alternative: Evaluate the title, store the title info & level, and
  286. back up the chain until that level is reached. Store in memo? Or
  287. return in results?
  288. :Exception: `EOFError` when a sibling or supersection encountered.
  289. """
  290. memo = self.memo
  291. title_styles = memo.title_styles
  292. mylevel = memo.section_level
  293. try: # check for existing title style
  294. level = title_styles.index(style) + 1
  295. except ValueError: # new title style
  296. if len(title_styles) == memo.section_level: # new subsection
  297. title_styles.append(style)
  298. return 1
  299. else: # not at lowest level
  300. self.parent += self.title_inconsistent(source, lineno)
  301. return None
  302. if level <= mylevel: # sibling or supersection
  303. memo.section_level = level # bubble up to parent section
  304. if len(style) == 2:
  305. memo.section_bubble_up_kludge = True
  306. # back up 2 lines for underline title, 3 for overline title
  307. self.state_machine.previous_line(len(style) + 1)
  308. raise EOFError # let parent section re-evaluate
  309. if level == mylevel + 1: # immediate subsection
  310. return 1
  311. else: # invalid subsection
  312. self.parent += self.title_inconsistent(source, lineno)
  313. return None
  314. def title_inconsistent(self, sourcetext, lineno):
  315. error = self.reporter.severe(
  316. 'Title level inconsistent:', nodes.literal_block('', sourcetext),
  317. line=lineno)
  318. return error
  319. def new_subsection(self, title, lineno, messages):
  320. """Append new subsection to document tree. On return, check level."""
  321. memo = self.memo
  322. mylevel = memo.section_level
  323. memo.section_level += 1
  324. section_node = nodes.section()
  325. self.parent += section_node
  326. textnodes, title_messages = self.inline_text(title, lineno)
  327. titlenode = nodes.title(title, '', *textnodes)
  328. name = normalize_name(titlenode.astext())
  329. section_node['names'].append(name)
  330. section_node += titlenode
  331. section_node += messages
  332. section_node += title_messages
  333. self.document.note_implicit_target(section_node, section_node)
  334. offset = self.state_machine.line_offset + 1
  335. absoffset = self.state_machine.abs_line_offset() + 1
  336. newabsoffset = self.nested_parse(
  337. self.state_machine.input_lines[offset:], input_offset=absoffset,
  338. node=section_node, match_titles=True)
  339. self.goto_line(newabsoffset)
  340. if memo.section_level <= mylevel: # can't handle next section?
  341. raise EOFError # bubble up to supersection
  342. # reset section_level; next pass will detect it properly
  343. memo.section_level = mylevel
  344. def paragraph(self, lines, lineno):
  345. """
  346. Return a list (paragraph & messages) & a boolean: literal_block next?
  347. """
  348. data = '\n'.join(lines).rstrip()
  349. if re.search(r'(?<!\\)(\\\\)*::$', data):
  350. if len(data) == 2:
  351. return [], 1
  352. elif data[-3] in ' \n':
  353. text = data[:-3].rstrip()
  354. else:
  355. text = data[:-1]
  356. literalnext = 1
  357. else:
  358. text = data
  359. literalnext = 0
  360. textnodes, messages = self.inline_text(text, lineno)
  361. p = nodes.paragraph(data, '', *textnodes)
  362. p.source, p.line = self.state_machine.get_source_and_line(lineno)
  363. return [p] + messages, literalnext
  364. def inline_text(self, text, lineno):
  365. """
  366. Return 2 lists: nodes (text and inline elements), and system_messages.
  367. """
  368. return self.inliner.parse(text, lineno, self.memo, self.parent)
  369. def unindent_warning(self, node_name):
  370. # the actual problem is one line below the current line
  371. lineno = self.state_machine.abs_line_number()+1
  372. return self.reporter.warning('%s ends without a blank line; '
  373. 'unexpected unindent.' % node_name,
  374. line=lineno)
  375. def build_regexp(definition, compile=True):
  376. """
  377. Build, compile and return a regular expression based on `definition`.
  378. :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts),
  379. where "parts" is a list of regular expressions and/or regular
  380. expression definitions to be joined into an or-group.
  381. """
  382. name, prefix, suffix, parts = definition
  383. part_strings = []
  384. for part in parts:
  385. if type(part) is tuple:
  386. part_strings.append(build_regexp(part, None))
  387. else:
  388. part_strings.append(part)
  389. or_group = '|'.join(part_strings)
  390. regexp = '%(prefix)s(?P<%(name)s>%(or_group)s)%(suffix)s' % locals()
  391. if compile:
  392. return re.compile(regexp, re.UNICODE)
  393. else:
  394. return regexp
  395. class Inliner:
  396. """
  397. Parse inline markup; call the `parse()` method.
  398. """
  399. def __init__(self):
  400. self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
  401. """List of (pattern, bound method) tuples, used by
  402. `self.implicit_inline`."""
  403. def init_customizations(self, settings):
  404. """Setting-based customizations; run when parsing begins."""
  405. if settings.pep_references:
  406. self.implicit_dispatch.append((self.patterns.pep,
  407. self.pep_reference))
  408. if settings.rfc_references:
  409. self.implicit_dispatch.append((self.patterns.rfc,
  410. self.rfc_reference))
  411. def parse(self, text, lineno, memo, parent):
  412. # Needs to be refactored for nested inline markup.
  413. # Add nested_parse() method?
  414. """
  415. Return 2 lists: nodes (text and inline elements), and system_messages.
  416. Using `self.patterns.initial`, a pattern which matches start-strings
  417. (emphasis, strong, interpreted, phrase reference, literal,
  418. substitution reference, and inline target) and complete constructs
  419. (simple reference, footnote reference), search for a candidate. When
  420. one is found, check for validity (e.g., not a quoted '*' character).
  421. If valid, search for the corresponding end string if applicable, and
  422. check it for validity. If not found or invalid, generate a warning
  423. and ignore the start-string. Implicit inline markup (e.g. standalone
  424. URIs) is found last.
  425. """
  426. self.reporter = memo.reporter
  427. self.document = memo.document
  428. self.language = memo.language
  429. self.parent = parent
  430. pattern_search = self.patterns.initial.search
  431. dispatch = self.dispatch
  432. remaining = escape2null(text)
  433. processed = []
  434. unprocessed = []
  435. messages = []
  436. while remaining:
  437. match = pattern_search(remaining)
  438. if match:
  439. groups = match.groupdict()
  440. method = dispatch[groups['start'] or groups['backquote']
  441. or groups['refend'] or groups['fnend']]
  442. before, inlines, remaining, sysmessages = method(self, match,
  443. lineno)
  444. unprocessed.append(before)
  445. messages += sysmessages
  446. if inlines:
  447. processed += self.implicit_inline(''.join(unprocessed),
  448. lineno)
  449. processed += inlines
  450. unprocessed = []
  451. else:
  452. break
  453. remaining = ''.join(unprocessed) + remaining
  454. if remaining:
  455. processed += self.implicit_inline(remaining, lineno)
  456. return processed, messages
  457. # Inline object recognition
  458. # -------------------------
  459. # lookahead and look-behind expressions for inline markup rules
  460. start_string_prefix = ('(^|(?<=\\s|[%s%s]))' %
  461. (punctuation_chars.openers,
  462. punctuation_chars.delimiters))
  463. end_string_suffix = ('($|(?=\\s|[\x00%s%s%s]))' %
  464. (punctuation_chars.closing_delimiters,
  465. punctuation_chars.delimiters,
  466. punctuation_chars.closers))
  467. # print start_string_prefix.encode('utf8')
  468. # TODO: support non-ASCII whitespace in the following 4 patterns?
  469. non_whitespace_before = r'(?<![ \n])'
  470. non_whitespace_escape_before = r'(?<![ \n\x00])'
  471. non_unescaped_whitespace_escape_before = r'(?<!(?<!\x00)[ \n\x00])'
  472. non_whitespace_after = r'(?![ \n])'
  473. # Alphanumerics with isolated internal [-._+:] chars (i.e. not 2 together):
  474. simplename = r'(?:(?!_)\w)+(?:[-._+:](?:(?!_)\w)+)*'
  475. # Valid URI characters (see RFC 2396 & RFC 2732);
  476. # final \x00 allows backslash escapes in URIs:
  477. uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9\x00]"""
  478. # Delimiter indicating the end of a URI (not part of the URI):
  479. uri_end_delim = r"""[>]"""
  480. # Last URI character; same as uric but no punctuation:
  481. urilast = r"""[_~*/=+a-zA-Z0-9]"""
  482. # End of a URI (either 'urilast' or 'uric followed by a
  483. # uri_end_delim'):
  484. uri_end = r"""(?:%(urilast)s|%(uric)s(?=%(uri_end_delim)s))""" % locals()
  485. emailc = r"""[-_!~*'{|}/#?^`&=+$%a-zA-Z0-9\x00]"""
  486. email_pattern = r"""
  487. %(emailc)s+(?:\.%(emailc)s+)* # name
  488. (?<!\x00)@ # at
  489. %(emailc)s+(?:\.%(emailc)s*)* # host
  490. %(uri_end)s # final URI char
  491. """
  492. parts = ('initial_inline', start_string_prefix, '',
  493. [('start', '', non_whitespace_after, # simple start-strings
  494. [r'\*\*', # strong
  495. r'\*(?!\*)', # emphasis but not strong
  496. r'``', # literal
  497. r'_`', # inline internal target
  498. r'\|(?!\|)'] # substitution reference
  499. ),
  500. ('whole', '', end_string_suffix, # whole constructs
  501. [# reference name & end-string
  502. r'(?P<refname>%s)(?P<refend>__?)' % simplename,
  503. ('footnotelabel', r'\[', r'(?P<fnend>\]_)',
  504. [r'[0-9]+', # manually numbered
  505. r'\#(%s)?' % simplename, # auto-numbered (w/ label?)
  506. r'\*', # auto-symbol
  507. r'(?P<citationlabel>%s)' % simplename] # citation reference
  508. )
  509. ]
  510. ),
  511. ('backquote', # interpreted text or phrase reference
  512. '(?P<role>(:%s:)?)' % simplename, # optional role
  513. non_whitespace_after,
  514. ['`(?!`)'] # but not literal
  515. )
  516. ]
  517. )
  518. patterns = Struct(
  519. initial=build_regexp(parts),
  520. emphasis=re.compile(non_whitespace_escape_before
  521. + r'(\*)' + end_string_suffix, re.UNICODE),
  522. strong=re.compile(non_whitespace_escape_before
  523. + r'(\*\*)' + end_string_suffix, re.UNICODE),
  524. interpreted_or_phrase_ref=re.compile(
  525. r"""
  526. %(non_unescaped_whitespace_escape_before)s
  527. (
  528. `
  529. (?P<suffix>
  530. (?P<role>:%(simplename)s:)?
  531. (?P<refend>__?)?
  532. )
  533. )
  534. %(end_string_suffix)s
  535. """ % locals(), re.VERBOSE | re.UNICODE),
  536. embedded_uri=re.compile(
  537. r"""
  538. (
  539. (?:[ \n]+|^) # spaces or beginning of line/string
  540. < # open bracket
  541. %(non_whitespace_after)s
  542. ([^<>\x00]+) # anything but angle brackets & nulls
  543. %(non_whitespace_before)s
  544. > # close bracket w/o whitespace before
  545. )
  546. $ # end of string
  547. """ % locals(), re.VERBOSE | re.UNICODE),
  548. literal=re.compile(non_whitespace_before + '(``)'
  549. + end_string_suffix),
  550. target=re.compile(non_whitespace_escape_before
  551. + r'(`)' + end_string_suffix),
  552. substitution_ref=re.compile(non_whitespace_escape_before
  553. + r'(\|_{0,2})'
  554. + end_string_suffix),
  555. email=re.compile(email_pattern % locals() + '$',
  556. re.VERBOSE | re.UNICODE),
  557. uri=re.compile(
  558. (r"""
  559. %(start_string_prefix)s
  560. (?P<whole>
  561. (?P<absolute> # absolute URI
  562. (?P<scheme> # scheme (http, ftp, mailto)
  563. [a-zA-Z][a-zA-Z0-9.+-]*
  564. )
  565. :
  566. (
  567. ( # either:
  568. (//?)? # hierarchical URI
  569. %(uric)s* # URI characters
  570. %(uri_end)s # final URI char
  571. )
  572. ( # optional query
  573. \?%(uric)s*
  574. %(uri_end)s
  575. )?
  576. ( # optional fragment
  577. \#%(uric)s*
  578. %(uri_end)s
  579. )?
  580. )
  581. )
  582. | # *OR*
  583. (?P<email> # email address
  584. """ + email_pattern + r"""
  585. )
  586. )
  587. %(end_string_suffix)s
  588. """) % locals(), re.VERBOSE | re.UNICODE),
  589. pep=re.compile(
  590. r"""
  591. %(start_string_prefix)s
  592. (
  593. (pep-(?P<pepnum1>\d+)(.txt)?) # reference to source file
  594. |
  595. (PEP\s+(?P<pepnum2>\d+)) # reference by name
  596. )
  597. %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE),
  598. rfc=re.compile(
  599. r"""
  600. %(start_string_prefix)s
  601. (RFC(-|\s+)?(?P<rfcnum>\d+))
  602. %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE))
  603. def quoted_start(self, match):
  604. """Test if inline markup start-string is 'quoted'.
  605. 'Quoted' in this context means the start-string is enclosed in a pair
  606. of matching opening/closing delimiters (not necessarily quotes)
  607. or at the end of the match.
  608. """
  609. string = match.string
  610. start = match.start()
  611. if start == 0: # start-string at beginning of text
  612. return False
  613. prestart = string[start - 1]
  614. try:
  615. poststart = string[match.end()]
  616. except IndexError: # start-string at end of text
  617. return True # not "quoted" but no markup start-string either
  618. return punctuation_chars.match_chars(prestart, poststart)
  619. def inline_obj(self, match, lineno, end_pattern, nodeclass,
  620. restore_backslashes=False):
  621. string = match.string
  622. matchstart = match.start('start')
  623. matchend = match.end('start')
  624. if self.quoted_start(match):
  625. return (string[:matchend], [], string[matchend:], [], '')
  626. endmatch = end_pattern.search(string[matchend:])
  627. if endmatch and endmatch.start(1): # 1 or more chars
  628. text = unescape(endmatch.string[:endmatch.start(1)],
  629. restore_backslashes)
  630. textend = matchend + endmatch.end(1)
  631. rawsource = unescape(string[matchstart:textend], 1)
  632. return (string[:matchstart], [nodeclass(rawsource, text)],
  633. string[textend:], [], endmatch.group(1))
  634. msg = self.reporter.warning(
  635. 'Inline %s start-string without end-string.'
  636. % nodeclass.__name__, line=lineno)
  637. text = unescape(string[matchstart:matchend], 1)
  638. rawsource = unescape(string[matchstart:matchend], 1)
  639. prb = self.problematic(text, rawsource, msg)
  640. return string[:matchstart], [prb], string[matchend:], [msg], ''
  641. def problematic(self, text, rawsource, message):
  642. msgid = self.document.set_id(message, self.parent)
  643. problematic = nodes.problematic(rawsource, text, refid=msgid)
  644. prbid = self.document.set_id(problematic)
  645. message.add_backref(prbid)
  646. return problematic
  647. def emphasis(self, match, lineno):
  648. before, inlines, remaining, sysmessages, endstring = self.inline_obj(
  649. match, lineno, self.patterns.emphasis, nodes.emphasis)
  650. return before, inlines, remaining, sysmessages
  651. def strong(self, match, lineno):
  652. before, inlines, remaining, sysmessages, endstring = self.inline_obj(
  653. match, lineno, self.patterns.strong, nodes.strong)
  654. return before, inlines, remaining, sysmessages
  655. def interpreted_or_phrase_ref(self, match, lineno):
  656. end_pattern = self.patterns.interpreted_or_phrase_ref
  657. string = match.string
  658. matchstart = match.start('backquote')
  659. matchend = match.end('backquote')
  660. rolestart = match.start('role')
  661. role = match.group('role')
  662. position = ''
  663. if role:
  664. role = role[1:-1]
  665. position = 'prefix'
  666. elif self.quoted_start(match):
  667. return (string[:matchend], [], string[matchend:], [])
  668. endmatch = end_pattern.search(string[matchend:])
  669. if endmatch and endmatch.start(1): # 1 or more chars
  670. textend = matchend + endmatch.end()
  671. if endmatch.group('role'):
  672. if role:
  673. msg = self.reporter.warning(
  674. 'Multiple roles in interpreted text (both '
  675. 'prefix and suffix present; only one allowed).',
  676. line=lineno)
  677. text = unescape(string[rolestart:textend], 1)
  678. prb = self.problematic(text, text, msg)
  679. return string[:rolestart], [prb], string[textend:], [msg]
  680. role = endmatch.group('suffix')[1:-1]
  681. position = 'suffix'
  682. escaped = endmatch.string[:endmatch.start(1)]
  683. rawsource = unescape(string[matchstart:textend], 1)
  684. if rawsource[-1:] == '_':
  685. if role:
  686. msg = self.reporter.warning(
  687. 'Mismatch: both interpreted text role %s and '
  688. 'reference suffix.' % position, line=lineno)
  689. text = unescape(string[rolestart:textend], 1)
  690. prb = self.problematic(text, text, msg)
  691. return string[:rolestart], [prb], string[textend:], [msg]
  692. return self.phrase_ref(string[:matchstart], string[textend:],
  693. rawsource, escaped, unescape(escaped))
  694. else:
  695. rawsource = unescape(string[rolestart:textend], 1)
  696. nodelist, messages = self.interpreted(rawsource, escaped, role,
  697. lineno)
  698. return (string[:rolestart], nodelist,
  699. string[textend:], messages)
  700. msg = self.reporter.warning(
  701. 'Inline interpreted text or phrase reference start-string '
  702. 'without end-string.', line=lineno)
  703. text = unescape(string[matchstart:matchend], 1)
  704. prb = self.problematic(text, text, msg)
  705. return string[:matchstart], [prb], string[matchend:], [msg]
  706. def phrase_ref(self, before, after, rawsource, escaped, text):
  707. match = self.patterns.embedded_uri.search(escaped)
  708. if match:
  709. text = unescape(escaped[:match.start(0)])
  710. uri_text = match.group(2)
  711. uri = ''.join(uri_text.split())
  712. uri = self.adjust_uri(uri)
  713. if uri:
  714. target = nodes.target(match.group(1), refuri=uri)
  715. target.referenced = 1
  716. else:
  717. raise ApplicationError('problem with URI: %r' % uri_text)
  718. if not text:
  719. text = uri
  720. else:
  721. target = None
  722. refname = normalize_name(text)
  723. reference = nodes.reference(rawsource, text,
  724. name=whitespace_normalize_name(text))
  725. node_list = [reference]
  726. if rawsource[-2:] == '__':
  727. if target:
  728. reference['refuri'] = uri
  729. else:
  730. reference['anonymous'] = 1
  731. else:
  732. if target:
  733. reference['refuri'] = uri
  734. target['names'].append(refname)
  735. self.document.note_explicit_target(target, self.parent)
  736. node_list.append(target)
  737. else:
  738. reference['refname'] = refname
  739. self.document.note_refname(reference)
  740. return before, node_list, after, []
  741. def adjust_uri(self, uri):
  742. match = self.patterns.email.match(uri)
  743. if match:
  744. return 'mailto:' + uri
  745. else:
  746. return uri
  747. def interpreted(self, rawsource, text, role, lineno):
  748. role_fn, messages = roles.role(role, self.language, lineno,
  749. self.reporter)
  750. if role_fn:
  751. nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
  752. return nodes, messages + messages2
  753. else:
  754. msg = self.reporter.error(
  755. 'Unknown interpreted text role "%s".' % role,
  756. line=lineno)
  757. return ([self.problematic(rawsource, rawsource, msg)],
  758. messages + [msg])
  759. def literal(self, match, lineno):
  760. before, inlines, remaining, sysmessages, endstring = self.inline_obj(
  761. match, lineno, self.patterns.literal, nodes.literal,
  762. restore_backslashes=True)
  763. return before, inlines, remaining, sysmessages
  764. def inline_internal_target(self, match, lineno):
  765. before, inlines, remaining, sysmessages, endstring = self.inline_obj(
  766. match, lineno, self.patterns.target, nodes.target)
  767. if inlines and isinstance(inlines[0], nodes.target):
  768. assert len(inlines) == 1
  769. target = inlines[0]
  770. name = normalize_name(target.astext())
  771. target['names'].append(name)
  772. self.document.note_explicit_target(target, self.parent)
  773. return before, inlines, remaining, sysmessages
  774. def substitution_reference(self, match, lineno):
  775. before, inlines, remaining, sysmessages, endstring = self.inline_obj(
  776. match, lineno, self.patterns.substitution_ref,
  777. nodes.substitution_reference)
  778. if len(inlines) == 1:
  779. subref_node = inlines[0]
  780. if isinstance(subref_node, nodes.substitution_reference):
  781. subref_text = subref_node.astext()
  782. self.document.note_substitution_ref(subref_node, subref_text)
  783. if endstring[-1:] == '_':
  784. reference_node = nodes.reference(
  785. '|%s%s' % (subref_text, endstring), '')
  786. if endstring[-2:] == '__':
  787. reference_node['anonymous'] = 1
  788. else:
  789. reference_node['refname'] = normalize_name(subref_text)
  790. self.document.note_refname(reference_node)
  791. reference_node += subref_node
  792. inlines = [reference_node]
  793. return before, inlines, remaining, sysmessages
  794. def footnote_reference(self, match, lineno):
  795. """
  796. Handles `nodes.footnote_reference` and `nodes.citation_reference`
  797. elements.
  798. """
  799. label = match.group('footnotelabel')
  800. refname = normalize_name(label)
  801. string = match.string
  802. before = string[:match.start('whole')]
  803. remaining = string[match.end('whole'):]
  804. if match.group('citationlabel'):
  805. refnode = nodes.citation_reference('[%s]_' % label,
  806. refname=refname)
  807. refnode += nodes.Text(label)
  808. self.document.note_citation_ref(refnode)
  809. else:
  810. refnode = nodes.footnote_reference('[%s]_' % label)
  811. if refname[0] == '#':
  812. refname = refname[1:]
  813. refnode['auto'] = 1
  814. self.document.note_autofootnote_ref(refnode)
  815. elif refname == '*':
  816. refname = ''
  817. refnode['auto'] = '*'
  818. self.document.note_symbol_footnote_ref(
  819. refnode)
  820. else:
  821. refnode += nodes.Text(label)
  822. if refname:
  823. refnode['refname'] = refname
  824. self.document.note_footnote_ref(refnode)
  825. if utils.get_trim_footnote_ref_space(self.document.settings):
  826. before = before.rstrip()
  827. return (before, [refnode], remaining, [])
  828. def reference(self, match, lineno, anonymous=False):
  829. referencename = match.group('refname')
  830. refname = normalize_name(referencename)
  831. referencenode = nodes.reference(
  832. referencename + match.group('refend'), referencename,
  833. name=whitespace_normalize_name(referencename))
  834. if anonymous:
  835. referencenode['anonymous'] = 1
  836. else:
  837. referencenode['refname'] = refname
  838. self.document.note_refname(referencenode)
  839. string = match.string
  840. matchstart = match.start('whole')
  841. matchend = match.end('whole')
  842. return (string[:matchstart], [referencenode], string[matchend:], [])
  843. def anonymous_reference(self, match, lineno):
  844. return self.reference(match, lineno, anonymous=1)
  845. def standalone_uri(self, match, lineno):
  846. if (not match.group('scheme')
  847. or match.group('scheme').lower() in urischemes.schemes):
  848. if match.group('email'):
  849. addscheme = 'mailto:'
  850. else:
  851. addscheme = ''
  852. text = match.group('whole')
  853. unescaped = unescape(text, 0)
  854. return [nodes.reference(unescape(text, 1), unescaped,
  855. refuri=addscheme + unescaped)]
  856. else: # not a valid scheme
  857. raise MarkupMismatch
  858. def pep_reference(self, match, lineno):
  859. text = match.group(0)
  860. if text.startswith('pep-'):
  861. pepnum = int(match.group('pepnum1'))
  862. elif text.startswith('PEP'):
  863. pepnum = int(match.group('pepnum2'))
  864. else:
  865. raise MarkupMismatch
  866. ref = (self.document.settings.pep_base_url
  867. + self.document.settings.pep_file_url_template % pepnum)
  868. unescaped = unescape(text, 0)
  869. return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
  870. rfc_url = 'rfc%d.html'
  871. def rfc_reference(self, match, lineno):
  872. text = match.group(0)
  873. if text.startswith('RFC'):
  874. rfcnum = int(match.group('rfcnum'))
  875. ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum
  876. else:
  877. raise MarkupMismatch
  878. unescaped = unescape(text, 0)
  879. return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
  880. def implicit_inline(self, text, lineno):
  881. """
  882. Check each of the patterns in `self.implicit_dispatch` for a match,
  883. and dispatch to the stored method for the pattern. Recursively check
  884. the text before and after the match. Return a list of `nodes.Text`
  885. and inline element nodes.
  886. """
  887. if not text:
  888. return []
  889. for pattern, method in self.implicit_dispatch:
  890. match = pattern.search(text)
  891. if match:
  892. try:
  893. # Must recurse on strings before *and* after the match;
  894. # there may be multiple patterns.
  895. return (self.implicit_inline(text[:match.start()], lineno)
  896. + method(match, lineno) +
  897. self.implicit_inline(text[match.end():], lineno))
  898. except MarkupMismatch:
  899. pass
  900. return [nodes.Text(unescape(text), rawsource=unescape(text, 1))]
  901. dispatch = {'*': emphasis,
  902. '**': strong,
  903. '`': interpreted_or_phrase_ref,
  904. '``': literal,
  905. '_`': inline_internal_target,
  906. ']_': footnote_reference,
  907. '|': substitution_reference,
  908. '_': reference,
  909. '__': anonymous_reference}
  910. def _loweralpha_to_int(s, _zero=(ord('a')-1)):
  911. return ord(s) - _zero
  912. def _upperalpha_to_int(s, _zero=(ord('A')-1)):
  913. return ord(s) - _zero
  914. def _lowerroman_to_int(s):
  915. return roman.fromRoman(s.upper())
  916. class Body(RSTState):
  917. """
  918. Generic classifier of the first line of a block.
  919. """
  920. double_width_pad_char = tableparser.TableParser.double_width_pad_char
  921. """Padding character for East Asian double-width text."""
  922. enum = Struct()
  923. """Enumerated list parsing information."""
  924. enum.formatinfo = {
  925. 'parens': Struct(prefix='(', suffix=')', start=1, end=-1),
  926. 'rparen': Struct(prefix='', suffix=')', start=0, end=-1),
  927. 'period': Struct(prefix='', suffix='.', start=0, end=-1)}
  928. enum.formats = list(enum.formatinfo.keys())
  929. enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
  930. 'lowerroman', 'upperroman'] # ORDERED!
  931. enum.sequencepats = {'arabic': '[0-9]+',
  932. 'loweralpha': '[a-z]',
  933. 'upperalpha': '[A-Z]',
  934. 'lowerroman': '[ivxlcdm]+',
  935. 'upperroman': '[IVXLCDM]+',}
  936. enum.converters = {'arabic': int,
  937. 'loweralpha': _loweralpha_to_int,
  938. 'upperalpha': _upperalpha_to_int,
  939. 'lowerroman': _lowerroman_to_int,
  940. 'upperroman': roman.fromRoman}
  941. enum.sequenceregexps = {}
  942. for sequence in enum.sequences:
  943. enum.sequenceregexps[sequence] = re.compile(
  944. enum.sequencepats[sequence] + '$', re.UNICODE)
  945. grid_table_top_pat = re.compile(r'\+-[-+]+-\+ *$')
  946. """Matches the top (& bottom) of a full table)."""
  947. simple_table_top_pat = re.compile('=+( +=+)+ *$')
  948. """Matches the top of a simple table."""
  949. simple_table_border_pat = re.compile('=+[ =]*$')
  950. """Matches the bottom & header bottom of a simple table."""
  951. pats = {}
  952. """Fragments of patterns used by transitions."""
  953. pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
  954. pats['alpha'] = '[a-zA-Z]'
  955. pats['alphanum'] = '[a-zA-Z0-9]'
  956. pats['alphanumplus'] = '[a-zA-Z0-9_-]'
  957. pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s'
  958. '|%(upperroman)s|#)' % enum.sequencepats)
  959. pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
  960. # @@@ Loosen up the pattern? Allow Unicode?
  961. pats['optarg'] = '(%(alpha)s%(alphanumplus)s*|<[^<>]+>)' % pats
  962. pats['shortopt'] = r'(-|\+)%(alphanum)s( ?%(optarg)s)?' % pats
  963. pats['longopt'] = r'(--|/)%(optname)s([ =]%(optarg)s)?' % pats
  964. pats['option'] = r'(%(shortopt)s|%(longopt)s)' % pats
  965. for format in enum.formats:
  966. pats[format] = '(?P<%s>%s%s%s)' % (
  967. format, re.escape(enum.formatinfo[format].prefix),
  968. pats['enum'], re.escape(enum.formatinfo[format].suffix))
  969. patterns = {
  970. 'bullet': '[-+*\u2022\u2023\u2043]( +|$)',
  971. 'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)' % pats,
  972. 'field_marker': r':(?![: ])([^:\\]|\\.)*(?<! ):( +|$)',
  973. 'option_marker': r'%(option)s(, %(option)s)*( +| ?$)' % pats,
  974. 'doctest': r'>>>( +|$)',
  975. 'line_block': r'\|( +|$)',
  976. 'grid_table_top': grid_table_top_pat,
  977. 'simple_table_top': simple_table_top_pat,
  978. 'explicit_markup': r'\.\.( +|$)',
  979. 'anonymous': r'__( +|$)',
  980. 'line': r'(%(nonalphanum7bit)s)\1* *$' % pats,
  981. 'text': r''}
  982. initial_transitions = (
  983. 'bullet',
  984. 'enumerator',
  985. 'field_marker',
  986. 'option_marker',
  987. 'doctest',
  988. 'line_block',
  989. 'grid_table_top',
  990. 'simple_table_top',
  991. 'explicit_markup',
  992. 'anonymous',
  993. 'line',
  994. 'text')
  995. def indent(self, match, context, next_state):
  996. """Block quote."""
  997. indented, indent, line_offset, blank_finish = \
  998. self.state_machine.get_indented()
  999. elements = self.block_quote(indented, line_offset)
  1000. self.parent += elements
  1001. if not blank_finish:
  1002. self.parent += self.unindent_warning('Block quote')
  1003. return context, next_state, []
  1004. def block_quote(self, indented, line_offset):
  1005. elements = []
  1006. while indented:
  1007. (blockquote_lines,
  1008. attribution_lines,
  1009. attribution_offset,
  1010. indented,
  1011. new_line_offset) = self.split_attribution(indented, line_offset)
  1012. blockquote = nodes.block_quote()
  1013. self.nested_parse(blockquote_lines, line_offset, blockquote)
  1014. elements.append(blockquote)
  1015. if attribution_lines:
  1016. attribution, messages = self.parse_attribution(
  1017. attribution_lines, attribution_offset)
  1018. blockquote += attribution
  1019. elements += messages
  1020. line_offset = new_line_offset
  1021. while indented and not indented[0]:
  1022. indented = indented[1:]
  1023. line_offset += 1
  1024. return elements
  1025. # U+2014 is an em-dash:
  1026. attribution_pattern = re.compile('(---?(?!-)|\u2014) *(?=[^ \\n])',
  1027. re.UNICODE)
  1028. def split_attribution(self, indented, line_offset):
  1029. """
  1030. Check for a block quote attribution and split it off:
  1031. * First line after a blank line must begin with a dash ("--", "---",
  1032. em-dash; matches `self.attribution_pattern`).
  1033. * Every line after that must have consistent indentation.
  1034. * Attributions must be preceded by block quote content.
  1035. Return a tuple of: (block quote content lines, content offset,
  1036. attribution lines, attribution offset, remaining indented lines).
  1037. """
  1038. blank = None
  1039. nonblank_seen = False
  1040. for i in range(len(indented)):
  1041. line = indented[i].rstrip()
  1042. if line:
  1043. if nonblank_seen and blank == i - 1: # last line blank
  1044. match = self.attribution_pattern.match(line)
  1045. if match:
  1046. attribution_end, indent = self.check_attribution(
  1047. indented, i)
  1048. if attribution_end:
  1049. a_lines = indented[i:attribution_end]
  1050. a_lines.trim_left(match.end(), end=1)
  1051. a_lines.trim_left(indent, start=1)
  1052. return (indented[:i], a_lines,
  1053. i, indented[attribution_end:],
  1054. line_offset + attribution_end)
  1055. nonblank_seen = True
  1056. else:
  1057. blank = i
  1058. else:
  1059. return (indented, None, None, None, None)
  1060. def check_attribution(self, indented, attribution_start):
  1061. """
  1062. Check attribution shape.
  1063. Return the index past the end of the attribution, and the indent.
  1064. """
  1065. indent = None
  1066. i = attribution_start + 1
  1067. for i in range(attribution_start + 1, len(indented)):
  1068. line = indented[i].rstrip()
  1069. if not line:
  1070. break
  1071. if indent is None:
  1072. indent = len(line) - len(line.lstrip())
  1073. elif len(line) - len(line.lstrip()) != indent:
  1074. return None, None # bad shape; not an attribution
  1075. else:
  1076. # return index of line after last attribution line:
  1077. i += 1
  1078. return i, (indent or 0)
  1079. def parse_attribution(self, indented, line_offset):
  1080. text = '\n'.join(indented).rstrip()
  1081. lineno = self.state_machine.abs_line_number() + line_offset
  1082. textnodes, messages = self.inline_text(text, lineno)
  1083. node = nodes.attribution(text, '', *textnodes)
  1084. node.source, node.line = self.state_machine.get_source_and_line(lineno)
  1085. return node, messages
  1086. def bullet(self, match, context, next_state):
  1087. """Bullet list item."""
  1088. bulletlist = nodes.bullet_list()
  1089. self.parent += bulletlist
  1090. bulletlist['bullet'] = match.string[0]
  1091. i, blank_finish = self.list_item(match.end())
  1092. bulletlist += i
  1093. offset = self.state_machine.line_offset + 1 # next line
  1094. new_line_offset, blank_finish = self.nested_list_parse(
  1095. self.state_machine.input_lines[offset:],
  1096. input_offset=self.state_machine.abs_line_offset() + 1,
  1097. node=bulletlist, initial_state='BulletList',
  1098. blank_finish=blank_finish)
  1099. self.goto_line(new_line_offset)
  1100. if not blank_finish:
  1101. self.parent += self.unindent_warning('Bullet list')
  1102. return [], next_state, []
  1103. def list_item(self, indent):
  1104. if self.state_machine.line[indent:]:
  1105. indented, line_offset, blank_finish = (
  1106. self.state_machine.get_known_indented(indent))
  1107. else:
  1108. indented, indent, line_offset, blank_finish = (
  1109. self.state_machine.get_first_known_indented(indent))
  1110. listitem = nodes.list_item('\n'.join(indented))
  1111. if indented:
  1112. self.nested_parse(indented, input_offset=line_offset,
  1113. node=listitem)
  1114. return listitem, blank_finish
  1115. def enumerator(self, match, context, next_state):
  1116. """Enumerated List Item"""
  1117. format, sequence, text, ordinal = self.parse_enumerator(match)
  1118. if not self.is_enumerated_list_item(ordinal, sequence, format):
  1119. raise statemachine.TransitionCorrection('text')
  1120. enumlist = nodes.enumerated_list()
  1121. self.parent += enumlist
  1122. if sequence == '#':
  1123. enumlist['enumtype'] = 'arabic'
  1124. else:
  1125. enumlist['enumtype'] = sequence
  1126. enumlist['prefix'] = self.enum.formatinfo[format].prefix
  1127. enumlist['suffix'] = self.enum.formatinfo[format].suffix
  1128. if ordinal != 1:
  1129. enumlist['start'] = ordinal
  1130. msg = self.reporter.info(
  1131. 'Enumerated list start value not ordinal-1: "%s" (ordinal %s)'
  1132. % (text, ordinal))
  1133. self.parent += msg
  1134. listitem, blank_finish = self.list_item(match.end())
  1135. enumlist += listitem
  1136. offset = self.state_machine.line_offset + 1 # next line
  1137. newline_offset, blank_finish = self.nested_list_parse(
  1138. self.state_machine.input_lines[offset:],
  1139. input_offset=self.state_machine.abs_line_offset() + 1,
  1140. node=enumlist, initial_state='EnumeratedList',
  1141. blank_finish=blank_finish,
  1142. extra_settings={'lastordinal': ordinal,
  1143. 'format': format,
  1144. 'auto': sequence == '#'})
  1145. self.goto_line(newline_offset)
  1146. if not blank_finish:
  1147. self.parent += self.unindent_warning('Enumerated list')
  1148. return [], next_state, []
  1149. def parse_enumerator(self, match, expected_sequence=None):
  1150. """
  1151. Analyze an enumerator and return the results.
  1152. :Return:
  1153. - the enumerator format ('period', 'parens', or 'rparen'),
  1154. - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
  1155. - the text of the enumerator, stripped of formatting, and
  1156. - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
  1157. ``None`` is returned for invalid enumerator text).
  1158. The enumerator format has already been determined by the regular
  1159. expression match. If `expected_sequence` is given, that sequence is
  1160. tried first. If not, we check for Roman numeral 1. This way,
  1161. single-character Roman numerals (which are also alphabetical) can be
  1162. matched. If no sequence has been matched, all sequences are checked in
  1163. order.
  1164. """
  1165. groupdict = match.groupdict()
  1166. sequence = ''
  1167. for format in self.enum.formats:
  1168. if groupdict[format]: # was this the format matched?
  1169. break # yes; keep `format`
  1170. else: # shouldn't happen
  1171. raise ParserError('enumerator format not matched')
  1172. text = groupdict[format][self.enum.formatinfo[format].start
  1173. :self.enum.formatinfo[format].end]
  1174. if text == '#':
  1175. sequence = '#'
  1176. elif expected_sequence:
  1177. try:
  1178. if self.enum.sequenceregexps[expected_sequence].match(text):
  1179. sequence = expected_sequence
  1180. except KeyError: # shouldn't happen
  1181. raise ParserError('unknown enumerator sequence: %s'
  1182. % sequence)
  1183. elif text == 'i':
  1184. sequence = 'lowerroman'
  1185. elif text == 'I':
  1186. sequence = 'upperroman'
  1187. if not sequence:
  1188. for sequence in self.enum.sequences:
  1189. if self.enum.sequenceregexps[sequence].match(text):
  1190. break
  1191. else: # shouldn't happen
  1192. raise ParserError('enumerator sequence not matched')
  1193. if sequence == '#':
  1194. ordinal = 1
  1195. else:
  1196. try:
  1197. ordinal = self.enum.converters[sequence](text)
  1198. except roman.InvalidRomanNumeralError:
  1199. ordinal = None
  1200. return format, sequence, text, ordinal
  1201. def is_enumerated_list_item(self, ordinal, sequence, format):
  1202. """
  1203. Check validity based on the ordinal value and the second line.
  1204. Return true if the ordinal is valid and the second line is blank,
  1205. indented, or starts with the next enumerator or an auto-enumerator.
  1206. """
  1207. if ordinal is None:
  1208. return None
  1209. try:
  1210. next_line = self.state_machine.next_line()
  1211. except EOFError: # end of input lines
  1212. self.state_machine.previous_line()
  1213. return 1
  1214. else:
  1215. self.state_machine.previous_line()
  1216. if not next_line[:1].strip(): # blank or indented
  1217. return 1
  1218. result = self.make_enumerator(ordinal + 1, sequence, format)
  1219. if result:
  1220. next_enumerator, auto_enumerator = result
  1221. try:
  1222. if ( next_line.startswith(next_enumerator) or
  1223. next_line.startswith(auto_enumerator) ):
  1224. return 1
  1225. except TypeError:
  1226. pass
  1227. return None
  1228. def make_enumerator(self, ordinal, sequence, format):
  1229. """
  1230. Construct and return the next enumerated list item marker, and an
  1231. auto-enumerator ("#" instead of the regular enumerator).
  1232. Return ``None`` for invalid (out of range) ordinals.
  1233. """ #"
  1234. if sequence == '#':
  1235. enumerator = '#'
  1236. elif sequence == 'arabic':
  1237. enumerator = str(ordinal)
  1238. else:
  1239. if sequence.endswith('alpha'):
  1240. if ordinal > 26:
  1241. return None
  1242. enumerator = chr(ordinal + ord('a') - 1)
  1243. elif sequence.endswith('roman'):
  1244. try:
  1245. enumerator = roman.toRoman(ordinal)
  1246. except roman.RomanError:
  1247. return None
  1248. else: # shouldn't happen
  1249. raise ParserError('unknown enumerator sequence: "%s"'
  1250. % sequence)
  1251. if sequence.startswith('lower'):
  1252. enumerator = enumerator.lower()
  1253. elif sequence.startswith('upper'):
  1254. enumerator = enumerator.upper()
  1255. else: # shouldn't happen
  1256. raise ParserError('unknown enumerator sequence: "%s"'
  1257. % sequence)
  1258. formatinfo = self.enum.formatinfo[format]
  1259. next_enumerator = (formatinfo.prefix + enumerator + formatinfo.suffix
  1260. + ' ')
  1261. auto_enumerator = formatinfo.prefix + '#' + formatinfo.suffix + ' '
  1262. return next_enumerator, auto_enumerator
  1263. def field_marker(self, match, context, next_state):
  1264. """Field list item."""
  1265. field_list = nodes.field_list()
  1266. self.parent += field_list
  1267. field, blank_finish = self.field(match)
  1268. field_list += field
  1269. offset = self.state_machine.line_offset + 1 # next line
  1270. newline_offset, blank_finish = self.nested_list_parse(
  1271. self.state_machine.input_lines[offset:],
  1272. input_offset=self.state_machine.abs_line_offset() + 1,
  1273. node=field_list, initial_state='FieldList',
  1274. blank_finish=blank_finish)
  1275. self.goto_line(newline_offset)
  1276. if not blank_finish:
  1277. self.parent += self.unindent_warning('Field list')
  1278. return [], next_state, []
  1279. def field(self, match):
  1280. name = self.parse_field_marker(match)
  1281. src, srcline = self.state_machine.get_source_and_line()
  1282. lineno = self.state_machine.abs_line_number()
  1283. indented, indent, line_offset, blank_finish = \
  1284. self.state_machine.get_first_known_indented(match.end())
  1285. field_node = nodes.field()
  1286. field_node.source = src
  1287. field_node.line = srcline
  1288. name_nodes, name_messages = self.inline_text(name, lineno)
  1289. field_node += nodes.field_name(name, '', *name_nodes)
  1290. field_body = nodes.field_body('\n'.join(indented), *name_messages)
  1291. field_node += field_body
  1292. if indented:
  1293. self.parse_field_body(indented, line_offset, field_body)
  1294. return field_node, blank_finish
  1295. def parse_field_marker(self, match):
  1296. """Extract & return field name from a field marker match."""
  1297. field = match.group()[1:] # strip off leading ':'
  1298. field = field[:field.rfind(':')] # strip off trailing ':' etc.
  1299. return field
  1300. def parse_field_body(self, indented, offset, node):
  1301. self.nested_parse(indented, input_offset=offset, node=node)
  1302. def option_marker(self, match, context, next_state):
  1303. """Option list item."""
  1304. optionlist = nodes.option_list()
  1305. try:
  1306. listitem, blank_finish = self.option_list_item(match)
  1307. except MarkupError as error:
  1308. # This shouldn't happen; pattern won't match.
  1309. msg = self.reporter.error('Invalid option list marker: %s' %
  1310. error)
  1311. self.parent += msg
  1312. indented, indent, line_offset, blank_finish = \
  1313. self.state_machine.get_first_known_indented(match.end())
  1314. elements = self.block_quote(indented, line_offset)
  1315. self.parent += elements
  1316. if not blank_finish:
  1317. self.parent += self.unindent_warning('Option list')
  1318. return [], next_state, []
  1319. self.parent += optionlist
  1320. optionlist += listitem
  1321. offset = self.state_machine.line_offset + 1 # next line
  1322. newline_offset, blank_finish = self.nested_list_parse(
  1323. self.state_machine.input_lines[offset:],
  1324. input_offset=self.state_machine.abs_line_offset() + 1,
  1325. node=optionlist, initial_state='OptionList',
  1326. blank_finish=blank_finish)
  1327. self.goto_line(newline_offset)
  1328. if not blank_finish:
  1329. self.parent += self.unindent_warning('Option list')
  1330. return [], next_state, []
  1331. def option_list_item(self, match):
  1332. offset = self.state_machine.abs_line_offset()
  1333. options = self.parse_option_marker(match)
  1334. indented, indent, line_offset, blank_finish = \
  1335. self.state_machine.get_first_known_indented(match.end())
  1336. if not indented: # not an option list item
  1337. self.goto_line(offset)
  1338. raise statemachine.TransitionCorrection('text')
  1339. option_group = nodes.option_group('', *options)
  1340. description = nodes.description('\n'.join(indented))
  1341. option_list_item = nodes.option_list_item('', option_group,
  1342. description)
  1343. if indented:
  1344. self.nested_parse(indented, input_offset=line_offset,
  1345. node=description)
  1346. return option_list_item, blank_finish
  1347. def parse_option_marker(self, match):
  1348. """
  1349. Return a list of `node.option` and `node.option_argument` objects,
  1350. parsed from an option marker match.
  1351. :Exception: `MarkupError` for invalid option markers.
  1352. """
  1353. optlist = []
  1354. optionstrings = match.group().rstrip().split(', ')
  1355. for optionstring in optionstrings:
  1356. tokens = optionstring.split()
  1357. delimiter = ' '
  1358. firstopt = tokens[0].split('=', 1)
  1359. if len(firstopt) > 1:
  1360. # "--opt=value" form
  1361. tokens[:1] = firstopt
  1362. delimiter = '='
  1363. elif (len(tokens[0]) > 2
  1364. and ((tokens[0].startswith('-')
  1365. and not tokens[0].startswith('--'))
  1366. or tokens[0].startswith('+'))):
  1367. # "-ovalue" form
  1368. tokens[:1] = [tokens[0][:2], tokens[0][2:]]
  1369. delimiter = ''
  1370. if len(tokens) > 1 and (tokens[1].startswith('<')
  1371. and tokens[-1].endswith('>')):
  1372. # "-o <value1 value2>" form; join all values into one token
  1373. tokens[1:] = [' '.join(tokens[1:])]
  1374. if 0 < len(tokens) <= 2:
  1375. option = nodes.option(optionstring)
  1376. option += nodes.option_string(tokens[0], tokens[0])
  1377. if len(tokens) > 1:
  1378. option += nodes.option_argument(tokens[1], tokens[1],
  1379. delimiter=delimiter)
  1380. optlist.append(option)
  1381. else:
  1382. raise MarkupError(
  1383. 'wrong number of option tokens (=%s), should be 1 or 2: '
  1384. '"%s"' % (len(tokens), optionstring))
  1385. return optlist
  1386. def doctest(self, match, context, next_state):
  1387. data = '\n'.join(self.state_machine.get_text_block())
  1388. self.parent += nodes.doctest_block(data, data)
  1389. return [], next_state, []
  1390. def line_block(self, match, context, next_state):
  1391. """First line of a line block."""
  1392. block = nodes.line_block()
  1393. self.parent += block
  1394. lineno = self.state_machine.abs_line_number()
  1395. line, messages, blank_finish = self.line_block_line(match, lineno)
  1396. block += line
  1397. self.parent += messages
  1398. if not blank_finish:
  1399. offset = self.state_machine.line_offset + 1 # next line
  1400. new_line_offset, blank_finish = self.nested_list_parse(
  1401. self.state_machine.input_lines[offset:],
  1402. input_offset=self.state_machine.abs_line_offset() + 1,
  1403. node=block, initial_state='LineBlock',
  1404. blank_finish=0)
  1405. self.goto_line(new_line_offset)
  1406. if not blank_finish:
  1407. self.parent += self.reporter.warning(
  1408. 'Line block ends without a blank line.',
  1409. line=lineno+1)
  1410. if len(block):
  1411. if block[0].indent is None:
  1412. block[0].indent = 0
  1413. self.nest_line_block_lines(block)
  1414. return [], next_state, []
  1415. def line_block_line(self, match, lineno):
  1416. """Return one line element of a line_block."""
  1417. indented, indent, line_offset, blank_finish = \
  1418. self.state_machine.get_first_known_indented(match.end(),
  1419. until_blank=True)
  1420. text = '\n'.join(indented)
  1421. text_nodes, messages = self.inline_text(text, lineno)
  1422. line = nodes.line(text, '', *text_nodes)
  1423. if match.string.rstrip() != '|': # not empty
  1424. line.indent = len(match.group(1)) - 1
  1425. return line, messages, blank_finish
  1426. def nest_line_block_lines(self, block):
  1427. for index in range(1, len(block)):
  1428. if block[index].indent is None:
  1429. block[index].indent = block[index - 1].indent
  1430. self.nest_line_block_segment(block)
  1431. def nest_line_block_segment(self, block):
  1432. indents = [item.indent for item in block]
  1433. least = min(indents)
  1434. new_items = []
  1435. new_block = nodes.line_block()
  1436. for item in block:
  1437. if item.indent > least:
  1438. new_block.append(item)
  1439. else:
  1440. if len(new_block):
  1441. self.nest_line_block_segment(new_block)
  1442. new_items.append(new_block)
  1443. new_block = nodes.line_block()
  1444. new_items.append(item)
  1445. if len(new_block):
  1446. self.nest_line_block_segment(new_block)
  1447. new_items.append(new_block)
  1448. block[:] = new_items
  1449. def grid_table_top(self, match, context, next_state):
  1450. """Top border of a full table."""
  1451. return self.table_top(match, context, next_state,
  1452. self.isolate_grid_table,
  1453. tableparser.GridTableParser)
  1454. def simple_table_top(self, match, context, next_state):
  1455. """Top border of a simple table."""
  1456. return self.table_top(match, context, next_state,
  1457. self.isolate_simple_table,
  1458. tableparser.SimpleTableParser)
  1459. def table_top(self, match, context, next_state,
  1460. isolate_function, parser_class):
  1461. """Top border of a generic table."""
  1462. nodelist, blank_finish = self.table(isolate_function, parser_class)
  1463. self.parent += nodelist
  1464. if not blank_finish:
  1465. msg = self.reporter.warning(
  1466. 'Blank line required after table.',
  1467. line=self.state_machine.abs_line_number()+1)
  1468. self.parent += msg
  1469. return [], next_state, []
  1470. def table(self, isolate_function, parser_class):
  1471. """Parse a table."""
  1472. block, messages, blank_finish = isolate_function()
  1473. if block:
  1474. try:
  1475. parser = parser_class()
  1476. tabledata = parser.parse(block)
  1477. tableline = (self.state_machine.abs_line_number() - len(block)
  1478. + 1)
  1479. table = self.build_table(tabledata, tableline)
  1480. nodelist = [table] + messages
  1481. except tableparser.TableMarkupError as err:
  1482. nodelist = self.malformed_table(block, ' '.join(err.args),
  1483. offset=err.offset) + messages
  1484. else:
  1485. nodelist = messages
  1486. return nodelist, blank_finish
  1487. def isolate_grid_table(self):
  1488. messages = []
  1489. blank_finish = 1
  1490. try:
  1491. block = self.state_machine.get_text_block(flush_left=True)
  1492. except statemachine.UnexpectedIndentationError as err:
  1493. block, src, srcline = err.args
  1494. messages.append(self.reporter.error('Unexpected indentation.',
  1495. source=src, line=srcline))
  1496. blank_finish = 0
  1497. block.disconnect()
  1498. # for East Asian chars:
  1499. block.pad_double_width(self.double_width_pad_char)
  1500. width = len(block[0].strip())
  1501. for i in range(len(block)):
  1502. block[i] = block[i].strip()
  1503. if block[i][0] not in '+|': # check left edge
  1504. blank_finish = 0
  1505. self.state_machine.previous_line(len(block) - i)
  1506. del block[i:]
  1507. break
  1508. if not self.grid_table_top_pat.match(block[-1]): # find bottom
  1509. blank_finish = 0
  1510. # from second-last to third line of table:
  1511. for i in range(len(block) - 2, 1, -1):
  1512. if self.grid_table_top_pat.match(block[i]):
  1513. self.state_machine.previous_line(len(block) - i + 1)
  1514. del block[i+1:]
  1515. break
  1516. else:
  1517. messages.extend(self.malformed_table(block))
  1518. return [], messages, blank_finish
  1519. for i in range(len(block)): # check right edge
  1520. if len(block[i]) != width or block[i][-1] not in '+|':
  1521. messages.extend(self.malformed_table(block))
  1522. return [], messages, blank_finish
  1523. return block, messages, blank_finish
  1524. def isolate_simple_table(self):
  1525. start = self.state_machine.line_offset
  1526. lines = self.state_machine.input_lines
  1527. limit = len(lines) - 1
  1528. toplen = len(lines[start].strip())
  1529. pattern_match = self.simple_table_border_pat.match
  1530. found = 0
  1531. found_at = None
  1532. i = start + 1
  1533. while i <= limit:
  1534. line = lines[i]
  1535. match = pattern_match(line)
  1536. if match:
  1537. if len(line.strip()) != toplen:
  1538. self.state_machine.next_line(i - start)
  1539. messages = self.malformed_table(
  1540. lines[start:i+1], 'Bottom/header table border does '
  1541. 'not match top border.')
  1542. return [], messages, i == limit or not lines[i+1].strip()
  1543. found += 1
  1544. found_at = i
  1545. if found == 2 or i == limit or not lines[i+1].strip():
  1546. end = i
  1547. break
  1548. i += 1
  1549. else: # reached end of input_lines
  1550. if found:
  1551. extra = ' or no blank line after table bottom'
  1552. self.state_machine.next_line(found_at - start)
  1553. block = lines[start:found_at+1]
  1554. else:
  1555. extra = ''
  1556. self.state_machine.next_line(i - start - 1)
  1557. block = lines[start:]
  1558. messages = self.malformed_table(
  1559. block, 'No bottom table border found%s.' % extra)
  1560. return [], messages, not extra
  1561. self.state_machine.next_line(end - start)
  1562. block = lines[start:end+1]
  1563. # for East Asian chars:
  1564. block.pad_double_width(self.double_width_pad_char)
  1565. return block, [], end == limit or not lines[end+1].strip()
  1566. def malformed_table(self, block, detail='', offset=0):
  1567. block.replace(self.double_width_pad_char, '')
  1568. data = '\n'.join(block)
  1569. message = 'Malformed table.'
  1570. startline = self.state_machine.abs_line_number() - len(block) + 1
  1571. if detail:
  1572. message += '\n' + detail
  1573. error = self.reporter.error(message, nodes.literal_block(data, data),
  1574. line=startline+offset)
  1575. return [error]
  1576. def build_table(self, tabledata, tableline, stub_columns=0):
  1577. colwidths, headrows, bodyrows = tabledata
  1578. table = nodes.table()
  1579. tgroup = nodes.tgroup(cols=len(colwidths))
  1580. table += tgroup
  1581. for colwidth in colwidths:
  1582. colspec = nodes.colspec(colwidth=colwidth)
  1583. if stub_columns:
  1584. colspec.attributes['stub'] = 1
  1585. stub_columns -= 1
  1586. tgroup += colspec
  1587. if headrows:
  1588. thead = nodes.thead()
  1589. tgroup += thead
  1590. for row in headrows:
  1591. thead += self.build_table_row(row, tableline)
  1592. tbody = nodes.tbody()
  1593. tgroup += tbody
  1594. for row in bodyrows:
  1595. tbody += self.build_table_row(row, tableline)
  1596. return table
  1597. def build_table_row(self, rowdata, tableline):
  1598. row = nodes.row()
  1599. for cell in rowdata:
  1600. if cell is None:
  1601. continue
  1602. morerows, morecols, offset, cellblock = cell
  1603. attributes = {}
  1604. if morerows:
  1605. attributes['morerows'] = morerows
  1606. if morecols:
  1607. attributes['morecols'] = morecols
  1608. entry = nodes.entry(**attributes)
  1609. row += entry
  1610. if ''.join(cellblock):
  1611. self.nested_parse(cellblock, input_offset=tableline+offset,
  1612. node=entry)
  1613. return row
  1614. explicit = Struct()
  1615. """Patterns and constants used for explicit markup recognition."""
  1616. explicit.patterns = Struct(
  1617. target=re.compile(r"""
  1618. (
  1619. _ # anonymous target
  1620. | # *OR*
  1621. (?!_) # no underscore at the beginning
  1622. (?P<quote>`?) # optional open quote
  1623. (?![ `]) # first char. not space or
  1624. # backquote
  1625. (?P<name> # reference name
  1626. .+?
  1627. )
  1628. %(non_whitespace_escape_before)s
  1629. (?P=quote) # close quote if open quote used
  1630. )
  1631. (?<!(?<!\x00):) # no unescaped colon at end
  1632. %(non_whitespace_escape_before)s
  1633. [ ]? # optional space
  1634. : # end of reference name
  1635. ([ ]+|$) # followed by whitespace
  1636. """ % vars(Inliner), re.VERBOSE | re.UNICODE),
  1637. reference=re.compile(r"""
  1638. (
  1639. (?P<simple>%(simplename)s)_
  1640. | # *OR*
  1641. ` # open backquote
  1642. (?![ ]) # not space
  1643. (?P<phrase>.+?) # hyperlink phrase
  1644. %(non_whitespace_escape_before)s
  1645. `_ # close backquote,
  1646. # reference mark
  1647. )
  1648. $ # end of string
  1649. """ % vars(Inliner), re.VERBOSE | re.UNICODE),
  1650. substitution=re.compile(r"""
  1651. (
  1652. (?![ ]) # first char. not space
  1653. (?P<name>.+?) # substitution text
  1654. %(non_whitespace_escape_before)s
  1655. \| # close delimiter
  1656. )
  1657. ([ ]+|$) # followed by whitespace
  1658. """ % vars(Inliner),
  1659. re.VERBOSE | re.UNICODE),)
  1660. def footnote(self, match):
  1661. src, srcline = self.state_machine.get_source_and_line()
  1662. indented, indent, offset, blank_finish = \
  1663. self.state_machine.get_first_known_indented(match.end())
  1664. label = match.group(1)
  1665. name = normalize_name(label)
  1666. footnote = nodes.footnote('\n'.join(indented))
  1667. footnote.source = src
  1668. footnote.line = srcline
  1669. if name[0] == '#': # auto-numbered
  1670. name = name[1:] # autonumber label
  1671. footnote['auto'] = 1
  1672. if name:
  1673. footnote['names'].append(name)
  1674. self.document.note_autofootnote(footnote)
  1675. elif name == '*': # auto-symbol
  1676. name = ''
  1677. footnote['auto'] = '*'
  1678. self.document.note_symbol_footnote(footnote)
  1679. else: # manually numbered
  1680. footnote += nodes.label('', label)
  1681. footnote['names'].append(name)
  1682. self.document.note_footnote(footnote)
  1683. if name:
  1684. self.document.note_explicit_target(footnote, footnote)
  1685. else:
  1686. self.document.set_id(footnote, footnote)
  1687. if indented:
  1688. self.nested_parse(indented, input_offset=offset, node=footnote)
  1689. return [footnote], blank_finish
  1690. def citation(self, match):
  1691. src, srcline = self.state_machine.get_source_and_line()
  1692. indented, indent, offset, blank_finish = \
  1693. self.state_machine.get_first_known_indented(match.end())
  1694. label = match.group(1)
  1695. name = normalize_name(label)
  1696. citation = nodes.citation('\n'.join(indented))
  1697. citation.source = src
  1698. citation.line = srcline
  1699. citation += nodes.label('', label)
  1700. citation['names'].append(name)
  1701. self.document.note_citation(citation)
  1702. self.document.note_explicit_target(citation, citation)
  1703. if indented:
  1704. self.nested_parse(indented, input_offset=offset, node=citation)
  1705. return [citation], blank_finish
  1706. def hyperlink_target(self, match):
  1707. pattern = self.explicit.patterns.target
  1708. lineno = self.state_machine.abs_line_number()
  1709. block, indent, offset, blank_finish = \
  1710. self.state_machine.get_first_known_indented(
  1711. match.end(), until_blank=True, strip_indent=False)
  1712. blocktext = match.string[:match.end()] + '\n'.join(block)
  1713. block = [escape2null(line) for line in block]
  1714. escaped = block[0]
  1715. blockindex = 0
  1716. while True:
  1717. targetmatch = pattern.match(escaped)
  1718. if targetmatch:
  1719. break
  1720. blockindex += 1
  1721. try:
  1722. escaped += block[blockindex]
  1723. except IndexError:
  1724. raise MarkupError('malformed hyperlink target.')
  1725. del block[:blockindex]
  1726. block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
  1727. target = self.make_target(block, blocktext, lineno,
  1728. targetmatch.group('name'))
  1729. return [target], blank_finish
  1730. def make_target(self, block, block_text, lineno, target_name):
  1731. target_type, data = self.parse_target(block, block_text, lineno)
  1732. if target_type == 'refname':
  1733. target = nodes.target(block_text, '', refname=normalize_name(data))
  1734. target.indirect_reference_name = data
  1735. self.add_target(target_name, '', target, lineno)
  1736. self.document.note_indirect_target(target)
  1737. return target
  1738. elif target_type == 'refuri':
  1739. target = nodes.target(block_text, '')
  1740. self.add_target(target_name, data, target, lineno)
  1741. return target
  1742. else:
  1743. return data
  1744. def parse_target(self, block, block_text, lineno):
  1745. """
  1746. Determine the type of reference of a target.
  1747. :Return: A 2-tuple, one of:
  1748. - 'refname' and the indirect reference name
  1749. - 'refuri' and the URI
  1750. - 'malformed' and a system_message node
  1751. """
  1752. if block and block[-1].strip()[-1:] == '_': # possible indirect target
  1753. reference = ' '.join([line.strip() for line in block])
  1754. refname = self.is_reference(reference)
  1755. if refname:
  1756. return 'refname', refname
  1757. reference = ''.join([''.join(line.split()) for line in block])
  1758. return 'refuri', unescape(reference)
  1759. def is_reference(self, reference):
  1760. match = self.explicit.patterns.reference.match(
  1761. whitespace_normalize_name(reference))
  1762. if not match:
  1763. return None
  1764. return unescape(match.group('simple') or match.group('phrase'))
  1765. def add_target(self, targetname, refuri, target, lineno):
  1766. target.line = lineno
  1767. if targetname:
  1768. name = normalize_name(unescape(targetname))
  1769. target['names'].append(name)
  1770. if refuri:
  1771. uri = self.inliner.adjust_uri(refuri)
  1772. if uri:
  1773. target['refuri'] = uri
  1774. else:
  1775. raise ApplicationError('problem with URI: %r' % refuri)
  1776. self.document.note_explicit_target(target, self.parent)
  1777. else: # anonymous target
  1778. if refuri:
  1779. target['refuri'] = refuri
  1780. target['anonymous'] = 1
  1781. self.document.note_anonymous_target(target)
  1782. def substitution_def(self, match):
  1783. pattern = self.explicit.patterns.substitution
  1784. src, srcline = self.state_machine.get_source_and_line()
  1785. block, indent, offset, blank_finish = \
  1786. self.state_machine.get_first_known_indented(match.end(),
  1787. strip_indent=False)
  1788. blocktext = (match.string[:match.end()] + '\n'.join(block))
  1789. block.disconnect()
  1790. escaped = escape2null(block[0].rstrip())
  1791. blockindex = 0
  1792. while True:
  1793. subdefmatch = pattern.match(escaped)
  1794. if subdefmatch:
  1795. break
  1796. blockindex += 1
  1797. try:
  1798. escaped = escaped + ' ' + escape2null(block[blockindex].strip())
  1799. except IndexError:
  1800. raise MarkupError('malformed substitution definition.')
  1801. del block[:blockindex] # strip out the substitution marker
  1802. block[0] = (block[0].strip() + ' ')[subdefmatch.end()-len(escaped)-1:-1]
  1803. if not block[0]:
  1804. del block[0]
  1805. offset += 1
  1806. while block and not block[-1].strip():
  1807. block.pop()
  1808. subname = subdefmatch.group('name')
  1809. substitution_node = nodes.substitution_definition(blocktext)
  1810. substitution_node.source = src
  1811. substitution_node.line = srcline
  1812. if not block:
  1813. msg = self.reporter.warning(
  1814. 'Substitution definition "%s" missing contents.' % subname,
  1815. nodes.literal_block(blocktext, blocktext),
  1816. source=src, line=srcline)
  1817. return [msg], blank_finish
  1818. block[0] = block[0].strip()
  1819. substitution_node['names'].append(
  1820. nodes.whitespace_normalize_name(subname))
  1821. new_abs_offset, blank_finish = self.nested_list_parse(
  1822. block, input_offset=offset, node=substitution_node,
  1823. initial_state='SubstitutionDef', blank_finish=blank_finish)
  1824. i = 0
  1825. for node in substitution_node[:]:
  1826. if not (isinstance(node, nodes.Inline) or
  1827. isinstance(node, nodes.Text)):
  1828. self.parent += substitution_node[i]
  1829. del substitution_node[i]
  1830. else:
  1831. i += 1
  1832. for node in substitution_node.traverse(nodes.Element):
  1833. if self.disallowed_inside_substitution_definitions(node):
  1834. pformat = nodes.literal_block('', node.pformat().rstrip())
  1835. msg = self.reporter.error(
  1836. 'Substitution definition contains illegal element:',
  1837. pformat, nodes.literal_block(blocktext, blocktext),
  1838. source=src, line=srcline)
  1839. return [msg], blank_finish
  1840. if len(substitution_node) == 0:
  1841. msg = self.reporter.warning(
  1842. 'Substitution definition "%s" empty or invalid.' % subname,
  1843. nodes.literal_block(blocktext, blocktext),
  1844. source=src, line=srcline)
  1845. return [msg], blank_finish
  1846. self.document.note_substitution_def(
  1847. substitution_node, subname, self.parent)
  1848. return [substitution_node], blank_finish
  1849. def disallowed_inside_substitution_definitions(self, node):
  1850. if (node['ids'] or
  1851. isinstance(node, nodes.reference) and node.get('anonymous') or
  1852. isinstance(node, nodes.footnote_reference) and node.get('auto')):
  1853. return 1
  1854. else:
  1855. return 0
  1856. def directive(self, match, **option_presets):
  1857. """Returns a 2-tuple: list of nodes, and a "blank finish" boolean."""
  1858. type_name = match.group(1)
  1859. directive_class, messages = directives.directive(
  1860. type_name, self.memo.language, self.document)
  1861. self.parent += messages
  1862. if directive_class:
  1863. return self.run_directive(
  1864. directive_class, match, type_name, option_presets)
  1865. else:
  1866. return self.unknown_directive(type_name)
  1867. def run_directive(self, directive, match, type_name, option_presets):
  1868. """
  1869. Parse a directive then run its directive function.
  1870. Parameters:
  1871. - `directive`: The class implementing the directive. Must be
  1872. a subclass of `rst.Directive`.
  1873. - `match`: A regular expression match object which matched the first
  1874. line of the directive.
  1875. - `type_name`: The directive name, as used in the source text.
  1876. - `option_presets`: A dictionary of preset options, defaults for the
  1877. directive options. Currently, only an "alt" option is passed by
  1878. substitution definitions (value: the substitution name), which may
  1879. be used by an embedded image directive.
  1880. Returns a 2-tuple: list of nodes, and a "blank finish" boolean.
  1881. """
  1882. if isinstance(directive, (FunctionType, MethodType)):
  1883. from docutils.parsers.rst import convert_directive_function
  1884. directive = convert_directive_function(directive)
  1885. lineno = self.state_machine.abs_line_number()
  1886. initial_line_offset = self.state_machine.line_offset
  1887. indented, indent, line_offset, blank_finish \
  1888. = self.state_machine.get_first_known_indented(match.end(),
  1889. strip_top=0)
  1890. block_text = '\n'.join(self.state_machine.input_lines[
  1891. initial_line_offset : self.state_machine.line_offset + 1])
  1892. try:
  1893. arguments, options, content, content_offset = (
  1894. self.parse_directive_block(indented, line_offset,
  1895. directive, option_presets))
  1896. except MarkupError as detail:
  1897. error = self.reporter.error(
  1898. 'Error in "%s" directive:\n%s.' % (type_name,
  1899. ' '.join(detail.args)),
  1900. nodes.literal_block(block_text, block_text), line=lineno)
  1901. return [error], blank_finish
  1902. directive_instance = directive(
  1903. type_name, arguments, options, content, lineno,
  1904. content_offset, block_text, self, self.state_machine)
  1905. try:
  1906. result = directive_instance.run()
  1907. except docutils.parsers.rst.DirectiveError as error:
  1908. msg_node = self.reporter.system_message(error.level, error.msg,
  1909. line=lineno)
  1910. msg_node += nodes.literal_block(block_text, block_text)
  1911. result = [msg_node]
  1912. assert isinstance(result, list), \
  1913. 'Directive "%s" must return a list of nodes.' % type_name
  1914. for i in range(len(result)):
  1915. assert isinstance(result[i], nodes.Node), \
  1916. ('Directive "%s" returned non-Node object (index %s): %r'
  1917. % (type_name, i, result[i]))
  1918. return (result,
  1919. blank_finish or self.state_machine.is_next_line_blank())
  1920. def parse_directive_block(self, indented, line_offset, directive,
  1921. option_presets):
  1922. option_spec = directive.option_spec
  1923. has_content = directive.has_content
  1924. if indented and not indented[0].strip():
  1925. indented.trim_start()
  1926. line_offset += 1
  1927. while indented and not indented[-1].strip():
  1928. indented.trim_end()
  1929. if indented and (directive.required_arguments
  1930. or directive.optional_arguments
  1931. or option_spec):
  1932. for i, line in enumerate(indented):
  1933. if not line.strip():
  1934. break
  1935. else:
  1936. i += 1
  1937. arg_block = indented[:i]
  1938. content = indented[i+1:]
  1939. content_offset = line_offset + i + 1
  1940. else:
  1941. content = indented
  1942. content_offset = line_offset
  1943. arg_block = []
  1944. if option_spec:
  1945. options, arg_block = self.parse_directive_options(
  1946. option_presets, option_spec, arg_block)
  1947. else:
  1948. options = {}
  1949. if arg_block and not (directive.required_arguments
  1950. or directive.optional_arguments):
  1951. content = arg_block + indented[i:]
  1952. content_offset = line_offset
  1953. arg_block = []
  1954. while content and not content[0].strip():
  1955. content.trim_start()
  1956. content_offset += 1
  1957. if directive.required_arguments or directive.optional_arguments:
  1958. arguments = self.parse_directive_arguments(
  1959. directive, arg_block)
  1960. else:
  1961. arguments = []
  1962. if content and not has_content:
  1963. raise MarkupError('no content permitted')
  1964. return (arguments, options, content, content_offset)
  1965. def parse_directive_options(self, option_presets, option_spec, arg_block):
  1966. options = option_presets.copy()
  1967. for i, line in enumerate(arg_block):
  1968. if re.match(Body.patterns['field_marker'], line):
  1969. opt_block = arg_block[i:]
  1970. arg_block = arg_block[:i]
  1971. break
  1972. else:
  1973. opt_block = []
  1974. if opt_block:
  1975. success, data = self.parse_extension_options(option_spec,
  1976. opt_block)
  1977. if success: # data is a dict of options
  1978. options.update(data)
  1979. else: # data is an error string
  1980. raise MarkupError(data)
  1981. return options, arg_block
  1982. def parse_directive_arguments(self, directive, arg_block):
  1983. required = directive.required_arguments
  1984. optional = directive.optional_arguments
  1985. arg_text = '\n'.join(arg_block)
  1986. arguments = arg_text.split()
  1987. if len(arguments) < required:
  1988. raise MarkupError('%s argument(s) required, %s supplied'
  1989. % (required, len(arguments)))
  1990. elif len(arguments) > required + optional:
  1991. if directive.final_argument_whitespace:
  1992. arguments = arg_text.split(None, required + optional - 1)
  1993. else:
  1994. raise MarkupError(
  1995. 'maximum %s argument(s) allowed, %s supplied'
  1996. % (required + optional, len(arguments)))
  1997. return arguments
  1998. def parse_extension_options(self, option_spec, datalines):
  1999. """
  2000. Parse `datalines` for a field list containing extension options
  2001. matching `option_spec`.
  2002. :Parameters:
  2003. - `option_spec`: a mapping of option name to conversion
  2004. function, which should raise an exception on bad input.
  2005. - `datalines`: a list of input strings.
  2006. :Return:
  2007. - Success value, 1 or 0.
  2008. - An option dictionary on success, an error string on failure.
  2009. """
  2010. node = nodes.field_list()
  2011. newline_offset, blank_finish = self.nested_list_parse(
  2012. datalines, 0, node, initial_state='ExtensionOptions',
  2013. blank_finish=True)
  2014. if newline_offset != len(datalines): # incomplete parse of block
  2015. return 0, 'invalid option block'
  2016. try:
  2017. options = utils.extract_extension_options(node, option_spec)
  2018. except KeyError as detail:
  2019. return 0, ('unknown option: "%s"' % detail.args[0])
  2020. except (ValueError, TypeError) as detail:
  2021. return 0, ('invalid option value: %s' % ' '.join(detail.args))
  2022. except utils.ExtensionOptionError as detail:
  2023. return 0, ('invalid option data: %s' % ' '.join(detail.args))
  2024. if blank_finish:
  2025. return 1, options
  2026. else:
  2027. return 0, 'option data incompletely parsed'
  2028. def unknown_directive(self, type_name):
  2029. lineno = self.state_machine.abs_line_number()
  2030. indented, indent, offset, blank_finish = \
  2031. self.state_machine.get_first_known_indented(0, strip_indent=False)
  2032. text = '\n'.join(indented)
  2033. error = self.reporter.error(
  2034. 'Unknown directive type "%s".' % type_name,
  2035. nodes.literal_block(text, text), line=lineno)
  2036. return [error], blank_finish
  2037. def comment(self, match):
  2038. if not match.string[match.end():].strip() \
  2039. and self.state_machine.is_next_line_blank(): # an empty comment?
  2040. return [nodes.comment()], 1 # "A tiny but practical wart."
  2041. indented, indent, offset, blank_finish = \
  2042. self.state_machine.get_first_known_indented(match.end())
  2043. while indented and not indented[-1].strip():
  2044. indented.trim_end()
  2045. text = '\n'.join(indented)
  2046. return [nodes.comment(text, text)], blank_finish
  2047. explicit.constructs = [
  2048. (footnote,
  2049. re.compile(r"""
  2050. \.\.[ ]+ # explicit markup start
  2051. \[
  2052. ( # footnote label:
  2053. [0-9]+ # manually numbered footnote
  2054. | # *OR*
  2055. \# # anonymous auto-numbered footnote
  2056. | # *OR*
  2057. \#%s # auto-number ed?) footnote label
  2058. | # *OR*
  2059. \* # auto-symbol footnote
  2060. )
  2061. \]
  2062. ([ ]+|$) # whitespace or end of line
  2063. """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
  2064. (citation,
  2065. re.compile(r"""
  2066. \.\.[ ]+ # explicit markup start
  2067. \[(%s)\] # citation label
  2068. ([ ]+|$) # whitespace or end of line
  2069. """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
  2070. (hyperlink_target,
  2071. re.compile(r"""
  2072. \.\.[ ]+ # explicit markup start
  2073. _ # target indicator
  2074. (?![ ]|$) # first char. not space or EOL
  2075. """, re.VERBOSE | re.UNICODE)),
  2076. (substitution_def,
  2077. re.compile(r"""
  2078. \.\.[ ]+ # explicit markup start
  2079. \| # substitution indicator
  2080. (?![ ]|$) # first char. not space or EOL
  2081. """, re.VERBOSE | re.UNICODE)),
  2082. (directive,
  2083. re.compile(r"""
  2084. \.\.[ ]+ # explicit markup start
  2085. (%s) # directive name
  2086. [ ]? # optional space
  2087. :: # directive delimiter
  2088. ([ ]+|$) # whitespace or end of line
  2089. """ % Inliner.simplename, re.VERBOSE | re.UNICODE))]
  2090. def explicit_markup(self, match, context, next_state):
  2091. """Footnotes, hyperlink targets, directives, comments."""
  2092. nodelist, blank_finish = self.explicit_construct(match)
  2093. self.parent += nodelist
  2094. self.explicit_list(blank_finish)
  2095. return [], next_state, []
  2096. def explicit_construct(self, match):
  2097. """Determine which explicit construct this is, parse & return it."""
  2098. errors = []
  2099. for method, pattern in self.explicit.constructs:
  2100. expmatch = pattern.match(match.string)
  2101. if expmatch:
  2102. try:
  2103. return method(self, expmatch)
  2104. except MarkupError as error:
  2105. lineno = self.state_machine.abs_line_number()
  2106. message = ' '.join(error.args)
  2107. errors.append(self.reporter.warning(message, line=lineno))
  2108. break
  2109. nodelist, blank_finish = self.comment(match)
  2110. return nodelist + errors, blank_finish
  2111. def explicit_list(self, blank_finish):
  2112. """
  2113. Create a nested state machine for a series of explicit markup
  2114. constructs (including anonymous hyperlink targets).
  2115. """
  2116. offset = self.state_machine.line_offset + 1 # next line
  2117. newline_offset, blank_finish = self.nested_list_parse(
  2118. self.state_machine.input_lines[offset:],
  2119. input_offset=self.state_machine.abs_line_offset() + 1,
  2120. node=self.parent, initial_state='Explicit',
  2121. blank_finish=blank_finish,
  2122. match_titles=self.state_machine.match_titles)
  2123. self.goto_line(newline_offset)
  2124. if not blank_finish:
  2125. self.parent += self.unindent_warning('Explicit markup')
  2126. def anonymous(self, match, context, next_state):
  2127. """Anonymous hyperlink targets."""
  2128. nodelist, blank_finish = self.anonymous_target(match)
  2129. self.parent += nodelist
  2130. self.explicit_list(blank_finish)
  2131. return [], next_state, []
  2132. def anonymous_target(self, match):
  2133. lineno = self.state_machine.abs_line_number()
  2134. block, indent, offset, blank_finish \
  2135. = self.state_machine.get_first_known_indented(match.end(),
  2136. until_blank=True)
  2137. blocktext = match.string[:match.end()] + '\n'.join(block)
  2138. block = [escape2null(line) for line in block]
  2139. target = self.make_target(block, blocktext, lineno, '')
  2140. return [target], blank_finish
  2141. def line(self, match, context, next_state):
  2142. """Section title overline or transition marker."""
  2143. if self.state_machine.match_titles:
  2144. return [match.string], 'Line', []
  2145. elif match.string.strip() == '::':
  2146. raise statemachine.TransitionCorrection('text')
  2147. elif len(match.string.strip()) < 4:
  2148. msg = self.reporter.info(
  2149. 'Unexpected possible title overline or transition.\n'
  2150. "Treating it as ordinary text because it's so short.",
  2151. line=self.state_machine.abs_line_number())
  2152. self.parent += msg
  2153. raise statemachine.TransitionCorrection('text')
  2154. else:
  2155. blocktext = self.state_machine.line
  2156. msg = self.reporter.severe(
  2157. 'Unexpected section title or transition.',
  2158. nodes.literal_block(blocktext, blocktext),
  2159. line=self.state_machine.abs_line_number())
  2160. self.parent += msg
  2161. return [], next_state, []
  2162. def text(self, match, context, next_state):
  2163. """Titles, definition lists, paragraphs."""
  2164. return [match.string], 'Text', []
  2165. class RFC2822Body(Body):
  2166. """
  2167. RFC2822 headers are only valid as the first constructs in documents. As
  2168. soon as anything else appears, the `Body` state should take over.
  2169. """
  2170. patterns = Body.patterns.copy() # can't modify the original
  2171. patterns['rfc2822'] = r'[!-9;-~]+:( +|$)'
  2172. initial_transitions = [(name, 'Body')
  2173. for name in Body.initial_transitions]
  2174. initial_transitions.insert(-1, ('rfc2822', 'Body')) # just before 'text'
  2175. def rfc2822(self, match, context, next_state):
  2176. """RFC2822-style field list item."""
  2177. fieldlist = nodes.field_list(classes=['rfc2822'])
  2178. self.parent += fieldlist
  2179. field, blank_finish = self.rfc2822_field(match)
  2180. fieldlist += field
  2181. offset = self.state_machine.line_offset + 1 # next line
  2182. newline_offset, blank_finish = self.nested_list_parse(
  2183. self.state_machine.input_lines[offset:],
  2184. input_offset=self.state_machine.abs_line_offset() + 1,
  2185. node=fieldlist, initial_state='RFC2822List',
  2186. blank_finish=blank_finish)
  2187. self.goto_line(newline_offset)
  2188. if not blank_finish:
  2189. self.parent += self.unindent_warning(
  2190. 'RFC2822-style field list')
  2191. return [], next_state, []
  2192. def rfc2822_field(self, match):
  2193. name = match.string[:match.string.find(':')]
  2194. indented, indent, line_offset, blank_finish = \
  2195. self.state_machine.get_first_known_indented(match.end(),
  2196. until_blank=True)
  2197. fieldnode = nodes.field()
  2198. fieldnode += nodes.field_name(name, name)
  2199. fieldbody = nodes.field_body('\n'.join(indented))
  2200. fieldnode += fieldbody
  2201. if indented:
  2202. self.nested_parse(indented, input_offset=line_offset,
  2203. node=fieldbody)
  2204. return fieldnode, blank_finish
  2205. class SpecializedBody(Body):
  2206. """
  2207. Superclass for second and subsequent compound element members. Compound
  2208. elements are lists and list-like constructs.
  2209. All transition methods are disabled (redefined as `invalid_input`).
  2210. Override individual methods in subclasses to re-enable.
  2211. For example, once an initial bullet list item, say, is recognized, the
  2212. `BulletList` subclass takes over, with a "bullet_list" node as its
  2213. container. Upon encountering the initial bullet list item, `Body.bullet`
  2214. calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which
  2215. starts up a nested parsing session with `BulletList` as the initial state.
  2216. Only the ``bullet`` transition method is enabled in `BulletList`; as long
  2217. as only bullet list items are encountered, they are parsed and inserted
  2218. into the container. The first construct which is *not* a bullet list item
  2219. triggers the `invalid_input` method, which ends the nested parse and
  2220. closes the container. `BulletList` needs to recognize input that is
  2221. invalid in the context of a bullet list, which means everything *other
  2222. than* bullet list items, so it inherits the transition list created in
  2223. `Body`.
  2224. """
  2225. def invalid_input(self, match=None, context=None, next_state=None):
  2226. """Not a compound element member. Abort this state machine."""
  2227. self.state_machine.previous_line() # back up so parent SM can reassess
  2228. raise EOFError
  2229. indent = invalid_input
  2230. bullet = invalid_input
  2231. enumerator = invalid_input
  2232. field_marker = invalid_input
  2233. option_marker = invalid_input
  2234. doctest = invalid_input
  2235. line_block = invalid_input
  2236. grid_table_top = invalid_input
  2237. simple_table_top = invalid_input
  2238. explicit_markup = invalid_input
  2239. anonymous = invalid_input
  2240. line = invalid_input
  2241. text = invalid_input
  2242. class BulletList(SpecializedBody):
  2243. """Second and subsequent bullet_list list_items."""
  2244. def bullet(self, match, context, next_state):
  2245. """Bullet list item."""
  2246. if match.string[0] != self.parent['bullet']:
  2247. # different bullet: new list
  2248. self.invalid_input()
  2249. listitem, blank_finish = self.list_item(match.end())
  2250. self.parent += listitem
  2251. self.blank_finish = blank_finish
  2252. return [], next_state, []
  2253. class DefinitionList(SpecializedBody):
  2254. """Second and subsequent definition_list_items."""
  2255. def text(self, match, context, next_state):
  2256. """Definition lists."""
  2257. return [match.string], 'Definition', []
  2258. class EnumeratedList(SpecializedBody):
  2259. """Second and subsequent enumerated_list list_items."""
  2260. def enumerator(self, match, context, next_state):
  2261. """Enumerated list item."""
  2262. format, sequence, text, ordinal = self.parse_enumerator(
  2263. match, self.parent['enumtype'])
  2264. if ( format != self.format
  2265. or (sequence != '#' and (sequence != self.parent['enumtype']
  2266. or self.auto
  2267. or ordinal != (self.lastordinal + 1)))
  2268. or not self.is_enumerated_list_item(ordinal, sequence, format)):
  2269. # different enumeration: new list
  2270. self.invalid_input()
  2271. if sequence == '#':
  2272. self.auto = 1
  2273. listitem, blank_finish = self.list_item(match.end())
  2274. self.parent += listitem
  2275. self.blank_finish = blank_finish
  2276. self.lastordinal = ordinal
  2277. return [], next_state, []
  2278. class FieldList(SpecializedBody):
  2279. """Second and subsequent field_list fields."""
  2280. def field_marker(self, match, context, next_state):
  2281. """Field list field."""
  2282. field, blank_finish = self.field(match)
  2283. self.parent += field
  2284. self.blank_finish = blank_finish
  2285. return [], next_state, []
  2286. class OptionList(SpecializedBody):
  2287. """Second and subsequent option_list option_list_items."""
  2288. def option_marker(self, match, context, next_state):
  2289. """Option list item."""
  2290. try:
  2291. option_list_item, blank_finish = self.option_list_item(match)
  2292. except MarkupError:
  2293. self.invalid_input()
  2294. self.parent += option_list_item
  2295. self.blank_finish = blank_finish
  2296. return [], next_state, []
  2297. class RFC2822List(SpecializedBody, RFC2822Body):
  2298. """Second and subsequent RFC2822-style field_list fields."""
  2299. patterns = RFC2822Body.patterns
  2300. initial_transitions = RFC2822Body.initial_transitions
  2301. def rfc2822(self, match, context, next_state):
  2302. """RFC2822-style field list item."""
  2303. field, blank_finish = self.rfc2822_field(match)
  2304. self.parent += field
  2305. self.blank_finish = blank_finish
  2306. return [], 'RFC2822List', []
  2307. blank = SpecializedBody.invalid_input
  2308. class ExtensionOptions(FieldList):
  2309. """
  2310. Parse field_list fields for extension options.
  2311. No nested parsing is done (including inline markup parsing).
  2312. """
  2313. def parse_field_body(self, indented, offset, node):
  2314. """Override `Body.parse_field_body` for simpler parsing."""
  2315. lines = []
  2316. for line in list(indented) + ['']:
  2317. if line.strip():
  2318. lines.append(line)
  2319. elif lines:
  2320. text = '\n'.join(lines)
  2321. node += nodes.paragraph(text, text)
  2322. lines = []
  2323. class LineBlock(SpecializedBody):
  2324. """Second and subsequent lines of a line_block."""
  2325. blank = SpecializedBody.invalid_input
  2326. def line_block(self, match, context, next_state):
  2327. """New line of line block."""
  2328. lineno = self.state_machine.abs_line_number()
  2329. line, messages, blank_finish = self.line_block_line(match, lineno)
  2330. self.parent += line
  2331. self.parent.parent += messages
  2332. self.blank_finish = blank_finish
  2333. return [], next_state, []
  2334. class Explicit(SpecializedBody):
  2335. """Second and subsequent explicit markup construct."""
  2336. def explicit_markup(self, match, context, next_state):
  2337. """Footnotes, hyperlink targets, directives, comments."""
  2338. nodelist, blank_finish = self.explicit_construct(match)
  2339. self.parent += nodelist
  2340. self.blank_finish = blank_finish
  2341. return [], next_state, []
  2342. def anonymous(self, match, context, next_state):
  2343. """Anonymous hyperlink targets."""
  2344. nodelist, blank_finish = self.anonymous_target(match)
  2345. self.parent += nodelist
  2346. self.blank_finish = blank_finish
  2347. return [], next_state, []
  2348. blank = SpecializedBody.invalid_input
  2349. class SubstitutionDef(Body):
  2350. """
  2351. Parser for the contents of a substitution_definition element.
  2352. """
  2353. patterns = {
  2354. 'embedded_directive': re.compile(r'(%s)::( +|$)'
  2355. % Inliner.simplename, re.UNICODE),
  2356. 'text': r''}
  2357. initial_transitions = ['embedded_directive', 'text']
  2358. def embedded_directive(self, match, context, next_state):
  2359. nodelist, blank_finish = self.directive(match,
  2360. alt=self.parent['names'][0])
  2361. self.parent += nodelist
  2362. if not self.state_machine.at_eof():
  2363. self.blank_finish = blank_finish
  2364. raise EOFError
  2365. def text(self, match, context, next_state):
  2366. if not self.state_machine.at_eof():
  2367. self.blank_finish = self.state_machine.is_next_line_blank()
  2368. raise EOFError
  2369. class Text(RSTState):
  2370. """
  2371. Classifier of second line of a text block.
  2372. Could be a paragraph, a definition list item, or a title.
  2373. """
  2374. patterns = {'underline': Body.patterns['line'],
  2375. 'text': r''}
  2376. initial_transitions = [('underline', 'Body'), ('text', 'Body')]
  2377. def blank(self, match, context, next_state):
  2378. """End of paragraph."""
  2379. # NOTE: self.paragraph returns [ node, system_message(s) ], literalnext
  2380. paragraph, literalnext = self.paragraph(
  2381. context, self.state_machine.abs_line_number() - 1)
  2382. self.parent += paragraph
  2383. if literalnext:
  2384. self.parent += self.literal_block()
  2385. return [], 'Body', []
  2386. def eof(self, context):
  2387. if context:
  2388. self.blank(None, context, None)
  2389. return []
  2390. def indent(self, match, context, next_state):
  2391. """Definition list item."""
  2392. definitionlist = nodes.definition_list()
  2393. definitionlistitem, blank_finish = self.definition_list_item(context)
  2394. definitionlist += definitionlistitem
  2395. self.parent += definitionlist
  2396. offset = self.state_machine.line_offset + 1 # next line
  2397. newline_offset, blank_finish = self.nested_list_parse(
  2398. self.state_machine.input_lines[offset:],
  2399. input_offset=self.state_machine.abs_line_offset() + 1,
  2400. node=definitionlist, initial_state='DefinitionList',
  2401. blank_finish=blank_finish, blank_finish_state='Definition')
  2402. self.goto_line(newline_offset)
  2403. if not blank_finish:
  2404. self.parent += self.unindent_warning('Definition list')
  2405. return [], 'Body', []
  2406. def underline(self, match, context, next_state):
  2407. """Section title."""
  2408. lineno = self.state_machine.abs_line_number()
  2409. title = context[0].rstrip()
  2410. underline = match.string.rstrip()
  2411. source = title + '\n' + underline
  2412. messages = []
  2413. if column_width(title) > len(underline):
  2414. if len(underline) < 4:
  2415. if self.state_machine.match_titles:
  2416. msg = self.reporter.info(
  2417. 'Possible title underline, too short for the title.\n'
  2418. "Treating it as ordinary text because it's so short.",
  2419. line=lineno)
  2420. self.parent += msg
  2421. raise statemachine.TransitionCorrection('text')
  2422. else:
  2423. blocktext = context[0] + '\n' + self.state_machine.line
  2424. msg = self.reporter.warning('Title underline too short.',
  2425. nodes.literal_block(blocktext, blocktext), line=lineno)
  2426. messages.append(msg)
  2427. if not self.state_machine.match_titles:
  2428. blocktext = context[0] + '\n' + self.state_machine.line
  2429. # We need get_source_and_line() here to report correctly
  2430. src, srcline = self.state_machine.get_source_and_line()
  2431. # TODO: why is abs_line_number() == srcline+1
  2432. # if the error is in a table (try with test_tables.py)?
  2433. # print "get_source_and_line", srcline
  2434. # print "abs_line_number", self.state_machine.abs_line_number()
  2435. msg = self.reporter.severe('Unexpected section title.',
  2436. nodes.literal_block(blocktext, blocktext),
  2437. source=src, line=srcline)
  2438. self.parent += messages
  2439. self.parent += msg
  2440. return [], next_state, []
  2441. style = underline[0]
  2442. context[:] = []
  2443. self.section(title, source, style, lineno - 1, messages)
  2444. return [], next_state, []
  2445. def text(self, match, context, next_state):
  2446. """Paragraph."""
  2447. startline = self.state_machine.abs_line_number() - 1
  2448. msg = None
  2449. try:
  2450. block = self.state_machine.get_text_block(flush_left=True)
  2451. except statemachine.UnexpectedIndentationError as err:
  2452. block, src, srcline = err.args
  2453. msg = self.reporter.error('Unexpected indentation.',
  2454. source=src, line=srcline)
  2455. lines = context + list(block)
  2456. paragraph, literalnext = self.paragraph(lines, startline)
  2457. self.parent += paragraph
  2458. self.parent += msg
  2459. if literalnext:
  2460. try:
  2461. self.state_machine.next_line()
  2462. except EOFError:
  2463. pass
  2464. self.parent += self.literal_block()
  2465. return [], next_state, []
  2466. def literal_block(self):
  2467. """Return a list of nodes."""
  2468. indented, indent, offset, blank_finish = \
  2469. self.state_machine.get_indented()
  2470. while indented and not indented[-1].strip():
  2471. indented.trim_end()
  2472. if not indented:
  2473. return self.quoted_literal_block()
  2474. data = '\n'.join(indented)
  2475. literal_block = nodes.literal_block(data, data)
  2476. literal_block.line = offset + 1
  2477. nodelist = [literal_block]
  2478. if not blank_finish:
  2479. nodelist.append(self.unindent_warning('Literal block'))
  2480. return nodelist
  2481. def quoted_literal_block(self):
  2482. abs_line_offset = self.state_machine.abs_line_offset()
  2483. offset = self.state_machine.line_offset
  2484. parent_node = nodes.Element()
  2485. new_abs_offset = self.nested_parse(
  2486. self.state_machine.input_lines[offset:],
  2487. input_offset=abs_line_offset, node=parent_node, match_titles=False,
  2488. state_machine_kwargs={'state_classes': (QuotedLiteralBlock,),
  2489. 'initial_state': 'QuotedLiteralBlock'})
  2490. self.goto_line(new_abs_offset)
  2491. return parent_node.children
  2492. def definition_list_item(self, termline):
  2493. indented, indent, line_offset, blank_finish = \
  2494. self.state_machine.get_indented()
  2495. itemnode = nodes.definition_list_item(
  2496. '\n'.join(termline + list(indented)))
  2497. lineno = self.state_machine.abs_line_number() - 1
  2498. (itemnode.source,
  2499. itemnode.line) = self.state_machine.get_source_and_line(lineno)
  2500. termlist, messages = self.term(termline, lineno)
  2501. itemnode += termlist
  2502. definition = nodes.definition('', *messages)
  2503. itemnode += definition
  2504. if termline[0][-2:] == '::':
  2505. definition += self.reporter.info(
  2506. 'Blank line missing before literal block (after the "::")? '
  2507. 'Interpreted as a definition list item.',
  2508. line=lineno+1)
  2509. self.nested_parse(indented, input_offset=line_offset, node=definition)
  2510. return itemnode, blank_finish
  2511. classifier_delimiter = re.compile(' +: +')
  2512. def term(self, lines, lineno):
  2513. """Return a definition_list's term and optional classifiers."""
  2514. assert len(lines) == 1
  2515. text_nodes, messages = self.inline_text(lines[0], lineno)
  2516. term_node = nodes.term()
  2517. node_list = [term_node]
  2518. for i in range(len(text_nodes)):
  2519. node = text_nodes[i]
  2520. if isinstance(node, nodes.Text):
  2521. parts = self.classifier_delimiter.split(node.rawsource)
  2522. if len(parts) == 1:
  2523. node_list[-1] += node
  2524. else:
  2525. node_list[-1] += nodes.Text(parts[0].rstrip())
  2526. for part in parts[1:]:
  2527. classifier_node = nodes.classifier('', part)
  2528. node_list.append(classifier_node)
  2529. else:
  2530. node_list[-1] += node
  2531. return node_list, messages
  2532. class SpecializedText(Text):
  2533. """
  2534. Superclass for second and subsequent lines of Text-variants.
  2535. All transition methods are disabled. Override individual methods in
  2536. subclasses to re-enable.
  2537. """
  2538. def eof(self, context):
  2539. """Incomplete construct."""
  2540. return []
  2541. def invalid_input(self, match=None, context=None, next_state=None):
  2542. """Not a compound element member. Abort this state machine."""
  2543. raise EOFError
  2544. blank = invalid_input
  2545. indent = invalid_input
  2546. underline = invalid_input
  2547. text = invalid_input
  2548. class Definition(SpecializedText):
  2549. """Second line of potential definition_list_item."""
  2550. def eof(self, context):
  2551. """Not a definition."""
  2552. self.state_machine.previous_line(2) # so parent SM can reassess
  2553. return []
  2554. def indent(self, match, context, next_state):
  2555. """Definition list item."""
  2556. itemnode, blank_finish = self.definition_list_item(context)
  2557. self.parent += itemnode
  2558. self.blank_finish = blank_finish
  2559. return [], 'DefinitionList', []
  2560. class Line(SpecializedText):
  2561. """
  2562. Second line of over- & underlined section title or transition marker.
  2563. """
  2564. eofcheck = 1 # @@@ ???
  2565. """Set to 0 while parsing sections, so that we don't catch the EOF."""
  2566. def eof(self, context):
  2567. """Transition marker at end of section or document."""
  2568. marker = context[0].strip()
  2569. if self.memo.section_bubble_up_kludge:
  2570. self.memo.section_bubble_up_kludge = False
  2571. elif len(marker) < 4:
  2572. self.state_correction(context)
  2573. if self.eofcheck: # ignore EOFError with sections
  2574. lineno = self.state_machine.abs_line_number() - 1
  2575. transition = nodes.transition(rawsource=context[0])
  2576. transition.line = lineno
  2577. self.parent += transition
  2578. self.eofcheck = 1
  2579. return []
  2580. def blank(self, match, context, next_state):
  2581. """Transition marker."""
  2582. src, srcline = self.state_machine.get_source_and_line()
  2583. marker = context[0].strip()
  2584. if len(marker) < 4:
  2585. self.state_correction(context)
  2586. transition = nodes.transition(rawsource=marker)
  2587. transition.source = src
  2588. transition.line = srcline - 1
  2589. self.parent += transition
  2590. return [], 'Body', []
  2591. def text(self, match, context, next_state):
  2592. """Potential over- & underlined title."""
  2593. lineno = self.state_machine.abs_line_number() - 1
  2594. overline = context[0]
  2595. title = match.string
  2596. underline = ''
  2597. try:
  2598. underline = self.state_machine.next_line()
  2599. except EOFError:
  2600. blocktext = overline + '\n' + title
  2601. if len(overline.rstrip()) < 4:
  2602. self.short_overline(context, blocktext, lineno, 2)
  2603. else:
  2604. msg = self.reporter.severe(
  2605. 'Incomplete section title.',
  2606. nodes.literal_block(blocktext, blocktext),
  2607. line=lineno)
  2608. self.parent += msg
  2609. return [], 'Body', []
  2610. source = '%s\n%s\n%s' % (overline, title, underline)
  2611. overline = overline.rstrip()
  2612. underline = underline.rstrip()
  2613. if not self.transitions['underline'][0].match(underline):
  2614. blocktext = overline + '\n' + title + '\n' + underline
  2615. if len(overline.rstrip()) < 4:
  2616. self.short_overline(context, blocktext, lineno, 2)
  2617. else:
  2618. msg = self.reporter.severe(
  2619. 'Missing matching underline for section title overline.',
  2620. nodes.literal_block(source, source),
  2621. line=lineno)
  2622. self.parent += msg
  2623. return [], 'Body', []
  2624. elif overline != underline:
  2625. blocktext = overline + '\n' + title + '\n' + underline
  2626. if len(overline.rstrip()) < 4:
  2627. self.short_overline(context, blocktext, lineno, 2)
  2628. else:
  2629. msg = self.reporter.severe(
  2630. 'Title overline & underline mismatch.',
  2631. nodes.literal_block(source, source),
  2632. line=lineno)
  2633. self.parent += msg
  2634. return [], 'Body', []
  2635. title = title.rstrip()
  2636. messages = []
  2637. if column_width(title) > len(overline):
  2638. blocktext = overline + '\n' + title + '\n' + underline
  2639. if len(overline.rstrip()) < 4:
  2640. self.short_overline(context, blocktext, lineno, 2)
  2641. else:
  2642. msg = self.reporter.warning(
  2643. 'Title overline too short.',
  2644. nodes.literal_block(source, source),
  2645. line=lineno)
  2646. messages.append(msg)
  2647. style = (overline[0], underline[0])
  2648. self.eofcheck = 0 # @@@ not sure this is correct
  2649. self.section(title.lstrip(), source, style, lineno + 1, messages)
  2650. self.eofcheck = 1
  2651. return [], 'Body', []
  2652. indent = text # indented title
  2653. def underline(self, match, context, next_state):
  2654. overline = context[0]
  2655. blocktext = overline + '\n' + self.state_machine.line
  2656. lineno = self.state_machine.abs_line_number() - 1
  2657. if len(overline.rstrip()) < 4:
  2658. self.short_overline(context, blocktext, lineno, 1)
  2659. msg = self.reporter.error(
  2660. 'Invalid section title or transition marker.',
  2661. nodes.literal_block(blocktext, blocktext),
  2662. line=lineno)
  2663. self.parent += msg
  2664. return [], 'Body', []
  2665. def short_overline(self, context, blocktext, lineno, lines=1):
  2666. msg = self.reporter.info(
  2667. 'Possible incomplete section title.\nTreating the overline as '
  2668. "ordinary text because it's so short.",
  2669. line=lineno)
  2670. self.parent += msg
  2671. self.state_correction(context, lines)
  2672. def state_correction(self, context, lines=1):
  2673. self.state_machine.previous_line(lines)
  2674. context[:] = []
  2675. raise statemachine.StateCorrection('Body', 'text')
  2676. class QuotedLiteralBlock(RSTState):
  2677. """
  2678. Nested parse handler for quoted (unindented) literal blocks.
  2679. Special-purpose. Not for inclusion in `state_classes`.
  2680. """
  2681. patterns = {'initial_quoted': r'(%(nonalphanum7bit)s)' % Body.pats,
  2682. 'text': r''}
  2683. initial_transitions = ('initial_quoted', 'text')
  2684. def __init__(self, state_machine, debug=False):
  2685. RSTState.__init__(self, state_machine, debug)
  2686. self.messages = []
  2687. self.initial_lineno = None
  2688. def blank(self, match, context, next_state):
  2689. if context:
  2690. raise EOFError
  2691. else:
  2692. return context, next_state, []
  2693. def eof(self, context):
  2694. if context:
  2695. src, srcline = self.state_machine.get_source_and_line(
  2696. self.initial_lineno)
  2697. text = '\n'.join(context)
  2698. literal_block = nodes.literal_block(text, text)
  2699. literal_block.source = src
  2700. literal_block.line = srcline
  2701. self.parent += literal_block
  2702. else:
  2703. self.parent += self.reporter.warning(
  2704. 'Literal block expected; none found.',
  2705. line=self.state_machine.abs_line_number())
  2706. # src not available, because statemachine.input_lines is empty
  2707. self.state_machine.previous_line()
  2708. self.parent += self.messages
  2709. return []
  2710. def indent(self, match, context, next_state):
  2711. assert context, ('QuotedLiteralBlock.indent: context should not '
  2712. 'be empty!')
  2713. self.messages.append(
  2714. self.reporter.error('Unexpected indentation.',
  2715. line=self.state_machine.abs_line_number()))
  2716. self.state_machine.previous_line()
  2717. raise EOFError
  2718. def initial_quoted(self, match, context, next_state):
  2719. """Match arbitrary quote character on the first line only."""
  2720. self.remove_transition('initial_quoted')
  2721. quote = match.string[0]
  2722. pattern = re.compile(re.escape(quote), re.UNICODE)
  2723. # New transition matches consistent quotes only:
  2724. self.add_transition('quoted',
  2725. (pattern, self.quoted, self.__class__.__name__))
  2726. self.initial_lineno = self.state_machine.abs_line_number()
  2727. return [match.string], next_state, []
  2728. def quoted(self, match, context, next_state):
  2729. """Match consistent quotes on subsequent lines."""
  2730. context.append(match.string)
  2731. return context, next_state, []
  2732. def text(self, match, context, next_state):
  2733. if context:
  2734. self.messages.append(
  2735. self.reporter.error('Inconsistent literal block quoting.',
  2736. line=self.state_machine.abs_line_number()))
  2737. self.state_machine.previous_line()
  2738. raise EOFError
  2739. state_classes = (Body, BulletList, DefinitionList, EnumeratedList, FieldList,
  2740. OptionList, LineBlock, ExtensionOptions, Explicit, Text,
  2741. Definition, Line, SubstitutionDef, RFC2822Body, RFC2822List)
  2742. """Standard set of State classes used to start `RSTStateMachine`."""