PageRenderTime 61ms CodeModel.GetById 19ms RepoModel.GetById 0ms app.codeStats 1ms

/old/txt2tags-2.1.py

http://txt2tags.googlecode.com/
Python | 4229 lines | 3805 code | 158 blank | 266 comment | 160 complexity | 4e16129e62cf5a83c262b3a3d8a3d1c0 MD5 | raw file
Possible License(s): GPL-2.0, GPL-3.0, WTFPL

Large files files are truncated, but you can click here to view the full file

  1. #!/usr/bin/env python
  2. # txt2tags - generic text conversion tool
  3. # http://txt2tags.sf.net
  4. #
  5. # Copyright 2001, 2002, 2003, 2004 Aurelio Marinho Jargas
  6. #
  7. # This program is free software; you can redistribute it and/or modify
  8. # it under the terms of the GNU General Public License as published by
  9. # the Free Software Foundation, version 2.
  10. #
  11. # This program is distributed in the hope that it will be useful,
  12. # but WITHOUT ANY WARRANTY; without even the implied warranty of
  13. # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  14. # GNU General Public License for more details.
  15. #
  16. # You have received a copy of the GNU General Public License along
  17. # with this program, on the COPYING file.
  18. #
  19. #
  20. #
  21. # +-------------------------------------------------------------+
  22. # | IMPORTANT MESSAGES, PLEASE READ |
  23. # +-------------------------------------------------------------+
  24. # | |
  25. # | |
  26. # | v1.x COMPATIBILITY |
  27. # | ------------------ |
  28. # | |
  29. # | Due the major syntax changes, the new 2.x series |
  30. # | BREAKS backwards compatibility. |
  31. # | |
  32. # | Use the 't2tconv' script to upgrade your existing |
  33. # | v1.x files to conform the new v2.x syntax. |
  34. # | |
  35. # | Do a visual inspection on the new converted file. |
  36. # | Specially Pre & Post proc filters can break. |
  37. # | Check them! |
  38. # | |
  39. # | |
  40. # +-------------------------------------------------------------+
  41. #
  42. #
  43. ########################################################################
  44. #
  45. # BORING CODE EXPLANATION AHEAD
  46. #
  47. # Just read if you wish to understand how the txt2tags code works
  48. #
  49. ########################################################################
  50. #
  51. # Version 2.0 was a complete rewrite for the program 'core'.
  52. #
  53. # Now the code that [1] parses the marked text is separated from the
  54. # code that [2] insert the target tags.
  55. #
  56. # [1] made by: def convert()
  57. # [2] made by: class BlockMaster
  58. #
  59. # The structures of the marked text are identifyed and its contents are
  60. # extracted into a data holder (Python lists and dictionaries).
  61. #
  62. # When parsing the source file, the blocks (para, lists, quote, table)
  63. # are opened with BlockMaster, right when found. Then its contents,
  64. # which spans on several lines, are feeded into a special holder on the
  65. # BlockMaster instance. Just when the block is closed, the target tags
  66. # are inserted for the full block as a whole, in one pass. This way, we
  67. # have a better control on blocks. Much better than the previous line by
  68. # line approach.
  69. #
  70. # In other words, whenever inside a block, the parser *holds* the tag
  71. # insertion process, waiting until the full block is readed. That was
  72. # needed primary to close paragraphs for the new XHTML target, but
  73. # proved to be a very good adding, improving many other processings.
  74. #
  75. # -------------------------------------------------------------------
  76. #
  77. # There is also a brand new code for the Configuration schema, 100%
  78. # rewritten. There are new classes, all self documented: CommandLine,
  79. # SourceDocument, ConfigMaster and ConfigLines. In short, a new RAW
  80. # Config format was created, and all kind of configuration is first
  81. # converted to this format, and then a generic method parses it.
  82. #
  83. # The init processing was changed also, and now the functions which
  84. # gets informations about the input files are: get_infiles_config(),
  85. # process_source_file() and convert_this_files()
  86. #
  87. # Other parts are untouched, and remains the same as in v1.7, as the
  88. # marks regexes, target Headers and target Tags&Rules.
  89. #
  90. ########################################################################
  91. # Now I think the code is nice, easier to read and understand
  92. #XXX Python coding warning
  93. # Avoid common mistakes:
  94. # - do NOT use newlist=list instead newlist=list[:]
  95. # - do NOT use newdic=dic instead newdic=dic.copy()
  96. # - do NOT use dic[key] instead dic.get(key)
  97. # - do NOT use del dic[key] without has_key() before
  98. #XXX Smart Image Align don't work if the image is a link
  99. # Can't fix that because the image is expanded together with the
  100. # link, at the linkbank filling moment. Only the image is passed
  101. # to parse_images(), not the full line, so it is always 'middle'.
  102. #XXX Paragraph separation not valid inside Quote
  103. # Quote will not have <p></p> inside, instead will close and open
  104. # again the <blockquote>. This really sux in CSS, when defining a
  105. # diferent background color. Still don't know how to fix it.
  106. #XXX TODO (maybe)
  107. # New mark or macro which expands to an anchor full title.
  108. # It is necessary to parse the full document in this order:
  109. # DONE 1st scan: HEAD: get all settings, including %!includeconf
  110. # DONE 2nd scan: BODY: expand includes & apply %!preproc
  111. # 3rd scan: BODY: read titles and compose TOC info
  112. # 4th scan: BODY: full parsing, expanding [#anchor] 1st
  113. # Steps 2 and 3 can be made together, with no tag adding.
  114. # Two complete body scans will be *slow*, don't know if it worths.
  115. ##############################################################################
  116. # User config (1=ON, 0=OFF)
  117. USE_I18N = 1 # use gettext for i18ned messages? (default is 1)
  118. COLOR_DEBUG = 1 # show debug messages in colors? (default is 1)
  119. HTML_LOWER = 0 # use lowercased HTML tags instead upper? (default is 0)
  120. ##############################################################################
  121. # these are all the core Python modules used by txt2tags (KISS!)
  122. import re, string, os, sys, time, getopt
  123. # program information
  124. my_url = 'http://txt2tags.sf.net'
  125. my_name = 'txt2tags'
  126. my_email = 'verde@aurelio.net'
  127. my_version = '2.1'
  128. # i18n - just use if available
  129. if USE_I18N:
  130. try:
  131. import gettext
  132. # if your locale dir is different, change it here
  133. cat = gettext.Catalog('txt2tags',localedir='/usr/share/locale/')
  134. _ = cat.gettext
  135. except:
  136. _ = lambda x:x
  137. else:
  138. _ = lambda x:x
  139. # FLAGS : the conversion related flags , may be used in %!options
  140. # OPTIONS : the conversion related options, may be used in %!options
  141. # ACTIONS : the other behaviour modifiers, valid on command line only
  142. # MACROS : the valid macros with their default values for formatting
  143. # SETTINGS: global miscelaneous settings, valid on RC file only
  144. # CONFIG_KEYWORDS: the valid %!key:val keywords
  145. #
  146. # FLAGS and OPTIONS are configs that affect the converted document.
  147. # They usually have also a --no-<option> to turn them OFF.
  148. # ACTIONS are needed because when doing multiple input files, strange
  149. # behaviour would be found, as use command line interface for the
  150. # first file and gui for the second. There is no --no-<action>.
  151. # --version and --help inside %!options are also odd
  152. #
  153. TARGETS = ['html', 'xhtml', 'sgml', 'tex', 'man', 'mgp', 'moin', 'pm6', 'txt']
  154. FLAGS = {'headers' :1 , 'enum-title' :0 , 'mask-email' :0 ,
  155. 'toc-only' :0 , 'toc' :0 , 'rc' :1 ,
  156. 'css-sugar' :0 , 'css-suggar' :0 , 'quiet' :0 }
  157. OPTIONS = {'target' :'', 'toc-level' :3 , 'style' :'',
  158. 'infile' :'', 'outfile' :'', 'encoding' :'',
  159. 'split' :0 , 'lang' :''}
  160. ACTIONS = {'help' :0 , 'version' :0 , 'gui' :0 ,
  161. 'verbose' :0 , 'debug' :0 , 'dump-config':0 }
  162. MACROS = {'date' : '%Y%m%d', 'infile': '%f',
  163. 'mtime': '%Y%m%d', 'outfile': '%f'}
  164. SETTINGS = {} # for future use
  165. CONFIG_KEYWORDS = [
  166. 'target', 'encoding', 'style', 'options', 'preproc','postproc',
  167. 'guicolors']
  168. TARGET_NAMES = {
  169. 'html' : _('HTML page'),
  170. 'xhtml': _('XHTML page'),
  171. 'sgml' : _('SGML document'),
  172. 'tex' : _('LaTeX document'),
  173. 'man' : _('UNIX Manual page'),
  174. 'mgp' : _('Magic Point presentation'),
  175. 'moin' : _('MoinMoin page'),
  176. 'pm6' : _('PageMaker 6.0 document'),
  177. 'txt' : _('Plain Text'),
  178. }
  179. DEBUG = 0 # do not edit here, please use --debug
  180. VERBOSE = 0 # do not edit here, please use -v, -vv or -vvv
  181. QUIET = 0 # do not edit here, please use --quiet
  182. GUI = 0
  183. AUTOTOC = 1
  184. RC_RAW = []
  185. CMDLINE_RAW = []
  186. CONF = {}
  187. BLOCK = None
  188. regex = {}
  189. TAGS = {}
  190. rules = {}
  191. lang = 'english'
  192. TARGET = ''
  193. STDIN = STDOUT = '-'
  194. ESCCHAR = '\x00'
  195. SEPARATOR = '\x01'
  196. LISTNAMES = {'-':'list', '+':'numlist', ':':'deflist'}
  197. LINEBREAK = {'default':'\n', 'win':'\r\n', 'mac':'\r'}
  198. RCFILE = {'default':'.txt2tagsrc', 'win':'_t2trc'}
  199. # plataform specific settings
  200. LB = LINEBREAK.get(sys.platform[:3]) or LINEBREAK['default']
  201. RC = RCFILE.get(sys.platform[:3]) or RCFILE['default']
  202. # identify a development version
  203. #dev_suffix = '-dev'+time.strftime('%m%d',time.localtime(time.time()))
  204. #my_version = my_version + dev_suffix
  205. VERSIONSTR = _("%s version %s <%s>")%(my_name,my_version,my_url)
  206. USAGE = string.join([
  207. '',
  208. _("Usage: %s [OPTIONS] [infile.t2t ...]") % my_name,
  209. '',
  210. _(" -t, --target set target document type. currently supported:"),
  211. ' %s' % re.sub(r"[]'[]",'',repr(TARGETS)),
  212. _(" -i, --infile=FILE set FILE as the input file name ('-' for STDIN)"),
  213. _(" -o, --outfile=FILE set FILE as the output file name ('-' for STDOUT)"),
  214. _(" -n, --enum-title enumerate all title lines as 1, 1.1, 1.1.1, etc"),
  215. _(" -H, --no-headers suppress header, title and footer contents"),
  216. _(" --headers show header, title and footer contents (default ON)"),
  217. _(" --encoding set target file encoding (utf-8, iso-8859-1, etc)"),
  218. _(" --style=FILE use FILE as the document style (like HTML CSS)"),
  219. _(" --css-sugar insert CSS-friendly tags for HTML and XHTML targets"),
  220. _(" --mask-email hide email from spam robots. x@y.z turns <x (a) y z>"),
  221. _(" --toc add TOC (Table of Contents) to target document"),
  222. _(" --toc-only print document TOC and exit"),
  223. _(" --toc-level=N set maximum TOC level (depth) to N"),
  224. _(" --rc read user config file ~/.txt2tagsrc (default ON)"),
  225. _(" --gui invoke Graphical Tk Interface"),
  226. _(" -q, --quiet quiet mode, suppress all output (except errors)"),
  227. _(" -v, --verbose print informative messages during conversion"),
  228. _(" -h, --help print this help information and exit"),
  229. _(" -V, --version print program version and exit"),
  230. _(" --dump-config print all the config found and exit"),
  231. '',
  232. _("Turn OFF options:"),
  233. " --no-outfile, --no-infile, --no-style, --no-encoding, --no-headers",
  234. " --no-toc, --no-toc-only, --no-mask-email, --no-enum-title, --no-rc",
  235. " --no-css-sugar, --no-quiet",
  236. '',
  237. _("Example:\n %s -t html --toc myfile.t2t") % my_name,
  238. '',
  239. _("By default, converted output is saved to 'infile.<target>'."),
  240. _("Use --outfile to force an output file name."),
  241. _("If input file is '-', reads from STDIN."),
  242. _("If output file is '-', dumps output to STDOUT."),
  243. ''
  244. ], '\n')
  245. ##############################################################################
  246. # here is all the target's templates
  247. # you may edit them to fit your needs
  248. # - the %(HEADERn)s strings represent the Header lines
  249. # - the %(STYLE)s string is changed by --style contents
  250. # - the %(ENCODING)s string is changed by --encoding contents
  251. # - if any of the above is empty, the full line is removed
  252. # - use %% to represent a literal %
  253. #
  254. HEADER_TEMPLATE = {
  255. 'txt': """\
  256. %(HEADER1)s
  257. %(HEADER2)s
  258. %(HEADER3)s
  259. """,
  260. 'sgml': """\
  261. <!doctype linuxdoc system>
  262. <article>
  263. <title>%(HEADER1)s
  264. <author>%(HEADER2)s
  265. <date>%(HEADER3)s
  266. """,
  267. 'html': """\
  268. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  269. <HTML>
  270. <HEAD>
  271. <META NAME="generator" CONTENT="http://txt2tags.sf.net">
  272. <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=%(ENCODING)s">
  273. <LINK REL="stylesheet" TYPE="text/css" HREF="%(STYLE)s">
  274. <TITLE>%(HEADER1)s</TITLE>
  275. </HEAD><BODY BGCOLOR="white" TEXT="black">
  276. <P ALIGN="center"><CENTER><H1>%(HEADER1)s</H1>
  277. <FONT SIZE="4">
  278. <I>%(HEADER2)s</I><BR>
  279. %(HEADER3)s
  280. </FONT></CENTER>
  281. """,
  282. 'htmlcss': """\
  283. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  284. <HTML>
  285. <HEAD>
  286. <META NAME="generator" CONTENT="http://txt2tags.sf.net">
  287. <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=%(ENCODING)s">
  288. <LINK REL="stylesheet" TYPE="text/css" HREF="%(STYLE)s">
  289. <TITLE>%(HEADER1)s</TITLE>
  290. </HEAD>
  291. <BODY>
  292. <DIV CLASS="header" ID="header">
  293. <H1>%(HEADER1)s</H1>
  294. <H2>%(HEADER2)s</H2>
  295. <H3>%(HEADER3)s</H3>
  296. </DIV>
  297. """,
  298. 'xhtml': """\
  299. <?xml version="1.0"?>
  300. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\
  301. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  302. <html xmlns="http://www.w3.org/1999/xhtml">
  303. <head>
  304. <title>%(HEADER1)s</title>
  305. <meta name="generator" content="http://txt2tags.sf.net" />
  306. <meta http-equiv="Content-Type" content="text/html; charset=%(ENCODING)s" />
  307. <link rel="stylesheet" type="text/css" href="%(STYLE)s" />
  308. </head>
  309. <body bgcolor="white" text="black">
  310. <div align="center">
  311. <h1>%(HEADER1)s</h1>
  312. <h2>%(HEADER2)s</h2>
  313. <h3>%(HEADER3)s</h3>
  314. </div>
  315. """,
  316. 'xhtmlcss': """\
  317. <?xml version="1.0"?>
  318. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\
  319. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  320. <html xmlns="http://www.w3.org/1999/xhtml">
  321. <head>
  322. <title>%(HEADER1)s</title>
  323. <meta name="generator" content="http://txt2tags.sf.net" />
  324. <meta http-equiv="Content-Type" content="text/html; charset=%(ENCODING)s" />
  325. <link rel="stylesheet" type="text/css" href="%(STYLE)s" />
  326. </head>
  327. <body>
  328. <div class="header" id="header">
  329. <h1>%(HEADER1)s</h1>
  330. <h2>%(HEADER2)s</h2>
  331. <h3>%(HEADER3)s</h3>
  332. </div>
  333. """,
  334. 'man': """\
  335. .TH "%(HEADER1)s" 1 "%(HEADER3)s" "%(HEADER2)s"
  336. """,
  337. # TODO style to <HR>
  338. 'pm6': """\
  339. <PMTags1.0 win><C-COLORTABLE ("Preto" 1 0 0 0)
  340. ><@Normal=
  341. <FONT "Times New Roman"><CCOLOR "Preto"><SIZE 11>
  342. <HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE 58.3>
  343. <C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05>
  344. <GGRID 0><GLEFT 7.2><GRIGHT 0><GFIRST 0><G+BEFORE 7.2><G+AFTER 0>
  345. <GALIGNMENT "justify"><GMETHOD "proportional"><G& "ENGLISH">
  346. <GPAIRS 12><G%% 120><GKNEXT 0><GKWIDOW 0><GKORPHAN 0><GTABS $>
  347. <GHYPHENATION 2 34 0><GWORDSPACE 75 100 150><GSPACE -5 0 25>
  348. ><@Bullet=<@-PARENT "Normal"><FONT "Abadi MT Condensed Light">
  349. <GLEFT 14.4><G+BEFORE 2.15><G%% 110><GTABS(25.2 l "")>
  350. ><@PreFormat=<@-PARENT "Normal"><FONT "Lucida Console"><SIZE 8><CTRACK 0>
  351. <GLEFT 0><G+BEFORE 0><GALIGNMENT "left"><GWORDSPACE 100 100 100><GSPACE 0 0 0>
  352. ><@Title1=<@-PARENT "Normal"><FONT "Arial"><SIZE 14><B>
  353. <GCONTENTS><GLEFT 0><G+BEFORE 0><GALIGNMENT "left">
  354. ><@Title2=<@-PARENT "Title1"><SIZE 12><G+BEFORE 3.6>
  355. ><@Title3=<@-PARENT "Title1"><SIZE 10><GLEFT 7.2><G+BEFORE 7.2>
  356. ><@Title4=<@-PARENT "Title3">
  357. ><@Title5=<@-PARENT "Title3">
  358. ><@Quote=<@-PARENT "Normal"><SIZE 10><I>>
  359. %(HEADER1)s
  360. %(HEADER2)s
  361. %(HEADER3)s
  362. """,
  363. 'mgp': """\
  364. #!/usr/X11R6/bin/mgp -t 90
  365. %%deffont "normal" xfont "utopia-medium-r", charset "iso8859-1"
  366. %%deffont "normal-i" xfont "utopia-medium-i", charset "iso8859-1"
  367. %%deffont "normal-b" xfont "utopia-bold-r" , charset "iso8859-1"
  368. %%deffont "normal-bi" xfont "utopia-bold-i" , charset "iso8859-1"
  369. %%deffont "mono" xfont "courier-medium-r", charset "iso8859-1"
  370. %%default 1 size 5
  371. %%default 2 size 8, fore "yellow", font "normal-b", center
  372. %%default 3 size 5, fore "white", font "normal", left, prefix " "
  373. %%tab 1 size 4, vgap 30, prefix " ", icon arc "red" 40, leftfill
  374. %%tab 2 prefix " ", icon arc "orange" 40, leftfill
  375. %%tab 3 prefix " ", icon arc "brown" 40, leftfill
  376. %%tab 4 prefix " ", icon arc "darkmagenta" 40, leftfill
  377. %%tab 5 prefix " ", icon arc "magenta" 40, leftfill
  378. %%%%------------------------- end of headers -----------------------------
  379. %%page
  380. %%size 10, center, fore "yellow"
  381. %(HEADER1)s
  382. %%font "normal-i", size 6, fore "white", center
  383. %(HEADER2)s
  384. %%font "mono", size 7, center
  385. %(HEADER3)s
  386. """,
  387. # TODO please, improve me!
  388. 'moin': """\
  389. '''%(HEADER1)s'''
  390. ''%(HEADER2)s''
  391. %(HEADER3)s
  392. """,
  393. 'tex': \
  394. r"""\documentclass[11pt,a4paper]{article}
  395. \usepackage{amsfonts,graphicx,url}
  396. \usepackage[%(ENCODING)s]{inputenc} %% char encoding
  397. \usepackage{%(STYLE)s} %% user defined package
  398. \pagestyle{plain} %% do page numbering ('empty' turns off)
  399. \frenchspacing %% no aditional spaces after periods
  400. \setlength{\parskip}{8pt}\parindent=0pt %% no paragraph indentation
  401. %% uncomment next line for fancy PDF output on Adobe Acrobat Reader
  402. %%\usepackage[pdfstartview=FitV,colorlinks=true,bookmarks=true]{hyperref}
  403. \title{%(HEADER1)s}
  404. \author{%(HEADER2)s}
  405. \begin{document}
  406. \date{%(HEADER3)s}
  407. \maketitle
  408. \clearpage
  409. """
  410. }
  411. ##############################################################################
  412. def getTags(config):
  413. "Returns all the known tags for the specified target"
  414. keys = [
  415. 'paragraphOpen','paragraphClose',
  416. 'title1','title2','title3','title4','title5',
  417. 'numtitle1','numtitle2','numtitle3','numtitle4','numtitle5',
  418. 'blockVerbOpen','blockVerbClose',
  419. 'blockQuoteOpen','blockQuoteClose','blockQuoteLine',
  420. 'fontMonoOpen','fontMonoClose',
  421. 'fontBoldOpen','fontBoldClose',
  422. 'fontItalicOpen','fontItalicClose',
  423. 'fontUnderlineOpen','fontUnderlineClose',
  424. 'listOpen','listClose',
  425. 'listItemOpen','listItemClose','listItemLine',
  426. 'numlistOpen','numlistClose',
  427. 'numlistItemOpen','numlistItemClose','numlistItemLine',
  428. 'deflistOpen','deflistClose',
  429. 'deflistItem1Open','deflistItem1Close',
  430. 'deflistItem2Open','deflistItem2Close',
  431. 'bar1','bar2',
  432. 'url','urlMark','email','emailMark',
  433. 'img',
  434. 'tableOpen','tableClose',
  435. 'tableRowOpen','tableRowClose','tableRowSep',
  436. 'tableCellOpen','tableCellClose','tableCellSep',
  437. 'tableTitleCellOpen','tableTitleCellClose','tableTitleCellSep',
  438. 'tableTitleRowOpen','tableTitleRowClose',
  439. 'tableBorder', 'tableAlignLeft', 'tableAlignCenter',
  440. 'tableCellAlignLeft','tableCellAlignRight','tableCellAlignCenter',
  441. 'tableColAlignLeft','tableColAlignRight','tableColAlignCenter',
  442. 'tableColAlignSep',
  443. 'anchor','comment','pageBreak',
  444. 'TOC','tocOpen','tocClose',
  445. 'bodyOpen','bodyClose',
  446. 'EOD'
  447. ]
  448. alltags = {
  449. 'txt': {
  450. 'title1' : ' \a' ,
  451. 'title2' : '\t\a' ,
  452. 'title3' : '\t\t\a' ,
  453. 'title4' : '\t\t\t\a' ,
  454. 'title5' : '\t\t\t\t\a',
  455. 'blockQuoteLine' : '\t' ,
  456. 'listItemOpen' : '- ' ,
  457. 'numlistItemOpen' : '\a. ' ,
  458. 'bar1' : '\a' ,
  459. 'bar2' : '\a' ,
  460. 'url' : '\a' ,
  461. 'urlMark' : '\a (\a)' ,
  462. 'email' : '\a' ,
  463. 'emailMark' : '\a (\a)' ,
  464. 'img' : '[\a]' ,
  465. },
  466. 'html': {
  467. 'paragraphOpen' : '<P>' ,
  468. 'paragraphClose' : '</P>' ,
  469. 'title1' : '~A~<H1>\a</H1>' ,
  470. 'title2' : '~A~<H2>\a</H2>' ,
  471. 'title3' : '~A~<H3>\a</H3>' ,
  472. 'title4' : '~A~<H4>\a</H4>' ,
  473. 'title5' : '~A~<H5>\a</H5>' ,
  474. 'blockVerbOpen' : '<PRE>' ,
  475. 'blockVerbClose' : '</PRE>' ,
  476. 'blockQuoteOpen' : '<BLOCKQUOTE>' ,
  477. 'blockQuoteClose' : '</BLOCKQUOTE>' ,
  478. 'fontMonoOpen' : '<CODE>' ,
  479. 'fontMonoClose' : '</CODE>' ,
  480. 'fontBoldOpen' : '<B>' ,
  481. 'fontBoldClose' : '</B>' ,
  482. 'fontItalicOpen' : '<I>' ,
  483. 'fontItalicClose' : '</I>' ,
  484. 'fontUnderlineOpen' : '<U>' ,
  485. 'fontUnderlineClose' : '</U>' ,
  486. 'listOpen' : '<UL>' ,
  487. 'listClose' : '</UL>' ,
  488. 'listItemOpen' : '<LI>' ,
  489. 'numlistOpen' : '<OL>' ,
  490. 'numlistClose' : '</OL>' ,
  491. 'numlistItemOpen' : '<LI>' ,
  492. 'deflistOpen' : '<DL>' ,
  493. 'deflistClose' : '</DL>' ,
  494. 'deflistItem1Open' : '<DT>' ,
  495. 'deflistItem1Close' : '</DT>' ,
  496. 'deflistItem2Open' : '<DD>' ,
  497. 'bar1' : '<HR NOSHADE SIZE=1>' ,
  498. 'bar2' : '<HR NOSHADE SIZE=5>' ,
  499. 'url' : '<A HREF="\a">\a</A>' ,
  500. 'urlMark' : '<A HREF="\a">\a</A>' ,
  501. 'email' : '<A HREF="mailto:\a">\a</A>' ,
  502. 'emailMark' : '<A HREF="mailto:\a">\a</A>' ,
  503. 'img' :'<IMG ALIGN="~A~" SRC="\a" BORDER="0" ALT="">',
  504. 'tableOpen' : '<TABLE~A~ CELLPADDING="4"~B~>',
  505. 'tableClose' : '</TABLE>' ,
  506. 'tableRowOpen' : '<TR>' ,
  507. 'tableRowClose' : '</TR>' ,
  508. 'tableCellOpen' : '<TD\a>' ,
  509. 'tableCellClose' : '</TD>' ,
  510. 'tableTitleCellOpen' : '<TH>' ,
  511. 'tableTitleCellClose' : '</TH>' ,
  512. 'tableBorder' : ' BORDER="1"' ,
  513. 'tableAlignCenter' : ' ALIGN="center"',
  514. 'tableCellAlignRight' : ' ALIGN="right"' ,
  515. 'tableCellAlignCenter': ' ALIGN="center"',
  516. 'anchor' : '<A NAME="\a"></A>\n',
  517. 'comment' : '<!-- \a -->' ,
  518. 'EOD' : '</BODY></HTML>'
  519. },
  520. #TIP xhtml inherits all HTML definitions (lowercased)
  521. #TIP http://www.w3.org/TR/xhtml1/#guidelines
  522. #TIP http://www.htmlref.com/samples/Chapt17/17_08.htm
  523. 'xhtml': {
  524. 'listItemClose' : '</li>' ,
  525. 'numlistItemClose' : '</li>' ,
  526. 'deflistItem2Close' : '</dd>' ,
  527. 'bar1' : '<hr class="light" />',
  528. 'bar2' : '<hr class="heavy" />',
  529. 'anchor' : '<a id="\a" name="\a"></a>\n',
  530. 'img' :'<img align="~A~" src="\a" border="0" alt=""/>',
  531. },
  532. 'sgml': {
  533. 'paragraphOpen' : '<p>' ,
  534. 'title1' : '<sect>\a~A~<p>' ,
  535. 'title2' : '<sect1>\a~A~<p>' ,
  536. 'title3' : '<sect2>\a~A~<p>' ,
  537. 'title4' : '<sect3>\a~A~<p>' ,
  538. 'title5' : '<sect4>\a~A~<p>' ,
  539. 'blockVerbOpen' : '<tscreen><verb>' ,
  540. 'blockVerbClose' : '</verb></tscreen>' ,
  541. 'blockQuoteOpen' : '<quote>' ,
  542. 'blockQuoteClose' : '</quote>' ,
  543. 'fontMonoOpen' : '<tt>' ,
  544. 'fontMonoClose' : '</tt>' ,
  545. 'fontBoldOpen' : '<bf>' ,
  546. 'fontBoldClose' : '</bf>' ,
  547. 'fontItalicOpen' : '<em>' ,
  548. 'fontItalicClose' : '</em>' ,
  549. 'fontUnderlineOpen' : '<bf><em>' ,
  550. 'fontUnderlineClose' : '</em></bf>' ,
  551. 'listOpen' : '<itemize>' ,
  552. 'listClose' : '</itemize>' ,
  553. 'listItemOpen' : '<item>' ,
  554. 'numlistOpen' : '<enum>' ,
  555. 'numlistClose' : '</enum>' ,
  556. 'numlistItemOpen' : '<item>' ,
  557. 'deflistOpen' : '<descrip>' ,
  558. 'deflistClose' : '</descrip>' ,
  559. 'deflistItem1Open' : '<tag>' ,
  560. 'deflistItem1Close' : '</tag>' ,
  561. 'bar1' : '<!-- \a -->' ,
  562. 'bar2' : '<!-- \a -->' ,
  563. 'url' : '<htmlurl url="\a" name="\a">' ,
  564. 'urlMark' : '<htmlurl url="\a" name="\a">' ,
  565. 'email' : '<htmlurl url="mailto:\a" name="\a">' ,
  566. 'emailMark' : '<htmlurl url="mailto:\a" name="\a">' ,
  567. 'img' : '<figure><ph vspace=""><img src="\a">'+\
  568. '</figure>' ,
  569. 'tableOpen' : '<table><tabular ca="~C~">' ,
  570. 'tableClose' : '</tabular></table>' ,
  571. 'tableRowSep' : '<rowsep>' ,
  572. 'tableCellSep' : '<colsep>' ,
  573. 'tableColAlignLeft' : 'l' ,
  574. 'tableColAlignRight' : 'r' ,
  575. 'tableColAlignCenter' : 'c' ,
  576. 'comment' : '<!-- \a -->' ,
  577. 'anchor' : '<label id="\a">' ,
  578. 'TOC' : '<toc>' ,
  579. 'EOD' : '</article>'
  580. },
  581. 'tex': {
  582. 'title1' : '\n\section*{\a}',
  583. 'title2' : '\\subsection*{\a}' ,
  584. 'title3' : '\\subsubsection*{\a}' ,
  585. # title 4/5: DIRTY: para+BF+\\+\n
  586. 'title4' : '\\paragraph{}\\textbf{\a}\\\\\n',
  587. 'title5' : '\\paragraph{}\\textbf{\a}\\\\\n',
  588. 'numtitle1' : '\n\section{\a}',
  589. 'numtitle2' : '\\subsection{\a}' ,
  590. 'numtitle3' : '\\subsubsection{\a}' ,
  591. 'blockVerbOpen' : '\\begin{verbatim}' ,
  592. 'blockVerbClose' : '\\end{verbatim}' ,
  593. 'blockQuoteOpen' : '\\begin{quotation}' ,
  594. 'blockQuoteClose' : '\\end{quotation}' ,
  595. 'fontMonoOpen' : '\\texttt{' ,
  596. 'fontMonoClose' : '}' ,
  597. 'fontBoldOpen' : '\\textbf{' ,
  598. 'fontBoldClose' : '}' ,
  599. 'fontItalicOpen' : '\\textit{' ,
  600. 'fontItalicClose' : '}' ,
  601. 'fontUnderlineOpen' : '\\underline{' ,
  602. 'fontUnderlineClose' : '}' ,
  603. 'listOpen' : '\\begin{itemize}' ,
  604. 'listClose' : '\\end{itemize}' ,
  605. 'listItemOpen' : '\\item ' ,
  606. 'numlistOpen' : '\\begin{enumerate}' ,
  607. 'numlistClose' : '\\end{enumerate}' ,
  608. 'numlistItemOpen' : '\\item ' ,
  609. 'deflistOpen' : '\\begin{description}',
  610. 'deflistClose' : '\\end{description}' ,
  611. 'deflistItem1Open' : '\\item[' ,
  612. 'deflistItem1Close' : ']' ,
  613. 'bar1' : '\n\\hrulefill{}\n' ,
  614. 'bar2' : '\n\\rule{\linewidth}{1mm}\n',
  615. 'url' : '\\url{\a}' ,
  616. 'urlMark' : '\\textit{\a} (\\url{\a})' ,
  617. 'email' : '\\url{\a}' ,
  618. 'emailMark' : '\\textit{\a} (\\url{\a})' ,
  619. 'img' : '\\includegraphics{\a}',
  620. 'tableOpen' : '\\begin{center}\\begin{tabular}{|~C~|}',
  621. 'tableClose' : '\\end{tabular}\\end{center}',
  622. 'tableRowOpen' : '\\hline ' ,
  623. 'tableRowClose' : ' \\\\' ,
  624. 'tableCellSep' : ' & ' ,
  625. 'tableColAlignLeft' : 'l' ,
  626. 'tableColAlignRight' : 'r' ,
  627. 'tableColAlignCenter' : 'c' ,
  628. 'tableColAlignSep' : '|' ,
  629. 'comment' : '% \a' ,
  630. 'TOC' : '\\tableofcontents',
  631. 'pageBreak' : '\\clearpage',
  632. 'EOD' : '\\end{document}'
  633. },
  634. 'moin': {
  635. 'title1' : '= \a =' ,
  636. 'title2' : '== \a ==' ,
  637. 'title3' : '=== \a ===' ,
  638. 'title4' : '==== \a ====' ,
  639. 'title5' : '===== \a =====',
  640. 'blockVerbOpen' : '{{{' ,
  641. 'blockVerbClose' : '}}}' ,
  642. 'blockQuoteLine' : ' ' ,
  643. 'fontMonoOpen' : '{{{' ,
  644. 'fontMonoClose' : '}}}' ,
  645. 'fontBoldOpen' : "'''" ,
  646. 'fontBoldClose' : "'''" ,
  647. 'fontItalicOpen' : "''" ,
  648. 'fontItalicClose' : "''" ,
  649. 'fontUnderlineOpen' : "__" ,
  650. 'fontUnderlineClose' : "__" ,
  651. 'listItemOpen' : ' * ' ,
  652. 'numlistItemOpen' : ' \a. ' ,
  653. 'bar1' : '----' ,
  654. 'bar2' : '----' ,
  655. 'url' : '[\a]' ,
  656. 'urlMark' : '[\a \a]' ,
  657. 'email' : '[\a]' ,
  658. 'emailMark' : '[\a \a]' ,
  659. 'img' : '[\a]' ,
  660. 'tableRowOpen' : '||' ,
  661. 'tableCellOpen' : '\a' ,
  662. 'tableCellClose' : '||' ,
  663. 'tableTitleCellClose' : '||' ,
  664. 'tableCellAlignRight' : '<)>' ,
  665. 'tableCellAlignCenter': '<:>' ,
  666. 'comment' : '## \a' ,
  667. 'TOC' : '[[TableOfContents]]'
  668. },
  669. 'mgp': {
  670. 'paragraphOpen' : '%font "normal", size 5' ,
  671. 'title1' : '%page\n\n\a\n' ,
  672. 'title2' : '%page\n\n\a\n' ,
  673. 'title3' : '%page\n\n\a\n' ,
  674. 'title4' : '%page\n\n\a\n' ,
  675. 'title5' : '%page\n\n\a\n' ,
  676. 'blockVerbOpen' : '%font "mono"' ,
  677. 'blockVerbClose' : '%font "normal"' ,
  678. 'blockQuoteOpen' : '%prefix " "' ,
  679. 'blockQuoteClose' : '%prefix " "' ,
  680. 'fontMonoOpen' : '\n%cont, font "mono"\n' ,
  681. 'fontMonoClose' : '\n%cont, font "normal"\n' ,
  682. 'fontBoldOpen' : '\n%cont, font "normal-b"\n' ,
  683. 'fontBoldClose' : '\n%cont, font "normal"\n' ,
  684. 'fontItalicOpen' : '\n%cont, font "normal-i"\n' ,
  685. 'fontItalicClose' : '\n%cont, font "normal"\n' ,
  686. 'fontUnderlineOpen' : '\n%cont, fore "cyan"\n' ,
  687. 'fontUnderlineClose' : '\n%cont, fore "white"\n' ,
  688. 'listItemLine' : '\t' ,
  689. 'numlistItemLine' : '\t' ,
  690. 'deflistItem1Open' : '\t\n%cont, font "normal-b"\n',
  691. 'deflistItem1Close' : '\n%cont, font "normal"\n' ,
  692. 'bar1' : '%bar "white" 5' ,
  693. 'bar2' : '%pause' ,
  694. 'url' : '\n%cont, fore "cyan"\n\a' +\
  695. '\n%cont, fore "white"\n' ,
  696. 'urlMark' : '\a \n%cont, fore "cyan"\n\a'+\
  697. '\n%cont, fore "white"\n' ,
  698. 'email' : '\n%cont, fore "cyan"\n\a' +\
  699. '\n%cont, fore "white"\n' ,
  700. 'emailMark' : '\a \n%cont, fore "cyan"\n\a'+\
  701. '\n%cont, fore "white"\n' ,
  702. 'img' : '\n%~A~\n%newimage "\a"\n%left\n',
  703. 'comment' : '%% \a' ,
  704. 'pageBreak' : '%page\n\n\n' ,
  705. 'EOD' : '%%EOD'
  706. },
  707. # man groff_man ; man 7 groff
  708. 'man': {
  709. 'paragraphOpen' : '.P' ,
  710. 'title1' : '.SH \a' ,
  711. 'title2' : '.SS \a' ,
  712. 'title3' : '.SS \a' ,
  713. 'title4' : '.SS \a' ,
  714. 'title5' : '.SS \a' ,
  715. 'blockVerbOpen' : '.nf' ,
  716. 'blockVerbClose' : '.fi\n' ,
  717. 'blockQuoteOpen' : '.RS' ,
  718. 'blockQuoteClose' : '.RE' ,
  719. 'fontBoldOpen' : '\\fB' ,
  720. 'fontBoldClose' : '\\fR' ,
  721. 'fontItalicOpen' : '\\fI' ,
  722. 'fontItalicClose' : '\\fR' ,
  723. 'listOpen' : '.RS' ,
  724. 'listItemOpen' : '.IP \(bu 3\n',
  725. 'listClose' : '.RE' ,
  726. 'numlistOpen' : '.RS' ,
  727. 'numlistItemOpen' : '.IP \a. 3\n',
  728. 'numlistClose' : '.RE' ,
  729. 'deflistItem1Open' : '.TP\n' ,
  730. 'bar1' : '\n\n' ,
  731. 'bar2' : '\n\n' ,
  732. 'url' : '\a' ,
  733. 'urlMark' : '\a (\a)',
  734. 'email' : '\a' ,
  735. 'emailMark' : '\a (\a)',
  736. 'img' : '\a' ,
  737. 'tableOpen' : '.TS\n~A~~B~tab(^); ~C~.',
  738. 'tableClose' : '.TE' ,
  739. 'tableRowOpen' : ' ' ,
  740. 'tableCellSep' : '^' ,
  741. 'tableAlignCenter' : 'center, ',
  742. 'tableBorder' : 'allbox, ',
  743. 'tableColAlignLeft' : 'l' ,
  744. 'tableColAlignRight' : 'r' ,
  745. 'tableColAlignCenter' : 'c' ,
  746. 'comment' : '.\\" \a'
  747. },
  748. 'pm6': {
  749. 'paragraphOpen' : '<@Normal:>' ,
  750. 'title1' : '\n<@Title1:>\a',
  751. 'title2' : '\n<@Title2:>\a',
  752. 'title3' : '\n<@Title3:>\a',
  753. 'title4' : '\n<@Title4:>\a',
  754. 'title5' : '\n<@Title5:>\a',
  755. 'blockVerbOpen' : '<@PreFormat:>' ,
  756. 'blockQuoteLine' : '<@Quote:>' ,
  757. 'fontMonoOpen' : '<FONT "Lucida Console"><SIZE 9>' ,
  758. 'fontMonoClose' : '<SIZE$><FONT$>',
  759. 'fontBoldOpen' : '<B>' ,
  760. 'fontBoldClose' : '<P>' ,
  761. 'fontItalicOpen' : '<I>' ,
  762. 'fontItalicClose' : '<P>' ,
  763. 'fontUnderlineOpen' : '<U>' ,
  764. 'fontUnderlineClose' : '<P>' ,
  765. 'listOpen' : '<@Bullet:>' ,
  766. 'listItemOpen' : '\x95\t' , # \x95 == ~U
  767. 'numlistOpen' : '<@Bullet:>' ,
  768. 'numlistItemOpen' : '\x95\t' ,
  769. 'bar1' : '\a' ,
  770. 'bar2' : '\a' ,
  771. 'url' : '<U>\a<P>' , # underline
  772. 'urlMark' : '\a <U>\a<P>' ,
  773. 'email' : '\a' ,
  774. 'emailMark' : '\a \a' ,
  775. 'img' : '\a'
  776. }
  777. }
  778. # exceptions for --css-sugar
  779. if config['css-sugar'] and config['target'] in ('html','xhtml'):
  780. # change just HTML because XHTML inherits it
  781. htmltags = alltags['html']
  782. # table with no cellpadding
  783. htmltags['tableOpen'] = string.replace(
  784. htmltags['tableOpen'], ' CELLPADDING="4"', '')
  785. # DIVs
  786. htmltags['tocOpen' ] = '<DIV CLASS="toc" ID="toc">'
  787. htmltags['tocClose'] = '</DIV>'
  788. htmltags['bodyOpen'] = '<DIV CLASS="body" ID="body">'
  789. htmltags['bodyClose']= '</DIV>'
  790. # make the HTML -> XHTML inheritance
  791. xhtml = alltags['html'].copy()
  792. for key in xhtml.keys(): xhtml[key] = string.lower(xhtml[key])
  793. # some like HTML tags as lowercase, some don't... (headers out)
  794. if HTML_LOWER: alltags['html'] = xhtml.copy()
  795. xhtml.update(alltags['xhtml'])
  796. alltags['xhtml'] = xhtml.copy()
  797. # compose the target tags dictionary
  798. tags = {}
  799. target_tags = alltags[config['target']].copy()
  800. for key in keys: tags[key] = '' # create empty keys
  801. for key in target_tags.keys():
  802. tags[key] = maskEscapeChar(target_tags[key]) # populate
  803. return tags
  804. ##############################################################################
  805. def getRules(config):
  806. "Returns all the target-specific syntax rules"
  807. ret = {}
  808. allrules = [
  809. # target rules (ON/OFF)
  810. 'linkable', # target supports external links
  811. 'tableable', # target supports tables
  812. 'imglinkable', # target supports images as links
  813. 'imgalignable', # target supports image alignment
  814. 'imgasdefterm', # target supports image as definition term
  815. 'autonumberlist', # target supports numbered lists natively
  816. 'autonumbertitle', # target supports numbered titles natively
  817. 'parainsidelist', # lists items supports paragraph
  818. 'spacedlistitem', # lists support blank lines between items
  819. 'listnotnested', # lists cannot be nested
  820. 'quotenotnested', # quotes cannot be nested
  821. 'verbblocknotescaped', # don't escape specials in verb block
  822. 'verbblockfinalescape', # do final escapes in verb block
  823. 'escapeurl', # escape special in link URL
  824. 'onelinepara', # dump paragraph as a single long line
  825. 'tabletitlerowinbold', # manually bold any cell on table titles
  826. 'tablecellstrip', # strip extra spaces from each table cell
  827. 'barinsidequote', # bars are allowed inside quote blocks
  828. 'finalescapetitle', # perform final escapes on title lines
  829. 'autotocnewpagebefore', # break page before automatic TOC
  830. 'autotocnewpageafter', # break page after automatic TOC
  831. 'autotocwithbars', # automatic TOC surrounded by bars
  832. # target code beautify (ON/OFF)
  833. 'indentverbblock', # add leading spaces to verb block lines
  834. 'breaktablecell', # break lines after any table cell
  835. 'breaktablelineopen', # break line after opening table line
  836. 'notbreaklistopen', # don't break line after opening a new list
  837. 'notbreakparaopen', # don't break line after opening a new para
  838. 'keepquoteindent', # don't remove the leading TABs on quotes
  839. 'keeplistindent', # don't remove the leading spaces on lists
  840. 'blankendmotherlist', # append a blank line at the mother list end
  841. 'blankendtable', # append a blank line at the table end
  842. 'blankendautotoc', # append a blank line at the auto TOC end
  843. 'tagnotindentable', # tags must be placed at the line begining
  844. # value settings
  845. 'listmaxdepth', # maximum depth for lists
  846. 'tablecellaligntype' # type of table cell align: cell, column
  847. ]
  848. rules_bank = {
  849. 'txt' : {
  850. 'indentverbblock':1,
  851. 'spacedlistitem':1,
  852. 'parainsidelist':1,
  853. 'keeplistindent':1,
  854. 'barinsidequote':1,
  855. 'autotocwithbars':1,
  856. 'blankendmotherlist':1
  857. },
  858. 'html': {
  859. 'indentverbblock':1,
  860. 'linkable':1,
  861. 'escapeurl':1,
  862. 'imglinkable':1,
  863. 'imgalignable':1,
  864. 'imgasdefterm':1,
  865. 'autonumberlist':1,
  866. 'spacedlistitem':1,
  867. 'parainsidelist':1,
  868. 'blankendmotherlist':1,
  869. 'tableable':1,
  870. 'tablecellstrip':1,
  871. 'blankendtable':1,
  872. 'breaktablecell':1,
  873. 'breaktablelineopen':1,
  874. 'keeplistindent':1,
  875. 'keepquoteindent':1,
  876. 'barinsidequote':1,
  877. 'autotocwithbars':1,
  878. 'tablecellaligntype':'cell'
  879. },
  880. #TIP xhtml inherits all HTML rules
  881. 'xhtml': {
  882. },
  883. 'sgml': {
  884. 'linkable':1,
  885. 'escapeurl':1,
  886. 'autonumberlist':1,
  887. 'spacedlistitem':1,
  888. 'blankendmotherlist':1,
  889. 'tableable':1,
  890. 'tablecellstrip':1,
  891. 'blankendtable':1,
  892. 'blankendautotoc':1,
  893. 'quotenotnested':1,
  894. 'keeplistindent':1,
  895. 'keepquoteindent':1,
  896. 'barinsidequote':1,
  897. 'finalescapetitle':1,
  898. 'tablecellaligntype':'column'
  899. },
  900. 'mgp' : {
  901. 'blankendmotherlist':1,
  902. 'tagnotindentable':1,
  903. 'spacedlistitem':1,
  904. 'imgalignable':1,
  905. 'autotocnewpagebefore':1,
  906. },
  907. 'tex' : {
  908. 'autonumberlist':1,
  909. 'autonumbertitle':1,
  910. 'spacedlistitem':1,
  911. 'blankendmotherlist':1,
  912. 'tableable':1,
  913. 'tablecellstrip':1,
  914. 'tabletitlerowinbold':1,
  915. 'blankendtable':1,
  916. 'verbblocknotescaped':1,
  917. 'keeplistindent':1,
  918. 'listmaxdepth':4,
  919. 'barinsidequote':1,
  920. 'finalescapetitle':1,
  921. 'autotocnewpageafter':1,
  922. 'tablecellaligntype':'column'
  923. },
  924. 'moin': {
  925. 'spacedlistitem':1,
  926. 'linkable':1,
  927. 'blankendmotherlist':1,
  928. 'keeplistindent':1,
  929. 'tableable':1,
  930. 'barinsidequote':1,
  931. 'blankendtable':1,
  932. 'tabletitlerowinbold':1,
  933. 'tablecellstrip':1,
  934. 'autotocwithbars':1,
  935. 'tablecellaligntype':'cell'
  936. },
  937. 'man' : {
  938. 'spacedlistitem':1,
  939. 'indentverbblock':1,
  940. 'blankendmotherlist':1,
  941. 'tagnotindentable':1,
  942. 'tableable':1,
  943. 'tablecellaligntype':'column',
  944. 'tabletitlerowinbold':1,
  945. 'tablecellstrip':1,
  946. 'blankendtable':1,
  947. 'keeplistindent':0,
  948. 'barinsidequote':1,
  949. 'parainsidelist':0,
  950. },
  951. 'pm6' : {
  952. 'keeplistindent':1,
  953. 'verbblockfinalescape':1,
  954. #TODO add support for these - maybe set a JOINNEXT char and
  955. # do it on addLineBreaks()
  956. 'notbreaklistopen':1,
  957. 'notbreakparaopen':1,
  958. 'barinsidequote':1,
  959. 'autotocwithbars':1,
  960. 'onelinepara':1,
  961. }
  962. }
  963. # exceptions for --css-sugar
  964. if config['css-sugar'] and config['target'] in ('html','xhtml'):
  965. rules_bank['html']['indentverbblock'] = 0
  966. rules_bank['html']['autotocwithbars'] = 0
  967. # get the target specific rules
  968. if config['target'] == 'xhtml':
  969. myrules = rules_bank['html'].copy() # inheritance
  970. myrules.update(rules_bank['xhtml']) # get XHTML specific
  971. else:
  972. myrules = rules_bank[config['target']].copy()
  973. # populate return dictionary
  974. for key in allrules: ret[key] = 0 # reset all
  975. ret.update(myrules) # get rules
  976. return ret
  977. ##############################################################################
  978. def getRegexes():
  979. "Returns all the regexes used to find the t2t marks"
  980. bank = {
  981. 'blockVerbOpen':
  982. re.compile(r'^```\s*$'),
  983. 'blockVerbClose':
  984. re.compile(r'^```\s*$'),
  985. 'blockRawOpen':
  986. re.compile(r'^"""\s*$'),
  987. 'blockRawClose':
  988. re.compile(r'^"""\s*$'),
  989. 'quote':
  990. re.compile(r'^\t+'),
  991. '1lineVerb':
  992. re.compile(r'^``` (?=.)'),
  993. '1lineRaw':
  994. re.compile(r'^""" (?=.)'),
  995. # mono, raw, bold, italic, underline:
  996. # - marks must be glued with the contents, no boundary spaces
  997. # - they are greedy, so in ****bold****, turns to <b>**bold**</b>
  998. 'fontMono':
  999. re.compile( r'``([^\s](|.*?[^\s])`*)``'),
  1000. 'raw':
  1001. re.compile( r'""([^\s](|.*?[^\s])"*)""'),
  1002. 'fontBold':
  1003. re.compile(r'\*\*([^\s](|.*?[^\s])\**)\*\*'),
  1004. 'fontItalic':
  1005. re.compile( r'//([^\s](|.*?[^\s])/*)//'),
  1006. 'fontUnderline':
  1007. re.compile( r'__([^\s](|.*?[^\s])_*)__'),
  1008. 'list':
  1009. re.compile(r'^( *)(-) (?=[^ ])'),
  1010. 'numlist':
  1011. re.compile(r'^( *)(\+) (?=[^ ])'),
  1012. 'deflist':
  1013. re.compile(r'^( *)(:) (.*)$'),
  1014. 'listclose':
  1015. re.compile(r'^( *)([-+:])\s*$'),
  1016. 'bar':
  1017. re.compile(r'^(\s*)([_=-]{20,})\s*$'),
  1018. 'table':
  1019. re.compile(r'^ *\|\|? '),
  1020. 'blankline':
  1021. re.compile(r'^\s*$'),
  1022. 'comment':
  1023. re.compile(r'^%'),
  1024. # auxiliar tag regexes
  1025. '_imgAlign' : re.compile(r'~A~',re.I),
  1026. '_tableAlign' : re.compile(r'~A~',re.I),
  1027. '_anchor' : re.compile(r'~A~',re.I),
  1028. '_tableBorder' : re.compile(r'~B~',re.I),
  1029. '_tableColAlign': re.compile(r'~C~',re.I),
  1030. }
  1031. # special char to place data on TAGs contents (\a == bell)
  1032. bank['x'] = re.compile('\a')
  1033. # %%macroname [ (formatting) ]
  1034. bank['macros'] = re.compile(r'%%%%(?P<name>%s)\b(\((?P<fmt>.*?)\))?'%(
  1035. string.join(MACROS.keys(), '|')), re.I)
  1036. # %%TOC special macro for TOC positioning
  1037. bank['toc'] = re.compile(r'^ *%%toc\s*$', re.I)
  1038. # almost complicated title regexes ;)
  1039. titskel = r'^ *(?P<id>%s)(?P<txt>%s)\1(\[(?P<label>[\w-]*)\])?\s*$'
  1040. bank[ 'title'] = re.compile(titskel%('[=]{1,5}','[^=](|.*[^=])'))
  1041. bank['numtitle'] = re.compile(titskel%('[+]{1,5}','[^+](|.*[^+])'))
  1042. ### complicated regexes begin here ;)
  1043. #
  1044. # textual descriptions on --help's style: [...] is optional, | is OR
  1045. ### first, some auxiliar variables
  1046. #
  1047. # [image.EXT]
  1048. patt_img = r'\[([\w_,.+%$#@!?+~/-]+\.(png|jpe?g|gif|eps|bmp))\]'
  1049. # link things
  1050. urlskel = {
  1051. 'proto' : r'(https?|ftp|news|telnet|gopher|wais)://',
  1052. 'guess' : r'(www[23]?|ftp)\.', # w/out proto, try to guess
  1053. 'login' : r'A-Za-z0-9_.-', # for ftp://login@domain.com
  1054. 'pass' : r'[^ @]*', # for ftp://login:pass@dom.com
  1055. 'chars' : r'A-Za-z0-9%._/~:,=$@&+-', # %20(space), :80(port), D&D
  1056. 'anchor': r'A-Za-z0-9%._-', # %nn(encoded)
  1057. 'form' : r'A-Za-z0-9/%&=+;.,$@*_-', # .,@*_-(as is)
  1058. 'punct' : r'.,;:!?'
  1059. }
  1060. # username [ :password ] @
  1061. patt_url_login = r'([%s]+(:%s)?@)?'%(urlskel['login'],urlskel['pass'])
  1062. # [ http:// ] [ username:password@ ] domain.com [ / ]
  1063. # [ #anchor | ?form=data ]
  1064. retxt_url = r'\b(%s%s|%s)[%s]+\b/*(\?[%s]+)?(#[%s]+)?'%(
  1065. urlskel['proto'],patt_url_login, urlskel['guess'],
  1066. urlskel['chars'],urlskel['form'],urlskel['anchor'])
  1067. # filename | [ filename ] #anchor
  1068. retxt_url_local = r'[%s]+|[%s]*(#[%s]+)'%(
  1069. urlskel['chars'],urlskel['chars'],urlskel['anchor'])
  1070. # user@domain [ ?form=data ]
  1071. patt_email = r'\b[%s]+@([A-Za-z0-9_-]+\.)+[A-Za-z]{2,4}\b(\?[%s]+)?'%(
  1072. urlskel['login'],urlskel['form'])
  1073. # saving for future use
  1074. bank['_urlskel'] = urlskel
  1075. ### and now the real regexes
  1076. #
  1077. bank['email'] = re.compile(patt_email,re.I)
  1078. # email | url
  1079. bank['link'] = re.compile(r'%s|%s'%(retxt_url,patt_email), re.I)
  1080. # \[ label | imagetag url | email | filename \]
  1081. bank['linkmark'] = re.compile(
  1082. r'\[(?P<label>%s|[^]]+) (?P<link>%s|%s|%s)\]'%(
  1083. patt_img, retxt_url, patt_email, retxt_url_local),
  1084. re.L+re.I)
  1085. # image
  1086. bank['img'] = re.compile(patt_img, re.L+re.I)
  1087. # special things
  1088. bank['special'] = re.compile(r'^%!\s*')
  1089. return bank
  1090. ### END OF regex nightmares
  1091. ##############################################################################
  1092. def echo(msg): # for quick debug
  1093. print '\033[32;1m%s\033[m'%msg
  1094. def Quit(msg, exitcode=0):
  1095. print msg
  1096. sys.exit(exitcode)
  1097. def Error(msg):
  1098. sys.stderr.write(_("%s: Error: ")%my_name + "%s\n"%msg)
  1099. sys.stderr.flush()
  1100. sys.exit(1)
  1101. def ShowTraceback():
  1102. try:
  1103. from traceback import print_exc
  1104. print_exc() ; print ; print
  1105. except: pass
  1106. def Message(msg,level):
  1107. if level <= VERBOSE and not QUIET:
  1108. prefix = '-'*5
  1109. print "%s %s"%(prefix*level, msg)
  1110. def Debug(msg,color=0,linenr=None):
  1111. "0gray=init,1red=conf,3yellow=line,6cyan=block,2green=detail,5pink=gui"
  1112. if QUIET or not DEBUG: return
  1113. if COLOR_DEBUG: msg = '\033[3%s;1m%s\033[m'%(color,msg)
  1114. if linenr is not None: msg = "LINE %04d: %s"%(linenr,msg)
  1115. print "** %s"%msg
  1116. def Readfile(file, remove_linebreaks=0):
  1117. if file == '-':
  1118. try: data = sys.stdin.readlines()
  1119. except: Error(_('You must feed me with data on STDIN!'))
  1120. else:
  1121. try: f = open(file); data = f.readlines() ; f.close()
  1122. except: Error(_("Cannot read file:")+"\n %s"%file)
  1123. if remove_linebreaks:
  1124. data = map(lambda x:re.sub('[\n\r]+$','',x), data)
  1125. Message(_("Readed file (%d lines): %s")%(len(data),file),2)
  1126. return data
  1127. def Savefile(file, contents):
  1128. try: f = open(file, 'wb')
  1129. except: Error(_("Cannot open file for writing:")+"\n %s"%file)
  1130. if type(contents) == type([]): doit = f.writelines
  1131. else: doit = f.write
  1132. doit(contents) ; f.close()
  1133. def showdic(dic):
  1134. for k in dic.keys(): print "%15s : %s" % (k,dic[k])
  1135. def dotted_spaces(txt=''):
  1136. return string.replace(txt,' ','.')
  1137. def get_rc_path():
  1138. "Return the full path for the users' RC file"
  1139. rc_file = RC
  1140. # search the RC dir on the specified system variables
  1141. # TIP: win: http://www.winnetmag.com/Article/ArticleID/23873/23873.html
  1142. rc_dir_search = ['HOME', 'HOMEPATH']
  1143. for var in rc_dir_search:
  1144. rc_dir = os.environ.get(var)
  1145. if rc_dir: break
  1146. if rc_dir:
  1147. # compose path and return it if the file exists
  1148. rc_path = os.path.join(rc_dir, rc_file)
  1149. # on windows, prefix with the drive (%homedrive%: 2k/XP/NT)
  1150. if sys.platform[:3] == 'win':
  1151. rc_drive = os.environ.get('HOMEDRIVE')
  1152. rc_path = os.path.join(rc_drive,rc_path)
  1153. return rc_path
  1154. return ''
  1155. ##############################################################################
  1156. class CommandLine:
  1157. """
  1158. Command Line class - Masters command line
  1159. This class checks and extract data from the provided command line.
  1160. The --long options and flags are taken from the global OPTIONS,
  1161. FLAGS and ACTIONS dictionaries. The short options are registered
  1162. here, and also their equivalence to the long ones.
  1163. METHODS:
  1164. _compose_short_opts() -> str
  1165. _compose_long_opts() -> list
  1166. Compose the valid short and long options list, on the
  1167. 'getopt' format.
  1168. parse() -> (opts, args)
  1169. Call getopt to check and parse the command line.
  1170. It expects to receive the command line as a list, and
  1171. without the program name (sys.argv[1:]).
  1172. get_raw_config() -> [RAW config]
  1173. Scans command line and convert the data to the RAW config
  1174. format. See ConfigMaster class to the RAW format description.
  1175. Optional 'ignore' and 'filter' arguments are used to filter
  1176. in or out specified keys.
  1177. compose_cmdline(dict) -> [Command line]
  1178. Compose a command line list from an already parsed config
  1179. dictionary, generated from RAW by ConfigMaster(). Use
  1180. this to compose an optimal command line for a group of
  1181. options.
  1182. The get_raw_config() calls parse(), so the tipical use of this
  1183. class is:
  1184. raw = CommandLine().get_raw_config(sys.argv[1:])
  1185. """
  1186. def __init__(self):
  1187. self.all_options = OPTIONS.keys()
  1188. self.all_flags = FLAGS.keys()
  1189. self.all_actions = ACTIONS.keys()
  1190. # short:long options equivalence
  1191. self.short_long = {
  1192. 'h':'help' , 'V':'version',
  1193. 'n':'enum-title', 'i':'infile' ,
  1194. 'H':'no-headers', 'o':'outfile',
  1195. 'v':'verbose' , 't':'target' ,
  1196. 'q':'quiet'
  1197. }
  1198. # compose valid short and long options data for getopt
  1199. self.short_opts = self._compose_short_opts()
  1200. self.long_opts = self._compose_long_opts()
  1201. def _compose_short_opts(self):
  1202. "Returns a string like 'hVt:o' with all short options/flags"
  1203. ret = []
  1204. for opt in self.short_long.keys():
  1205. long = self.short_long[opt]
  1206. if long in self.all_options: # is flag or option?
  1207. opt = opt+':' # option: have param
  1208. ret.append(opt)
  1209. Debug('Valid SHORT options: %s'%ret)
  1210. return string.join(ret, '')
  1211. def _compose_long_opts(self):
  1212. "Returns a list with all the valid long options/flags"
  1213. ret = map(lambda x:x+'=', self.all_options) # add =
  1214. ret.extend(self.all_flags) # flag ON
  1215. ret.extend(self.all_actions) # acts
  1216. ret.extend(map(lambda x:'no-'+x, self.all_flags)) # add no-*
  1217. ret.extend(['no-style']) # turn

Large files files are truncated, but you can click here to view the full file