PageRenderTime 52ms CodeModel.GetById 18ms RepoModel.GetById 0ms app.codeStats 1ms

/compat/smartypants.py

https://bitbucket.org/resplin/byteflow
Python | 903 lines | 758 code | 84 blank | 61 comment | 114 complexity | b6e531498455a2794401fa4c5cc68313 MD5 | raw file
Possible License(s): BSD-3-Clause
  1. #!/usr/bin/python
  2. r"""
  3. ==============
  4. smartypants.py
  5. ==============
  6. ----------------------------
  7. SmartyPants ported to Python
  8. ----------------------------
  9. Ported by `Chad Miller`_
  10. Copyright (c) 2004, 2007 Chad Miller
  11. original `SmartyPants`_ by `John Gruber`_
  12. Copyright (c) 2003 John Gruber
  13. Synopsis
  14. ========
  15. A smart-quotes plugin for Pyblosxom_.
  16. The priginal "SmartyPants" is a free web publishing plug-in for Movable Type,
  17. Blosxom, and BBEdit that easily translates plain ASCII punctuation characters
  18. into "smart" typographic punctuation HTML entities.
  19. This software, *smartypants.py*, endeavours to be a functional port of
  20. SmartyPants to Python, for use with Pyblosxom_.
  21. Description
  22. ===========
  23. SmartyPants can perform the following transformations:
  24. - Straight quotes ( " and ' ) into "curly" quote HTML entities
  25. - Backticks-style quotes (\`\`like this'') into "curly" quote HTML entities
  26. - Dashes (``--`` and ``---``) into en- and em-dash entities
  27. - Three consecutive dots (``...`` or ``. . .``) into an ellipsis entity
  28. This means you can write, edit, and save your posts using plain old
  29. ASCII straight quotes, plain dashes, and plain dots, but your published
  30. posts (and final HTML output) will appear with smart quotes, em-dashes,
  31. and proper ellipses.
  32. SmartyPants does not modify characters within ``<pre>``, ``<code>``, ``<kbd>``,
  33. ``<math>`` or ``<script>`` tag blocks. Typically, these tags are used to
  34. display text where smart quotes and other "smart punctuation" would not be
  35. appropriate, such as source code or example markup.
  36. Backslash Escapes
  37. =================
  38. If you need to use literal straight quotes (or plain hyphens and
  39. periods), SmartyPants accepts the following backslash escape sequences
  40. to force non-smart punctuation. It does so by transforming the escape
  41. sequence into a decimal-encoded HTML entity:
  42. (FIXME: table here.)
  43. .. comment It sucks that there's a disconnect between the visual layout and table markup when special characters are involved.
  44. .. comment ====== ===== =========
  45. .. comment Escape Value Character
  46. .. comment ====== ===== =========
  47. .. comment \\\\\\\\ &#92; \\\\
  48. .. comment \\\\" &#34; "
  49. .. comment \\\\' &#39; '
  50. .. comment \\\\. &#46; .
  51. .. comment \\\\- &#45; \-
  52. .. comment \\\\` &#96; \`
  53. .. comment ====== ===== =========
  54. This is useful, for example, when you want to use straight quotes as
  55. foot and inch marks: 6'2" tall; a 17" iMac.
  56. Options
  57. =======
  58. For Pyblosxom users, the ``smartypants_attributes`` attribute is where you
  59. specify configuration options.
  60. Numeric values are the easiest way to configure SmartyPants' behavior:
  61. "0"
  62. Suppress all transformations. (Do nothing.)
  63. "1"
  64. Performs default SmartyPants transformations: quotes (including
  65. \`\`backticks'' -style), em-dashes, and ellipses. "``--``" (dash dash)
  66. is used to signify an em-dash; there is no support for en-dashes.
  67. "2"
  68. Same as smarty_pants="1", except that it uses the old-school typewriter
  69. shorthand for dashes: "``--``" (dash dash) for en-dashes, "``---``"
  70. (dash dash dash)
  71. for em-dashes.
  72. "3"
  73. Same as smarty_pants="2", but inverts the shorthand for dashes:
  74. "``--``" (dash dash) for em-dashes, and "``---``" (dash dash dash) for
  75. en-dashes.
  76. "-1"
  77. Stupefy mode. Reverses the SmartyPants transformation process, turning
  78. the HTML entities produced by SmartyPants into their ASCII equivalents.
  79. E.g. "&#8220;" is turned into a simple double-quote ("), "&#8212;" is
  80. turned into two dashes, etc.
  81. The following single-character attribute values can be combined to toggle
  82. individual transformations from within the smarty_pants attribute. For
  83. example, to educate normal quotes and em-dashes, but not ellipses or
  84. \`\`backticks'' -style quotes:
  85. ``py['smartypants_attributes'] = "1"``
  86. "q"
  87. Educates normal quote characters: (") and (').
  88. "b"
  89. Educates \`\`backticks'' -style double quotes.
  90. "B"
  91. Educates \`\`backticks'' -style double quotes and \`single' quotes.
  92. "d"
  93. Educates em-dashes.
  94. "D"
  95. Educates em-dashes and en-dashes, using old-school typewriter shorthand:
  96. (dash dash) for en-dashes, (dash dash dash) for em-dashes.
  97. "i"
  98. Educates em-dashes and en-dashes, using inverted old-school typewriter
  99. shorthand: (dash dash) for em-dashes, (dash dash dash) for en-dashes.
  100. "e"
  101. Educates ellipses.
  102. "w"
  103. Translates any instance of ``&quot;`` into a normal double-quote character.
  104. This should be of no interest to most people, but of particular interest
  105. to anyone who writes their posts using Dreamweaver, as Dreamweaver
  106. inexplicably uses this entity to represent a literal double-quote
  107. character. SmartyPants only educates normal quotes, not entities (because
  108. ordinarily, entities are used for the explicit purpose of representing the
  109. specific character they represent). The "w" option must be used in
  110. conjunction with one (or both) of the other quote options ("q" or "b").
  111. Thus, if you wish to apply all SmartyPants transformations (quotes, en-
  112. and em-dashes, and ellipses) and also translate ``&quot;`` entities into
  113. regular quotes so SmartyPants can educate them, you should pass the
  114. following to the smarty_pants attribute:
  115. The ``smartypants_forbidden_flavours`` list contains pyblosxom flavours for
  116. which no Smarty Pants rendering will occur.
  117. Caveats
  118. =======
  119. Why You Might Not Want to Use Smart Quotes in Your Weblog
  120. ---------------------------------------------------------
  121. For one thing, you might not care.
  122. Most normal, mentally stable individuals do not take notice of proper
  123. typographic punctuation. Many design and typography nerds, however, break
  124. out in a nasty rash when they encounter, say, a restaurant sign that uses
  125. a straight apostrophe to spell "Joe's".
  126. If you're the sort of person who just doesn't care, you might well want to
  127. continue not caring. Using straight quotes -- and sticking to the 7-bit
  128. ASCII character set in general -- is certainly a simpler way to live.
  129. Even if you I *do* care about accurate typography, you still might want to
  130. think twice before educating the quote characters in your weblog. One side
  131. effect of publishing curly quote HTML entities is that it makes your
  132. weblog a bit harder for others to quote from using copy-and-paste. What
  133. happens is that when someone copies text from your blog, the copied text
  134. contains the 8-bit curly quote characters (as well as the 8-bit characters
  135. for em-dashes and ellipses, if you use these options). These characters
  136. are not standard across different text encoding methods, which is why they
  137. need to be encoded as HTML entities.
  138. People copying text from your weblog, however, may not notice that you're
  139. using curly quotes, and they'll go ahead and paste the unencoded 8-bit
  140. characters copied from their browser into an email message or their own
  141. weblog. When pasted as raw "smart quotes", these characters are likely to
  142. get mangled beyond recognition.
  143. That said, my own opinion is that any decent text editor or email client
  144. makes it easy to stupefy smart quote characters into their 7-bit
  145. equivalents, and I don't consider it my problem if you're using an
  146. indecent text editor or email client.
  147. Algorithmic Shortcomings
  148. ------------------------
  149. One situation in which quotes will get curled the wrong way is when
  150. apostrophes are used at the start of leading contractions. For example:
  151. ``'Twas the night before Christmas.``
  152. In the case above, SmartyPants will turn the apostrophe into an opening
  153. single-quote, when in fact it should be a closing one. I don't think
  154. this problem can be solved in the general case -- every word processor
  155. I've tried gets this wrong as well. In such cases, it's best to use the
  156. proper HTML entity for closing single-quotes (``&#8217;``) by hand.
  157. Bugs
  158. ====
  159. To file bug reports or feature requests (other than topics listed in the
  160. Caveats section above) please send email to: mailto:smartypantspy@chad.org
  161. If the bug involves quotes being curled the wrong way, please send example
  162. text to illustrate.
  163. To Do list
  164. ----------
  165. - Provide a function for use within templates to quote anything at all.
  166. Version History
  167. ===============
  168. 1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400
  169. - Fixed bug where blocks of precious unalterable text was instead
  170. interpreted. Thanks to Le Roux and Dirk van Oosterbosch.
  171. 1.5_1.5: Sat, 13 Aug 2005 15:50:24 -0400
  172. - Fix bogus magical quotation when there is no hint that the
  173. user wants it, e.g., in "21st century". Thanks to Nathan Hamblen.
  174. - Be smarter about quotes before terminating numbers in an en-dash'ed
  175. range.
  176. 1.5_1.4: Thu, 10 Feb 2005 20:24:36 -0500
  177. - Fix a date-processing bug, as reported by jacob childress.
  178. - Begin a test-suite for ensuring correct output.
  179. - Removed import of "string", since I didn't really need it.
  180. (This was my first every Python program. Sue me!)
  181. 1.5_1.3: Wed, 15 Sep 2004 18:25:58 -0400
  182. - Abort processing if the flavour is in forbidden-list. Default of
  183. [ "rss" ] (Idea of Wolfgang SCHNERRING.)
  184. - Remove stray virgules from en-dashes. Patch by Wolfgang SCHNERRING.
  185. 1.5_1.2: Mon, 24 May 2004 08:14:54 -0400
  186. - Some single quotes weren't replaced properly. Diff-tesuji played
  187. by Benjamin GEIGER.
  188. 1.5_1.1: Sun, 14 Mar 2004 14:38:28 -0500
  189. - Support upcoming pyblosxom 0.9 plugin verification feature.
  190. 1.5_1.0: Tue, 09 Mar 2004 08:08:35 -0500
  191. - Initial release
  192. Version Information
  193. -------------------
  194. Version numbers will track the SmartyPants_ version numbers, with the addition
  195. of an underscore and the smartypants.py version on the end.
  196. New versions will be available at `http://wiki.chad.org/SmartyPantsPy`_
  197. .. _http://wiki.chad.org/SmartyPantsPy: http://wiki.chad.org/SmartyPantsPy
  198. Authors
  199. =======
  200. `John Gruber`_ did all of the hard work of writing this software in Perl for
  201. `Movable Type`_ and almost all of this useful documentation. `Chad Miller`_
  202. ported it to Python to use with Pyblosxom_.
  203. Additional Credits
  204. ==================
  205. Portions of the SmartyPants original work are based on Brad Choate's nifty
  206. MTRegex plug-in. `Brad Choate`_ also contributed a few bits of source code to
  207. this plug-in. Brad Choate is a fine hacker indeed.
  208. `Jeremy Hedley`_ and `Charles Wiltgen`_ deserve mention for exemplary beta
  209. testing of the original SmartyPants.
  210. `Rael Dornfest`_ ported SmartyPants to Blosxom.
  211. .. _Brad Choate: http://bradchoate.com/
  212. .. _Jeremy Hedley: http://antipixel.com/
  213. .. _Charles Wiltgen: http://playbacktime.com/
  214. .. _Rael Dornfest: http://raelity.org/
  215. Copyright and License
  216. =====================
  217. SmartyPants_ license::
  218. Copyright (c) 2003 John Gruber
  219. (http://daringfireball.net/)
  220. All rights reserved.
  221. Redistribution and use in source and binary forms, with or without
  222. modification, are permitted provided that the following conditions are
  223. met:
  224. * Redistributions of source code must retain the above copyright
  225. notice, this list of conditions and the following disclaimer.
  226. * Redistributions in binary form must reproduce the above copyright
  227. notice, this list of conditions and the following disclaimer in
  228. the documentation and/or other materials provided with the
  229. distribution.
  230. * Neither the name "SmartyPants" nor the names of its contributors
  231. may be used to endorse or promote products derived from this
  232. software without specific prior written permission.
  233. This software is provided by the copyright holders and contributors "as
  234. is" and any express or implied warranties, including, but not limited
  235. to, the implied warranties of merchantability and fitness for a
  236. particular purpose are disclaimed. In no event shall the copyright
  237. owner or contributors be liable for any direct, indirect, incidental,
  238. special, exemplary, or consequential damages (including, but not
  239. limited to, procurement of substitute goods or services; loss of use,
  240. data, or profits; or business interruption) however caused and on any
  241. theory of liability, whether in contract, strict liability, or tort
  242. (including negligence or otherwise) arising in any way out of the use
  243. of this software, even if advised of the possibility of such damage.
  244. smartypants.py license::
  245. smartypants.py is a derivative work of SmartyPants.
  246. Redistribution and use in source and binary forms, with or without
  247. modification, are permitted provided that the following conditions are
  248. met:
  249. * Redistributions of source code must retain the above copyright
  250. notice, this list of conditions and the following disclaimer.
  251. * Redistributions in binary form must reproduce the above copyright
  252. notice, this list of conditions and the following disclaimer in
  253. the documentation and/or other materials provided with the
  254. distribution.
  255. This software is provided by the copyright holders and contributors "as
  256. is" and any express or implied warranties, including, but not limited
  257. to, the implied warranties of merchantability and fitness for a
  258. particular purpose are disclaimed. In no event shall the copyright
  259. owner or contributors be liable for any direct, indirect, incidental,
  260. special, exemplary, or consequential damages (including, but not
  261. limited to, procurement of substitute goods or services; loss of use,
  262. data, or profits; or business interruption) however caused and on any
  263. theory of liability, whether in contract, strict liability, or tort
  264. (including negligence or otherwise) arising in any way out of the use
  265. of this software, even if advised of the possibility of such damage.
  266. .. _John Gruber: http://daringfireball.net/
  267. .. _Chad Miller: http://web.chad.org/
  268. .. _Pyblosxom: http://roughingit.subtlehints.net/pyblosxom
  269. .. _SmartyPants: http://daringfireball.net/projects/smartypants/
  270. .. _Movable Type: http://www.movabletype.org/
  271. """
  272. default_smartypants_attr = "1"
  273. import re
  274. tags_to_skip_regex = re.compile(r"<(/)?(pre|code|kbd|script|math)[^>]*>", re.I)
  275. def verify_installation(request):
  276. return 1
  277. # assert the plugin is functional
  278. def cb_story(args):
  279. global default_smartypants_attr
  280. try:
  281. forbidden_flavours = args["entry"]["smartypants_forbidden_flavours"]
  282. except KeyError:
  283. forbidden_flavours = [ "rss" ]
  284. try:
  285. attributes = args["entry"]["smartypants_attributes"]
  286. except KeyError:
  287. attributes = default_smartypants_attr
  288. if attributes is None:
  289. attributes = default_smartypants_attr
  290. entryData = args["entry"].getData()
  291. try:
  292. if args["request"]["flavour"] in forbidden_flavours:
  293. return
  294. except KeyError:
  295. if "&lt;" in args["entry"]["body"][0:15]: # sniff the stream
  296. return # abort if it looks like escaped HTML. FIXME
  297. # FIXME: make these configurable, perhaps?
  298. args["entry"]["body"] = smartyPants(entryData, attributes)
  299. args["entry"]["title"] = smartyPants(args["entry"]["title"], attributes)
  300. ### interal functions below here
  301. def smartyPants(text, attr=default_smartypants_attr):
  302. convert_quot = False # should we translate &quot; entities into normal quotes?
  303. # Parse attributes:
  304. # 0 : do nothing
  305. # 1 : set all
  306. # 2 : set all, using old school en- and em- dash shortcuts
  307. # 3 : set all, using inverted old school en and em- dash shortcuts
  308. #
  309. # q : quotes
  310. # b : backtick quotes (``double'' only)
  311. # B : backtick quotes (``double'' and `single')
  312. # d : dashes
  313. # D : old school dashes
  314. # i : inverted old school dashes
  315. # e : ellipses
  316. # w : convert &quot; entities to " for Dreamweaver users
  317. skipped_tag_stack = []
  318. do_dashes = "0"
  319. do_backticks = "0"
  320. do_quotes = "0"
  321. do_ellipses = "0"
  322. do_stupefy = "0"
  323. if attr == "0":
  324. # Do nothing.
  325. return text
  326. elif attr == "1":
  327. do_quotes = "1"
  328. do_backticks = "1"
  329. do_dashes = "1"
  330. do_ellipses = "1"
  331. elif attr == "2":
  332. # Do everything, turn all options on, use old school dash shorthand.
  333. do_quotes = "1"
  334. do_backticks = "1"
  335. do_dashes = "2"
  336. do_ellipses = "1"
  337. elif attr == "3":
  338. # Do everything, turn all options on, use inverted old school dash shorthand.
  339. do_quotes = "1"
  340. do_backticks = "1"
  341. do_dashes = "3"
  342. do_ellipses = "1"
  343. elif attr == "-1":
  344. # Special "stupefy" mode.
  345. do_stupefy = "1"
  346. else:
  347. for c in attr:
  348. if c == "q": do_quotes = "1"
  349. elif c == "b": do_backticks = "1"
  350. elif c == "B": do_backticks = "2"
  351. elif c == "d": do_dashes = "1"
  352. elif c == "D": do_dashes = "2"
  353. elif c == "i": do_dashes = "3"
  354. elif c == "e": do_ellipses = "1"
  355. elif c == "w": convert_quot = "1"
  356. else:
  357. pass
  358. # ignore unknown option
  359. tokens = _tokenize(text)
  360. result = []
  361. in_pre = False
  362. prev_token_last_char = ""
  363. # This is a cheat, used to get some context
  364. # for one-character tokens that consist of
  365. # just a quote char. What we do is remember
  366. # the last character of the previous text
  367. # token, to use as context to curl single-
  368. # character quote tokens correctly.
  369. for cur_token in tokens:
  370. if cur_token[0] == "tag":
  371. # Don't mess with quotes inside some tags. This does not handle self <closing/> tags!
  372. result.append(cur_token[1])
  373. skip_match = tags_to_skip_regex.match(cur_token[1])
  374. if skip_match is not None:
  375. if not skip_match.group(1):
  376. skipped_tag_stack.append(skip_match.group(2).lower())
  377. in_pre = True
  378. else:
  379. if len(skipped_tag_stack) > 0:
  380. if skip_match.group(2).lower() == skipped_tag_stack[-1]:
  381. skipped_tag_stack.pop()
  382. else:
  383. pass
  384. # This close doesn't match the open. This isn't XHTML. We should barf here.
  385. if len(skipped_tag_stack) == 0:
  386. in_pre = False
  387. else:
  388. t = cur_token[1]
  389. last_char = t[-1:] # Remember last char of this token before processing.
  390. if not in_pre:
  391. oldstr = t
  392. t = processEscapes(t)
  393. if convert_quot != "0":
  394. t = re.sub('&quot;', '"', t)
  395. if do_dashes != "0":
  396. if do_dashes == "1":
  397. t = educateDashes(t)
  398. if do_dashes == "2":
  399. t = educateDashesOldSchool(t)
  400. if do_dashes == "3":
  401. t = educateDashesOldSchoolInverted(t)
  402. if do_ellipses != "0":
  403. t = educateEllipses(t)
  404. # Note: backticks need to be processed before quotes.
  405. if do_backticks != "0":
  406. t = educateBackticks(t)
  407. if do_backticks == "2":
  408. t = educateSingleBackticks(t)
  409. if do_quotes != "0":
  410. if t == "'":
  411. # Special case: single-character ' token
  412. if re.match("\S", prev_token_last_char):
  413. t = "&#8217;"
  414. else:
  415. t = "&#8216;"
  416. elif t == '"':
  417. # Special case: single-character " token
  418. if re.match("\S", prev_token_last_char):
  419. t = "&#8221;"
  420. else:
  421. t = "&#8220;"
  422. else:
  423. # Normal case:
  424. t = educateQuotes(t)
  425. if do_stupefy == "1":
  426. t = stupefyEntities(t)
  427. prev_token_last_char = last_char
  428. result.append(t)
  429. return "".join(result)
  430. def educateQuotes(str):
  431. """
  432. Parameter: String.
  433. Returns: The string, with "educated" curly quote HTML entities.
  434. Example input: "Isn't this fun?"
  435. Example output: &#8220;Isn&#8217;t this fun?&#8221;
  436. """
  437. oldstr = str
  438. punct_class = r"""[!"#\$\%'()*+,-.\/:;<=>?\@\[\\\]\^_`{|}~]"""
  439. # Special case if the very first character is a quote
  440. # followed by punctuation at a non-word-break. Close the quotes by brute force:
  441. str = re.sub(r"""^'(?=%s\\B)""" % (punct_class,), r"""&#8217;""", str)
  442. str = re.sub(r"""^"(?=%s\\B)""" % (punct_class,), r"""&#8221;""", str)
  443. # Special case for double sets of quotes, e.g.:
  444. # <p>He said, "'Quoted' words in a larger quote."</p>
  445. str = re.sub(r""""'(?=\w)""", """&#8220;&#8216;""", str)
  446. str = re.sub(r"""'"(?=\w)""", """&#8216;&#8220;""", str)
  447. # Special case for decade abbreviations (the '80s):
  448. str = re.sub(r"""\b'(?=\d{2}s)""", r"""&#8217;""", str)
  449. close_class = r"""[^\ \t\r\n\[\{\(\-]"""
  450. dec_dashes = r"""&#8211;|&#8212;"""
  451. # Get most opening single quotes:
  452. opening_single_quotes_regex = re.compile(r"""
  453. (
  454. \s | # a whitespace char, or
  455. &nbsp; | # a non-breaking space entity, or
  456. -- | # dashes, or
  457. &[mn]dash; | # named dash entities
  458. %s | # or decimal entities
  459. &\#x201[34]; # or hex
  460. )
  461. ' # the quote
  462. (?=\w) # followed by a word character
  463. """ % (dec_dashes,), re.VERBOSE)
  464. str = opening_single_quotes_regex.sub(r"""\1&#8216;""", str)
  465. closing_single_quotes_regex = re.compile(r"""
  466. (%s)
  467. '
  468. (?!\s | s\b | \d)
  469. """ % (close_class,), re.VERBOSE)
  470. str = closing_single_quotes_regex.sub(r"""\1&#8217;""", str)
  471. closing_single_quotes_regex = re.compile(r"""
  472. (%s)
  473. '
  474. (\s | s\b)
  475. """ % (close_class,), re.VERBOSE)
  476. str = closing_single_quotes_regex.sub(r"""\1&#8217;\2""", str)
  477. # Any remaining single quotes should be opening ones:
  478. str = re.sub(r"""'""", r"""&#8216;""", str)
  479. # Get most opening double quotes:
  480. opening_double_quotes_regex = re.compile(r"""
  481. (
  482. \s | # a whitespace char, or
  483. &nbsp; | # a non-breaking space entity, or
  484. -- | # dashes, or
  485. &[mn]dash; | # named dash entities
  486. %s | # or decimal entities
  487. &\#x201[34]; # or hex
  488. )
  489. " # the quote
  490. (?=\w) # followed by a word character
  491. """ % (dec_dashes,), re.VERBOSE)
  492. str = opening_double_quotes_regex.sub(r"""\1&#8220;""", str)
  493. # Double closing quotes:
  494. closing_double_quotes_regex = re.compile(r"""
  495. #(%s)? # character that indicates the quote should be closing
  496. "
  497. (?=\s)
  498. """ % (close_class,), re.VERBOSE)
  499. str = closing_double_quotes_regex.sub(r"""&#8221;""", str)
  500. closing_double_quotes_regex = re.compile(r"""
  501. (%s) # character that indicates the quote should be closing
  502. "
  503. """ % (close_class,), re.VERBOSE)
  504. str = closing_double_quotes_regex.sub(r"""\1&#8221;""", str)
  505. # Any remaining quotes should be opening ones.
  506. str = re.sub(r'"', r"""&#8220;""", str)
  507. return str
  508. def educateBackticks(str):
  509. """
  510. Parameter: String.
  511. Returns: The string, with ``backticks'' -style double quotes
  512. translated into HTML curly quote entities.
  513. Example input: ``Isn't this fun?''
  514. Example output: &#8220;Isn't this fun?&#8221;
  515. """
  516. str = re.sub(r"""``""", r"""&#8220;""", str)
  517. str = re.sub(r"""''""", r"""&#8221;""", str)
  518. return str
  519. def educateSingleBackticks(str):
  520. """
  521. Parameter: String.
  522. Returns: The string, with `backticks' -style single quotes
  523. translated into HTML curly quote entities.
  524. Example input: `Isn't this fun?'
  525. Example output: &#8216;Isn&#8217;t this fun?&#8217;
  526. """
  527. str = re.sub(r"""`""", r"""&#8216;""", str)
  528. str = re.sub(r"""'""", r"""&#8217;""", str)
  529. return str
  530. def educateDashes(str):
  531. """
  532. Parameter: String.
  533. Returns: The string, with each instance of "--" translated to
  534. an em-dash HTML entity.
  535. """
  536. str = re.sub(r"""---""", r"""&#8211;""", str) # en (yes, backwards)
  537. str = re.sub(r"""--""", r"""&#8212;""", str) # em (yes, backwards)
  538. return str
  539. def educateDashesOldSchool(str):
  540. """
  541. Parameter: String.
  542. Returns: The string, with each instance of "--" translated to
  543. an en-dash HTML entity, and each "---" translated to
  544. an em-dash HTML entity.
  545. """
  546. str = re.sub(r"""---""", r"""&#8212;""", str) # em (yes, backwards)
  547. str = re.sub(r"""--""", r"""&#8211;""", str) # en (yes, backwards)
  548. return str
  549. def educateDashesOldSchoolInverted(str):
  550. """
  551. Parameter: String.
  552. Returns: The string, with each instance of "--" translated to
  553. an em-dash HTML entity, and each "---" translated to
  554. an en-dash HTML entity. Two reasons why: First, unlike the
  555. en- and em-dash syntax supported by
  556. EducateDashesOldSchool(), it's compatible with existing
  557. entries written before SmartyPants 1.1, back when "--" was
  558. only used for em-dashes. Second, em-dashes are more
  559. common than en-dashes, and so it sort of makes sense that
  560. the shortcut should be shorter to type. (Thanks to Aaron
  561. Swartz for the idea.)
  562. """
  563. str = re.sub(r"""---""", r"""&#8211;""", str) # em
  564. str = re.sub(r"""--""", r"""&#8212;""", str) # en
  565. return str
  566. def educateEllipses(str):
  567. """
  568. Parameter: String.
  569. Returns: The string, with each instance of "..." translated to
  570. an ellipsis HTML entity.
  571. Example input: Huh...?
  572. Example output: Huh&#8230;?
  573. """
  574. str = re.sub(r"""\.\.\.""", r"""&#8230;""", str)
  575. str = re.sub(r"""\. \. \.""", r"""&#8230;""", str)
  576. return str
  577. def stupefyEntities(str):
  578. """
  579. Parameter: String.
  580. Returns: The string, with each SmartyPants HTML entity translated to
  581. its ASCII counterpart.
  582. Example input: &#8220;Hello &#8212; world.&#8221;
  583. Example output: "Hello -- world."
  584. """
  585. str = re.sub(r"""&#8211;""", r"""-""", str) # en-dash
  586. str = re.sub(r"""&#8212;""", r"""--""", str) # em-dash
  587. str = re.sub(r"""&#8216;""", r"""'""", str) # open single quote
  588. str = re.sub(r"""&#8217;""", r"""'""", str) # close single quote
  589. str = re.sub(r"""&#8220;""", r'''"''', str) # open double quote
  590. str = re.sub(r"""&#8221;""", r'''"''', str) # close double quote
  591. str = re.sub(r"""&#8230;""", r"""...""", str)# ellipsis
  592. return str
  593. def processEscapes(str):
  594. r"""
  595. Parameter: String.
  596. Returns: The string, with after processing the following backslash
  597. escape sequences. This is useful if you want to force a "dumb"
  598. quote or other character to appear.
  599. Escape Value
  600. ------ -----
  601. \\ &#92;
  602. \" &#34;
  603. \' &#39;
  604. \. &#46;
  605. \- &#45;
  606. \` &#96;
  607. """
  608. str = re.sub(r"""\\\\""", r"""&#92;""", str)
  609. str = re.sub(r'''\\"''', r"""&#34;""", str)
  610. str = re.sub(r"""\\'""", r"""&#39;""", str)
  611. str = re.sub(r"""\\\.""", r"""&#46;""", str)
  612. str = re.sub(r"""\\-""", r"""&#45;""", str)
  613. str = re.sub(r"""\\`""", r"""&#96;""", str)
  614. return str
  615. def _tokenize(str):
  616. """
  617. Parameter: String containing HTML markup.
  618. Returns: Reference to an array of the tokens comprising the input
  619. string. Each token is either a tag (possibly with nested,
  620. tags contained therein, such as <a href="<MTFoo>">, or a
  621. run of text between tags. Each element of the array is a
  622. two-element array; the first is either 'tag' or 'text';
  623. the second is the actual value.
  624. Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin.
  625. <http://www.bradchoate.com/past/mtregex.php>
  626. """
  627. pos = 0
  628. length = len(str)
  629. tokens = []
  630. depth = 6
  631. nested_tags = "|".join(['(?:<(?:[^<>]',] * depth) + (')*>)' * depth)
  632. #match = r"""(?: <! ( -- .*? -- \s* )+ > ) | # comments
  633. # (?: <\? .*? \?> ) | # directives
  634. # %s # nested tags """ % (nested_tags,)
  635. tag_soup = re.compile(r"""([^<]*)(<[^>]*>)""")
  636. token_match = tag_soup.search(str)
  637. previous_end = 0
  638. while token_match is not None:
  639. if token_match.group(1):
  640. tokens.append(['text', token_match.group(1)])
  641. tokens.append(['tag', token_match.group(2)])
  642. previous_end = token_match.end()
  643. token_match = tag_soup.search(str, token_match.end())
  644. if previous_end < len(str):
  645. tokens.append(['text', str[previous_end:]])
  646. return tokens
  647. if __name__ == "__main__":
  648. import locale
  649. try:
  650. locale.setlocale(locale.LC_ALL, '')
  651. except:
  652. pass
  653. from docutils.core import publish_string
  654. docstring_html = publish_string(__doc__, writer_name='html')
  655. print docstring_html
  656. # Unit test output goes out stderr. No worries.
  657. import unittest
  658. sp = smartyPants
  659. class TestSmartypantsAllAttributes(unittest.TestCase):
  660. # the default attribute is "1", which means "all".
  661. def test_dates(self):
  662. self.assertEqual(sp("1440-80's"), "1440-80&#8217;s")
  663. self.assertEqual(sp("1440-'80s"), "1440-&#8216;80s")
  664. self.assertEqual(sp("1440---'80s"), "1440&#8211;&#8216;80s")
  665. self.assertEqual(sp("1960s"), "1960s") # no effect.
  666. self.assertEqual(sp("1960's"), "1960&#8217;s")
  667. self.assertEqual(sp("one two '60s"), "one two &#8216;60s")
  668. self.assertEqual(sp("'60s"), "&#8216;60s")
  669. def test_skip_tags(self):
  670. self.assertEqual(
  671. sp("""<script type="text/javascript">\n<!--\nvar href = "http://www.google.com";\nvar linktext = "google";\ndocument.write('<a href="' + href + '">' + linktext + "</a>");\n//-->\n</script>"""),
  672. """<script type="text/javascript">\n<!--\nvar href = "http://www.google.com";\nvar linktext = "google";\ndocument.write('<a href="' + href + '">' + linktext + "</a>");\n//-->\n</script>""")
  673. self.assertEqual(
  674. sp("""<p>He said &quot;Let's write some code.&quot; This code here <code>if True:\n\tprint &quot;Okay&quot;</code> is python code.</p>"""),
  675. """<p>He said &#8220;Let&#8217;s write some code.&#8221; This code here <code>if True:\n\tprint &quot;Okay&quot;</code> is python code.</p>""")
  676. def test_ordinal_numbers(self):
  677. self.assertEqual(sp("21st century"), "21st century") # no effect.
  678. self.assertEqual(sp("3rd"), "3rd") # no effect.
  679. def test_educated_quotes(self):
  680. self.assertEqual(sp('''"Isn't this fun?"'''), '''&#8220;Isn&#8217;t this fun?&#8221;''')
  681. unittest.main()
  682. __author__ = "Chad Miller <smartypantspy@chad.org>"
  683. __version__ = "1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400"
  684. __url__ = "http://wiki.chad.org/SmartyPantsPy"
  685. __description__ = "Smart-quotes, smart-ellipses, and smart-dashes for weblog entries in pyblosxom"