/tools/Ruby/lib/ruby/1.8/scanf.rb

http://github.com/agross/netopenspace · Ruby · 702 lines · 269 code · 75 blank · 358 comment · 51 complexity · d7abaeda0efae0483b58bf42dd1cdf88 MD5 · raw file

  1. # scanf for Ruby
  2. #
  3. # $Revision: 21682 $
  4. # $Id: scanf.rb 21682 2009-01-20 03:23:46Z shyouhei $
  5. # $Author: shyouhei $
  6. # $Date: 2009-01-20 12:23:46 +0900 (Tue, 20 Jan 2009) $
  7. #
  8. # A product of the Austin Ruby Codefest (Austin, Texas, August 2002)
  9. =begin
  10. =scanf for Ruby
  11. ==Description
  12. scanf for Ruby is an implementation of the C function scanf(3),
  13. modified as necessary for Ruby compatibility.
  14. The methods provided are String#scanf, IO#scanf, and
  15. Kernel#scanf. Kernel#scanf is a wrapper around STDIN.scanf. IO#scanf
  16. can be used on any IO stream, including file handles and sockets.
  17. scanf can be called either with or without a block.
  18. scanf for Ruby scans an input string or stream according to a
  19. <b>format</b>, as described below ("Conversions"), and returns an
  20. array of matches between the format and the input. The format is
  21. defined in a string, and is similar (though not identical) to the
  22. formats used in Kernel#printf and Kernel#sprintf.
  23. The format may contain <b>conversion specifiers</b>, which tell scanf
  24. what form (type) each particular matched substring should be converted
  25. to (e.g., decimal integer, floating point number, literal string,
  26. etc.) The matches and conversions take place from left to right, and
  27. the conversions themselves are returned as an array.
  28. The format string may also contain characters other than those in the
  29. conversion specifiers. White space (blanks, tabs, or newlines) in the
  30. format string matches any amount of white space, including none, in
  31. the input. Everything else matches only itself.
  32. Scanning stops, and scanf returns, when any input character fails to
  33. match the specifications in the format string, or when input is
  34. exhausted, or when everything in the format string has been
  35. matched. All matches found up to the stopping point are returned in
  36. the return array (or yielded to the block, if a block was given).
  37. ==Basic usage
  38. require 'scanf.rb'
  39. # String#scanf and IO#scanf take a single argument (a format string)
  40. array = aString.scanf("%d%s")
  41. array = anIO.scanf("%d%s")
  42. # Kernel#scanf reads from STDIN
  43. array = scanf("%d%s")
  44. ==Block usage
  45. When called with a block, scanf keeps scanning the input, cycling back
  46. to the beginning of the format string, and yields a new array of
  47. conversions to the block every time the format string is matched
  48. (including partial matches, but not including complete failures). The
  49. actual return value of scanf when called with a block is an array
  50. containing the results of all the executions of the block.
  51. str = "123 abc 456 def 789 ghi"
  52. str.scanf("%d%s") { |num,str| [ num * 2, str.upcase ] }
  53. # => [[246, "ABC"], [912, "DEF"], [1578, "GHI"]]
  54. ==Conversions
  55. The single argument to scanf is a format string, which generally
  56. includes one or more conversion specifiers. Conversion specifiers
  57. begin with the percent character ('%') and include information about
  58. what scanf should next scan for (string, decimal number, single
  59. character, etc.).
  60. There may be an optional maximum field width, expressed as a decimal
  61. integer, between the % and the conversion. If no width is given, a
  62. default of `infinity' is used (with the exception of the %c specifier;
  63. see below). Otherwise, given a field width of <em>n</em> for a given
  64. conversion, at most <em>n</em> characters are scanned in processing
  65. that conversion. Before conversion begins, most conversions skip
  66. white space in the input string; this white space is not counted
  67. against the field width.
  68. The following conversions are available. (See the files EXAMPLES
  69. and <tt>tests/scanftests.rb</tt> for examples.)
  70. [%]
  71. Matches a literal `%'. That is, `%%' in the format string matches a
  72. single input `%' character. No conversion is done, and the resulting
  73. '%' is not included in the return array.
  74. [d]
  75. Matches an optionally signed decimal integer.
  76. [u]
  77. Same as d.
  78. [i]
  79. Matches an optionally signed integer. The integer is read in base
  80. 16 if it begins with `0x' or `0X', in base 8 if it begins with `0',
  81. and in base 10 other- wise. Only characters that correspond to the
  82. base are recognized.
  83. [o]
  84. Matches an optionally signed octal integer.
  85. [x,X]
  86. Matches an optionally signed hexadecimal integer,
  87. [f,g,e,E]
  88. Matches an optionally signed floating-point number.
  89. [s]
  90. Matches a sequence of non-white-space character. The input string stops at
  91. white space or at the maximum field width, whichever occurs first.
  92. [c]
  93. Matches a single character, or a sequence of <em>n</em> characters if a
  94. field width of <em>n</em> is specified. The usual skip of leading white
  95. space is suppressed. To skip white space first, use an explicit space in
  96. the format.
  97. [<tt>[</tt>]
  98. Matches a nonempty sequence of characters from the specified set
  99. of accepted characters. The usual skip of leading white space is
  100. suppressed. This bracketed sub-expression is interpreted exactly like a
  101. character class in a Ruby regular expression. (In fact, it is placed as-is
  102. in a regular expression.) The matching against the input string ends with
  103. the appearance of a character not in (or, with a circumflex, in) the set,
  104. or when the field width runs out, whichever comes first.
  105. ===Assignment suppression
  106. To require that a particular match occur, but without including the result
  107. in the return array, place the <b>assignment suppression flag</b>, which is
  108. the star character ('*'), immediately after the leading '%' of a format
  109. specifier (just before the field width, if any).
  110. ==Examples
  111. See the files <tt>EXAMPLES</tt> and <tt>tests/scanftests.rb</tt>.
  112. ==scanf for Ruby compared with scanf in C
  113. scanf for Ruby is based on the C function scanf(3), but with modifications,
  114. dictated mainly by the underlying differences between the languages.
  115. ===Unimplemented flags and specifiers
  116. * The only flag implemented in scanf for Ruby is '<tt>*</tt>' (ignore
  117. upcoming conversion). Many of the flags available in C versions of scanf(4)
  118. have to do with the type of upcoming pointer arguments, and are literally
  119. meaningless in Ruby.
  120. * The <tt>n</tt> specifier (store number of characters consumed so far in
  121. next pointer) is not implemented.
  122. * The <tt>p</tt> specifier (match a pointer value) is not implemented.
  123. ===Altered specifiers
  124. [o,u,x,X]
  125. In scanf for Ruby, all of these specifiers scan for an optionally signed
  126. integer, rather than for an unsigned integer like their C counterparts.
  127. ===Return values
  128. scanf for Ruby returns an array of successful conversions, whereas
  129. scanf(3) returns the number of conversions successfully
  130. completed. (See below for more details on scanf for Ruby's return
  131. values.)
  132. ==Return values
  133. Without a block, scanf returns an array containing all the conversions
  134. it has found. If none are found, scanf will return an empty array. An
  135. unsuccesful match is never ignored, but rather always signals the end
  136. of the scanning operation. If the first unsuccessful match takes place
  137. after one or more successful matches have already taken place, the
  138. returned array will contain the results of those successful matches.
  139. With a block scanf returns a 'map'-like array of transformations from
  140. the block -- that is, an array reflecting what the block did with each
  141. yielded result from the iterative scanf operation. (See "Block
  142. usage", above.)
  143. ==Test suite
  144. scanf for Ruby includes a suite of unit tests (requiring the
  145. <tt>TestUnit</tt> package), which can be run with the command <tt>ruby
  146. tests/scanftests.rb</tt> or the command <tt>make test</tt>.
  147. ==Current limitations and bugs
  148. When using IO#scanf under Windows, make sure you open your files in
  149. binary mode:
  150. File.open("filename", "rb")
  151. so that scanf can keep track of characters correctly.
  152. Support for character classes is reasonably complete (since it
  153. essentially piggy-backs on Ruby's regular expression handling of
  154. character classes), but users are advised that character class testing
  155. has not been exhaustive, and that they should exercise some caution
  156. in using any of the more complex and/or arcane character class
  157. idioms.
  158. ==Technical notes
  159. ===Rationale behind scanf for Ruby
  160. The impetus for a scanf implementation in Ruby comes chiefly from the fact
  161. that existing pattern matching operations, such as Regexp#match and
  162. String#scan, return all results as strings, which have to be converted to
  163. integers or floats explicitly in cases where what's ultimately wanted are
  164. integer or float values.
  165. ===Design of scanf for Ruby
  166. scanf for Ruby is essentially a <format string>-to-<regular
  167. expression> converter.
  168. When scanf is called, a FormatString object is generated from the
  169. format string ("%d%s...") argument. The FormatString object breaks the
  170. format string down into atoms ("%d", "%5f", "blah", etc.), and from
  171. each atom it creates a FormatSpecifier object, which it
  172. saves.
  173. Each FormatSpecifier has a regular expression fragment and a "handler"
  174. associated with it. For example, the regular expression fragment
  175. associated with the format "%d" is "([-+]?\d+)", and the handler
  176. associated with it is a wrapper around String#to_i. scanf itself calls
  177. FormatString#match, passing in the input string. FormatString#match
  178. iterates through its FormatSpecifiers; for each one, it matches the
  179. corresponding regular expression fragment against the string. If
  180. there's a match, it sends the matched string to the handler associated
  181. with the FormatSpecifier.
  182. Thus, to follow up the "%d" example: if "123" occurs in the input
  183. string when a FormatSpecifier consisting of "%d" is reached, the "123"
  184. will be matched against "([-+]?\d+)", and the matched string will be
  185. rendered into an integer by a call to to_i.
  186. The rendered match is then saved to an accumulator array, and the
  187. input string is reduced to the post-match substring. Thus the string
  188. is "eaten" from the left as the FormatSpecifiers are applied in
  189. sequence. (This is done to a duplicate string; the original string is
  190. not altered.)
  191. As soon as a regular expression fragment fails to match the string, or
  192. when the FormatString object runs out of FormatSpecifiers, scanning
  193. stops and results accumulated so far are returned in an array.
  194. ==License and copyright
  195. Copyright:: (c) 2002-2003 David Alan Black
  196. License:: Distributed on the same licensing terms as Ruby itself
  197. ==Warranty disclaimer
  198. This software is provided "as is" and without any express or implied
  199. warranties, including, without limitation, the implied warranties of
  200. merchantibility and fitness for a particular purpose.
  201. ==Credits and acknowledgements
  202. scanf for Ruby was developed as the major activity of the Austin
  203. Ruby Codefest (Austin, Texas, August 2002).
  204. Principal author:: David Alan Black (mailto:dblack@superlink.net)
  205. Co-author:: Hal Fulton (mailto:hal9000@hypermetrics.com)
  206. Project contributors:: Nolan Darilek, Jason Johnston
  207. Thanks to Hal Fulton for hosting the Codefest.
  208. Thanks to Matz for suggestions about the class design.
  209. Thanks to Gavin Sinclair for some feedback on the documentation.
  210. The text for parts of this document, especially the Description and
  211. Conversions sections, above, were adapted from the Linux Programmer's
  212. Manual manpage for scanf(3), dated 1995-11-01.
  213. ==Bugs and bug reports
  214. scanf for Ruby is based on something of an amalgam of C scanf
  215. implementations and documentation, rather than on a single canonical
  216. description. Suggestions for features and behaviors which appear in
  217. other scanfs, and would be meaningful in Ruby, are welcome, as are
  218. reports of suspicious behaviors and/or bugs. (Please see "Credits and
  219. acknowledgements", above, for email addresses.)
  220. =end
  221. module Scanf
  222. class FormatSpecifier
  223. attr_reader :re_string, :matched_string, :conversion, :matched
  224. private
  225. def skip; /^\s*%\*/.match(@spec_string); end
  226. def extract_float(s); s.to_f if s &&! skip; end
  227. def extract_decimal(s); s.to_i if s &&! skip; end
  228. def extract_hex(s); s.hex if s &&! skip; end
  229. def extract_octal(s); s.oct if s &&! skip; end
  230. def extract_integer(s); Integer(s) if s &&! skip; end
  231. def extract_plain(s); s unless skip; end
  232. def nil_proc(s); nil; end
  233. public
  234. def to_s
  235. @spec_string
  236. end
  237. def count_space?
  238. /(?:\A|\S)%\*?\d*c|\[/.match(@spec_string)
  239. end
  240. def initialize(str)
  241. @spec_string = str
  242. h = '[A-Fa-f0-9]'
  243. @re_string, @handler =
  244. case @spec_string
  245. # %[[:...:]]
  246. when /%\*?(\[\[:[a-z]+:\]\])/
  247. [ "(#{$1}+)", :extract_plain ]
  248. # %5[[:...:]]
  249. when /%\*?(\d+)(\[\[:[a-z]+:\]\])/
  250. [ "(#{$2}{1,#{$1}})", :extract_plain ]
  251. # %[...]
  252. when /%\*?\[([^\]]*)\]/
  253. yes = $1
  254. if /^\^/.match(yes) then no = yes[1..-1] else no = '^' + yes end
  255. [ "([#{yes}]+)(?=[#{no}]|\\z)", :extract_plain ]
  256. # %5[...]
  257. when /%\*?(\d+)\[([^\]]*)\]/
  258. yes = $2
  259. w = $1
  260. [ "([#{yes}]{1,#{w}})", :extract_plain ]
  261. # %i
  262. when /%\*?i/
  263. [ "([-+]?(?:(?:0[0-7]+)|(?:0[Xx]#{h}+)|(?:[1-9]\\d*)))", :extract_integer ]
  264. # %5i
  265. when /%\*?(\d+)i/
  266. n = $1.to_i
  267. s = "("
  268. if n > 1 then s += "[1-9]\\d{1,#{n-1}}|" end
  269. if n > 1 then s += "0[0-7]{1,#{n-1}}|" end
  270. if n > 2 then s += "[-+]0[0-7]{1,#{n-2}}|" end
  271. if n > 2 then s += "[-+][1-9]\\d{1,#{n-2}}|" end
  272. if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
  273. if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
  274. s += "\\d"
  275. s += ")"
  276. [ s, :extract_integer ]
  277. # %d, %u
  278. when /%\*?[du]/
  279. [ '([-+]?\d+)', :extract_decimal ]
  280. # %5d, %5u
  281. when /%\*?(\d+)[du]/
  282. n = $1.to_i
  283. s = "("
  284. if n > 1 then s += "[-+]\\d{1,#{n-1}}|" end
  285. s += "\\d{1,#{$1}})"
  286. [ s, :extract_decimal ]
  287. # %x
  288. when /%\*?[Xx]/
  289. [ "([-+]?(?:0[Xx])?#{h}+)", :extract_hex ]
  290. # %5x
  291. when /%\*?(\d+)[Xx]/
  292. n = $1.to_i
  293. s = "("
  294. if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
  295. if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
  296. if n > 1 then s += "[-+]#{h}{1,#{n-1}}|" end
  297. s += "#{h}{1,#{n}}"
  298. s += ")"
  299. [ s, :extract_hex ]
  300. # %o
  301. when /%\*?o/
  302. [ '([-+]?[0-7]+)', :extract_octal ]
  303. # %5o
  304. when /%\*?(\d+)o/
  305. [ "([-+][0-7]{1,#{$1.to_i-1}}|[0-7]{1,#{$1}})", :extract_octal ]
  306. # %f
  307. when /%\*?f/
  308. [ '([-+]?((\d+(?>(?=[^\d.]|$)))|(\d*(\.(\d*([eE][-+]?\d+)?)))))', :extract_float ]
  309. # %5f
  310. when /%\*?(\d+)f/
  311. [ "(\\S{1,#{$1}})", :extract_float ]
  312. # %5s
  313. when /%\*?(\d+)s/
  314. [ "(\\S{1,#{$1}})", :extract_plain ]
  315. # %s
  316. when /%\*?s/
  317. [ '(\S+)', :extract_plain ]
  318. # %c
  319. when /\s%\*?c/
  320. [ "\\s*(.)", :extract_plain ]
  321. # %c
  322. when /%\*?c/
  323. [ "(.)", :extract_plain ]
  324. # %5c (whitespace issues are handled by the count_*_space? methods)
  325. when /%\*?(\d+)c/
  326. [ "(.{1,#{$1}})", :extract_plain ]
  327. # %%
  328. when /%%/
  329. [ '(\s*%)', :nil_proc ]
  330. # literal characters
  331. else
  332. [ "(#{Regexp.escape(@spec_string)})", :nil_proc ]
  333. end
  334. @re_string = '\A' + @re_string
  335. end
  336. def to_re
  337. Regexp.new(@re_string,Regexp::MULTILINE)
  338. end
  339. def match(str)
  340. @matched = false
  341. s = str.dup
  342. s.sub!(/\A\s+/,'') unless count_space?
  343. res = to_re.match(s)
  344. if res
  345. @conversion = send(@handler, res[1])
  346. @matched_string = @conversion.to_s
  347. @matched = true
  348. end
  349. res
  350. end
  351. def letter
  352. /%\*?\d*([a-z\[])/.match(@spec_string).to_a[1]
  353. end
  354. def width
  355. w = /%\*?(\d+)/.match(@spec_string).to_a[1]
  356. w && w.to_i
  357. end
  358. def mid_match?
  359. return false unless @matched
  360. cc_no_width = letter == '[' &&! width
  361. c_or_cc_width = (letter == 'c' || letter == '[') && width
  362. width_left = c_or_cc_width && (matched_string.size < width)
  363. return width_left || cc_no_width
  364. end
  365. end
  366. class FormatString
  367. attr_reader :string_left, :last_spec_tried,
  368. :last_match_tried, :matched_count, :space
  369. SPECIFIERS = 'diuXxofeEgsc'
  370. REGEX = /
  371. # possible space, followed by...
  372. (?:\s*
  373. # percent sign, followed by...
  374. %
  375. # another percent sign, or...
  376. (?:%|
  377. # optional assignment suppression flag
  378. \*?
  379. # optional maximum field width
  380. \d*
  381. # named character class, ...
  382. (?:\[\[:\w+:\]\]|
  383. # traditional character class, or...
  384. \[[^\]]*\]|
  385. # specifier letter.
  386. [#{SPECIFIERS}])))|
  387. # or miscellaneous characters
  388. [^%\s]+/ix
  389. def initialize(str)
  390. @specs = []
  391. @i = 1
  392. s = str.to_s
  393. return unless /\S/.match(s)
  394. @space = true if /\s\z/.match(s)
  395. @specs.replace s.scan(REGEX).map {|spec| FormatSpecifier.new(spec) }
  396. end
  397. def to_s
  398. @specs.join('')
  399. end
  400. def prune(n=matched_count)
  401. n.times { @specs.shift }
  402. end
  403. def spec_count
  404. @specs.size
  405. end
  406. def last_spec
  407. @i == spec_count - 1
  408. end
  409. def match(str)
  410. accum = []
  411. @string_left = str
  412. @matched_count = 0
  413. @specs.each_with_index do |spec,@i|
  414. @last_spec_tried = spec
  415. @last_match_tried = spec.match(@string_left)
  416. break unless @last_match_tried
  417. @matched_count += 1
  418. accum << spec.conversion
  419. @string_left = @last_match_tried.post_match
  420. break if @string_left.empty?
  421. end
  422. return accum.compact
  423. end
  424. end
  425. end
  426. class IO
  427. # The trick here is doing a match where you grab one *line*
  428. # of input at a time. The linebreak may or may not occur
  429. # at the boundary where the string matches a format specifier.
  430. # And if it does, some rule about whitespace may or may not
  431. # be in effect...
  432. #
  433. # That's why this is much more elaborate than the string
  434. # version.
  435. #
  436. # For each line:
  437. # Match succeeds (non-emptily)
  438. # and the last attempted spec/string sub-match succeeded:
  439. #
  440. # could the last spec keep matching?
  441. # yes: save interim results and continue (next line)
  442. #
  443. # The last attempted spec/string did not match:
  444. #
  445. # are we on the next-to-last spec in the string?
  446. # yes:
  447. # is fmt_string.string_left all spaces?
  448. # yes: does current spec care about input space?
  449. # yes: fatal failure
  450. # no: save interim results and continue
  451. # no: continue [this state could be analyzed further]
  452. #
  453. #
  454. def scanf(str,&b)
  455. return block_scanf(str,&b) if b
  456. return [] unless str.size > 0
  457. start_position = pos rescue 0
  458. matched_so_far = 0
  459. source_buffer = ""
  460. result_buffer = []
  461. final_result = []
  462. fstr = Scanf::FormatString.new(str)
  463. loop do
  464. if eof || (tty? &&! fstr.match(source_buffer))
  465. final_result.concat(result_buffer)
  466. break
  467. end
  468. source_buffer << gets
  469. current_match = fstr.match(source_buffer)
  470. spec = fstr.last_spec_tried
  471. if spec.matched
  472. if spec.mid_match?
  473. result_buffer.replace(current_match)
  474. next
  475. end
  476. elsif (fstr.matched_count == fstr.spec_count - 1)
  477. if /\A\s*\z/.match(fstr.string_left)
  478. break if spec.count_space?
  479. result_buffer.replace(current_match)
  480. next
  481. end
  482. end
  483. final_result.concat(current_match)
  484. matched_so_far += source_buffer.size
  485. source_buffer.replace(fstr.string_left)
  486. matched_so_far -= source_buffer.size
  487. break if fstr.last_spec
  488. fstr.prune
  489. end
  490. seek(start_position + matched_so_far, IO::SEEK_SET) rescue Errno::ESPIPE
  491. soak_up_spaces if fstr.last_spec && fstr.space
  492. return final_result
  493. end
  494. private
  495. def soak_up_spaces
  496. c = getc
  497. ungetc(c) if c
  498. until eof ||! c || /\S/.match(c.chr)
  499. c = getc
  500. end
  501. ungetc(c) if (c && /\S/.match(c.chr))
  502. end
  503. def block_scanf(str)
  504. final = []
  505. # Sub-ideal, since another FS gets created in scanf.
  506. # But used here to determine the number of specifiers.
  507. fstr = Scanf::FormatString.new(str)
  508. last_spec = fstr.last_spec
  509. begin
  510. current = scanf(str)
  511. break if current.empty?
  512. final.push(yield(current))
  513. end until eof || fstr.last_spec_tried == last_spec
  514. return final
  515. end
  516. end
  517. class String
  518. def scanf(fstr,&b)
  519. if b
  520. block_scanf(fstr,&b)
  521. else
  522. fs =
  523. if fstr.is_a? Scanf::FormatString
  524. fstr
  525. else
  526. Scanf::FormatString.new(fstr)
  527. end
  528. fs.match(self)
  529. end
  530. end
  531. def block_scanf(fstr,&b)
  532. fs = Scanf::FormatString.new(fstr)
  533. str = self.dup
  534. final = []
  535. begin
  536. current = str.scanf(fs)
  537. final.push(yield(current)) unless current.empty?
  538. str = fs.string_left
  539. end until current.empty? || str.empty?
  540. return final
  541. end
  542. end
  543. module Kernel
  544. private
  545. def scanf(fs,&b)
  546. STDIN.scanf(fs,&b)
  547. end
  548. end