/thirdparty/breakpad/third_party/protobuf/protobuf/src/google/protobuf/io/tokenizer.cc

http://github.com/tomahawk-player/tomahawk · C++ · 694 lines · 339 code · 15 blank · 340 comment · 50 complexity · 5ac845cb643c37a1f20d81ddb0dc1733 MD5 · raw file

  1. // Protocol Buffers - Google's data interchange format
  2. // Copyright 2008 Google Inc. All rights reserved.
  3. // http://code.google.com/p/protobuf/
  4. //
  5. // Redistribution and use in source and binary forms, with or without
  6. // modification, are permitted provided that the following conditions are
  7. // met:
  8. //
  9. // * Redistributions of source code must retain the above copyright
  10. // notice, this list of conditions and the following disclaimer.
  11. // * Redistributions in binary form must reproduce the above
  12. // copyright notice, this list of conditions and the following disclaimer
  13. // in the documentation and/or other materials provided with the
  14. // distribution.
  15. // * Neither the name of Google Inc. nor the names of its
  16. // contributors may be used to endorse or promote products derived from
  17. // this software without specific prior written permission.
  18. //
  19. // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20. // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21. // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22. // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23. // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24. // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25. // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26. // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27. // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28. // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29. // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30. // Author: kenton@google.com (Kenton Varda)
  31. // Based on original Protocol Buffers design by
  32. // Sanjay Ghemawat, Jeff Dean, and others.
  33. //
  34. // Here we have a hand-written lexer. At first you might ask yourself,
  35. // "Hand-written text processing? Is Kenton crazy?!" Well, first of all,
  36. // yes I am crazy, but that's beside the point. There are actually reasons
  37. // why I ended up writing this this way.
  38. //
  39. // The traditional approach to lexing is to use lex to generate a lexer for
  40. // you. Unfortunately, lex's output is ridiculously ugly and difficult to
  41. // integrate cleanly with C++ code, especially abstract code or code meant
  42. // as a library. Better parser-generators exist but would add dependencies
  43. // which most users won't already have, which we'd like to avoid. (GNU flex
  44. // has a C++ output option, but it's still ridiculously ugly, non-abstract,
  45. // and not library-friendly.)
  46. //
  47. // The next approach that any good software engineer should look at is to
  48. // use regular expressions. And, indeed, I did. I have code which
  49. // implements this same class using regular expressions. It's about 200
  50. // lines shorter. However:
  51. // - Rather than error messages telling you "This string has an invalid
  52. // escape sequence at line 5, column 45", you get error messages like
  53. // "Parse error on line 5". Giving more precise errors requires adding
  54. // a lot of code that ends up basically as complex as the hand-coded
  55. // version anyway.
  56. // - The regular expression to match a string literal looks like this:
  57. // kString = new RE("(\"([^\"\\\\]|" // non-escaped
  58. // "\\\\[abfnrtv?\"'\\\\0-7]|" // normal escape
  59. // "\\\\x[0-9a-fA-F])*\"|" // hex escape
  60. // "\'([^\'\\\\]|" // Also support single-quotes.
  61. // "\\\\[abfnrtv?\"'\\\\0-7]|"
  62. // "\\\\x[0-9a-fA-F])*\')");
  63. // Verifying the correctness of this line noise is actually harder than
  64. // verifying the correctness of ConsumeString(), defined below. I'm not
  65. // even confident that the above is correct, after staring at it for some
  66. // time.
  67. // - PCRE is fast, but there's still more overhead involved than the code
  68. // below.
  69. // - Sadly, regular expressions are not part of the C standard library, so
  70. // using them would require depending on some other library. For the
  71. // open source release, this could be really annoying. Nobody likes
  72. // downloading one piece of software just to find that they need to
  73. // download something else to make it work, and in all likelihood
  74. // people downloading Protocol Buffers will already be doing so just
  75. // to make something else work. We could include a copy of PCRE with
  76. // our code, but that obligates us to keep it up-to-date and just seems
  77. // like a big waste just to save 200 lines of code.
  78. //
  79. // On a similar but unrelated note, I'm even scared to use ctype.h.
  80. // Apparently functions like isalpha() are locale-dependent. So, if we used
  81. // that, then if this code is being called from some program that doesn't
  82. // have its locale set to "C", it would behave strangely. We can't just set
  83. // the locale to "C" ourselves since we might break the calling program that
  84. // way, particularly if it is multi-threaded. WTF? Someone please let me
  85. // (Kenton) know if I'm missing something here...
  86. //
  87. // I'd love to hear about other alternatives, though, as this code isn't
  88. // exactly pretty.
  89. #include <google/protobuf/io/tokenizer.h>
  90. #include <google/protobuf/io/zero_copy_stream.h>
  91. #include <google/protobuf/stubs/strutil.h>
  92. namespace google {
  93. namespace protobuf {
  94. namespace io {
  95. namespace {
  96. // As mentioned above, I don't trust ctype.h due to the presence of "locales".
  97. // So, I have written replacement functions here. Someone please smack me if
  98. // this is a bad idea or if there is some way around this.
  99. //
  100. // These "character classes" are designed to be used in template methods.
  101. // For instance, Tokenizer::ConsumeZeroOrMore<Whitespace>() will eat
  102. // whitespace.
  103. // Note: No class is allowed to contain '\0', since this is used to mark end-
  104. // of-input and is handled specially.
  105. #define CHARACTER_CLASS(NAME, EXPRESSION) \
  106. class NAME { \
  107. public: \
  108. static inline bool InClass(char c) { \
  109. return EXPRESSION; \
  110. } \
  111. }
  112. CHARACTER_CLASS(Whitespace, c == ' ' || c == '\n' || c == '\t' ||
  113. c == '\r' || c == '\v' || c == '\f');
  114. CHARACTER_CLASS(Unprintable, c < ' ' && c > '\0');
  115. CHARACTER_CLASS(Digit, '0' <= c && c <= '9');
  116. CHARACTER_CLASS(OctalDigit, '0' <= c && c <= '7');
  117. CHARACTER_CLASS(HexDigit, ('0' <= c && c <= '9') ||
  118. ('a' <= c && c <= 'f') ||
  119. ('A' <= c && c <= 'F'));
  120. CHARACTER_CLASS(Letter, ('a' <= c && c <= 'z') ||
  121. ('A' <= c && c <= 'Z') ||
  122. (c == '_'));
  123. CHARACTER_CLASS(Alphanumeric, ('a' <= c && c <= 'z') ||
  124. ('A' <= c && c <= 'Z') ||
  125. ('0' <= c && c <= '9') ||
  126. (c == '_'));
  127. CHARACTER_CLASS(Escape, c == 'a' || c == 'b' || c == 'f' || c == 'n' ||
  128. c == 'r' || c == 't' || c == 'v' || c == '\\' ||
  129. c == '?' || c == '\'' || c == '\"');
  130. #undef CHARACTER_CLASS
  131. // Given a char, interpret it as a numeric digit and return its value.
  132. // This supports any number base up to 36.
  133. inline int DigitValue(char digit) {
  134. if ('0' <= digit && digit <= '9') return digit - '0';
  135. if ('a' <= digit && digit <= 'z') return digit - 'a' + 10;
  136. if ('A' <= digit && digit <= 'Z') return digit - 'A' + 10;
  137. return -1;
  138. }
  139. // Inline because it's only used in one place.
  140. inline char TranslateEscape(char c) {
  141. switch (c) {
  142. case 'a': return '\a';
  143. case 'b': return '\b';
  144. case 'f': return '\f';
  145. case 'n': return '\n';
  146. case 'r': return '\r';
  147. case 't': return '\t';
  148. case 'v': return '\v';
  149. case '\\': return '\\';
  150. case '?': return '\?'; // Trigraphs = :(
  151. case '\'': return '\'';
  152. case '"': return '\"';
  153. // We expect escape sequences to have been validated separately.
  154. default: return '?';
  155. }
  156. }
  157. } // anonymous namespace
  158. ErrorCollector::~ErrorCollector() {}
  159. // ===================================================================
  160. Tokenizer::Tokenizer(ZeroCopyInputStream* input,
  161. ErrorCollector* error_collector)
  162. : input_(input),
  163. error_collector_(error_collector),
  164. buffer_(NULL),
  165. buffer_size_(0),
  166. buffer_pos_(0),
  167. read_error_(false),
  168. line_(0),
  169. column_(0),
  170. token_start_(-1),
  171. allow_f_after_float_(false),
  172. comment_style_(CPP_COMMENT_STYLE) {
  173. current_.line = 0;
  174. current_.column = 0;
  175. current_.end_column = 0;
  176. current_.type = TYPE_START;
  177. Refresh();
  178. }
  179. Tokenizer::~Tokenizer() {
  180. // If we had any buffer left unread, return it to the underlying stream
  181. // so that someone else can read it.
  182. if (buffer_size_ > buffer_pos_) {
  183. input_->BackUp(buffer_size_ - buffer_pos_);
  184. }
  185. }
  186. // -------------------------------------------------------------------
  187. // Internal helpers.
  188. void Tokenizer::NextChar() {
  189. // Update our line and column counters based on the character being
  190. // consumed.
  191. if (current_char_ == '\n') {
  192. ++line_;
  193. column_ = 0;
  194. } else if (current_char_ == '\t') {
  195. column_ += kTabWidth - column_ % kTabWidth;
  196. } else {
  197. ++column_;
  198. }
  199. // Advance to the next character.
  200. ++buffer_pos_;
  201. if (buffer_pos_ < buffer_size_) {
  202. current_char_ = buffer_[buffer_pos_];
  203. } else {
  204. Refresh();
  205. }
  206. }
  207. void Tokenizer::Refresh() {
  208. if (read_error_) {
  209. current_char_ = '\0';
  210. return;
  211. }
  212. // If we're in a token, append the rest of the buffer to it.
  213. if (token_start_ >= 0 && token_start_ < buffer_size_) {
  214. current_.text.append(buffer_ + token_start_, buffer_size_ - token_start_);
  215. token_start_ = 0;
  216. }
  217. const void* data = NULL;
  218. buffer_ = NULL;
  219. buffer_pos_ = 0;
  220. do {
  221. if (!input_->Next(&data, &buffer_size_)) {
  222. // end of stream (or read error)
  223. buffer_size_ = 0;
  224. read_error_ = true;
  225. current_char_ = '\0';
  226. return;
  227. }
  228. } while (buffer_size_ == 0);
  229. buffer_ = static_cast<const char*>(data);
  230. current_char_ = buffer_[0];
  231. }
  232. inline void Tokenizer::StartToken() {
  233. token_start_ = buffer_pos_;
  234. current_.type = TYPE_START; // Just for the sake of initializing it.
  235. current_.text.clear();
  236. current_.line = line_;
  237. current_.column = column_;
  238. }
  239. inline void Tokenizer::EndToken() {
  240. // Note: The if() is necessary because some STL implementations crash when
  241. // you call string::append(NULL, 0), presumably because they are trying to
  242. // be helpful by detecting the NULL pointer, even though there's nothing
  243. // wrong with reading zero bytes from NULL.
  244. if (buffer_pos_ != token_start_) {
  245. current_.text.append(buffer_ + token_start_, buffer_pos_ - token_start_);
  246. }
  247. token_start_ = -1;
  248. current_.end_column = column_;
  249. }
  250. // -------------------------------------------------------------------
  251. // Helper methods that consume characters.
  252. template<typename CharacterClass>
  253. inline bool Tokenizer::LookingAt() {
  254. return CharacterClass::InClass(current_char_);
  255. }
  256. template<typename CharacterClass>
  257. inline bool Tokenizer::TryConsumeOne() {
  258. if (CharacterClass::InClass(current_char_)) {
  259. NextChar();
  260. return true;
  261. } else {
  262. return false;
  263. }
  264. }
  265. inline bool Tokenizer::TryConsume(char c) {
  266. if (current_char_ == c) {
  267. NextChar();
  268. return true;
  269. } else {
  270. return false;
  271. }
  272. }
  273. template<typename CharacterClass>
  274. inline void Tokenizer::ConsumeZeroOrMore() {
  275. while (CharacterClass::InClass(current_char_)) {
  276. NextChar();
  277. }
  278. }
  279. template<typename CharacterClass>
  280. inline void Tokenizer::ConsumeOneOrMore(const char* error) {
  281. if (!CharacterClass::InClass(current_char_)) {
  282. AddError(error);
  283. } else {
  284. do {
  285. NextChar();
  286. } while (CharacterClass::InClass(current_char_));
  287. }
  288. }
  289. // -------------------------------------------------------------------
  290. // Methods that read whole patterns matching certain kinds of tokens
  291. // or comments.
  292. void Tokenizer::ConsumeString(char delimiter) {
  293. while (true) {
  294. switch (current_char_) {
  295. case '\0':
  296. case '\n': {
  297. AddError("String literals cannot cross line boundaries.");
  298. return;
  299. }
  300. case '\\': {
  301. // An escape sequence.
  302. NextChar();
  303. if (TryConsumeOne<Escape>()) {
  304. // Valid escape sequence.
  305. } else if (TryConsumeOne<OctalDigit>()) {
  306. // Possibly followed by two more octal digits, but these will
  307. // just be consumed by the main loop anyway so we don't need
  308. // to do so explicitly here.
  309. } else if (TryConsume('x') || TryConsume('X')) {
  310. if (!TryConsumeOne<HexDigit>()) {
  311. AddError("Expected hex digits for escape sequence.");
  312. }
  313. // Possibly followed by another hex digit, but again we don't care.
  314. } else {
  315. AddError("Invalid escape sequence in string literal.");
  316. }
  317. break;
  318. }
  319. default: {
  320. if (current_char_ == delimiter) {
  321. NextChar();
  322. return;
  323. }
  324. NextChar();
  325. break;
  326. }
  327. }
  328. }
  329. }
  330. Tokenizer::TokenType Tokenizer::ConsumeNumber(bool started_with_zero,
  331. bool started_with_dot) {
  332. bool is_float = false;
  333. if (started_with_zero && (TryConsume('x') || TryConsume('X'))) {
  334. // A hex number (started with "0x").
  335. ConsumeOneOrMore<HexDigit>("\"0x\" must be followed by hex digits.");
  336. } else if (started_with_zero && LookingAt<Digit>()) {
  337. // An octal number (had a leading zero).
  338. ConsumeZeroOrMore<OctalDigit>();
  339. if (LookingAt<Digit>()) {
  340. AddError("Numbers starting with leading zero must be in octal.");
  341. ConsumeZeroOrMore<Digit>();
  342. }
  343. } else {
  344. // A decimal number.
  345. if (started_with_dot) {
  346. is_float = true;
  347. ConsumeZeroOrMore<Digit>();
  348. } else {
  349. ConsumeZeroOrMore<Digit>();
  350. if (TryConsume('.')) {
  351. is_float = true;
  352. ConsumeZeroOrMore<Digit>();
  353. }
  354. }
  355. if (TryConsume('e') || TryConsume('E')) {
  356. is_float = true;
  357. TryConsume('-') || TryConsume('+');
  358. ConsumeOneOrMore<Digit>("\"e\" must be followed by exponent.");
  359. }
  360. if (allow_f_after_float_ && (TryConsume('f') || TryConsume('F'))) {
  361. is_float = true;
  362. }
  363. }
  364. if (LookingAt<Letter>()) {
  365. AddError("Need space between number and identifier.");
  366. } else if (current_char_ == '.') {
  367. if (is_float) {
  368. AddError(
  369. "Already saw decimal point or exponent; can't have another one.");
  370. } else {
  371. AddError("Hex and octal numbers must be integers.");
  372. }
  373. }
  374. return is_float ? TYPE_FLOAT : TYPE_INTEGER;
  375. }
  376. void Tokenizer::ConsumeLineComment() {
  377. while (current_char_ != '\0' && current_char_ != '\n') {
  378. NextChar();
  379. }
  380. TryConsume('\n');
  381. }
  382. void Tokenizer::ConsumeBlockComment() {
  383. int start_line = line_;
  384. int start_column = column_ - 2;
  385. while (true) {
  386. while (current_char_ != '\0' &&
  387. current_char_ != '*' &&
  388. current_char_ != '/') {
  389. NextChar();
  390. }
  391. if (TryConsume('*') && TryConsume('/')) {
  392. // End of comment.
  393. break;
  394. } else if (TryConsume('/') && current_char_ == '*') {
  395. // Note: We didn't consume the '*' because if there is a '/' after it
  396. // we want to interpret that as the end of the comment.
  397. AddError(
  398. "\"/*\" inside block comment. Block comments cannot be nested.");
  399. } else if (current_char_ == '\0') {
  400. AddError("End-of-file inside block comment.");
  401. error_collector_->AddError(
  402. start_line, start_column, " Comment started here.");
  403. break;
  404. }
  405. }
  406. }
  407. // -------------------------------------------------------------------
  408. bool Tokenizer::Next() {
  409. previous_ = current_;
  410. // Did we skip any characters after the last token?
  411. bool skipped_stuff = false;
  412. while (!read_error_) {
  413. if (TryConsumeOne<Whitespace>()) {
  414. ConsumeZeroOrMore<Whitespace>();
  415. } else if (comment_style_ == CPP_COMMENT_STYLE && TryConsume('/')) {
  416. // Starting a comment?
  417. if (TryConsume('/')) {
  418. ConsumeLineComment();
  419. } else if (TryConsume('*')) {
  420. ConsumeBlockComment();
  421. } else {
  422. // Oops, it was just a slash. Return it.
  423. current_.type = TYPE_SYMBOL;
  424. current_.text = "/";
  425. current_.line = line_;
  426. current_.column = column_ - 1;
  427. return true;
  428. }
  429. } else if (comment_style_ == SH_COMMENT_STYLE && TryConsume('#')) {
  430. ConsumeLineComment();
  431. } else if (LookingAt<Unprintable>() || current_char_ == '\0') {
  432. AddError("Invalid control characters encountered in text.");
  433. NextChar();
  434. // Skip more unprintable characters, too. But, remember that '\0' is
  435. // also what current_char_ is set to after EOF / read error. We have
  436. // to be careful not to go into an infinite loop of trying to consume
  437. // it, so make sure to check read_error_ explicitly before consuming
  438. // '\0'.
  439. while (TryConsumeOne<Unprintable>() ||
  440. (!read_error_ && TryConsume('\0'))) {
  441. // Ignore.
  442. }
  443. } else {
  444. // Reading some sort of token.
  445. StartToken();
  446. if (TryConsumeOne<Letter>()) {
  447. ConsumeZeroOrMore<Alphanumeric>();
  448. current_.type = TYPE_IDENTIFIER;
  449. } else if (TryConsume('0')) {
  450. current_.type = ConsumeNumber(true, false);
  451. } else if (TryConsume('.')) {
  452. // This could be the beginning of a floating-point number, or it could
  453. // just be a '.' symbol.
  454. if (TryConsumeOne<Digit>()) {
  455. // It's a floating-point number.
  456. if (previous_.type == TYPE_IDENTIFIER && !skipped_stuff) {
  457. // We don't accept syntax like "blah.123".
  458. error_collector_->AddError(line_, column_ - 2,
  459. "Need space between identifier and decimal point.");
  460. }
  461. current_.type = ConsumeNumber(false, true);
  462. } else {
  463. current_.type = TYPE_SYMBOL;
  464. }
  465. } else if (TryConsumeOne<Digit>()) {
  466. current_.type = ConsumeNumber(false, false);
  467. } else if (TryConsume('\"')) {
  468. ConsumeString('\"');
  469. current_.type = TYPE_STRING;
  470. } else if (TryConsume('\'')) {
  471. ConsumeString('\'');
  472. current_.type = TYPE_STRING;
  473. } else {
  474. NextChar();
  475. current_.type = TYPE_SYMBOL;
  476. }
  477. EndToken();
  478. return true;
  479. }
  480. skipped_stuff = true;
  481. }
  482. // EOF
  483. current_.type = TYPE_END;
  484. current_.text.clear();
  485. current_.line = line_;
  486. current_.column = column_;
  487. current_.end_column = column_;
  488. return false;
  489. }
  490. // -------------------------------------------------------------------
  491. // Token-parsing helpers. Remember that these don't need to report
  492. // errors since any errors should already have been reported while
  493. // tokenizing. Also, these can assume that whatever text they
  494. // are given is text that the tokenizer actually parsed as a token
  495. // of the given type.
  496. bool Tokenizer::ParseInteger(const string& text, uint64 max_value,
  497. uint64* output) {
  498. // Sadly, we can't just use strtoul() since it is only 32-bit and strtoull()
  499. // is non-standard. I hate the C standard library. :(
  500. // return strtoull(text.c_str(), NULL, 0);
  501. const char* ptr = text.c_str();
  502. int base = 10;
  503. if (ptr[0] == '0') {
  504. if (ptr[1] == 'x' || ptr[1] == 'X') {
  505. // This is hex.
  506. base = 16;
  507. ptr += 2;
  508. } else {
  509. // This is octal.
  510. base = 8;
  511. }
  512. }
  513. uint64 result = 0;
  514. for (; *ptr != '\0'; ptr++) {
  515. int digit = DigitValue(*ptr);
  516. GOOGLE_LOG_IF(DFATAL, digit < 0 || digit >= base)
  517. << " Tokenizer::ParseInteger() passed text that could not have been"
  518. " tokenized as an integer: " << CEscape(text);
  519. if (digit > max_value || result > (max_value - digit) / base) {
  520. // Overflow.
  521. return false;
  522. }
  523. result = result * base + digit;
  524. }
  525. *output = result;
  526. return true;
  527. }
  528. double Tokenizer::ParseFloat(const string& text) {
  529. const char* start = text.c_str();
  530. char* end;
  531. double result = NoLocaleStrtod(start, &end);
  532. // "1e" is not a valid float, but if the tokenizer reads it, it will
  533. // report an error but still return it as a valid token. We need to
  534. // accept anything the tokenizer could possibly return, error or not.
  535. if (*end == 'e' || *end == 'E') {
  536. ++end;
  537. if (*end == '-' || *end == '+') ++end;
  538. }
  539. // If the Tokenizer had allow_f_after_float_ enabled, the float may be
  540. // suffixed with the letter 'f'.
  541. if (*end == 'f' || *end == 'F') {
  542. ++end;
  543. }
  544. GOOGLE_LOG_IF(DFATAL, end - start != text.size() || *start == '-')
  545. << " Tokenizer::ParseFloat() passed text that could not have been"
  546. " tokenized as a float: " << CEscape(text);
  547. return result;
  548. }
  549. void Tokenizer::ParseStringAppend(const string& text, string* output) {
  550. // Reminder: text[0] is always the quote character. (If text is
  551. // empty, it's invalid, so we'll just return.)
  552. if (text.empty()) {
  553. GOOGLE_LOG(DFATAL)
  554. << " Tokenizer::ParseStringAppend() passed text that could not"
  555. " have been tokenized as a string: " << CEscape(text);
  556. return;
  557. }
  558. output->reserve(output->size() + text.size());
  559. // Loop through the string copying characters to "output" and
  560. // interpreting escape sequences. Note that any invalid escape
  561. // sequences or other errors were already reported while tokenizing.
  562. // In this case we do not need to produce valid results.
  563. for (const char* ptr = text.c_str() + 1; *ptr != '\0'; ptr++) {
  564. if (*ptr == '\\' && ptr[1] != '\0') {
  565. // An escape sequence.
  566. ++ptr;
  567. if (OctalDigit::InClass(*ptr)) {
  568. // An octal escape. May one, two, or three digits.
  569. int code = DigitValue(*ptr);
  570. if (OctalDigit::InClass(ptr[1])) {
  571. ++ptr;
  572. code = code * 8 + DigitValue(*ptr);
  573. }
  574. if (OctalDigit::InClass(ptr[1])) {
  575. ++ptr;
  576. code = code * 8 + DigitValue(*ptr);
  577. }
  578. output->push_back(static_cast<char>(code));
  579. } else if (*ptr == 'x') {
  580. // A hex escape. May zero, one, or two digits. (The zero case
  581. // will have been caught as an error earlier.)
  582. int code = 0;
  583. if (HexDigit::InClass(ptr[1])) {
  584. ++ptr;
  585. code = DigitValue(*ptr);
  586. }
  587. if (HexDigit::InClass(ptr[1])) {
  588. ++ptr;
  589. code = code * 16 + DigitValue(*ptr);
  590. }
  591. output->push_back(static_cast<char>(code));
  592. } else {
  593. // Some other escape code.
  594. output->push_back(TranslateEscape(*ptr));
  595. }
  596. } else if (*ptr == text[0]) {
  597. // Ignore quote matching the starting quote.
  598. } else {
  599. output->push_back(*ptr);
  600. }
  601. }
  602. return;
  603. }
  604. } // namespace io
  605. } // namespace protobuf
  606. } // namespace google