PageRenderTime 45ms CodeModel.GetById 13ms RepoModel.GetById 0ms app.codeStats 0ms

/glibc/libc/Representation-of-Strings.html

https://gitlab.com/Gentio/my-pdf
HTML | 211 lines | 156 code | 8 blank | 47 comment | 0 complexity | af3e47ab403ab723e1e271b0dc069a31 MD5 | raw file
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!-- This file documents the GNU C Library.
  4. This is
  5. The GNU C Library Reference Manual, for version
  6. 2.23.
  7. Copyright (C) 1993-2016 Free Software Foundation, Inc.
  8. Permission is granted to copy, distribute and/or modify this document
  9. under the terms of the GNU Free Documentation License, Version
  10. 1.3 or any later version published by the Free
  11. Software Foundation; with the Invariant Sections being "Free Software
  12. Needs Free Documentation" and "GNU Lesser General Public License",
  13. the Front-Cover texts being "A GNU Manual", and with the Back-Cover
  14. Texts as in (a) below. A copy of the license is included in the
  15. section entitled "GNU Free Documentation License".
  16. (a) The FSF's Back-Cover Text is: "You have the freedom to
  17. copy and modify this GNU manual. Buying copies from the FSF
  18. supports it in developing GNU and promoting software freedom." -->
  19. <!-- Created by GNU Texinfo 6.0, http://www.gnu.org/software/texinfo/ -->
  20. <head>
  21. <title>The GNU C Library: Representation of Strings</title>
  22. <meta name="description" content="The GNU C Library: Representation of Strings">
  23. <meta name="keywords" content="The GNU C Library: Representation of Strings">
  24. <meta name="resource-type" content="document">
  25. <meta name="distribution" content="global">
  26. <meta name="Generator" content="makeinfo">
  27. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  28. <link href="index.html#Top" rel="start" title="Top">
  29. <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
  30. <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
  31. <link href="String-and-Array-Utilities.html#String-and-Array-Utilities" rel="up" title="String and Array Utilities">
  32. <link href="String_002fArray-Conventions.html#String_002fArray-Conventions" rel="next" title="String/Array Conventions">
  33. <link href="String-and-Array-Utilities.html#String-and-Array-Utilities" rel="prev" title="String and Array Utilities">
  34. <style type="text/css">
  35. <!--
  36. a.summary-letter {text-decoration: none}
  37. blockquote.indentedblock {margin-right: 0em}
  38. blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
  39. blockquote.smallquotation {font-size: smaller}
  40. div.display {margin-left: 3.2em}
  41. div.example {margin-left: 3.2em}
  42. div.lisp {margin-left: 3.2em}
  43. div.smalldisplay {margin-left: 3.2em}
  44. div.smallexample {margin-left: 3.2em}
  45. div.smalllisp {margin-left: 3.2em}
  46. kbd {font-style: oblique}
  47. pre.display {font-family: inherit}
  48. pre.format {font-family: inherit}
  49. pre.menu-comment {font-family: serif}
  50. pre.menu-preformatted {font-family: serif}
  51. pre.smalldisplay {font-family: inherit; font-size: smaller}
  52. pre.smallexample {font-size: smaller}
  53. pre.smallformat {font-family: inherit; font-size: smaller}
  54. pre.smalllisp {font-size: smaller}
  55. span.nocodebreak {white-space: nowrap}
  56. span.nolinebreak {white-space: nowrap}
  57. span.roman {font-family: serif; font-weight: normal}
  58. span.sansserif {font-family: sans-serif; font-weight: normal}
  59. ul.no-bullet {list-style: none}
  60. -->
  61. </style>
  62. </head>
  63. <body lang="en">
  64. <a name="Representation-of-Strings"></a>
  65. <div class="header">
  66. <p>
  67. Next: <a href="String_002fArray-Conventions.html#String_002fArray-Conventions" accesskey="n" rel="next">String/Array Conventions</a>, Up: <a href="String-and-Array-Utilities.html#String-and-Array-Utilities" accesskey="u" rel="up">String and Array Utilities</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
  68. </div>
  69. <hr>
  70. <a name="Representation-of-Strings-1"></a>
  71. <h3 class="section">5.1 Representation of Strings</h3>
  72. <a name="index-string_002c-representation-of"></a>
  73. <p>This section is a quick summary of string concepts for beginning C
  74. programmers. It describes how strings are represented in C
  75. and some common pitfalls. If you are already familiar with this
  76. material, you can skip this section.
  77. </p>
  78. <a name="index-string"></a>
  79. <p>A <em>string</em> is a null-terminated array of bytes of type <code>char</code>,
  80. including the terminating null byte. String-valued
  81. variables are usually declared to be pointers of type <code>char *</code>.
  82. Such variables do not include space for the text of a string; that has
  83. to be stored somewhere else&mdash;in an array variable, a string constant,
  84. or dynamically allocated memory (see <a href="Memory-Allocation.html#Memory-Allocation">Memory Allocation</a>). It&rsquo;s up to
  85. you to store the address of the chosen memory space into the pointer
  86. variable. Alternatively you can store a <em>null pointer</em> in the
  87. pointer variable. The null pointer does not point anywhere, so
  88. attempting to reference the string it points to gets an error.
  89. </p>
  90. <a name="index-multibyte-character"></a>
  91. <a name="index-multibyte-string"></a>
  92. <a name="index-wide-string"></a>
  93. <p>A <em>multibyte character</em> is a sequence of one or more bytes that
  94. represents a single character using the locale&rsquo;s encoding scheme; a
  95. null byte always represents the null character. A <em>multibyte
  96. string</em> is a string that consists entirely of multibyte
  97. characters. In contrast, a <em>wide string</em> is a null-terminated
  98. sequence of <code>wchar_t</code> objects. A wide-string variable is usually
  99. declared to be a pointer of type <code>wchar_t *</code>, by analogy with
  100. string variables and <code>char *</code>. See <a href="Extended-Char-Intro.html#Extended-Char-Intro">Extended Char Intro</a>.
  101. </p>
  102. <a name="index-null-byte"></a>
  103. <a name="index-null-wide-character"></a>
  104. <p>By convention, the <em>null byte</em>, <code>'\0'</code>,
  105. marks the end of a string and the <em>null wide character</em>,
  106. <code>L'\0'</code>, marks the end of a wide string. For example, in
  107. testing to see whether the <code>char *</code> variable <var>p</var> points to a
  108. null byte marking the end of a string, you can write
  109. <code>!*<var>p</var></code> or <code>*<var>p</var> == '\0'</code>.
  110. </p>
  111. <p>A null byte is quite different conceptually from a null pointer,
  112. although both are represented by the integer constant <code>0</code>.
  113. </p>
  114. <a name="index-string-literal"></a>
  115. <p>A <em>string literal</em> appears in C program source as a multibyte
  116. string between double-quote characters (&lsquo;<samp>&quot;</samp>&rsquo;). If the
  117. initial double-quote character is immediately preceded by a capital
  118. &lsquo;<samp>L</samp>&rsquo; (ell) character (as in <code>L&quot;foo&quot;</code>), it is a wide string
  119. literal. String literals can also contribute to <em>string
  120. concatenation</em>: <code>&quot;a&quot; &quot;b&quot;</code> is the same as <code>&quot;ab&quot;</code>.
  121. For wide strings one can use either
  122. <code>L&quot;a&quot; L&quot;b&quot;</code> or <code>L&quot;a&quot; &quot;b&quot;</code>. Modification of string literals is
  123. not allowed by the GNU C compiler, because literals are placed in
  124. read-only storage.
  125. </p>
  126. <p>Arrays that are declared <code>const</code> cannot be modified
  127. either. It&rsquo;s generally good style to declare non-modifiable string
  128. pointers to be of type <code>const char *</code>, since this often allows the
  129. C compiler to detect accidental modifications as well as providing some
  130. amount of documentation about what your program intends to do with the
  131. string.
  132. </p>
  133. <p>The amount of memory allocated for a byte array may extend past the null byte
  134. that marks the end of the string that the array contains. In this
  135. document, the term <em>allocated size</em> is always used to refer to the
  136. total amount of memory allocated for an array, while the term
  137. <em>length</em> refers to the number of bytes up to (but not including)
  138. the terminating null byte. Wide strings are similar, except their
  139. sizes and lengths count wide characters, not bytes.
  140. <a name="index-length-of-string"></a>
  141. <a name="index-allocation-size-of-string"></a>
  142. <a name="index-size-of-string"></a>
  143. <a name="index-string-length"></a>
  144. <a name="index-string-allocation"></a>
  145. </p>
  146. <p>A notorious source of program bugs is trying to put more bytes into a
  147. string than fit in its allocated size. When writing code that extends
  148. strings or moves bytes into a pre-allocated array, you should be
  149. very careful to keep track of the length of the text and make explicit
  150. checks for overflowing the array. Many of the library functions
  151. <em>do not</em> do this for you! Remember also that you need to allocate
  152. an extra byte to hold the null byte that marks the end of the
  153. string.
  154. </p>
  155. <a name="index-single_002dbyte-string"></a>
  156. <a name="index-multibyte-string-1"></a>
  157. <p>Originally strings were sequences of bytes where each byte represented a
  158. single character. This is still true today if the strings are encoded
  159. using a single-byte character encoding. Things are different if the
  160. strings are encoded using a multibyte encoding (for more information on
  161. encodings see <a href="Extended-Char-Intro.html#Extended-Char-Intro">Extended Char Intro</a>). There is no difference in
  162. the programming interface for these two kind of strings; the programmer
  163. has to be aware of this and interpret the byte sequences accordingly.
  164. </p>
  165. <p>But since there is no separate interface taking care of these
  166. differences the byte-based string functions are sometimes hard to use.
  167. Since the count parameters of these functions specify bytes a call to
  168. <code>memcpy</code> could cut a multibyte character in the middle and put an
  169. incomplete (and therefore unusable) byte sequence in the target buffer.
  170. </p>
  171. <a name="index-wide-string-1"></a>
  172. <p>To avoid these problems later versions of the ISO&nbsp;C<!-- /@w --> standard
  173. introduce a second set of functions which are operating on <em>wide
  174. characters</em> (see <a href="Extended-Char-Intro.html#Extended-Char-Intro">Extended Char Intro</a>). These functions don&rsquo;t have
  175. the problems the single-byte versions have since every wide character is
  176. a legal, interpretable value. This does not mean that cutting wide
  177. strings at arbitrary points is without problems. It normally
  178. is for alphabet-based languages (except for non-normalized text) but
  179. languages based on syllables still have the problem that more than one
  180. wide character is necessary to complete a logical unit. This is a
  181. higher level problem which the C&nbsp;library<!-- /@w --> functions are not designed
  182. to solve. But it is at least good that no invalid byte sequences can be
  183. created. Also, the higher level functions can also much more easily operate
  184. on wide characters than on multibyte characters so that a common strategy
  185. is to use wide characters internally whenever text is more than simply
  186. copied.
  187. </p>
  188. <p>The remaining of this chapter will discuss the functions for handling
  189. wide strings in parallel with the discussion of
  190. strings since there is almost always an exact equivalent
  191. available.
  192. </p>
  193. <hr>
  194. <div class="header">
  195. <p>
  196. Next: <a href="String_002fArray-Conventions.html#String_002fArray-Conventions" accesskey="n" rel="next">String/Array Conventions</a>, Up: <a href="String-and-Array-Utilities.html#String-and-Array-Utilities" accesskey="u" rel="up">String and Array Utilities</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
  197. </div>
  198. </body>
  199. </html>