PageRenderTime 36ms CodeModel.GetById 13ms RepoModel.GetById 0ms app.codeStats 1ms

/www/tags/NOV_07_2009/htdocs/users-guide/encodings.html

#
HTML | 61 lines | 61 code | 0 blank | 0 comment | 0 complexity | 63ab71b853db877ee8a77fc2b4ff587b MD5 | raw file
Possible License(s): BSD-3-Clause, AGPL-1.0, Apache-2.0, LGPL-2.0, LGPL-3.0, GPL-2.0, CC-BY-SA-3.0, LGPL-2.1, GPL-3.0, MPL-2.0-no-copyleft-exception, IPL-1.0
  1. <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Character Encodings</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="jEdit 4.3 User's Guide"><link rel="up" href="files.html" title="Chapter 4. Working With Files"><link rel="prev" href="line-separators.html" title="Line Separators"><link rel="next" href="vfs-browser.html" title="The File System Browser (FSB)"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Character Encodings</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><th width="60%" align="center">Chapter 4. Working With Files</th><td width="20%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="encodings"></a>Character Encodings</h2></div></div></div><p>A character encoding is a mapping from a set of characters to
  2. their on-disk representation. jEdit can use any encoding supported by
  3. the Java platform.</p><p>Buffers in memory are always stored in <code class="literal">UTF-16</code>
  4. encoding, which means each character is mapped to an integer between 0
  5. and 65535. <code class="literal">UTF-16</code> is the native encoding supported by
  6. Java, and has a large enough range of characters to support most modern
  7. languages.</p><p>When a buffer is loaded, it is converted from its on-disk
  8. representation to <code class="literal">UTF-16</code> using a specified
  9. encoding.</p><p>The default encoding, used to load files for which no other
  10. encoding is specified, can be set in the
  11. <span class="guibutton"><strong>Encodings</strong></span> pane of the
  12. <span class="guimenu"><strong>Utilities</strong></span>&gt;<span class="guimenuitem"><strong>Global
  13. Options</strong></span> dialog box; see <a class="xref" href="global-opts.html#encodings-pane" title="The Encodings Pane">the section called &#8220;The Encodings Pane&#8221;</a>.
  14. Unless you change this setting, it will be your operating system's
  15. native encoding, for example <code class="literal">MacRoman</code> on the MacOS,
  16. <code class="literal">windows-1252</code> on Windows, and
  17. <code class="literal">ISO-8859-1</code> on Unix.</p><p>An encoding can be explicitly set when opening a file in the file
  18. system browser's
  19. <span class="guimenu"><strong>Commands</strong></span>&gt;<span class="guisubmenu"><strong>Encoding</strong></span>
  20. menu.</p><p>Note that there is no general way to auto-detect the encoding used
  21. by a file, however in a few cases it is possible:</p><div class="itemizedlist"><ul type="disc"><li><p><code class="literal">UTF-16</code> and <code class="literal">UTF-8Y</code>
  22. files are auto-detected, because they begin with a certain fixed
  23. character sequence. Note that plain UTF-8 does not mandate a
  24. specific header, and thus cannot be auto-detected, unless the
  25. file in question is an XML file.</p></li><li><p>Encodings used in XML files with an XML PI like the
  26. following are auto-detected:</p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"&gt;</pre></li></ul></div><p>The encoding that will be used to save the current buffer is shown
  27. in the status bar, and can be changed in the
  28. <span class="guimenu"><strong>Utilities</strong></span>&gt;<span class="guimenuitem"><strong>Buffer
  29. Options</strong></span> dialog box. Note that changing this setting has no
  30. effect on the buffer's contents; if you opened a file with the wrong
  31. encoding and got garbage, you will need to reload it.
  32. <span class="guimenu"><strong>File</strong></span>&gt;<span class="guimenuitem"><strong>Reload with
  33. Encoding</strong></span> is an easy way.</p><p>If a file is opened without an explicit encoding specified and it
  34. appears in the recent file list, jEdit will use the encoding last used
  35. when working with that file; otherwise the default encoding will be
  36. used.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2552004"></a>Commonly Used Encodings</h3></div></div></div><p>While the world is slowly converging on UTF-8 and UTF-16
  37. encodings for storing text, a wide range of older encodings are
  38. still in widespread use and Java supports most of them.</p><p>The simplest character encoding still in use is ASCII, or
  39. &#8220;<span class="quote">American Standard Code for Information Interchange</span>&#8221;.
  40. ASCII encodes Latin letters used in English, in addition to numbers
  41. and a range of punctuation characters. Each ASCII character consists
  42. of 7 bits, there is a limit of 128 distinct characters, which makes
  43. it unsuitable for anything other than English text. jEdit will load
  44. and save files as ASCII if the <code class="literal">US-ASCII</code> encoding
  45. is used.</p><p>Because ASCII is unsuitable for international use, most
  46. operating systems use an 8-bit extension of ASCII, with the first
  47. 128 values mapped to the ASCII characters, and the rest used to
  48. encode accents, umlauts, and various more esoteric used
  49. typographical marks. The three major operating systems all extend
  50. ASCII in a different way. Files written by Macintosh programs can be
  51. read using the <code class="literal">MacRoman</code> encoding; Windows text
  52. files are usually stored as <code class="literal">windows-1252</code>. In the
  53. Unix world, the <code class="literal">8859_1</code> character encoding has
  54. found widespread usage.</p><p>On Windows, various other encodings, referred to as
  55. <em class="firstterm">code pages</em> and identified by number, are used
  56. to store non-English text. The corresponding Java encoding name is
  57. <code class="literal">windows-</code> followed by the code page number, for
  58. example <code class="literal">windows-850</code>.</p><p>Many common cross-platform international character sets are
  59. also supported; <code class="literal">KOI8_R</code> for Russian text,
  60. <code class="literal">Big5</code> and <code class="literal">GBK</code> for Chinese, and
  61. <code class="literal">SJIS</code> for Japanese.</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="files.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Line Separators </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> The File System Browser (FSB)</td></tr></table></div></body></html>