/www/tags/NOV_07_2009/htdocs/users-guide/encodings.html
HTML | 61 lines | 61 code | 0 blank | 0 comment | 0 complexity | 63ab71b853db877ee8a77fc2b4ff587b MD5 | raw file
Possible License(s): BSD-3-Clause, AGPL-1.0, Apache-2.0, LGPL-2.0, LGPL-3.0, GPL-2.0, CC-BY-SA-3.0, LGPL-2.1, GPL-3.0, MPL-2.0-no-copyleft-exception, IPL-1.0
- <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Character Encodings</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="jEdit 4.3 User's Guide"><link rel="up" href="files.html" title="Chapter 4. Working With Files"><link rel="prev" href="line-separators.html" title="Line Separators"><link rel="next" href="vfs-browser.html" title="The File System Browser (FSB)"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Character Encodings</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><th width="60%" align="center">Chapter 4. Working With Files</th><td width="20%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="encodings"></a>Character Encodings</h2></div></div></div><p>A character encoding is a mapping from a set of characters to
- their on-disk representation. jEdit can use any encoding supported by
- the Java platform.</p><p>Buffers in memory are always stored in <code class="literal">UTF-16</code>
- encoding, which means each character is mapped to an integer between 0
- and 65535. <code class="literal">UTF-16</code> is the native encoding supported by
- Java, and has a large enough range of characters to support most modern
- languages.</p><p>When a buffer is loaded, it is converted from its on-disk
- representation to <code class="literal">UTF-16</code> using a specified
- encoding.</p><p>The default encoding, used to load files for which no other
- encoding is specified, can be set in the
- <span class="guibutton"><strong>Encodings</strong></span> pane of the
- <span class="guimenu"><strong>Utilities</strong></span>><span class="guimenuitem"><strong>Global
- Options</strong></span> dialog box; see <a class="xref" href="global-opts.html#encodings-pane" title="The Encodings Pane">the section called “The Encodings Pane”</a>.
- Unless you change this setting, it will be your operating system's
- native encoding, for example <code class="literal">MacRoman</code> on the MacOS,
- <code class="literal">windows-1252</code> on Windows, and
- <code class="literal">ISO-8859-1</code> on Unix.</p><p>An encoding can be explicitly set when opening a file in the file
- system browser's
- <span class="guimenu"><strong>Commands</strong></span>><span class="guisubmenu"><strong>Encoding</strong></span>
- menu.</p><p>Note that there is no general way to auto-detect the encoding used
- by a file, however in a few cases it is possible:</p><div class="itemizedlist"><ul type="disc"><li><p><code class="literal">UTF-16</code> and <code class="literal">UTF-8Y</code>
- files are auto-detected, because they begin with a certain fixed
- character sequence. Note that plain UTF-8 does not mandate a
- specific header, and thus cannot be auto-detected, unless the
- file in question is an XML file.</p></li><li><p>Encodings used in XML files with an XML PI like the
- following are auto-detected:</p><pre class="programlisting"><?xml version="1.0" encoding="UTF-8"></pre></li></ul></div><p>The encoding that will be used to save the current buffer is shown
- in the status bar, and can be changed in the
- <span class="guimenu"><strong>Utilities</strong></span>><span class="guimenuitem"><strong>Buffer
- Options</strong></span> dialog box. Note that changing this setting has no
- effect on the buffer's contents; if you opened a file with the wrong
- encoding and got garbage, you will need to reload it.
- <span class="guimenu"><strong>File</strong></span>><span class="guimenuitem"><strong>Reload with
- Encoding</strong></span> is an easy way.</p><p>If a file is opened without an explicit encoding specified and it
- appears in the recent file list, jEdit will use the encoding last used
- when working with that file; otherwise the default encoding will be
- used.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2552004"></a>Commonly Used Encodings</h3></div></div></div><p>While the world is slowly converging on UTF-8 and UTF-16
- encodings for storing text, a wide range of older encodings are
- still in widespread use and Java supports most of them.</p><p>The simplest character encoding still in use is ASCII, or
- “<span class="quote">American Standard Code for Information Interchange</span>”.
- ASCII encodes Latin letters used in English, in addition to numbers
- and a range of punctuation characters. Each ASCII character consists
- of 7 bits, there is a limit of 128 distinct characters, which makes
- it unsuitable for anything other than English text. jEdit will load
- and save files as ASCII if the <code class="literal">US-ASCII</code> encoding
- is used.</p><p>Because ASCII is unsuitable for international use, most
- operating systems use an 8-bit extension of ASCII, with the first
- 128 values mapped to the ASCII characters, and the rest used to
- encode accents, umlauts, and various more esoteric used
- typographical marks. The three major operating systems all extend
- ASCII in a different way. Files written by Macintosh programs can be
- read using the <code class="literal">MacRoman</code> encoding; Windows text
- files are usually stored as <code class="literal">windows-1252</code>. In the
- Unix world, the <code class="literal">8859_1</code> character encoding has
- found widespread usage.</p><p>On Windows, various other encodings, referred to as
- <em class="firstterm">code pages</em> and identified by number, are used
- to store non-English text. The corresponding Java encoding name is
- <code class="literal">windows-</code> followed by the code page number, for
- example <code class="literal">windows-850</code>.</p><p>Many common cross-platform international character sets are
- also supported; <code class="literal">KOI8_R</code> for Russian text,
- <code class="literal">Big5</code> and <code class="literal">GBK</code> for Chinese, and
- <code class="literal">SJIS</code> for Japanese.</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="files.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Line Separators </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> The File System Browser (FSB)</td></tr></table></div></body></html>