PageRenderTime 57ms CodeModel.GetById 47ms app.highlight 7ms RepoModel.GetById 1ms app.codeStats 0ms

/www/tags/NOV_07_2009/htdocs/users-guide/encodings.html

#
HTML | 61 lines | 61 code | 0 blank | 0 comment | 0 complexity | 63ab71b853db877ee8a77fc2b4ff587b MD5 | raw file
 1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Character Encodings</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="jEdit 4.3 User's Guide"><link rel="up" href="files.html" title="Chapter 4. Working With Files"><link rel="prev" href="line-separators.html" title="Line Separators"><link rel="next" href="vfs-browser.html" title="The File System Browser (FSB)"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Character Encodings</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><th width="60%" align="center">Chapter 4. Working With Files</th><td width="20%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="encodings"></a>Character Encodings</h2></div></div></div><p>A character encoding is a mapping from a set of characters to
 2        their on-disk representation. jEdit can use any encoding supported by
 3        the Java platform.</p><p>Buffers in memory are always stored in <code class="literal">UTF-16</code>
 4        encoding, which means each character is mapped to an integer between 0
 5        and 65535. <code class="literal">UTF-16</code> is the native encoding supported by
 6        Java, and has a large enough range of characters to support most modern
 7        languages.</p><p>When a buffer is loaded, it is converted from its on-disk
 8        representation to <code class="literal">UTF-16</code> using a specified
 9        encoding.</p><p>The default encoding, used to load files for which no other
10        encoding is specified, can be set in the
11        <span class="guibutton"><strong>Encodings</strong></span> pane of the
12        <span class="guimenu"><strong>Utilities</strong></span>&gt;<span class="guimenuitem"><strong>Global
13        Options</strong></span> dialog box; see <a class="xref" href="global-opts.html#encodings-pane" title="The Encodings Pane">the section called &#8220;The Encodings Pane&#8221;</a>.
14        Unless you change this setting, it will be your operating system's
15        native encoding, for example <code class="literal">MacRoman</code> on the MacOS,
16        <code class="literal">windows-1252</code> on Windows, and
17        <code class="literal">ISO-8859-1</code> on Unix.</p><p>An encoding can be explicitly set when opening a file in the file
18        system browser's
19        <span class="guimenu"><strong>Commands</strong></span>&gt;<span class="guisubmenu"><strong>Encoding</strong></span>
20        menu.</p><p>Note that there is no general way to auto-detect the encoding used
21        by a file, however in a few cases it is possible:</p><div class="itemizedlist"><ul type="disc"><li><p><code class="literal">UTF-16</code> and <code class="literal">UTF-8Y</code>
22                files are auto-detected, because they begin with a certain fixed
23                character sequence. Note that plain UTF-8 does not mandate a
24                specific header, and thus cannot be auto-detected, unless the
25                file in question is an XML file.</p></li><li><p>Encodings used in XML files with an XML PI like the
26                following are auto-detected:</p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"&gt;</pre></li></ul></div><p>The encoding that will be used to save the current buffer is shown
27        in the status bar, and can be changed in the
28        <span class="guimenu"><strong>Utilities</strong></span>&gt;<span class="guimenuitem"><strong>Buffer
29        Options</strong></span> dialog box. Note that changing this setting has no
30        effect on the buffer's contents; if you opened a file with the wrong
31        encoding and got garbage, you will need to reload it.
32        <span class="guimenu"><strong>File</strong></span>&gt;<span class="guimenuitem"><strong>Reload with
33        Encoding</strong></span> is an easy way.</p><p>If a file is opened without an explicit encoding specified and it
34        appears in the recent file list, jEdit will use the encoding last used
35        when working with that file; otherwise the default encoding will be
36        used.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2552004"></a>Commonly Used Encodings</h3></div></div></div><p>While the world is slowly converging on UTF-8 and UTF-16
37            encodings for storing text, a wide range of older encodings are
38            still in widespread use and Java supports most of them.</p><p>The simplest character encoding still in use is ASCII, or
39            &#8220;<span class="quote">American Standard Code for Information Interchange</span>&#8221;.
40            ASCII encodes Latin letters used in English, in addition to numbers
41            and a range of punctuation characters. Each ASCII character consists
42            of 7 bits, there is a limit of 128 distinct characters, which makes
43            it unsuitable for anything other than English text. jEdit will load
44            and save files as ASCII if the <code class="literal">US-ASCII</code> encoding
45            is used.</p><p>Because ASCII is unsuitable for international use, most
46            operating systems use an 8-bit extension of ASCII, with the first
47            128 values mapped to the ASCII characters, and the rest used to
48            encode accents, umlauts, and various more esoteric used
49            typographical marks. The three major operating systems all extend
50            ASCII in a different way. Files written by Macintosh programs can be
51            read using the <code class="literal">MacRoman</code> encoding; Windows text
52            files are usually stored as <code class="literal">windows-1252</code>. In the
53            Unix world, the <code class="literal">8859_1</code> character encoding has
54            found widespread usage.</p><p>On Windows, various other encodings, referred to as
55            <em class="firstterm">code pages</em> and identified by number, are used
56            to store non-English text. The corresponding Java encoding name is
57            <code class="literal">windows-</code> followed by the code page number, for
58            example <code class="literal">windows-850</code>.</p><p>Many common cross-platform international character sets are
59            also supported; <code class="literal">KOI8_R</code> for Russian text,
60            <code class="literal">Big5</code> and <code class="literal">GBK</code> for Chinese, and
61            <code class="literal">SJIS</code> for Japanese.</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="line-separators.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="files.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="vfs-browser.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Line Separators </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> The File System Browser (FSB)</td></tr></table></div></body></html>