PageRenderTime 29ms CodeModel.GetById 13ms app.highlight 11ms RepoModel.GetById 2ms app.codeStats 0ms

/native/external/espeak/docs/languages.html

http://eyes-free.googlecode.com/
HTML | 289 lines | 284 code | 5 blank | 0 comment | 0 complexity | 23243980b7f3948516d5db5a9d1af79e MD5 | raw file
  1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2<html>
  3
  4<head>
  5  <title>eSpeak Speech Synthesizer</title>
  6  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  7</head>
  8<body>
  9<A href="index.html">Back</A>
 10<hr>
 11<h2>3. LANGUAGES</h2>
 12<hr>
 13<h4>Help Needed</h4>
 14Many of these are just experimental attempts at these languages, produced after a quick reading of the corresponding article on wikipedia.org.  They will need work or advice from native speakers to improve them.  Please contact me if you want to advise or assist with these or other languages.<p>
 15The sound of some phonemes may be poorly implemented, particularly [r] since I'm English and therefore unable to make a "proper" [r] sound.<p>
 16A major factor is the rhythm or cadance.  An Italian speaker told me the Italian voice improved from "difficult to understand" to "good" by changing the relative length of stressed syllables.  Identifying unstressed function words in the xx_list file is also important to make the speech flow well.  See <a href="add_language.html">Adding or Improving a Language</a>
 17<h4>Character sets</h4>
 18Languages recognise text either as UTF8 or alternatively in an 8-bit character set which is appropriate for that language.  For example, for Polish this is Latin2, for Russian it is KOI8-R.  This choice can be overridden by a line in the voices file to specify an ISO 8859 character set, eg. for Russian the line:<br>
 19<pre>     charset 5</pre>
 20will mean that ISO 8859-5 is used as the 8-bit character set rather than KOI8-R.
 21<p>
 22In the case of a language which uses a non-Latin character set (eg. Greek or Russian) if the text contains a word with Latin characters then that particular word will be pronounced using English pronunciation rules and English phonemes.  Speaking entirely English text using a Greek or Russian voice will sound OK, but each word is spoken separately so it won't flow properly.
 23<p>
 24Sample texts in various languages can be found at  <a href="http://meta.wikimedia.org/wiki/List_of_Wikipedias"> http://&lt;language&gt;.wikipedia.org</a>  and <a href="http://www.gutenberg.org">www.gutenberg.org</a>
 25<h3>3.1 Voice Files</h3>
 26
 27A number of Voice files are provided in the <code>espeak-data/voices</code> directory.
 28You can select one of these with the <strong>-v &lt;voice filename&gt;</strong> parameter to the
 29speak command, eg:
 30<pre>   espeak -vaf</pre>
 31to speak using the Afrikaans voice.<p>Language voices generally start with the 2 letter <a href="http://en.wikipedia.org/wiki/ISO_639-1">ISO 639-1 code</a> for the language.  If the language does not have an ISO 639-1 code, then the 3 letter <a href="http://www.sil.org/iso639-3/codes.asp">ISO 639-3 code</a> can be used.
 32<p>
 33For details of the voice files see <a href="voices.html">Voices</a>.
 34<h4>Default Voice</h4>
 35<ul>
 36<dl>
 37<dt>
 38<strong>default</strong><br>
 39<dd>   This voice is used if none is specified in the speak command.  Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.</dd>
 40</dl>
 41</ul>
 42<h3>3.2 English Voices</h3>
 43<ul><dl>
 44<dt>
 45<strong>en</strong><br>
 46<dd>   is the standard default English voice.</dd>
 47<p>
 48<dt>
 49<strong>en-sc</strong><br>
 50<dd>   Scottish English.
 51<p>
 52<dt>
 53<strong>en-r</strong><br>
 54<dd>   Some slight vowel changes, and a "rhotic" accent, where "r" is pronounced even when not followed by a vowel.  This may sound less "British" to an American.
 55<p>
 56<dt>
 57<strong>en-n<br>
 58en-rp<br>
 59en-wm</strong><br>
 60<dd>   are different English voices.  These can be considered caricatures of
 61   various British accents: Northern, Received Pronunciation, West Midlands
 62   respectively.</dd>
 63<p>
 64
 65</dl></ul>
 66<h3>3.3 Voice Variants</h3>
 67To make alternative voices for a language, you can make additional voice files in espeak-data/voices which contains commands to change various voice and pronunciation attributes.  See <a href="voices.html">voices.html</a>.
 68<p>
 69Alternatively there are some preset voice variants which can be applied to any of the language voices, by appending <code>+</code> and a variant name. Their effects are defined by files in <code>espeak-data/voices/!v</code>.
 70<p>
 71The variants are <code> +m1  +m2  +m3  +m4  +m5 </code> for male voices, <code> +f1 +f2 +f3 +f4 </code> for female voices, and <code> +croak  +wisper</code> for other effects. For example:
 72<pre>   espeak -ven+m3</pre>
 73The available voice variants can be listed with:<br>
 74<pre>   espeak --voices=variant</pre>
 75<h3>3.4 Other Languages</h3>
 76The eSpeak speech synthesizer does text to speech for the following additional languages.
 77<ul>
 78<dl>
 79<p>
 80<dt>
 81<strong>af &nbsp;Afrikaans</strong><br>
 82<dd>This has been worked on by native speakers and it should be OK.</dd>
 83<p>
 84<dt>
 85<strong>bs &nbsp;Bosnian</strong><br>
 86<dd>Usable, but I'm unsure whether wrong stressed syllables are a problem. It accepts both Latin and Cyrillic characters. This voice is similar to <strong>sr Serbian</strong> and <strong>hr Croatian</strong>
 87</dd>
 88<p>
 89<dt>
 90<strong>cs &nbsp;Czech</strong><br>
 91<dd>Usable.
 92</dd>
 93<p>
 94<dt>
 95<strong>de &nbsp;German</strong><br>
 96<dd>This has improved from easlier versions.  A problem is stress placement (which like English is irregular), prosody, and the use of compound words where correct detection of the sub-word boundaries would probably be needed for accurate pronunciation.
 97</dd>
 98<p>
 99<dt>
100<strong>el &nbsp;Greek</strong><br>
101<dd>Stress position is marked in text and spelling is fairly regular, so it shouldn't be too bad.  It uses a different alphabet and switches to English pronunciation for words which contain Latin characters a-z.</dd>
102<p>
103<dt>
104<strong>eo &nbsp;Esperanto</strong><br>
105<dd>Esperanto has simple and regular pronunciation rules, so it should be OK, although I'm not
106   certain how it's supposed to sound, other than what I've read in an introduction.
107   Text can be either UTF-8, or Latin3 alphabet, or
108   can use the Latin1 convention of two-letter combinations (cx,
109   gx, etc).</dd>
110<p>
111<dt>
112<strong>es &nbsp;Spanish</strong><br>
113<dd>Spanish has good spelling rules, so it should be OK.</dd>
114<p>
115<dt>
116<strong>es-la &nbsp;Spanish - Latin America</strong><br>
117<dd>
118This contains a few changes from <strong>es</strong>, notably the pronunciation of "z","ce","ci".
119</dd>
120<p>
121<dt>
122<strong>fi &nbsp;Finnish</strong><br>
123<dd>This has had assistance from native speakers and should be usable.
124</dd>
125<p>
126<dt>
127<strong>fr &nbsp;French</strong><br>
128<dd>This has been improved by a native speaker, and should be OK.
129</dd>
130<p>
131<dt>
132<strong>hr &nbsp;Croatian</strong><br>
133<dd>Usable, but I'm unsure whether wrong stressed syllables are a problem.  It accepts both Latin and Cyrillic characters.  This voice is similar to <strong>sr Serbian</strong> and <strong>bs Bosnian</strong>
134</dd>
135<p>
136<dt>
137<strong>hu &nbsp;Hungarian</strong><br>
138<dd>This has had assistance from a native speaker and it should be OK. 
139</dd>
140<p>
141<dt>
142<strong>it &nbsp;Italian</strong><br>
143<dd>This has had some feedback from a native speaker but more work is needed.  Spelling is fairly regular, but stress marks and vowel accents are often omitted from text, so for some words the dictionary/exceptions list will need to determine the stress position or whether to use open/close [e] or [E] and [o] or [O].</dd>
144<p>
145<dt>
146<strong>ku &nbsp;Kurdish</strong><br>
147<dd>
148Not much work yet, but Kurdish has good spelling rules so it should be OK.
149</dd>
150<p>
151<dt>
152<strong>pt &nbsp;Portuguese (Brazil)</strong><br>
153<dd>Brazilian Portuguese.  This has had assistance from a native speaker and it should be OK.  Like Italian there is further work to do about the ambiguity in the spelling between open/close "e" and "o" vowels.<p>
154</dd>
155<p>
156<dt>
157<strong>pt-pt &nbsp;Portuguese (European)</strong><br>
158<dd>
159</dd>
160<p>
161<dt>
162<strong>ro &nbsp;Romanian</strong><br>
163<dd>Probably OK. More work is needed to improve the position of stress within words.
164</dd>
165<p>
166<dt>
167<strong>sk &nbsp;Slovak</strong><br>
168<dd>This has had assistance from a nativ speaker, so it should be OK.
169</dd>
170<p>
171<dt>
172<strong>sr &nbsp;Serbian</strong><br>
173<dd>Usable. Wrong stressed syllables may be a problem.   It accepts both Latin and Cyrillic characters.  This voice is similar to <strong>hr Croatian</strong> and <strong>bs &nbsp;Bosnian</strong>
174</dd>
175<p>
176<dt>
177<strong>sv &nbsp;Swedish</strong><br>
178<dd>This has now had some work done on the pronunciation rules, so it should be useable.
179</dd>
180<p>
181<dt>
182<strong>sw &nbsp;Swahihi</strong><br>
183<dd>Not much feedback yet, but the spelling and stress rules are fairly regular, so it's probably usable.
184</dd>
185<p>
186<dt>
187<strong>ta &nbsp;Tamil</strong><br>
188<dd>
189Not much work yet, but I'm told it sounds reasonable.
190</dd>
191<p>
192<dt>
193<strong>tr &nbsp;Turkish</strong><br>
194<dd>
195Not much work yet, but I'm told it sounds reasonable.
196</dd>
197<p>
198<dt>
199<strong>zh &nbsp;Mandarin Chinese</strong><br>
200<dd>
201This speaks Pinyin text and Chinese characters.  There is only a simple one-to-one translation of Chinese characters to a single Pinyin pronunciation.  There is no attempt yet at recognising different pronunciations of Chinese characters in context, or of recognising sequences of characters as "words".  The eSpeak installation includes a basic set of Chinese characters.  More are available in an additional data file for Mandarin Chinese at:
202  <a href="http://espeak.sourceforge.net/data/">http://espeak.sourceforge.net/data/</a>.
203</dd>
204<p>
205</dl></ul>
206<h3>3.5 Provisional Languages</h3>
207These languages are only initial naive implementations which have had little or no feedback and improvement from native speakers.
208<ul>
209<dl>
210<p>
211<dt>
212<strong>cy &nbsp;Welsh</strong><br>
213<dd>An initial guess, awaiting feedback.
214</dd>
215<p>
216<dt>
217<strong>grc &nbsp;Ancient Greek</strong><br>
218<dd>Includes a short pause between words to help understanding.
219</dd>
220<p>
221<dt>
222<strong>hi &nbsp;Hindi</strong><br>
223<dd>This is interesting because it uses the Devanagari characters.  I'm not sure about Hindi stress rules, and I expect the sound of aspirated/unaspirated consonant pairs needs improvement.
224</dd>
225<p>
226<dt>
227<strong>id &nbsp;Indonesian</strong><br>
228<dd>An initial guess, no feedback yet.
229</dd>
230<p>
231<dt>
232<strong>is &nbsp;Icelandic</strong><br>
233<dd>An initial guess, awaiting feedback.
234</dd>
235<p>
236<dt>
237<strong>jbo &nbsp;Lojban</strong><br>
238<dd>An artificial language.
239</dd>
240<p>
241<dt>
242<strong>la &nbsp;Latin</strong><br>
243<dd>Stress rules are implemented, but it needs text where long vowels are marked with macrons.
244</dd>
245<p>
246<dt>
247<strong>mk &nbsp;Macedonian</strong><br>
248<dd>This is similar to <strong>hr Croatian</strong>, so it's probably usable.  It accepts both Latin and Cyrillic characters.
249</dd>
250<p>
251<dt>
252<strong>nl &nbsp;Dutch</strong><br>
253<dd>Probably needs improvement of the spelling-to-phoneme rules.
254</dd>
255<p>
256<dt>
257<strong>no &nbsp;Norwegian</strong><br>
258<dd>An initial guess, awaiting feedback.
259</dd>
260<p>
261<dt>
262<strong>pl &nbsp;Polish</strong><br>
263<dd>Some initial feedback, but I'm told it's difficult to understand, so more work is needed.
264</dd>
265<p>
266<dt>
267<strong>ru &nbsp;Russian</strong><br>
268<dd>So far it's just an initial attempt with basic pronunciation rules.  Work is needed especially on the consonants.  Russian has two versions of most consonants, "hard" and "soft" (palatalised) and in most cases eSpeak doesn't yet make a proper distinction.<br>
269Russian stress position is unpredictable so a large lookup dictionary is needed of those words where eSpeak doesn't guess correctly.  To avoid increasing the size of the basic eSpeak package, this is available separately at: <a href="http://espeak.sourceforge.net/data/">http://espeak.sourceforge.net/data/</a>
270</dd>
271<p>
272<dt>
273<strong>vi &nbsp;Vietnamese</strong><br>
274<dd>This is interesting because it's a tone language.  I don't know how it should sound, so it's just a guess and I need feedback.
275</dd>
276<p>
277<dt>
278<strong>zhy &nbsp;Cantonese Chinese</strong><br>
279<dd>Just a naive simple one-to-one translation from single Simplified Chinese characters to phonetic equivalents in Cantonese.  There is limited attempt at disambiguation, grouping characters into words, or adjusting tones according to their surrounding syllables.  This voice needs Chinese character to phonetic translation data, which is available as a separate download for Cantonese at:  <a href="http://espeak.sourceforge.net/data/">http://espeak.sourceforge.net/data/</a>.<br>The voice can also read Jyutping romanised text.
280</dd>
281</ul>
282
283<h3>3.6 Mbrola Voices</h3>
284Some additional voices, whose name start with <b>mb-</b> (for example <b>mb-en1</b>) use eSpeak as a front-end to Mbrola diphone voices.  eSpeak does the spelling-to-phoneme translation and intonation.
285See <a href="mbrola.html">mbrola.html</a>. 
286<p>
287
288</body>
289</html>