PageRenderTime 13ms CodeModel.GetById 7ms app.highlight 2ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/ICTCLAS_Diary/2006-03-08.rtf

http://ictclas4j.googlecode.com/
Unknown | 52 lines | 52 code | 0 blank | 0 comment | 0 complexity | 0d579dd72cf1542799d5f19abc95a1b0 MD5 | raw file
 1{\rtf1\ansi\ansicpg1252\uc2\deff0\stshfdbch13\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
 2{\f3\froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}{\f13\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}SimSun{\*\falt \'cb\'ce\'cc\'e5};}{\f37\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}@SimSun;}
 3{\f38\froman\fcharset238\fprq2 Times New Roman CE;}{\f39\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f41\froman\fcharset161\fprq2 Times New Roman Greek;}{\f42\froman\fcharset162\fprq2 Times New Roman Tur;}
 4{\f43\froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\f44\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f45\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f46\froman\fcharset163\fprq2 Times New Roman (Vietnamese);}
 5{\f48\fswiss\fcharset238\fprq2 Arial CE;}{\f49\fswiss\fcharset204\fprq2 Arial Cyr;}{\f51\fswiss\fcharset161\fprq2 Arial Greek;}{\f52\fswiss\fcharset162\fprq2 Arial Tur;}{\f53\fswiss\fcharset177\fprq2 Arial (Hebrew);}
 6{\f54\fswiss\fcharset178\fprq2 Arial (Arabic);}{\f55\fswiss\fcharset186\fprq2 Arial Baltic;}{\f56\fswiss\fcharset163\fprq2 Arial (Vietnamese);}{\f170\fnil\fcharset0\fprq2 SimSun Western{\*\falt \'cb\'ce\'cc\'e5};}
 7{\f410\fnil\fcharset0\fprq2 @SimSun Western;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;
 8\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 
 9\fs24\lang1033\langfe3076\loch\f0\hich\af0\dbch\af13\cgrid\langnp1033\langfenp3076 \snext0 Normal;}{\*\cs10 \additive \ssemihidden Default Paragraph Font;}{\*
10\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv 
11\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}{\*\latentstyles\lsdstimax156\lsdlockeddef0}{\*\listtable{\list\listtemplateid1034585289
12\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01_;}{\levelnumbers;}\f3\fs20 \fi-240\li240\jclisttab\tx390\lin240 }{\listname ;}\listid320372207}
13{\list\listtemplateid1000771315\listsimple{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\f1\fs20 \fi-240\li240\jclisttab\tx390\lin240 }{\listname 
14;}\listid529684673}{\list\listtemplateid1773126601\listsimple{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\f1\fs20 \fi-240\li240\jclisttab\tx390\lin240 }
15{\listname ;}\listid599211252}}{\*\listoverridetable{\listoverride\listid529684673\listoverridecount0\ls1}{\listoverride\listid599211252\listoverridecount0\ls2}{\listoverride\listid320372207\listoverridecount0\ls3}}{\*\rsidtbl \rsid870687}{\*\generator Mi
16crosoft Word 11.0.6359;}{\info{\author SEEM}{\operator SEEM}{\creatim\yr2006\mo4\dy13\hr22\min57}{\revtim\yr2006\mo4\dy13\hr22\min58}{\version2}{\edmins1}{\nofpages1}{\nofwords186}{\nofchars1064}{\*\company CUHK}{\nofcharsws1248}{\vern24703}}
17\paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\gutter0 \widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3
18\jcompress\viewkind1\viewscale100\nolnhtadjtbl\rsidroot870687 \fet0\sectd \linex0\sectdefaultcl\sftnbj {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl3
19\pndec\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta \hich )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang 
20{\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang 
21{\pntxtb \hich (}{\pntxta \hich )}}\pard\plain \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 \fs24\lang1033\langfe3076\loch\af0\hich\af0\dbch\af13\cgrid\langnp1033\langfenp3076 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
22\hich\af1\dbch\af13\loch\f1 Tasks to do after meeting:
23\par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 1.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls1\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
24\faauto\ls1\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 check what is inside the trained model (i.e. transition prob, emission prob, etc.)}{
25\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
26\par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
27\par 
28\par \hich\af1\dbch\af13\loch\f1 Discovery today:
29\par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 1.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
30\faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 The 2nd HMM, i.e. class-based segmentation, is really a HMM. The shortest path \hich\af1\dbch\af13\loch\f1 
31is wanted. However, Viterbi algo is no need to apply (and cannot be applied) in this HMM. Djikstra algo is okay because the whole class-based word segmentation is constructed, the possible classes are already known. (See Figure 3 of the paper "Chinese Lex
32\hich\af1\dbch\af13\loch\f1 i\hich\af1\dbch\af13\loch\f1 cal Analysis Using HHMM") }{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
33\par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 2.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
34\faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 Is Atom Segment a HMM? Why the paper said that it is 5th level of HMM? In theory, no need to use HMM.}{
35\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
36\par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 3.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
37\faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 CSegment::BiOptimumSegment is the method to call before POSTagging and after unknown word recognition, therefore probably
38\hich\af1\dbch\af13\loch\f1  it is the class-based word segmentation.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
39\par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
40\par \hich\af1\dbch\af13\loch\f1 CSegGraph::GenerateWordNet
41\par {\pntext\pard\plain\f3\fs20\lang0\langfe2052\langnp0 \hich\af3\dbch\af13\loch\f3 _\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlblt\ilvl0\ls3\pnrnot0\pnf3\pnfs20\pnindent360\pnsp120\pnhang {\pntxtb \hich _}}
42\faauto\ls3\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 
43to generate a simple segmentation graph (a word net) by listing all the possible words (found in dictionary) when scanning the whole input sentence. Then, the result is stored in the m\hich\af1\dbch\af13\loch\f1 
44ember variable m_segGraph, which is a sparse transition matrix with each entry recording the word and the frequency of the word.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
45\par {\pntext\pard\plain\f3\fs20\lang0\langfe2052\langnp0 \hich\af3\dbch\af13\loch\f3 _\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlblt\ilvl0\ls3\pnrnot0\pnf3\pnfs20\pnindent360\pnsp120\pnhang {\pntxtb \hich _}}
46\faauto\ls3\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 The above would be done if the input boolean parameter "bOriginalFreq" is TRUE. Still not sure what the program will do if "b
47\hich\af1\dbch\af13\loch\f1 OriginalFreq" is FALSE.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
48\par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 
49\par 
50\par 
51\par 
52\par }}