/doc/ICTCLAS_Diary/2006-03-08.rtf

http://ictclas4j.googlecode.com/ · Unknown · 52 lines · 52 code · 0 blank · 0 comment · 0 complexity · 0d579dd72cf1542799d5f19abc95a1b0 MD5 · raw file

  1. {\rtf1\ansi\ansicpg1252\uc2\deff0\stshfdbch13\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
  2. {\f3\froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}{\f13\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}SimSun{\*\falt \'cb\'ce\'cc\'e5};}{\f37\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}@SimSun;}
  3. {\f38\froman\fcharset238\fprq2 Times New Roman CE;}{\f39\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f41\froman\fcharset161\fprq2 Times New Roman Greek;}{\f42\froman\fcharset162\fprq2 Times New Roman Tur;}
  4. {\f43\froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\f44\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f45\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f46\froman\fcharset163\fprq2 Times New Roman (Vietnamese);}
  5. {\f48\fswiss\fcharset238\fprq2 Arial CE;}{\f49\fswiss\fcharset204\fprq2 Arial Cyr;}{\f51\fswiss\fcharset161\fprq2 Arial Greek;}{\f52\fswiss\fcharset162\fprq2 Arial Tur;}{\f53\fswiss\fcharset177\fprq2 Arial (Hebrew);}
  6. {\f54\fswiss\fcharset178\fprq2 Arial (Arabic);}{\f55\fswiss\fcharset186\fprq2 Arial Baltic;}{\f56\fswiss\fcharset163\fprq2 Arial (Vietnamese);}{\f170\fnil\fcharset0\fprq2 SimSun Western{\*\falt \'cb\'ce\'cc\'e5};}
  7. {\f410\fnil\fcharset0\fprq2 @SimSun Western;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;
  8. \red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
  9. \fs24\lang1033\langfe3076\loch\f0\hich\af0\dbch\af13\cgrid\langnp1033\langfenp3076 \snext0 Normal;}{\*\cs10 \additive \ssemihidden Default Paragraph Font;}{\*
  10. \ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv
  11. \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}{\*\latentstyles\lsdstimax156\lsdlockeddef0}{\*\listtable{\list\listtemplateid1034585289
  12. \listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01_;}{\levelnumbers;}\f3\fs20 \fi-240\li240\jclisttab\tx390\lin240 }{\listname ;}\listid320372207}
  13. {\list\listtemplateid1000771315\listsimple{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\f1\fs20 \fi-240\li240\jclisttab\tx390\lin240 }{\listname
  14. ;}\listid529684673}{\list\listtemplateid1773126601\listsimple{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\f1\fs20 \fi-240\li240\jclisttab\tx390\lin240 }
  15. {\listname ;}\listid599211252}}{\*\listoverridetable{\listoverride\listid529684673\listoverridecount0\ls1}{\listoverride\listid599211252\listoverridecount0\ls2}{\listoverride\listid320372207\listoverridecount0\ls3}}{\*\rsidtbl \rsid870687}{\*\generator Mi
  16. crosoft Word 11.0.6359;}{\info{\author SEEM}{\operator SEEM}{\creatim\yr2006\mo4\dy13\hr22\min57}{\revtim\yr2006\mo4\dy13\hr22\min58}{\version2}{\edmins1}{\nofpages1}{\nofwords186}{\nofchars1064}{\*\company CUHK}{\nofcharsws1248}{\vern24703}}
  17. \paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\gutter0 \widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3
  18. \jcompress\viewkind1\viewscale100\nolnhtadjtbl\rsidroot870687 \fet0\sectd \linex0\sectdefaultcl\sftnbj {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl3
  19. \pndec\pnstart1\pnindent720\pnhang {\pntxta \hich .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta \hich )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang
  20. {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb \hich (}{\pntxta \hich )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang
  21. {\pntxtb \hich (}{\pntxta \hich )}}\pard\plain \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 \fs24\lang1033\langfe3076\loch\af0\hich\af0\dbch\af13\cgrid\langnp1033\langfenp3076 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  22. \hich\af1\dbch\af13\loch\f1 Tasks to do after meeting:
  23. \par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 1.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls1\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
  24. \faauto\ls1\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 check what is inside the trained model (i.e. transition prob, emission prob, etc.)}{
  25. \f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  26. \par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  27. \par
  28. \par \hich\af1\dbch\af13\loch\f1 Discovery today:
  29. \par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 1.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
  30. \faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 The 2nd HMM, i.e. class-based segmentation, is really a HMM. The shortest path \hich\af1\dbch\af13\loch\f1
  31. is wanted. However, Viterbi algo is no need to apply (and cannot be applied) in this HMM. Djikstra algo is okay because the whole class-based word segmentation is constructed, the possible classes are already known. (See Figure 3 of the paper "Chinese Lex
  32. \hich\af1\dbch\af13\loch\f1 i\hich\af1\dbch\af13\loch\f1 cal Analysis Using HHMM") }{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  33. \par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 2.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
  34. \faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 Is Atom Segment a HMM? Why the paper said that it is 5th level of HMM? In theory, no need to use HMM.}{
  35. \f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  36. \par {\pntext\pard\plain\f1\fs20\lang0\langfe2052\langnp0 \hich\af1\dbch\af13\loch\f1 3.\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlbody\ilvl0\ls2\pnrnot0\pndec\pnf1\pnfs20\pnstart1\pnindent360\pnsp120\pnhang {\pntxta \hich .}}
  37. \faauto\ls2\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 CSegment::BiOptimumSegment is the method to call before POSTagging and after unknown word recognition, therefore probably
  38. \hich\af1\dbch\af13\loch\f1 it is the class-based word segmentation.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  39. \par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  40. \par \hich\af1\dbch\af13\loch\f1 CSegGraph::GenerateWordNet
  41. \par {\pntext\pard\plain\f3\fs20\lang0\langfe2052\langnp0 \hich\af3\dbch\af13\loch\f3 _\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlblt\ilvl0\ls3\pnrnot0\pnf3\pnfs20\pnindent360\pnsp120\pnhang {\pntxtb \hich _}}
  42. \faauto\ls3\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1
  43. to generate a simple segmentation graph (a word net) by listing all the possible words (found in dictionary) when scanning the whole input sentence. Then, the result is stored in the m\hich\af1\dbch\af13\loch\f1
  44. ember variable m_segGraph, which is a sparse transition matrix with each entry recording the word and the frequency of the word.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  45. \par {\pntext\pard\plain\f3\fs20\lang0\langfe2052\langnp0 \hich\af3\dbch\af13\loch\f3 _\tab}}\pard \ql \fi-240\li240\ri0\nowidctlpar\jclisttab\tx390{\*\pn \pnlvlblt\ilvl0\ls3\pnrnot0\pnf3\pnfs20\pnindent360\pnsp120\pnhang {\pntxtb \hich _}}
  46. \faauto\ls3\rin0\lin240\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687 \hich\af1\dbch\af13\loch\f1 The above would be done if the input boolean parameter "bOriginalFreq" is TRUE. Still not sure what the program will do if "b
  47. \hich\af1\dbch\af13\loch\f1 OriginalFreq" is FALSE.}{\f1\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  48. \par }\pard \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 {\f1\fs20\lang0\langfe2052\langnp0\langfenp2052\insrsid870687
  49. \par
  50. \par
  51. \par
  52. \par }}