/languages_in_the_digital_age/maltese/maltese_content.tex

https://github.com/aps/metanetpaper · LaTeX · 2389 lines · 1706 code · 633 blank · 50 comment · 0 complexity · 2e7936af78d8923961d7d8fd951eaec0 MD5 · raw file

  1. % MMMMMMMMM
  2. %
  3. % MMA MM MMMMMM MMMMMMM MM MMMMMMMM MMA MM MMMMMMM MMMMMMM
  4. % MMMA AMMM MM MM MMMM MMMM MM MM MM
  5. % MM MMM MM MMMMMM MM IM MI MMMMMMM MM MMxMM MMMMMM MM
  6. % MM M MM MM MM .MMMMMM. MM MMMM MM MM
  7. % MM MM MMMMMM MM MM MM MM MMM MMMMMMM MM
  8. %
  9. %
  10. % - META-NET Language White Paper | Maltese content -
  11. %
  12. % ----------------------------------------------------------------------------
  13. \usepackage{covington}
  14. \begin{document}
  15. \maketitle
  16. % --------------------------------------------------------------------------
  17. \bsection*{Daħla --- Preface}
  18. \null
  19. \pagestyle{empty}
  20. \pagenumbering{Roman}
  21. \setcounter{page}{3}
  22. \pagestyle{scrheadings}
  23. \begin{Parallel}[c]{78mm}{78mm}
  24. \ParallelLText{\selectlanguage{maltese}%
  25. Din il-White Paper hija għall-edukaturi, ġurnalisti, politikanti, kommunitajiet ta lingwi u oħrajn, li jixtiequ jistabbilixxu Ewropa tabilħaqq multilingwali.
  26. Din hija parti minn serje White Papers li jippromwovu għarfien dwar it-teknoloġija lingwistika u l-potenzjal tagħha. Id-disponibiltà u l-użu tat-teknoloġija lingwistika fl-Ewropa tvarja bejn lingwa u oħra. Konsegwentement, l-azzjonijiet li huma meħtieġa sabiex jiġu appoġġjati r-riċerka u l-iżvilupp tat-teknoloġiji lingwistiċi ivarjaw ukoll fkull lingwa. L-azzjonijiet meħtieġa jiddependu fuq bosta fatturi, bħal kumplessità ta lingwa partikolari u d-daqs tal-komunità tagħha.
  27. META-NET, Netwerk ta Eċċellenza tal-Kummissjoni Ewropea, wettaq analiżi dwar ir-riżorsi u t-teknoloġiji lingwistiċi kurrenti fdin is-serje ta \emph{white papers} (p.~\pageref{whitepaperseries}). Din l-analiżi kienet ibbażata fuq 23 lingwa uffiċjali Ewropeja, kif ukoll lingwi reġjonali oħra importanti fl-Ewropa. Ir-riżultati ta dan l-analiżi jissuġġerixxu li hemm bosta nuqqasijiet fir-riċerka għal kull lingwa. Analiżi aktar dettaljata u esperta u assessjar tas-sitwazzjoni kurrenti għandha tgħin sabiex timmassimizza l-impatt ta riċerka addizzjonali.
  28. Minn Novembru 2011 META-NET tikkonsisti f54 ċentru ta riċerka minn 33 pajjiż \cite{rehm2011} (p.~\pageref{metanetmembers}) li qed jaħdmu ma partijiet interessati minn negozji kummerċjali, aġenziji governattivi, industriji, organizzazzjonijiet ta riċerka, kumpaniji ta software, fornituri ta teknoloġija u universitajiet Ewropej. Flimkien, dawn qed joħolqu viżjoni teknoloġika komuni filwaqt li jiżviluppaw aġenda ta riċerka strateġika li turi kif applikazzjonijiet teknoloġiċi lingwistiċi jistgħu jindirizzaw xi nuqqasijiet ta riċerka sal-2020.}%
  29. \ParallelRText{\selectlanguage{english}%
  30. \vspace{-2mm}%
  31. This white paper is part of a series that promotes knowledge about language technology and its potential. It addresses journalists, politicians, language communities, educators and others.
  32. The availability and use of language technology in Europe varies between languages. Consequently, the actions that are required to further support research and development of language technologies also differs. The required actions depend on many factors, such as the complexity of a given language and the size of its community.
  33. META-NET, a Network of Excellence funded by the European Commission, has conducted an analysis of current language resources and technologies in this white paper series (p.~\pageref{whitepaperseries}). The analysis focused on the 23 official European languages as well as other important national and regional languages in Europe. The results of this analysis suggest that there are tremendous deficits in technology support and significant research gaps for each language. The given detailed expert analysis and assessment of the current situation will help maximise the impact of additional research.
  34. META-NET consists of 54 research centres from 33 European countries \cite{rehm2011} (p.~\pageref{metanetmembers}). META-NET is working with stakeholders from economy (software companies, technology providers, users), government agencies, research organisations, non-governmental organisations, language communities and European universities. Together with these communities, META-NET is creating a common technology vision and strategic research agenda for multilingual Europe 2020.}%\ParallelPar
  35. \end{Parallel}
  36. % --------------------------------------------------------------------------
  37. \makefundingnotice
  38. \cleardoublepage
  39. \bsection*{Werrej --- Contents}
  40. \renewcommand\contentsname{}
  41. \tableofcontents
  42. \addtocontents{toc}{\protect\thispagestyle{empty}\protect}
  43. \addtocontents{toc}{{\Large\textsf{\centerline{IL-LINGWA MALTIJA FL-ERA DIĠITALI}}\par}}
  44. % --------------------------------------------------------------------------
  45. \cleardoublepage
  46. \setcounter{page}{1}
  47. \pagenumbering{arabic}
  48. \pagestyle{scrheadings}
  49. \ssection[Sommarju Eżekuttiv]{Sommarju Eżekuttiv}
  50. \selectlanguage{maltese}
  51. \begin{multicols}{2}
  52. Matul dawn l-aħħar sittin sena, l-Ewropa saret struttura politika u ekonomika distinta, iżda kulturalment u lingwistikament għadha diversa ħafna. Dan ifisser li mill-Portugiż għall-Pollakk u t-Taljan għall-Islandiż, ta kuljum il-komunikazzjoni bejn -ċittadini tal-Ewropa kif ukoll il-komunikazzjoni fl-oqsma tan-negozju u l-politika hija inevitabbilment ikkonfrontata minn ostakli lingwistiċi. L-istituzzjonijiet tal-UE jonfqu madwar biljun euro fis-sena fuq -żamma tal-politika tagħhom tal-multilingwiżmu, jiġifieri, it-traduzzjoni ta testi u l-interpretar tal-komunikazzjoni mitkellma. Madankollu, għandu dan ikun ta daqsekk piż? It-teknoloġija moderna tal-lingwi u r-riċerka lingwistika jistgħu jkunu ta kontribuzzjoni sinifikanti biex jitwaqqgħu dawn l-ostakli lingwistiċi. Meta kkombinati mal-apparati u l-applikazzjonijiet intelliġenti, it-teknoloġija lingwistika tkun fil-futur tista tgħin lill-Ewropej jitkellmu ma xulxin bmod faċli u jagħmlu n-negozju ma xulxin anke jekk ma jkunux jitkellmu lingwa komuni.
  53. \boxtext{It-teknoloġija lingwistika tibni rabtiet.}
  54. L-ostakli tal-lingwa jistgħu jwaqqfu n-negozju, bmod speċjali għall-SMEs li mgħandhomx il-mezzi finanzjarji biex ireġġgħu lura s-sitwazzjoni. L-unika alternattiva (inkonċepibbli) għal din it-tip ta Ewropa multilingwali tkun li tippermetti lingwa waħda biex tieħu pożizzjoni dominanti u tispiċċa tissostitwixxi kull lingwa oħra.
  55. Mod klassiku ta kif tista tegħleb l-ostaklu tal-lingwa huwa li titgħallem il-lingwi barranin. Iżda mingħajr appoġġ teknoloġiku, li tkun taf it-23 lingwa uffiċjali tal-Istati Membri tal-Unjoni Ewropea u xi sittin lingwa Ewropea oħra huwa ostaklu insormontabbli għaċ-ċittadini tal-Ewropa u l-ekonomija, id-dibattitu politiku, u l-progress xjentifiku tagħha.
  56. Is-soluzzjoni hija li jinbnew teknoloġiji ewlenin li jippermettu dan. Dawn ikunu joffru lill-atturi Ewropej vantaġġi tremendi, mhux biss fis-suq komuni Ewropew iżda wkoll frelazzjonijiet ta kummerċ ma pajjiżi terzi, bmod speċjali l-ekonomiji emerġenti. Biex jintlaħaq dan il-għan u tiġi preservata d-diversità kulturali u lingwistika tal-Ewropa, huwa neċessarju li l-ewwel issir analiżi sistematika tal-partikolaritajiet tal-lingwi Ewropej kollha u l-istat attwali tal-appoġġ tat-teknoloġija lingwistika li hawn għalihom. Soluzzjonijiet tat-teknoloġiji lingwistiċi se jservu eventwalment bħala rabta unika bejn il-lingwi tal-Ewropa.
  57. L-għodda ta traduzzjoni awtomatizzata u tal-ipproċessar tad-diskors li huma attwalment disponibbli fis-suq għadhom pjuttost lura minn dan il-għan ambizzjuż. L-atturi dominanti fdan il-qasam huma primarjament l-intrapriżi bsidien privati għall-profitt ibbażati fl-Amerka ta Fuq. Diġà fl-aħħar tal-1970, l-UE rrealizzat ir-relevanza profonda tat-teknoloġija lingwistika bħala sewwieq lejn l-unità Ewropea , u bdiet tiffinanzja l-ewwel proġetti ta riċerka tagħha, bħall-EUROTRA. Fl-istess żmien, proġetti nazzjonali ġew imwaqqfa u ġġeneraw riżultati ta valur iżda qatt ma wasslu għal azzjoni Ewropea kkonċertata. Fkuntrast ma dan l-isforz ta finanzjament ferm selettiv, soċjetajiet multilingwali oħra bħall-Indja (22 lingwa uffiċjali) u l-Afrika tIsfel (11-il lingwa uffiċjali) waqqfu reċentement programmi nazzjonali fit-tul għar-riċerka tal-lingwi u l-iżvilupp tat-teknoloġija.
  58. L-atturi predominanti fit-teknoloġija lingwistika llum jiddependu fuq approċċi ta statistika mhux preċiżi li ma jagħmlux użu minn metodi lingwistiċi u għarfien aktar fondi. Per eżempju, is-sentenzi jiġu awtomatikament tradotti billi tiġi kkumparata sentenza ġdida ma eluf ta sentenzi oħrajn li jkunu ġew tradotti qabel minn umani. Il-kwalità tar-riżultat il-biċċa l-kbira tiddependi fuq l-ammont u l-kwalità tal-kampjun tal-korp disponibbli. Filwaqt li t-traduzzjoni awtomatika ta sentenzi sempliċi fil-lingwi bammonti suffiċjenti ta materjal ta testi disponibbli tista tikseb riżultati utli, metodi ta statistika vojta bħal dawn huma destinati li jfallu fil-każ ta lingwi bkorp ta kampjuni tal-materjal ferm iżgħar jew fil-każ ta sentenzi bi strutturi kumplessi.
  59. \boxtext{It-teknoloġija lingwistika bħala ċavetta għall-futur.}
  60. Għalhekk l-Unjoni Ewropea ddeċidiet li tiffinanzja proġetti bħall-EuroMatrix u l-EuroMatrixPlus (mill-2006) u iTranslate4 (mill-2010) li jwettqu riċerka bażika u applikata u jiġġeneraw riżorsi biex jiġu stabbiliti soluzzjonijiet għat-teknoloġija lingwistika ta kwalità għolja għall-lingwi Ewropej kollha. Li tanalizza l-proprjetajiet strutturali aktar fondi tal-lingwi huwa l-uniku pass il quddiem jekk irridu nibnu applikazzjonijiet li jaħdmu tajjeb tul il-firxa kollha tal-lingwi tal-Ewropa.
  61. Ir-riċerka Ewropea fdan il-qasam diġà kisbet numru ta suċċessi. Per eżempju, is-servizzi ta traduzzjoni tal-Unjoni Ewropea issa jużaw is-softwer ta traduzzjoni bil-magni bsors miftuħ MOSES li ġie prinċipalment żviluppat permezz ta proġetti ta riċerka Ewropej. FMalta, l-oqsma tat-teknoloġija lingwistika l-aktar avvanzati bħalissa huma dawk tas-sinteżi tat-taħdit u l-korpora tat-testi: fil-qasam tas-sinteżi tat-taħdit bil-Malti, proġett appoġġjat mill-Gvern parzjalment iffinanzjat mill-fond ta żvilupp reġjonali tal-UE qiegħed fil-proċess li jwassal it-teknoloġija tat-taħdit lill-persuni bdiżabiltà. Il-konsorzju, li jikkonsisti fSME (Crimson Wing Ltd), fondazzjoni (FITA, Fundazzjoni għall-Aċċess tat-TI), u l-Università, wiegħed li dawn ir-riżorsi se jkunu disponibbli għal skopijiet ta riċerka. Fil-qasam tal-korpora tat-testi, is-Server għar-Riżorsi Lingwistiċi bil-Malti (MLRS) qed jagħti frott u sforzi sinifikanti li għadhom għaddejjin fl-Università, permezz tal-Istitut tal-Lingwistika (A.~Gatt, C.~Borg, R.~Fabri) u d-Dipartiment ta Sistemi Intelliġenti tal-Kompjuter (M.~Rosner), li jsostnu u jiżviluppaw dan. Bħalissa il-korpus jinkludi madwar 100M kelma, u hemm aktar għodod ippjanati inkluż tagger għall-kategoriji tal-kliem u ċekkjatur ortografiku.
  62. \boxtext{It-teknoloġija lingwistika tgħin\\ biex tgħaqqad lill-Ewropa.}
  63. Jekk inħarsu lejn l-għarfien miksub sissa, jidher li t-teknoloġija ibrida tal-lingwi tal-lum li tħallat metodi ta pproċessar fondi ma dawk ta statistika se tkun tista timla l-vojt ta bejn il-lingwi Ewropej kollha u aktar. Bħalma din is-serje ta \emph{white papers} turi, hemm differenza drammatika fl-istat ta prontezza fir-rigward ta soluzzjonijiet ta lingwi u l-istat tar-riċerka bejn l-Istati Membri tal-Ewropa. Din il-\emph{white} paper għall-lingwa Maltija turi li hemm il-potenzjal għal industrija tat-teknoloġija lingwistika u ambjent tar-riċerka fMalta. Iżda għalkemm numru ta teknoloġiji u riżorsi jeżisti, hemm ħafna anqas minn lingwi Ewropej li huma ``akbar'' u ċertament mhux biżżejjed biex tiġi appoġġjata l-firxa kompleta ta applikazzjonijiet sensittivi għall-lingwi li huma disponibbli għal dawk il-lingwi l-oħra.
  64. Skont il-valutazzjoni ddettaljata fdan ir-rapport, il-kisba ta suċċess fit-teknoloġija tal-lingwa Maltija tirrekjedi ċiklu sħiħ ta bidliet li jkun jinvolvi fornituri ta kontenut, żviluppaturi u utenti tat-teknoloġija lingwistika. Xi bidliet fil-politika tal-lingwa nazzjonali jridu jiġu implimentati qabel ma xi suċċessi għall-Lingwa Maltija jkunu jistgħu jinkisbu.
  65. %\columnbreak
  66. L-għan fit-tul tal-META-NET huwa li tiġi introdotta teknoloġija lingwistika bi kwalità għolja għall-lingwi kollha sabiex tinkiseb l-unità politika u ekonomika permezz tad-diversità kulturali. It-teknoloġija tkun tgħin biex tkisser l-ostakli eżistenti u jinbnew ir-rabtiet bejn il-lingwi tal-Ewropa. Dan jirrikjedi l-partijiet interessati kollha fil-politika, ir-riċerka, in-negozju, u s-soċjetà biex jingħaqdu l-isforzi tagħhom fil-futur.
  67. Din is-serje ta \emph{white papers} tikkumplimenta azzjonijiet strateġiċi oħra meħuda minn META-NET (ara l-appendiċi għal ħarsa ġenerali). Informazzjoni aġġornata bħall-verżjoni attwali tal-\emph{vision paper} \cite{Meta1} tal-META-NET jew l-Aġenda għar-Riċerka Strateġika (SRA) jistgħu jinstabu fuq il-websajt tal-META-NET \url{http://www.meta-net.eu}.
  68. \end{multicols}
  69. \clearpage
  70. % --------------------------------------------------------------------------
  71. \ssection[Riskju għal-Lingwi Tagħna u Sfida għat-Teknoloġija Lingwistika]{Riskju għal-Lingwi Tagħna u Sfida għat-Teknoloġija Lingwistika}
  72. \begin{multicols}{2}
  73. Qed naraw rivoluzzjoni diġitali li qed tħalli impatt bmod drammatiku fuq il-komunikazzjoni u s-soċjetà. Żviluppi riċenti fit-teknoloġija tal-komunikazzjoni diġitizzata u tan-networks xi drabi jiġu mqabbla mal-invenzjoni tal-istampar ta Gutenberg. Xtista tgħidilna din l-analoġija dwar il-futur tas-soċjetà tal-informazzjoni Ewropea u bmod partikolari l-lingwi tagħna?
  74. \boxtext{Ir-revoluzzjoni diġitali hija komparabbli mal-invenzjoni tal-istamperija ta' Gutenberg.}
  75. Wara l-invenzjoni ta Gutenberg, skoperti reali fil-komunikazzjoni u skambju ta għarfien kienu mwettqa permezz ta sforzi bħat-traduzzjoni tal-Bibbja ta Luther għal-lingwa komuni. Fis-sekli sussegwenti, tekniki kulturali ġew żviluppati biex jimmaniġġjaw aħjar l-ipproċessar tal-lingwa u l-iskambju ta għarfien:
  76. \begin{itemize}
  77. \item l-istandardizzazzjoni ortografika u grammatikali ta lingwi kbar ippermettiet it-tixrid rapidu ta ideat xjentifiċi u intellettwali ġodda;
  78. \item l-iżvilupp ta lingwi uffiċjali għamel possibbli għaċ-ċittadini li jikkomunikaw bejn ċerti konfini (ta spiss politiċi);
  79. \item it-tagħlim u t-traduzzjoni tal-lingwi ppermettew skambju bejn il-lingwi;
  80. \item il-ħolqien ta linji gwida ġurnalistiċi u biblijografiċi żguraw il-kwalità u d-disponibilità ta materjal ipprintjat;
  81. \item il-ħolqien ta midja differenti bħal gazzetti, radju, televiżjoni, kotba, u formati oħra ssodisfaw ħtiġijiet differenti ta komunikazzjoni.
  82. \end{itemize}
  83. Fl-aħħar għoxrin sena, it-teknoloġija tal-informatika (TI) għenet biex ħafna mill-proċessi jiġu awtomatizzati u ffaċilitati:
  84. \begin{itemize}
  85. \item is-software ta desktop publishing jissostitwixxi l-ittajpjar u t-typesetting;
  86. \item il-Microsoft PowerPoint tissostitwixxi t-trasparenzi tal-projectors;
  87. \item il-posta elettronika tibgħat u tirċievi dokumenti aktar malajr minn fax;
  88. \item Skype jagħmel telefonati bl-internet u jospita laqgħat virtwali;
  89. \item il-formati ta kodifikazzjoni audio u video jagħmluha faċli għal skambju ta kontenut multimidjali;
  90. \item il-magni ta tiftix jipprovdu aċċess ibbażat fuq kelma ewlenija għall-paġni web;
  91. \item is-servizzi fuq l-internet bħal Google Translate jipproduċu traduzzjonijiet ta malajr u approssimattivi;
  92. \item il-pjattaformi tal-midja soċjali jiffaċilitaw il-kollaborazzjoni u l-qsim ta informazzjoni.
  93. \end{itemize}
  94. Għalkemm għodod u applikazzjonijiet bħal dawn huma utli, bħalissa ma jistgħux jappoġġjaw b'mod suffiċjenti soċjetà tal-informazzjoni multilingwi Ewropea sostenibbli, soċjetà moderna u inklussiva fejn l-informazzjoni u l-merkanzija jistgħu jiċċirkolaw b’mod ħieles.
  95. \subsection[Il-Konfini Lingwistiċi jfixklu s-Soċjetà Ewropea tal-Informazzjoni]{Il-Konfini Lingwistiċi\newline jfixklu s-Soċjetà Ewropea tal-Informazzjoni}
  96. Ma nistgħux inbassru bmod preċiż kif se tkun is-soċjetà tal-informazzjoni tal-ġejjieni. Mandankollu, hemm probabbiltà li r-revoluzzjoni fit-teknoloġija tal-komunikazzjoni tgħaqqad lin-nies li jitkellmu lingwi differenti b'modi ġodda. Din titfa' pressjoni kemm fuq individwi biex jitgħallmu lingwi ġodda u speċjalment kemm għal iżviluppaturi ta' softwer biex joħolqu applikazzjonijiet ta' teknoloġija ġodda biex jassiguraw ftehim komuni u aċċess għal għarfien kondiviżibbli. Fekonomija globali u spazju ta informazzjoni, aktar lingwi, kelliema u kontenut jikkonfrontawna u jirrikjeduna li ninteraġixxu malajr ma tipi ġodda ta midja. Il-popolarità kurrenti ta midja soċjali (Wikipedia, Facebook, Twitter u YouTube) hija biss parti żgħira minn stampa akbar.
  97. %FIXME: Missing box texts! -- Fixed!
  98. \boxtext{L-ekonomija globali u s-spazju ta'\\ informazzjoni jikkonfrontana ma' lingwi,\\ kelliema u kontenut differenti.}
  99. Illum, aħna nistgħu nittrażmettu gigabytes ta test madwar id-dinja fi ftit sekondi qabel nagħrfu li huwa blingwa li aħna ma nifhmux. Skont rapport riċenti mitlub mill-Kummissjoni Ewropea, 57\% tal- utenti tal-internet fl-Ewropa jixtru oġġetti u servizzi blingwi li mhumiex il-lingwa nattiva tagħhom. (l-Ingliż huwa l-ilsien barrani l-aktar komuni segwit mill-Franċiż, il-Ġermaniż u l-Ispanjol.) 55\% tal-utenti jaqraw kontenut f'lingwa barranija filwaqt li 35\% biss jużaw lingwa oħra biex jiktbu ittri elettroniċi jew jibagħtu kummenti fuq il-web \cite{EC1}. Ftit snin ilu, l-Ingliż seta’ kien il-\emph{lingua franca} tal-web -- il-maġġoranza l-kbira tal-kontenut fuq il-web kien bl-Ingliż -- iżda s-sitwazzjoni issa nbidlet drastikament. L-ammont ta’ kontenut fuq l-internet b’lingwi oħra (partikolarment lingwi tal-Asja u Għarab) kiber f'daqqa waħda.
  100. Firda diġitali li tinsab kullimkien u li hija kkawżata minn konfini lingwistiċi sorprendentement ma kisbitx ħafna attenzjoni fid-diskors pubbliku; madankollu, qajmet kwistjoni urġenti ħafna, ``Liema lingwi Ewropej se jirnexxu u jippersistu fis-soċjetà tal-informazzjoni u l-għarfien, ibbażata fuq in-networks?''
  101. \subsection{Il-Lingwi Tagħna fRiskju}
  102. L-istampar ikkontribwixxa għal skambju imprezzabbli ta informazzjoni fl-Ewropa, iżda wassal ukoll għall-estinzjoni ta bosta lingwi Ewropej. Lingwi reġjonali u dawk fminoranza rarament ġew stampati. Bħala riżultat ta dan, ħafna lingwi bħal Cornish jew Dalmatian spiss kienu limitati għal forom orali ta trasmissjoni, li llimita l-adozzjoni kontinwa, it-tixrid u l-użu tagħhom.
  103. \boxtext{Il-varjetà wiesgħa tal-lingwi fl-Ewropa hija assi kulturali l-aktar sinjuri u importanti tagħha.}
  104. Il-lingwi tal-Ewropa, madwar 80, huma wieħed mill-assi l-aktar prezzjużi tagħha u l-aktar importanti fdak li huwa assi kulturali. In-numru kbir ta lingwi Ewropej huwa wkoll parti vitali mis-suċċess soċjali tagħha \cite{EC2}. Filwaqt li l-lingwi popolari bħall-Ingliż jew l-Ispanjol żgur li se jżommu l-preżenza tagħhom fis-soċjetà diġitali u s-suq li qed jitfaċċaw, bosta lingwi Ewropej jistgħu jinqatgħu mill- komunikazzjonijiet diġitali u jsiru irrilevanti għas-soċjetà tal-internet. Żviluppi bħal dawn żgur li ma jkunux mixtieqa. Minn naħa, opportunità strateġika tista tintilef li jista jkun iddgħajjef il-pożizzjoni globali tal-Ewropa. Min-naħa l-oħra, żviluppi bħal dawn jistgħu jmorru kontra l-għan ta parteċipazzjoni ugwali għal kull ċittadin Ewropew irrispettivament mil-lingwa. Skont rapport tal-UNESCO dwar il-multilingwiżmu, il-lingwi huma mezz essenzjali għat-tgawdija tad-drittijiet fundamentali, bħall-espressjoni politika, l-edukazzjoni u l-parteċipazzjoni fis-soċjetà \cite{Unesco1}.
  105. \subsection{It-Teknoloġija Lingwistika hija Teknoloġija Katalizzanti Ewlenija}
  106. Fil-passat, l-isforzi tal-investiment iffukaw fuq l-edukazzjoni u t-traduzzjoni tal-lingwi. Pereżempju, skont ċerti stimi, is-suq Ewropew għat-traduzzjoni, l-interpretazzjoni, il-lokalizzazzjoni ta software, u l-globalizzazzjoni ta websajts kien ta' EUR 8.4 biljun fl-2008 u kien mistenni li jikber b’10\% fis-sena \cite{EC3}. Madankollu, din il-kapaċità eżistenti mhix biżżejjed biex tissodisfa l-ħtiġijiet kurrenti u futuri.
  107. It-teknoloġija lingwistika hija teknoloġija katalizzanti ewlenija li tista tħares u trawwem lingwi Ewropej. It-teknoloġija lingwistika tgħin lin-nies jikkollaboraw, iwettqu negozju, jaqsmu l-għarfien u jieħdu sehem fdibattiti soċjali u politiċi irrispettivament mill-ostakoli lingwistiċi jew il-ħiliet tal-kompjuter. It-teknoloġija lingwistika diġà qed tassisti kompiti ta kuljum, bħal kitba ta ittri elettroniċi, twettiq ta tiftix fuq l-internet jew prenotazzjoni ta titjiriet. Aħna nibbenefikaw minn teknoloġija lingwistika meta:
  108. \begin{itemize}
  109. \item nsibu informazzjoni permezz ta magni ta tiftix fuq l-internet;
  110. \item niċċekkjaw l-ortografija u l-grammatika fil-word processor;
  111. \item naraw rakkomandazzjonijiet dwar prodotti fħanut online;
  112. \item nisimgħu struzzjonijiet verbali ta sistema ta navigazzjoni;
  113. \item nittraduċu paġni web permezz ta servizz fuq l-internet.
  114. \end{itemize}
  115. It-teknoloġiji lingwistiċi deskritti f'dan id-dokument huma parti essenzjali minn applikazzjonijiet futuri innovattivi. It-teknoloġija lingwistika hija teknoloġija li tipikament tippermetti xogħol f'qafas ta applikazzjoni akbar bħal sistema ta navigazzjoni jew magna ta tiftix. Dawn il-White Papers jiffokaw fuq il-prontezza ta teknoloġiji ewlenin għal kull lingwa.
  116. \boxtext{L-Ewropa teħtieġ teknoloġija robusta u affordabbli għal-lingwi kollha Ewropej.}
  117. Fil-futur qrib, inkunu neħtieġu teknoloġija lingwistika għal-lingwi kollha Ewropej li tkun disponibbli, bi prezz raġonevoli u integrata sewwa fambjenti ta software akbar. Esperjenza interattiva, multimidjali u multilingwi tal-utent mhijiex possibbli mingħajr teknoloġija lingwistika.
  118. \subsection{Opportunitajiet għat-Teknoloġija Lingwistika}
  119. It-teknoloġija lingwistika tista twettaq traduzzjoni awtomatika, produzzjoni ta kontenut, ipproċessar ta informazzjoni u ġestjoni ta għarfien possibbli għal-lingwi Ewropej kollha. It-teknoloġija lingwistika tista wkoll tkompli l-iżvilupp ta interfaces intuwittivi bbażati fuq il-lingwa għal elettronika tad-dar, makkinarju, vetturi, kompjuters u robots. Għalkemm ħafna prototipi diġà jeżistu, l-applikazzjonijiet kummerċjali u industrijali għadhom fi stadji bikrin ta żvilupp. Kisbiet riċenti fir-riċerka u l-iżvilupp ħolqu tieqa ġenwina ta opportunità. Pereżempju, it-traduzzjoni awtomatika (TA) diġà qed tagħti ammont raġonevoli ta preċiżjoni fi ħdan dominji speċifiċi, u applikazzjonijiet esperimentali jipprovdu informazzjoni multilingwi u ġestjoni ta għarfien kif ukoll produzzjoni ta kontenut fħafna ilsna Ewropej.
  120. Applikazzjonijiet lingwistiċi, interfaces għall-utenti bbażati fuq il-vuċi u sistemi ta djalogu jinsabu tradizzjonalment fdominji speċjalizzati ħafna, u ħafna drabi jagħtu prestazzjoni limitata. Qasam wieħed attiv ta riċerka huwa l-użu tat-teknoloġija lingwistika għal operazzjonijiet ta salvataġġ f'zoni ta’ diżastri. F’ ambjenti bħal dawn ta’ riskju għoli, l-eżattezza tat-traduzzjoni tista’ tkun kwistjoni ta’ ħajja jew mewt. L-istess raġunament japplika għall-użu tat-teknoloġija lingwistika fl-industrija tal-kura tas-saħħa. Robots intelliġenti b’kapaċitajiet lingwistiċi u bejn il-lingwi għandhom il-potenzjal li jsalvaw il-ħajjiet.
  121. Hemm opportunitajiet tas-suq enormi fl-edukazzjoni u l-industriji tad-divertiment għall-integrazzjoni ta teknoloġiji lingwistiċi fil-logħob, offerti ta edudivertiment, ambjenti ta simulazzjoni jew programmi ta taħriġ. Servizzi mobbli ta informazzjoni, software għat-tagħlim tal-lingwi assistit mill-kompjuter, ambjenti tal-eLearning, għodod għal awtoevalwazzjoni u software għal sejbien ta plaġjariżmu huma biss ftit eżempji oħra fejn it-teknoloġija lingwistika jista jkollha rwol importanti. Il-popolarità ta applikazzjonijiet soċjali tal-midja bħal Twitter u Facebook jissuġġerixxu ħtieġa akbar għal teknoloġiji lingwistiċi sofistikati li jistgħu jissorveljaw postijiet, iwettqu sommarji ta diskussjonijiet, jissuġġerixxu tendenzi ta opinjonijiet, jiskopru reazzjonijiet emozzjonali, jidentifikaw ksur tad-drittijiet tal-awtur jew użu ħażin ta sistemi ta kompjuters.
  122. \boxtext{It-teknoloġija lingwistika tgħin biex tegħleb id-diżabilità tad-diversità lingwistika.}
  123. It-teknoloġija lingwistika tirrappreżenta opportunità kbira għall-Unjoni Ewropea li tagħmel sens kemm ekonomikament kif ukoll kulturalment. Il-multilingwiżmu fl-Ewropa sar ir-regola. Negozji, organizzazzjonijiet u skejjel Ewropej huma wkoll multinazzjonali u diversi. -ċittadini jridu jikkomunikaw bejn il-konfini lingwistiċi li għadhom jeżistu fis-Suq Komuni Ewropew. It-teknoloġija lingwistika tista tgħin biex jingħelbu dawn l-ostakli li fadal waqt li tappoġġja l-użu ħieles u miftuħ tal-lingwi. Barra minn hekk, teknoloġija lingwistika innovattiva u multilingwi għall-Ewropej tista wkoll tgħinna nikkomunikaw mal-imsieħba globali tagħna u l-komunitajiet multilingwi tagħhom. It-teknoloġiji lingwistiċi jappoġġjaw l-ammont kbir ta opportunitajiet ekonomiċi internazzjonali.
  124. \subsection{Sfidi li t-Teknoloġija Lingwistika Taffaċċja}
  125. \boxtext{Il-pass kurrent tal-progress technoloġiku\\ progress huwa bil-wisq.}
  126. Għalkemm it-teknoloġija lingwistika għamlet progress konsiderevoli fl-aħħar ftit snin, il-pass preżenti tal-progress teknoloġiku u l-innovazzjoni tal-prodotti miexi bil-mod wisq. Teknoloġiji lingwistiċi bużu wiesa, bħal karatteristiċi ta ortografija u grammatika fil-word processors, huma tipikament monolingwi, u huma disponibbli biss għal numru żgħir ta lingwi. Servizzi ta' traduzzjoni awtomatika fuq l-internet huma eċċellenti fil-ħolqien ta’ approssimazzjoni tajba ta’ kontenut f’dokument, iżda huma mimlija diffikultajiet varji meta jkunu meħtieġa traduzzjonijiet preċiżi ħafna u kompluti. Minħabba l-kumplessita tal-lingwa umana, nimmudellaw ilsiena b’softwer u nittestjawom fid-dinja vera huwa proċess twil u għoli li jeħtieġ impenn ta’ fondi sostnuti. L-Ewropa għandha għalhekk tmantni l-irwol pijuniera tagħha fl-affaċjar ta’ sfidi teknoloġiċi ta’ kommunita multi lingwali billi tivvinta metodi ġodda sabiex taċċellera l-iżvilupp dritt madwar il-mappa. Dawn jistgħu jinkludu kemm avvanzi ta’ kompjutazzjoni u kemm dawk tekniċi bħal \emph{crowdsourcing}.
  127. \subsection{Il-Ksib tal-Lingwi tal-bnedmin u tal-magni}
  128. Sabiex nuru kif il-kompjuters jimmaniġġjaw il-lingwa u għaliex il-ksib tal-lingwi huwa kompitu diffiċli ħafna, nagħtu ħarsa fil-qosor lejn il-mod kif il-bnedmin jiksbu l-ewwel u t-tieni lingwa, imbagħad nagħmlu skeċċ kif sistemi tat-traduzzjoni awtomatika jaħdmu -- hemm raġuni għaliex il-qasam tat-teknoloġija lingwistika huwa marbut mill-qrib mal-qasam tal-intelliġenza artifiċjali.
  129. Il-bnedmin jiksbu l-ħiliet lingwistiċi permezz ta żewġ modi differenti. L-ewwel, it-tarbija titgħallem lingwa billi tisma l-interazzjoni bejn il-kelliema tal-lingwa. Espożizzjoni għal eżempji konkreti lingwistiċi mill-utenti tal-lingwi, bħal ġenituri, aħwa u membri oħra tal-familja, jgħinu lit-trabi mill-età ta madwar sentejn jew viċin jipproduċu l-ewwel kelmiet u frażijiet qosra tagħhom. Dan huwa possibbli biss minħabba d-dispożizzjoni ġenetika speċjali li għandhom il-bnedmin għat-tagħlim tal-lingwi.
  130. It-tagħlim tat-tieni lingwa normalment jeħtieġ sforz ferm aktar meta tifel jew tifla ma jkollhiex immersjoni fil-komunità lingwistika ta kelliema nattivi. Fl-età skolastika, lingwi barranin jinkisbu ġeneralment permezz ta tagħlim tal-istrutturi grammatikali, il-vokabularju u l-ortografija tagħhom minn kotba u materjal edukattiv li jiddeskrivu l-għarfien lingwistiku f'termini ta' regoli astratti, tabelli u testi bħala eżempji.
  131. \boxtext{Il-bnedmin jiksbu l-ħiliet lingwistiċi permezz ta żewġ modi differenti: talli jitagħlmu eżempji\\ u talli jitagħlmu ir-regoli bażiċi tal-lingwa.}
  132. -żewġ tipi ewlenin ta sistemi ta teknoloġija lingwistika jakkwistaw kapaċitajiet lingwistiċi bmod simili bħall-bnedmin. Metodi statistiċi jiksbu għarfien lingwistiku minn ġbir vast ta testi konkreti bħala eżempji f'lingwa waħda jew fl-hekk imsejħa testi paralleli li huma disponibbli f’żewġ lingwi jew aktar. L-algoritmi tat-tagħlim awtomatiku jiffurmaw ċerta fakultà lingwistika li tista’ tikseb mudelli ta’ kif kliem, frażijiet qosra u sentenzi sħaħ jintużaw b’mod korrett f'lingwa waħda jew jiġu tradotti minn lingwa għal oħra. In-numru kbir ta sentenzi li metodi statistiċi jeħtieġu huwa enormi. Il-kwalità tal-prestazzjoni tiżdied hekk kif in-numru ta testi analizzati jiżdiedu. Huwa komuni li sistemi bħal dawn jitħarrġu fuq testi li jinkludu miljuni ta sentenzi. Din hija waħda mir-raġunijiet għaliex fornituri ta magni ta tiftix huma ħerqana li jiġbru kemm jista jkun materjal bil-miktub. Il-korrezzjoni ortografika fword processors, l-informazzjoni disponibbli fuq l-internet, u s-servizzi ta traduzzjoni bħal Google Search u Google Translate jiddependu fuq metodu statistiku (mmexxi minn dejta).
  133. \boxtext{-żewġ tipi prinċipali tas-sistemi tat-teknoloġija langwistika jakkwistaw lingwi b'mod simili.}
  134. Sistemi bbażati fuq regoli huma t-tieni tip ewlieni ta teknoloġija lingwistika. Esperti mil-lingwistika, lingwistika kompjutazzjonali u x-xjenza tal-kompjuter jikkodifikaw analiżi grammatikali (regoli tat-traduzzjoni) u jikkumpilaw listi ta vokabularju (dizzjunarji). It-twaqqif ta sistema bbażata fuq regoli jieħu ħafna ħin u jinvolvi xogħol intensiv. Sistemi bbażati fuq regoli jeħtieġu wkoll esperti speċjalizzati sew. Uħud mis-sistemi ta traduzzjoni awtomatika bbażati fuq regoli kienu taħt żvilupp kostanti għal aktar minn għoxrin sena. Il-vantaġġ tas-sistemi bbażati fuq regoli huwa li l-esperti jistgħu jikkontrollaw bmod aktar dettaljat l-ipproċessar tal-lingwa. Dan jagħmel possibbli li l-iżbalji jiġu kkoreġuti bmod sistematiku fis-software u tingħata informazzjoni dettaljata lill-utent, speċjalment meta s-sistemi bbażati fuq regoli jintużaw għat- tagħlim tal-lingwi. Minħabba restrizzjonijiet finanzjarji, teknoloġija lingwistika bsistemi bbażati fuq regoli hija possibbli għal-lingwi ewlenin biss.
  135. \end{multicols}
  136. \clearpage
  137. % --------------------------------------------------------------------------
  138. \ssection[Il-Malti fis-Soċjetà tal-Informazzjoni Ewropea]{Il-Malti fis-Soċjetà\newline tal-Informazzjoni Ewropea}
  139. \begin{multicols}{2}
  140. \subsection{Fatti Ġenerali}
  141. Il-Malti huwa l-lingwa nazzjonali tal-arċipelagu Malti, li jikkonsisti fil-gżejjer ta Malta, Għawdex u Kemmuna.
  142. Flimkien mal-Ingliż, il-Malti huwa wkoll l-ilsien uffiċjali ta Malta. Skont id-Demographic Review 2009 mill-Uffiċċju Nazzjonali tal-Istatistika ta Maltavi, l-istima tal-popolazzjoni Maltija (minbarra l-barranin) fl-aħħar tas-sena 2009 kienet 396,278. Huwa stmat li llum, minħabba l-fażijiet tal-emigrazzjoni minn Malta l-aktar fil-ħamsinijiet u s-sittinijiet, bejn wieħed u ieħor l-istess numru ta kelliema nattivi espatrijati jgħixu barra mill-pajjiż (l-aktar fir-Renju Unit, l-Awstralja, l-Istati Uniti u l-Kanada).
  143. Għalkemm il-Malti jappartjeni għall-fergħa Għarbija tan-Nofsinhar tal-familja lingwistika Semitika, huwa pjuttost differenti mill-ilsna neo-Għarbin l-oħra. L-istruttura tiegħu huwa r-riżultat ta sitwazzjonijiet ta kuntatt lingwistiċi differenti li ffurmaw taħt mexxejja differenti tal-gżejjer fil-kors ta' millennju. Filwaqt li l-qalba tal-Malti hija Semitika, fiha wkoll superstrat Rumanz u adstrat Ingliż. Barra minn hekk, il-Malti huwa l-uniku lsien Semitiku miktub bl-alfabett Latin (modifikat).
  144. Il-qalba Semitika tal-Malti tnisslet mill-konkwista Għarbija fit-870 AD u l-popolazzjoni mill-ġdid sussegwenti tagħha permezz ta nies li ġew jgħixu fMalta li jitkellmu bl-Għarbi. L-ewwel kuntatt dirett ma lingwi Rumanzi ġie stabbilit fl-1090 meta Malta nħakmet minn Normanni, li ġabu l-Isqalli magħhom, filwaqt li l-popolazzjoni kienet għadha qed tuża l-Għarbi vernakulari tagħha fil-ħajja ta kuljum. Malta kienet aktar u aktar maqtugħa politikament, kulturalment u lingwistikament mid-dinja Għarbija. Fis-sekli ta wara, taħt l-influwenza tal-ilsna Rumanzi tal-mexxejja, aktar u aktar kliem Rumanz misluf daħal fid-djalett Għarbi. Meta Malta kienet taħt il-ħakma Ingliża fl-1800, il-lingwa uffiċjali nbidlet mit-Taljan għall-Ingliż, li ġab miegħu numru dejjem akbar ta kliem Ingliż misluf fil-Maltivii. Il-sentenza li ġejja meħuda minn artiklu ta gazzetta (\emph{l-Orizzont} mis-7 ta Settembru, 1995; riprodott f \cite[p.~135]{Ambros:1998}) tista turi l-influwenzi differenti tal-ilsna f'kuntatt (kliem Rumanz misluf huwa b’tipa grassa, kliem mill-Ingliż sottolinjat):
  145. \begin{examples}
  146. \item Il-\underline{hold-up} sar minn żagħżugħ li kien liebes \textbf{nuċċali} \textbf{skur} tax-xemx.
  147. \end{examples}
  148. %\begin{examples}
  149. %\item
  150. %\gll Il-ħold-up sar minn żagħżugħ li kien liebes nuċċali skur tax-xemx.
  151. %the-hold-up happened from young.man that was wearing glasses dark of.the-sun
  152. %\glt [The robbery was committed by a young man who was wearing dark sunglasses.]
  153. %\glend
  154. %\end{examples}
  155. Wieħed mill-fatti notevoli dwar il-Malti huwa li minkejja n-numru relattivament żgħir tal-kelliema tiegħu u z-zona żgħira fejn hu mitkellem, hemm numru pjuttost rikk ta varjanti jew djaletti. Bmod ġenerali, distinzjoni ewlenija tista ssir bejn il-varjetà standard mitkellma fiz-zoni urbani bħall-Belt Valletta u tas-Sliema u varjetajiet mhux standard mitkellma fiz-zoni rurali. Barra minn Malta, il-Malti mitkellem fl-Awstralja żviluppa fetnolett uniku msejjaħ \emph{Maltraljan} \cite{Bovingdon:2001}. Dan huwa differenti mill-Malti Standard prinċipalment f'dak li huwa l-lessiku tiegħu (jiġifieri, il-vokabularju) li huwa r-riżultat ta’ kliem misluf b’mod estensiv mill- Ingliż (Awstraljan) u bidla sussegwenti fit-tifsira.
  156. Minħabba li l-Ingliż huwa t-tieni ilsien uffiċjali f'Malta, ħafna Maltin huma bilingwi. Bejn il-pilastri tal-monolingwiżmu u l-bilingwiżmu sħiħ, hemm sekwenza ta’ taħlit ta’ lingwi u codeswitching. Fid-dar u bejniethom, ħafna Maltin jitkellmu biss bil-Malti. L-Ingliż, min-naħa l-oħra, huwa l-lingwa użata fil-kuntest ta’ kitba f’edukazzjoni ogħla u fil-komunikazzjoni mal-barranin.
  157. \subsection{Partikolaritajiet tal-Lingwa Maltija}
  158. Il-Malti huwa l-uniku lsien Semitiku fl-Unjoni Ewropea u l-uniku lsien Semitiku miktub bl-alfabett Latin. L-alfabett Malti jagħmel użu minn xi grafemi speċjali li jvarjaw minn alfabetti oħra Latini (il-valuri tal-ħoss huma mogħtija fl-Alfabett Fonetiku Internazzjonali):
  159. %ċ [], ġ [], (il-biċċa l-kbira siekta), ħ [h], ż [z].
  160. ċ \lingua{}, ġ \lingua{}, (il-biċċa l-kbira siekta), ħ \lingua{h}, ż \lingua{z} \cite{Fabri:2011a,Borg-Alexander:1997}. Xi karatteristiċi partikolari tal-Malti huma:
  161. \begin{itemize}
  162. \item ordni tal-kliem ħielsa
  163. \item morfoloġija Semitika
  164. \item sistema temporali bbażata fuq l-aspett
  165. \item nuqqas ta infinittiv morfoloġiku
  166. \end{itemize}
  167. \boxtext{L-ordni tal-kliem huwa relattivament\\[.3ex] ħieles fis-sentenzi Maltin.}
  168. Anki jekk ma fihx trufijiet tal-każi, il-Malti għandu ordni ta kliem ħielsa ħafna. Is-sentenza \emph{Il-kelb gidem il-qattusa lbieraħ} għandha l-ordni tal-kliem S(uġġett) V(erb) O(ġġett) iżda tista wkoll tiġi espressa bħala:
  169. \begin{examples}\label{WO_no_clitics}
  170. \item
  171. \gll Ilbieraħ il-kelb gidem il-qattusa.
  172. yesterday {the-dog (m)} he.bit {the-cat (f)}
  173. \gln (SVO)
  174. \glt `Yesterday, the dog bit the cat.'
  175. \glend
  176. \item
  177. \gll Gidem il-qattusa l-kelb ilbieraħ.
  178. he.bit {the-cat (f)} {the-dog (m)} yesterday
  179. \gln (VOS)
  180. \glt `The dog bit the cat yesterday.'
  181. \glend
  182. \item
  183. \gll Il-qattusa gidimha l-kelb ilbieraħ.
  184. {the-cat (f)} he.bit.her {the-dog (m)} yesterday
  185. \gln (OVS)
  186. \glt `The cat, it was bitten by the dog yesterday.'
  187. \glend
  188. \end{examples}
  189. Kif it-traduzzjonijiet bl-Ingliż jippruvaw juru, l-ordnijiet tal-kliem differenti għandhom enfasi differenti fit-tifsira. Fl-ewwel żewġ eżempji, l-ordni tal-kliem mhux immarka, bl-oġġett wara l-verb. Fl-aħħar eżempju, l-oġġett \emph{il-qattusa} jippreċedi l-verb. Kif jissemma f' \cite[p.~140]{Fabri:1993}, din l-ordni tal-kliem hija mmarkata u tenfasizza l-oġġett għal kuntrast. Bl-oġġett quddiem, kelliema nattivi jippreferu jimmarkaw \emph{il-qattusa} bl-enklitika tal-oġġett \emph{-ha} mal-verb. Barra minn hekk, fid-diskors, dan il-kuntrast huwa espress b’intonazzjoni differenti. L-ordni tal-kliem fit-tieni eżempju (VOS) tista’ tintuża biex tesprimi tifsira kuntrastiva kif ukoll, b’intonazzjoni xierqa, tqiegħed l-enfasi fuq \emph{gidem il-qattusa}. Mingħajr din it-tifsira kuntrastiva (u mingħajr l-intonazzjoni kuntrastiva) l-enfasi tkun fuq il-fatt innifsu bħal: `Ma smajtx dak li ġara lbieraħ? Il-kelb gidem il-qattusa lbieraħ!' (Fabri, konverżazzjoni personali).
  190. \boxtext{Kliem Maltin jistgħu jinbidlu internament\\ matul inflessioni u derivazzjoni.}
  191. Bħala lsien Semitiku, il-Malti juri morfoloġija mhux konkatenattiva, jiġifieri inflettiva u l-forom ta kliem imnisslin jinbidlu internament:
  192. Flingwi bħall-Ingliż, il-forom tal-kliem huma magħmula minn zkuk u affissi, jiġifieri bmod konkatenattiv. Il-verb \emph{shoot} jista jkun ikkonjugat fit-terza persuna biż-żieda tal-affiss \emph{-s} maz-zokk bħal f \emph{(he) shoot-s}. Barra minn hekk, miz-zokk verbali jista jitnissel nom billi jiżdied l-affiss \emph{-er} bħal f\emph{shoot-er}. Għaldaqstant kemm l-inflessjoni kif ukoll id-derivazzjoni jseħħu mingħajr tibdil fl-istruttura interna, jiġifieri bmod konkatenattiv.
  193. Fil-Malti, hemm taħlita ta morfoloġija bbażata fuq iz-zokk morfemiku u morfoloġija bbażata fuq l-għerq u l-mudell. Fil-komponent Semitiku, l-``unità'' bażika fkelma ta spiss ma tkunx iz-zokk iżda l-għerq magħmul minn tliet (xi kultant erba) konsonanti fordni fissa li ġġorr magħha tifsira ġenerali. Zkuk tal-kliem bit- tifsira speċifika tagħhom huma ffurmati billi l-konsonanti jiġu organizzati skont ċertu mudell. Pereżempju, l-għerq \emph{k-t-b} iħaddan it-tifsira ta dak kollu marbut mal-``kitba''. F'dan li ġej, il-mudelli huma rappreżentati bħala numri \textbf{1,2,3} għall-konsonanti tal-għerq u \textbf{v} għall-vokali bejniethom, pereżempju \textbf{1v2v3}. Meta wieħed japplika l-mudell \textbf{1v2v3} u jimla l-pożizzjonijiet tal-vokali bejn il-konsonanti tal-għerq \textbf{1,2} u \textbf{3} bis-sekwenza tal-vokali \textbf{i-e}, wieħed jifforma l-verb \emph{kiteb}. L-inflessjoni ta’ dan verb għall-plural issir bit-twaħħil tal-affiss tal-plural \emph{-u}, li tagħti l-forma \emph{kitbu}. L-applikazzjoni tal-\textbf{1v22v:3} mal-għerq tagħti n-nom aġent \emph{kittieb}. L-inflessjoni tan-nom biż-żieda tal-affiss \emph{-a} tagħti l-plural \emph{kittieba}. Wieħed jinnota li s-suffiss tal-plural \emph{-a} jixbah lil markatur għall-femminili \emph{-a} sabiex \emph{kittieba} tista’ wkoll tirreferi għal kittieb femminili. Is-suffissi l-oħra Semitiċi Maltin tal-plural huma \emph{-in} bħal fi \emph{mħallef}, \emph{imħallfin}; \emph{-at/ -iet} bħal f’\emph{kittieba}, \emph{kittiebat}; \emph{-ijiet} bħal fi \emph{żmien}, \emph{żminijiet}.
  194. Nomi fil-plural fil-Malti jistgħu jiġu ffurmati wkoll bmod mhux konkatenattiv (l-hekk imsejħa forom ta plural miksur), jiġifieri l-ebda affiss ma jiżdied, iżda n-nom jinbidel internament, pereżempju \emph{ktieb} vs.~\emph{kotba}.
  195. Verbi mislufa llum huma importati l-aktar permezz ta klassi ta verbi speċjali li tista takkomoda zkuk mhux maħduma \cite{Mifsud:1995}. Pereżempju, iz-zokk \emph{park-} bl-Ingliż sar il-bażi tal-forom tal-verbi Maltin \emph{pparkjajt, pparkjat, pparkja}. Illum, din il-klassi ta verbi speċjali li qabel kienet klassi marġinali Semitika żdiedet fid-daqs minħabba l-influss ta verbi mislufa mill-Ingliż. Dan huwa produttiv ferm, ħafna drabi iwassal għal self \emph{ad-hoc} ta verbi bl-Ingliż li diġà għandhom kontroparti Semitika bil-Malti. Pereżempju 'to download (a file)' tista tiġi espress bl-użu tal-verb Semitiku \emph{niżżel} (oriġinarjament tfisser 'he caused to come down'). Meta wieħed jieħu z-zokk Ingliż \emph{download} u jimportah permezz tal-klassi tal-verb speċjali dan minflok jagħti forom bħal \emph{ddawnlowdjajt, ddawnlowdjat, ddawnlowdja}. Din l-istrateġija tiġi ta spiss ikkritikata li qed tikkorrompi l-lingwa \cite{Fabri:2011a}.
  196. \boxtext{Is-sistema temporali tal-Malti\\ hija bbażata fuq l-aspett.}
  197. Il-verbi bil-Malti huma aċċentwati għall-aspett, jiġifieri jekk azzjoni tkunx kompluta (perfettiva) jew mhux kompluta (mhux perfettiva) -- għal rapport sħiħ dwar tempi u aspetti fil-Malti, ara \cite{Fabri:1995,Ebert:2000}. Fin-nuqqas ta kwalunkwe markaturi grammatikali oħra, verbi perfettivi huma interpretati bħala ``it-temp tal-passat'' u verbi li mhumiex perfettivi bħala ``it-temp tal-preżent'': \emph{Andrew kiteb}; \emph{Andrew jikteb}. Il-kumbinazzjoni tal-verb li mhuwiex perfettiv ma kien, tesprimi passat abitwali: \emph{Andrew kien jikteb}. -żieda tal-kelma \emph{qed} `progressiva' (bħall-forma bl-Ingliż \emph{-ing}) tagħti \emph{Andrew kien qed jikteb} eċċ.
  198. Il-verbi Maltin ma għandhomx infinittivi morfoloġiċi. Għaldaqstant, fi predikati kumplessi bħal fis-sentenza bl-Ingliż `Andrew wants to write', iż-żewġ verbi huma morfoloġikament finiti: \emph{Andrew jrid jikteb} (litteralment: `Andrew he wants he writes') anki jekk min-naħa semantika, \emph{jikteb} mhuwiex finit.
  199. \subsection{Żviluppi riċenti}
  200. Bl-avvanz tal-Ingliż għal-lingwa internazzjonali u l-lingwa tat-teknoloġija wara t-Tieni Gwerra Dinjija, l-ammont ta kliem misluf mill-Ingliż fil-Malti kiber bmod sostanzjali. Ħafna minnhom saru ``nattivi'', jiġifieri ġew adottati fl-użu regolari tant li anki kliem meħud mis-Semitiku ma jistax jieħu posthom. Pereżempju, minflok il-kelma użata ta spiss ˜emph{ajruport} (mill-Ingliż \emph{airport}), il- kelma Semitika \emph{mitjar} kienet proposta (imnissla minn \emph{tar} `he flew'). Madankollu, din qatt ma ġiet aċċettata mill-komunità lingwistika. Min-naħa l-oħra, kliem misluf jidħol fil-lingwa pjuttost malajr, jiġi impurtat b’mod spontanju, anki jekk diġà hemm kliem Malti tajjeb għalihom (pereżempju \emph{ddawnlowdja} vs \emph{niżżel} `he downloaded'). Dan iqabbad biżgħat fost xi wħud li l-lingwa tista’ ssir ``korrotta'' \cite{Fabri:2011a}.
  201. Żvilupp ieħor riċenti għall-Malti huwa l-istatus tiegħu bħala lingwa uffiċjali tal-Unjoni Ewropea. Dan għandu kemm vantaġġi, kif ukoll żvantaġġi \cite{Fabri:2011a}. Minn naħa waħda, il-Malti finalment sar ilsien rikonoxxut internazzjonalment, status li ma kellux għal żmien twil, kien imwarrab bħala l-``lingwa tal-kċina'' fis-sekli ta qabel. Min-naħa l-oħra, it-tradutturi tal-UE Maltin qed jaffaċċjaw ċerti sfidi: bosta termini tekniċi u legali għad iridu jiġu ``ivvintati'' bil-Malti. Dan jirriżulta eventwalment fespansjoni lessikali tal-lingwa (tabilħaqq aspett pożittiv), li, madankollu, għandu jiġi kkoordinat minn korpus ċentrali sabiex tradutturi individwali ma joħolqux termini differenti għall-istess kunċetti indipendentement minn xulxin (li hija problema serja). Il-korp ċentrali biex jittratta din l-isfida huwa l-Kunsill Nazzjonali għall-Ilsien Malti.
  202. Żviluppi oħra fis-snin riċenti jikkonċernaw l-ortografija Maltija. Il-Malti (flimkien mal-Ingliż) sar l-ilsien uffiċjali ta Malta fl-1 ta Jannar, 1934 bl-ortografija maħruġa mill-Għaqda tal-Kittieba tal-Malti fl-1924. Minn dakinhar, l-ortografija għaddiet minn tliet reviżjonijiet (1984, 1992 u 2008).
  203. L-aħħar riforma ġiet rilaxxata fl-2008. L-għan tagħha kien li jitnaqqsu l-inċertezzi tal-kittieba li jirriżultaw minn numru konsiderevoli ta varjanti ortografiċi għal ċerti kliem. Kif id-dokument \emph{Deċiżjonijiet 1} \cite{Kunsill:2008a} tal-Kunsill jirrimarka, ammont kbir ta varjanti jista' jitnaqqas billi jinstab bilanċ konsistenti bejn l-ortografija grammatikali u fonetika. Għaldaqstant l-erba’ varjanti \emph{zobtu, zoptu, sobtu} u \emph{soptu} jistgħu jitnaqqsu għal żewġ varjanti għal \emph{zoptu} \lingua{'zɔp.} u \emph{soptu} \lingua{'sɔp.tʊ}. Għal raġuni simili, il-kelma \emph{skond} \lingua{skɔnt} `according to' nbidlet għal \emph{skont} minħabba li l-forom grammatikali tagħha l-oħra ma jiġġustifikawx ortografija b \emph{d} (imnissla minn \emph{secondo} Taljan), bħal pereżempju \emph{skontok} \lingua{'skɔn.tɔk} 'according to you'.
  204. Għat-tielet qasam (tal-kliem misluf), il-prinċipju jibqa li l-kliem misluf jinkiteb skont l-ortografija Maltija jekk dawn jitqiesu bħala ``nattivi'' u jekk dan ma joħloqx kunflitti fil-pronunzja jew ma regoli oħra tal-kitba Maltija. Madankollu, ħafna Maltin jippreferu li jiktbu kliem Ingliż misluf bl-ortografija oriġinali tagħhom, minħabba li drawhom. Fil-fatt, waqt seminar pubbliku dwar l-użu ta kliem Ingliż misluf f'April 2008, kien hemm diskussjonijiet emozzjonali fost l-udjenza fil-każ ta’ kliem bħal \emph{email} u l-ortografija l-ġdida tagħha propost bħala \emph{imejl}. Fatturi bħal drawwiet tal-komunità lingwistika jagħmlu l-istandardizzazzjoni tal-ortografija saħansitra aktar diffiċli milli li jinstab bilanċ bejn il-prinċipji grammatikali u fonetiċi \cite{Kunsill:2008b}.
  205. Dawn l-eżempji jagħtu biss idea żgħira tal-ħidma iebsa li l-Kunsill Nazzjonali għall-Ilsien Malti qed iwettaq bħala parti mill-kultivazzjoni tal-lingwa f'Malta. Is-sezzjoni li jmiss se tagħti ħarsa lejn l-istorja tal-kultivazzjoni tal-lingwa f'Malta.
  206. \subsection{Il-Kultivazzjoni tal-Lingwa f'Malta}
  207. Meta mqabbel ma lingwi oħra tal-Ewropa, l-istatus tal-Malti bħala lsien uffiċjali (mill-1934) fih innifsu huwa żvilupp riċenti. Għaldaqstant il-kultivazzjoni tal-lingwa wkoll bdiet tard.
  208. Għal sekli sħaħ, il-Malti kien biss il-mezz mitkellem tal-popolazzjoni Maltija u kien imwarrab meta mqabbel mal-lingwa uffiċjali rispettiva tal-mexxejja ta Malta. Dan beda jinbidel mal-moviment tal-lingwa ta nofs-/tmiem is-seklu 18 meta l-ewwel studji sistematiċi lingwistiċi tmexxew minn Agius de Soldanis (1750) u Mikiel Anton Vassalli (1797). Speċjalment Vassalli ppromwova l-ilsien Malti billi nkoraġġixxa l-użu tiegħu f'kull qasam tal-ħajja ta’ kuljum. It-traduzzjonijiet tal-bibbja ta’ Fortunato Panzavecchia f’nofs is-seklu 19 ikkontribwixxew għal aktar standardizzazzjoni tal-lingwa \cite{Kontzi:2005}. Barra minn hekk, permezz tal-pass lejn ortografija standardizzata fil-bidu tas-seklu 20, ittieħed pass importanti mill-fondazzjoni tal-Għaqda tal-Kittieba tal-Malti fl-1920. Is-sistema ortografika, li ġiet żviluppata minn din l-organizzazzjoni, saret l-ortografija uffiċjali ta’ Malta fl-1934 u, b'xi bidliet u żidiet, ilha tintuża minn dak -żmien.
  209. Fl-1964, wara li nkisbet l-indipendenza mill-Gran Brittanja, l-istatus tal-Malti bħala lingwa nazzjonali u bħala lingwa uffiċjali flimkien mal-Ingliż inkiteb fil-kostituzzjoni. Meta Malta ssieħbet fl-UE fl-2004, il-Malti sar ilsien uffiċjali tal-UE. Kif issemma fit-taqsima thawn fuq, dan wassal għal ċerti sfidi, li jistgħu jissolvew biss minn korpus li jikkoordina l-istandardizzazzjoni u l-prassi komuni fix-xogħol tat-traduzzjoni.
  210. Il-korp f'Malta biex jagħmel dan ix-xogħol huwa l-Kunsill Nazzjonali għall-Ilsien Malti. Dan twaqqaf fl-2005 bħala l-ewwel organizzazzjoni tal-gvern sabiex tittratta uffiċjalment kwistjonijiet lingwistiċi u ppjanar lingwistiku għal-lingwa Maltija. Il-kompiti tal-Kunsill huma, kif imniżżlin fl-Att tal-Ilsien Malti (ATT Nru V tal-2004): il-jippromwovi l-ilsien Malti, ``jadotta politika, pjan u strateġija lingwistika xierqa'' u jwettaq dan fil-prattika. Xogħol ieħor importanti tal-Kunsill huwa li jaġġorna l-ortografija Maltija u jiddeċiedi fuq ortografija korretta (jieħu f’idejh l-inkarigu mill-Akkademja tal-Malti u b’hekk ikun prinċipalment responsabbli għar-riforma għall-ortografija Maltija tal-2008). Fuq is-sitweb tiegħu, il-Kunsill joffri wkoll korsijiet ta’ taħriġ għall-qarrejja tal-provi u korsijiet tal-lingwa Maltija għall-barranin \cite{Kunsill1}.
  211. \columnbreak
  212. Qabel twaqqaf il-Kunsill, l-istandardizzazzjoni tal-ortografija kienet il-kompitu tal-Akkademja tal-Malti. Din oriġinat fl-1964 mill-Għaqda tal-Kittieba tal-Malti, li kienet il-korp li waqqaf l-ewwel ortografija uffiċjali fl-1924/1932. Illum l-għan prinċipali tal-Akkademja huwa li tippromwovi studji akkademiċi fil-lingwa u l-letteratura Maltija, tippromwovi l-użu tal-Malti fkull qasam tal-ħajja ta kuljum u tibni kuntatti mal-persuni li huma ħbieb tal-lingwa u li jużawha barra minn Malta \cite{Akkademja1}. L-Akkademja taħdem mill-qrib flimkien mal-Kunsill Nazzjonali għall-Ilsien Malti.
  213. Il-motivazzjoni wara l-Att tal-Ilsien Malti kienet l-idea li lingwa nazzjonali waħda li hija kondiviża mill-individwi kollha fi ħdan dak in-nazzjon tifforma l-bażi għall-identità kulturali u nazzjonali. Dan naturalment jeħtieġ standardizzazzjoni tal-lingwa. Fil-fatt, mill-moviment tal-kultivazzjoni tal-lingwa mis-seklu 19 sal-lum, il-Malti avvanza minn ilsien imwarrab u vernakulari kif kien qabel għal ilsien nazzjonali ta prestiġju għoli. Dan jidher ukoll fl-ammont dejjem jikber ta xogħlijiet letterarji bil-Malti matul l-istess perjodu ta żmien u fin-numru kbir ta organizzazzjonijiet influwenti u l-korpi għal-lingwa u l-letteratura Maltija (ara \cite{Fabri:2011a}.
  214. \subsection{Il-Lingwi fl-Edukazzjoni}
  215. Partikolarment f'soċjetà bilingwali bħal dik ta’ Malta, diversi aspetti għandhom rwol meta niġu għal-lingwa fl-edukazzjoni.
  216. Aspett wieħed huwa l-lingwa ta istruzzjoni, jiġifieri l-lingwa li tintuża uffiċjalment mill-għalliema matul il-lezzjonijiet fl-iskola jew fis-seminars fl-università.
  217. Fattur ieħor huwa l-lingwa użata f'ċerti kotba tal-iskola. Bl-Ingliż bħala l-lingwa tax-xjenzi teknoloġiċi u naturali, ħafna mill-kotba tal-iskola dwar dawn is-suġġetti huma bl-Ingliż. Fil-fatt, l-isforzi biex jiġu tradotti termini tekniċi u xjentifiċi għall-Malti ltaqgħu ma’ bosta problemi, waħda minnhom hija l-aċċettazzjoni mill-komunità tal-lingwa. Għaldaqstant is-suġġetti skolastiċi, ukoll, possibbilment jiddeterminaw il-lingwa ta’ istruzzjoni għal ċerti lezzjonijiet, għalkemm jista’ jkun ukoll li l-kotba tal-iskola bl-Ingliż (u t-terminoloġija bl-Ingliż li tinsab fihom) jintużaw waqt li l-lingwa tat-tagħlim tkun il-Malti.
  218. Madankollu aspett ieħor huwa l-lingwa użata mill-individwi. Kelliema bilingwi mhux biss jużaw lingwi differenti f'kuntesti soċjali differenti (``dominji''), eż.~il-Malti ma’ tal-familja fid-dar, l-Ingliż mal-barranin, il-Malti jew l-Ingliż matul il-lezzjonijiet tal-iskola eċċ. Dawn għandhom tendenza wkoll li jużaw iż-żewġ lingwi flimkien, jew iħalltu ż-żewġ lingwi (eż.~kliem bl-Ingliż jitħalltu f’konverżazzjoni li tkun qed issir bil-Malti) jew permezz tal-codeswitching (eż.~konverżazzjoni bil-Malti tinqaleb għall-Ingliż u lura għall-Malti, bil-partijiet tal-Ingliż ikunu akbar minn kliem biss waħedhom, iżda spiss jikkonsistu minn diversi sentenzi). Għaldaqstant anki matul il-lezzjonijiet tal-iskola li jiġu mgħallma b’lingwa waħda, il-konverżazzjonijiet bejn l-għalliema u l-istudenti jistgħu jaqilbu bejn il-lingwi \cite{Camilleri:1995}.
  219. Meta wieħed iżomm dawn it-tliet fatturi fmoħħu, wieħed jinduna li l-espożizzjoni attwali tal-istudenti għal-lingwa rispettiva fl-iskejjel jew fl-università hija xi ħaġa differenti mil-lingwa magħżula ta istruzzjoni.
  220. Rigward il-lingwa uffiċjali ta istruzzjoni fl-edukazzjoni, kemm il-Malti kif ukoll l-Ingliż jintużaw fl-iskejjel u fl-università, minħabba li l-Malti u l-Ingliż jaqsmu l-istatus bħala lingwi uffiċjali ta Malta. Fl-iskejjel, it-tnejn li huma jiġu mgħallma bħala suġġetti minn età bikrija. Liema lingwa tintuża bħala l-lingwa ta istruzzjoni jiddependi mit-tip ta skola. Skejjel privati għandhom it-tendenza li jużaw l-Ingliż aktar mill-Malti (xi kultant bmod aktar estensiv), filwaqt li fl-iskejjel Maltin tal-istat il-Malti huwa kemxejn preferut mill-Ingliż. L-iskejjel tal-knisja għandhom il-preferenzi individwali tagħhom jiġifieri li xi wħud tradizzjonalment jippreferu lingwa waħda minn oħra.
  221. \columnbreak
  222. Kif issemma qabel, il-biċċa l-kbira tal-kotba tax-xjenza li jintużaw fl-iskola huma bl-Ingliż. Għaldaqstant, bl-introduzzjoni ta aktar u aktar suġġetti xjentifiċi aktar il quddiem fl-iskola u iżjed fl-università, l-istudenti huma esposti għal żewġ lingwi fl-istess ħin, li jintużaw f'sitwazzjonijiet differenti: jista’ jkollhom il-lezzjonijiet tagħhom mgħallma bil-Malti, iżda jaqraw il-kotba tagħhom u jiktbu l-essays tagħhom bl-Ingliż. Speċjalment għal studenti tal-università, il-konverżazzjonijiet bejniethom, mal-ħbieb u l-lecturers spiss iseħħu bil-Malti, xi kultant jużaw il-code-switching/iħalltu bejn il-Malti jew ikunu saħansitra bl-Ingliż biss (tal-aħħar pereżempju ma’ studenti internazzjonali jew lecturers).
  223. Madankollu, fid-dar mal-familja tagħhom u l-ħbieb, ħafna Maltin jitkellmu bil-Malti, xi wħud iħalltu l-lingwi u ftit familji biss jitkellmu bl-Ingliż biss.
  224. Kif jidher mill-eżempji ta hawn fuq, minkejja l-fatt li kemm il-Malti kif ukoll l-Ingliż jintużaw bħala lingwi fl-edukazzjoni, hemm distribuzzjoni ċara meta niġu għall-użu tagħhom fis-soċjetà. Sciriha u Vassallo (2001, p.~29, iċċitati f \cite{Fabri:2011a}) isemmu li ``70\% ta dawk li wieġbu qalu li jużaw il-Malti fuq ix-xogħol, filwaqt li 90\% qalu li jikkomunikaw mal-membri tal-familja tagħhom fid-dar bil-Malti. ...~il-persentaġġi għall-Malti mitkellem huma għolja ħafna iżda jonqsu fħiliet oħra bħall-qari u l-kitba.''
  225. Din id-distribuzzjoni tal-Malti li qed jintuża prinċipalment bħala l-mezz mitkellem u l-Ingliż bħala l-mezz tal-kitba toħloq ċertu riskju, minħabba li jista jkollha impatt fuq il-ħiliet differenti tal-kelliema nattiva tagħha fdak li għandu xjaqsam ma taħdit, qari jew kitba. Sabiex wieħed jagħti r-raġunijiet għal dan il-fatt, wieħed għandu jħares lejn il-karatteristiċi bażiċi tal-lingwa mitħaddta u dik miktuba.
  226. Bmod ġenerali, it-testi miktubin jiddistingwu ruħhom mid-diskors fnumru ta modi. Li għandhom komuni huwa li t-tnejn huma modi ta trasferiment ta informazzjoni bejn -żewġ partijiet, jiġifieri l-kelliem u min qed jisma, u l-kittieb u l-qarrej, rispettivament. Madankollu, huma differenti fil-mod kif l-informazzjoni tgħaddi bejniethom. Fi kliem sempliċi, test miktub, kuntrarju għal diskors, iseħħ barra minn sitwazzjoni komunikattiva, interattiva u konkreta. Minn naħa, id-diskors jiddependi fuq l-interazzjoni bejn il-kelliem u min qed jisma. Il-kelliem irid jibni struttura tal-informazzjoni bċertu mod. Dan huwa importanti minħabba l- memorja limitata u qasira tal-bniedem: min qed jisma fil-konverżazzjoni jista jassorbi ċertu ammont ta informazzjoni biss qabel ma jkollu jinterrompi u jsaqsi lill-kelliem biex jiżgura li fehem.
  227. Test miktub, min-naħa l-oħra, mhuwiex interattiv safejn il-qarrej ma jistax jitlob għal aktar informazzjoni speċifika. Madankollu huwa jista jara x'hemm 'il quddiem u lura fit-test (xi ħaġa li min qed jisma ma jistax jagħmilha fid-diskors). B'dan il-mod, it-test innifisu miktub iservi bħala memorja fit-tul għall-qarrej. Għaldaqstant, test miktub jistruttura l-informazzjoni b’mod differenti minn kif isir f’konverżazzjoni mitħaddta. Pereżempju, test għandu jipprovdi aktar informazzjoni ta’ sfond sabiex jagħti bażi komuni lill-qarrej qabel tibda għaddejja l-informazzjoni attwali. Din ma tkunx problema, jekk it-test jista’ jservi bħala memorja fit-tul għall-qarrej. Fil-fatt, dan jippermetti struttura aktar elaborata mid-diskors, jiġifieri normalment ikun fih sentenzi itwal u ammont ogħla ta’ propożizzjonijiet subordinati.
  228. Din id-distinzjoni tar-reġistru (jiġifieri ``l-istil tal-lingwa'') hija dik li fil-letteratura, .~\cite{Biber:1991}, issejħet strutturi ta testi \emph{orali} versus \emph{letterati}. Tabilħaqq, test jista jkun miktub f'reġistru orali li jixbah konverżazzjonijiet mitħaddta (eż.~f’forum ta’ iċċettjar jew posta elettronika informali). Iżda dan mhuwiex ir-reġistru normalment użat pereżempju f'essays. Idealment, il-kelliema nattivi jiksbu r-reġistru letterat diġà minn età żgħira, .~permezz tal-ġenituri tagħhom li jaqrawlhom l-istejjer. Aktar tard fl-iskola, dan l-għarfien jissaħħaħ minn, pereżempju, eżerċizzju attiv tal-kitba tal-essays.
  229. \columnbreak
  230. Reġistru letterat jiżviluppa maż-żmien flingwa bi tradizzjoni letterarja. Il-Malti, meta mqabbel mal-istorja qasira tiegħu bħala lsien uffiċjali miktub (mill-1934) għandu storja letterarja twila u rikka. Anki jekk l-eqdem letteratura skoperta hija skarsa ħafna (\emph{Il Cantilena} minn Pietro Caxaro, li tmur lura għal madwar l-1450), tradizzjoni letterarja bdiet tifforma madwar l-erbgħinijiet fis-seklu 17. Fis-seklu 19, l-ammont ta letteratura bil-Malti kienet qed tikber \cite{Fabri:2011a}, u flimkien magħha, il-Malti kien qed jespandi. Illum huwa lsien li għandu reġistru letterat komplut.
  231. Madankollu, dan ir-reġistru, jeħtieġ li jiġi pprattikat sabiex jinżamm l-istatus tal-lingwa bħala lingwa kemm konverżazzjonali kif ukoll letterarja. It-tendenza fl-edukazzjoni ogħla biex jinkitbu essays aktar bl-Ingliż milli bil-Malti, mill-anqas teoretikament, toħloq ir-riskju li l-Malti jibqa jintuża freġistru orali biss. Ammont ogħla ta websajts Maltin tal-ġeneri kollha huwa mixtieq biex wieħed ikopri ż-żewġ reġistri u s-sottotipi tagħhom u jiġi żgurat status stabbli tal-lingwa fir-rikkezza kollha tagħha.
  232. \subsection{Aspetti internazzjonali}
  233. Meta wieħed iżomm fmoħħu t-taqsimiet preċedenti, issa għandu jkun mifhum li l-aspetti internazzjonali tal-Malti huma pjuttost differenti minn lingwi oħra. Banqas minn miljun kelliema nattivi madwar id-dinja, il-Malti huwa kkunsidrat bħala lingwa ``mitkellma anqas''. Fl-istorja tiegħu, il-Malti ma kienx l-ilsien tal-okkupanti iżda wieħed ta dawk li qed jokkupaw il-post. Bħala riżultat ta dan, il-Malti qatt ma kien meqjus bħala lingwa internazzjonali jew lingua franca kif kien il-każ .~tal-Latin, l-Ispanjol, il-Portugiż jew l-Ingliż, li kollha huma l-lingwi tal-konkwistaturi. Il-Malti tabilħaqq infirex lejn pajjiżi oħra, fejn għadu mitkellem sal-lum (l-Awstralja, il-Kanada, l-Istati Uniti u r-Renju Unit), iżda bħala lingwa tal-komunità biss. Kien jeħtieġ kważi 200 sena mill-ewwel interess tal-grammatiċi Maltin fil-lingwa tagħhom sakemm eventwalment kiseb l-istatus ta lingwa uffiċjali. Saħansitra dakinhar, il-lingwa uffiċjali l-oħra, l-Ingliż, serviet bħala l-lingwa għar-relazzjonijiet internazzjonali.
  234. Il-bidla biex il-Malti jsir ilsien internazzjonalment viżibbli seħħet mas-sħubija ta Malta fl-UE fl-2004. Minn dakinhar, il-Malti huwa lsien uffiċjali fl-Unjoni Ewropea, flimkien bil-benefiċċji u l-isfidi kollha li huma marbutin ma dan l-istatus.
  235. Akkademikament, l-interess fil-Malti bħala suġġett tax-xjenza jmur lura sal-1603 meta Hieronymus Megiser ippubblika t-\emph{Thesaurus Polyglottus} tiegħu, li kien jinkludi lista ta kliem bil-Malti. L-ewwel studjuż li bmod sistematiku esplora u ppromwova l-lingwa Maltija kien Mikiel Anton Vassalli. Huwa ppubblika grammatika (1790), dizzjunarju (1797) u alfabetti diversi (1788 u 1790) għall-Malti u llum huwa msejjaħ ``il-missier tal-Ilsien Malti'' \cite{Brincat:2011}. Fis-seklu 20, kien ippubblikat il-\emph{Grammar of the Maltese Language} (1936) ta Sutcliffe. Mis-sittinijiet tas-seklu 20, il-Lingwistika tal-Ilsien Malti kisbet għarfien akkademiku internazzjonali permezz tal-pubblikazzjonijiet ta Joseph Aquilina (.~\emph{Papers in Maltese Linguistics} (1961) u \emph{Maltese-English Dictionary}, żewġ volumi (1987 and 1990)). Minn dak -żmien, aktar u aktar studjużi barra minn Malta wrew interess fil-Malti. L-2007 rat it-twaqqif tal-Għaqda Internazzjonali tal-Lingwistika Maltija \cite{GHILM1}, assoċjazzjoni ta lingwisti li huma interessati fil-lingwa Maltija. L-għan ewlieni tal-GĦILM, kif jidher fuq is-sitweb tagħha, huwa li tipprovdi ``konnessjoni bejn studjużi interessi fil li ġejjin mid-dixxiplina kollha tal-Lingwistika'', b'hekk tiffaċilita r-riċerka dwar il-Malti. Din l-għaqda trid ukoll li tgħaqqad flimkien nies minn sfondi differenti li jaħdmu bl-ilsien Malti (lingwisti, tradutturi, studenti u oħrajn).
  236. \subsection{Il-Malti fuq l-Internet}
  237. Stħarriġ tal-Uffiċċju Nazzjonali tal-Istatistika ta Malta fit-tieni kwart tal-2009 \cite{NSO2} juri li fost il-popolazzjoni ta madwar 400,000 ruħ, 67 fil-mija kellhom aċċess għall-kompjuter u 64 fil-mija kellhom aċċess għall-internet. Stħarriġ riċenti tal-Ewrobarometru (ippubblikat f'Mejju tal-2011) \cite{Eurobarometer1} dwar id-drawwiet tat-tiftix fuq l-internet fost l-utenti Ewropej wera li huma biss 6.5 fil-mija tal-utenti tal-internet Maltin li jużaw esklussivament il-Malti fuq l-internet meta jaqraw, jikkunsmaw il-kontenut jew jikkomunikaw. Minflok, 90.6 fil-mija jagħżlu li jibbrawżjaw is-websajts bl-Ingliż u 20.1 fil-mija bit-Taljan, rispettivament. Dawn iċ-ċifri iffurmaw il-bażi tal-artiklu fil-gazzetta Maltija li toħroġ kuljum \emph{The Times of Malta}, li ħoloq diskussjoni interessanti l-aktar fost il-qarrejja Maltin tal-edizzjoni fuq l-internet \cite{TimesOfMalta1}.
  238. Madankollu, ir-riżultati eżatti tal-istħarriġ, iwasslu għall-konklużjoni li din l-abitudni mhix għażla maħsuba: Meta mistoqsija liema lingwa l-Maltin jikkunsidraw bħala l-lingwa materna tagħhom, 89.5 fil-mija ta dawk li wieġbu qalu li l-Malti kien l-ilsien nattiv tagħhom (meta mqabbel ma 7.6 fil-mija biss għall-Ingliż u 0.2 fil-mija għat-Taljan).
  239. Il-lingwi l-oħra barra dik użata minn dawk li wieġbu biex jaqraw jew jaraw kontenut fuq l-internet kienu l-Ingliż (90.6 fil-mija) u t-Taljan (20.1 fil-mija). 6.5 fil-mija biss wieġbu li jużaw il-lingwa tagħhom, li mhuwiex fatt sorprendenti, minħabba li ħafna Maltin huma bilingwali bil-Malti u bl-Ingliż u numru konsiderevoli jitkellmu bit-Taljan ukoll.
  240. Fdak li għandu xjaqsam ma kitba fuq l-internet, in-numri favur il-Malti huma ogħla meta l-utenti jaqraw jew jaraw kontenut: 87 fil-mija qalu li jużaw il-Malti, 85 fil-mija l-Ingliż u 8 fil-mija t-Taljan.
  241. Ir-raġuni li l-maġġoranza jużaw l-Ingliż bħala l-lingwa biex jikkunsmaw kontenut fuq l-internet tista tkun sempliċiment in-numru limitat ta websajts bil-Malti minflok il-preferenza għall-Ingliż fiha nnifisha. Niftakru li ħafna minn dawk li wieġbu ma jikkunsidrawx l-Ingliż bħala l-lingwa tagħhom u li l-użu tal-Malti żdied meta ġie prodott kontenut fuq il-web, anki jekk dan l-użu tal-Malti fil-maġġoranza tal-każijiet iseħħ fforums ta iċċettjar u pjattaformi soċjali, għalhekk fi stil ta lingwaġġ tat-taħdit, jiġifieri fir-reġistru orali.
  242. Karatteristika partikolari dwar il-Malti użat mill-ġenerazzjoni żagħżugħa fi pjattaformi soċjali u forums ta iċċettjar hija l-ortografija fonetika tagħha, mingħajr il-karattri bħal \emph{} siekta u l-\emph{h}. Għaldaqstant \emph{għax} tinkiteb \emph{ax}, \emph{tiegħi} \emph{tiei} eċċ. Ir-raġuni għal dan tista tkun l-introduzzjoni tard tal-karattri speċjali tal-Malti fid-dinja tal-PC. Minkejja li l-Malti ġie implimentat fil-qafas tal-Unicode mill-bidu tiegħu, kompjuters u sistemi ta operazzjoni segwew ħafna aktar tard. L-Awtorita Maltija tal-Istandards ħarġet forma standardizzata ta tastiera Maltija fl-2002, u s-sistema ta operazzjoni Windows tal-Microsoft kienet disponibbli fil-verżjoni tal-lingwa Maltija fl-2006 biss (mal-Windows XP). Fil-każ ta telefonijiet ċellulari, l-ittri speċjali Maltin għadhom ma ġewx implimentati. Għaldaqstant naraw jekk l-ortografija \emph{ad hoc} tal-forums tal-iċċettjar hix se twitti t-triq għal ortografija bkarattri speċjali ladarba dawn ikunu disponibbli fuq it-telefonijiet ċellulari jew jekk din l-ortografija fonetika se tkompli teżisti bħala ``soċjolekt'' tal-ġenerazzjoni żagħżugħa \cite{Fabri:2011b}.
  243. Fir-rigward tal-ammont ta Malti fuq l-internet bmod ġenerali, huwa diffiċli biex toħroġ bnumri eżatti, mhux l-anqas minħabba l-għadd ta websajts qed jinbidel kontinwament. Iżda hemm fatturi oħra li jagħtu idea dwar l-ammont ta Malti fuq l-internet meta mqabbel ma lingwi oħra.
  244. L-ewwel ħarsa lejn l-ammont ta daħliet fil-Wikipedia (fl-1 ta Ġunju, 2011) wera li kien hemm madwar 2,820 daħla bil-Malti bkuntrast ma aktar minn 3,640,000 daħla bl-Ingliż u aktar minn 1,238,000 daħla bil-Ġermaniż.
  245. Meta wieħed iqabbel in-numru tad-Dominju tal-Ogħla Livell (TLD), it-TLD .mt jokkupa l-pożizzjoni 213 (minn 358) bnumru mhux speċifikat ta dominji .mt rreġistrati (membru taċ-Ċentru Informazzjoni tan-Network ta' Malta ta stima ta’ madwar 5,000), imqabbel ma’ 21,336,063 dominju irreġistrat għal .com (kummerċjali, klassifikazzjoni 1) u 5,459,604 dominji għad- .de (il-Ġermanja, klassifikazzjoni 2). Naturalment, in-numru ta’ dominji rreġistrati ma jgħid xejn dwar il-lingwa li l-paġni taħt ċertu dominju huma miktubin biha.
  246. Xi numri approssimattivi tal-ammont tal-lingwa Maltija fuq l-internet jistgħu jiġu kkalkulati permezz ta proċedura proposta minn \cite{Kilgarriff-Grefenstette:2003} (L-awturi huma obbligati lejn Dr Albert Gatt (L-Istitut tal-Lingwistika, L-Università ta' Malta) talli ġibed l-attenzjoni tagħhom għal dan id-dokument.). L-idea bażika hija li kliem funzjonali (eż.~iżda, għal, dan eċċ.) huma aktar frekwenti minn kliem ta’ kontenut (eż.~nomi, verbi, aġġettivi) u jiffurma sett finit fil-lingwa. Barra minn hekk, il-persentaġġ ta’ kliem funzjonali f'lingwa jkun stabbli f'kampjun ta’ test meta d-daqs tal-kampjun jiżdied (il-Liġi Zipf). Għaldaqstant, wieħed jista’ jikkalkula l-ammont ta’ kliem għal kull lingwa fuq l-internet kif ġej:
  247. %Xi numri approssimattivi tal-ammont tal-lingwa Maltija fuq l-internet jistgħu jiġu kkalkulati permezz ta proċedura proposta minn \cite{Kilgarriff-Grefenstette:2003,Gatt1mt}. L-idea bażika hija li kliem funzjonali (. iżda, għal, dan eċċ.) huma aktar frekwenti minn kliem ta kontenut (. nomi, verbi, aġġettivi) u jiffurma sett finit fil-lingwa. Barra minn hekk, il-persentaġġ ta kliem funzjonali f'lingwa jkun stabbli f'kampjun ta test meta d-daqs tal-kampjun jiżdied (il-Liġi Zipf). Għaldaqstant, wieħed jista jikkalkula l-ammont ta kliem għal kull lingwa fuq l-internet kif ġej:
  248. L-ewwel nett, wieħed jikkalkula l-ammont ta kliem funzjonali magħżula bil-Malti fkorpus (jiġifieri ġabra ta testi) li d-daqs ikun magħruf. It-tieni nett, wieħed juża magna ta tiftix (.~Google) biex isib il-frekwenza għall-istess kliem funzjonali fuq il-web. Fit-tielet pass, il-frekwenza min-numru tal-korpus tiġi estrapolata għall-Google Search u mbagħad medja tiġi kkalkolata għall-frekwenza tal-kliem funzjonali fir-rizultati ta tiftix.
  249. \begin{figure*}[p]
  250. \setlength{\tabcolsep}{2.5em}
  251. \begin{tabularx}{\textwidth}{lrrr} \toprule\addlinespace
  252. Kelma & f/m & Google (.mt biss, Reġjun=mt) & Estrapolazzjoni \\ \addlinespace\midrule\addlinespace
  253. għal & 3730.96 & 94,300 & 25,274,996 \\
  254. qed & 4770.79 & 118,000 & 24,733,849 \\
  255. minn & 4833.58 & 173,000 & 35,791,276 \\
  256. kien & 4073.83 & 93,800 & 23,025,015 \\
  257. biex & 5276.78 & 179,000 & 33,922,202 \\
  258. dan & 6412.28 & 434,000 & 67,682,634 \\
  259. kienet & 1452.42 & 116,000 & 79,866,705 \\
  260. kienu & 1465.56 & 135,000 & 92,114,959 \\
  261. kont & 521.43 & 34,200 & 65,588,861 \\
  262. konna & 301.39 & 19,400 & 64,368,426 \\
  263. jekk & 2776.8 & 72,100 & 25,965,140 \\
  264. mhux & 2101.32 & 79,500 & 37,833,362 \\ \addlinespace\midrule\addlinespace
  265. Medja & & & 48,013,952 \\ \addlinespace\bottomrule
  266. \end{tabularx}
  267. \caption{Tfittxija bil-Google, ristretta għad-dominju .mt u r-reġjun ta Malta}
  268. \label{table:Google_A_mt}
  269. \end{figure*}
  270. \begin{figure*}[p]
  271. \setlength{\tabcolsep}{3.1em}
  272. \begin{tabularx}{\textwidth}{lrrr} \toprule\addlinespace
  273. Kelma & f/m & Google (.mt biss) & Estrapolazzjoni \\ \addlinespace\midrule\addlinespace
  274. għal & 3730.96 & 1,340,000 & 359,156,892 \\
  275. qed & 4770.79 & 966,00 & 202,482,188 \\
  276. minn & 4833.58 & 1,240,000 & 256,538,632 \\
  277. kien & 4073.83 & 3,100,000 & 760,954,679 \\
  278. biex & 5276.78 & 6,530,000 & 1,237,497,110 \\
  279. dan & 6412.28 & 3,980,000 & 620,684,062 \\
  280. kienet & 1452.42 & 665,000 & 457,856,543 \\
  281. kienu & 1465.56 & 436,000 & 297,497,202 \\
  282. kont & 521.43 & 450,000 & 863,011,334 \\
  283. konna & 301.39 & 81,600 & 270,745,546\\
  284. jekk & 2776.8 & 1,120,000 & 403,341,976 \\
  285. mhux & 2101.32 & 1,040,000 & 494,926,998 \\ \addlinespace\midrule\addlinespace
  286. Medja & & & 518,724,430 \\ \addlinespace\bottomrule
  287. \end{tabularx}
  288. \caption{Tfittxija bil-Google, ristretta għad-dominju .mt biss}
  289. \label{table:Google_B_mt}
  290. \end{figure*}
  291. Xi restrizzjonijiet ta dan il-metodu għandhom jissemmew: L-ewwel nett, in-numri miksuba bdan il-metodu huma biss paġni mtellgħin. Pereżempju, 94,300 paġna tal-Google għall-kelma \emph{għal} mhumiex 94,300 każ tal-kelma fuq l-internet, iżda 94,300 paġni fuq il-web li fihom il-kelma \emph{għal} mill-anqas darba. It-tieni, it-tfittxija ssib biss paġni fuq il-web li għandhom URL individwali \cite{Kilgarriff-Grefenstette:2003}. Paġni li huma aċċessibbli biss permezz ta interface tal-web mhumiex miksuba bit-tiftix bl-internet. It-tielet, magna ta tiftix tfittex biss għal sensiela ta ittri irrispettivament mill-ambjent tagħha fuq il-paġna tal-web. Din ma tagħmel l-ebda ġudizzju dwar jekk ċerta sensiela ta ittri hijiex tabilħaqq kelma ta lingwa.
  292. Il-metodu deskritt aktar kmieni, applikat għall-kliem funzjonali Maltin, jiġġenera stimi differenti għall-Malti. Għas-websajts mad-dominju .mt li jinsabu f'Malta, id-daqs stmat huwa ta’ 50 miljun kelma, filwaqt li għas-websajts bid-dominju .mt fir-reġjuni kollha id-daqs huwa 500 miljun kelma. Ir-raġuni għal din id-differenza hija li ħafna dominji .mt huma riżervati għal servers barra minn Malta.
  293. Ir-riżultati eżatti tat-tfittxijiet fil-Google (imwettqa fit-8 ta Lulju, 2011) u l-estrapolazzjoni tagħhom tista tiġi ttraċċata lura fil-Figura~\ref{table:Google_A_mt} u l-Figura~\ref{table:Google_B_mt} hawn taħt. Il-kolonna f/m (jiġifieri frekwenza għal kull miljun) tidentifika kemm il-kelma rispettiva sseħħ ta spiss fmiljun, fil-korpus MLRS. Pereżempju, fil-Figura~\ref{table:Google_A_mt}, il-kelma \emph{għal} for tidher kważi 3731 darba fmiljun kelma. It-tiftix tal-Google għall-kelma \emph{għal} tirriżulta f94,300 paġna bmill-anqas okkażjoni waħda ta \emph{għal} fuq paġna web taħt id-dominju .mt f'Malta. Multiplikazzjoni ta’ miljun u diviżjoni b’3730.96 jagħti ammont stmat ta’ 25,274,996 każ ta’ kull kelma Maltija fuq il-paġni taħt id-dominju .mt ġewwa Malta. Jekk wieħed jagħmel dan il-kalkolu għall-kliem l-ieħor fil-figura u jsib il-medja tar-riżultati, wieħed jasal għal numru ftit anqas minn 50 miljun kelma. Għall-paġni web madwar id-dinja kollha elenkati taħt id-dominju .mt, ir-riżultati huma għaxar darbiet ogħla.
  294. Naturalment, għal studju serju, din it-tfittxija u l-estrapolazzjoni jkollhom jinkludu aktar kliem biex jaslu għal numri aktar affidabbli għall-ammont ta Malti fuq l-internet. Iżda meta wieħed iqabbel ir-riżultati mat-Tabella 3 f \cite{Kilgarriff-Grefenstette:2003}, wieħed jista jgħid li ż-żewġ numri huma baxxi ħafna: għall-paġni web fMalta biss, l-ammont huwa aktar mil-Latvjan u anqas mill-Iżlandiż għaxar snin ilu (in-numri fit-Tabella 3 ġew ikkalkulati fMarzu 2001). Għall-paġni web dinjin, l-ammont tal-Malti huwa aktar mill-Ungeriż u anqas miċ-Ċek għaxar snin ilu. Minħabba li ``l-proporzjon ta testi mhux bl-Ingliż għall-Ingliż qed jikber'' \cite{Kilgarriff-Grefenstette:2003}, il-Malti jista jkun saħansitra rappreżentat anqas fuq l-internet illum mil-lingwi li għadhom kemm issemmew.
  295. Apparti minn paġni ewlenin privati u weblogs, hemm numru ta websajts uffiċjali bil-Malti. L-ewwel nett, hemm il-paġna ewlenija tal-gvern Malti \cite{GovernmentOfMalta1}, li hija disponibbli kemm bil-Malti kif ukoll bl-Ingliż. Barra minn hekk, hemm l-edizzjonijiet tal-internet tal-gazzetti ta kuljum u ta kull ġimgħa bil-lingwa Maltija: \emph{In-Nazzjon}, \emph{L-Orizzont} (kuljum), \emph{Illum}, \emph{Il-ĠENSillum}, \emph{KullĦadd}, \emph{Leħen is-Sewwa}, \emph{It-Torċa} (kull ġimgħa).
  296. Is-websajts tat-TV Malti u l-istazzjonijiet tar-radju juru taħlita tal-Ingliż u l-Malti fi gradi differenti. Pereżempju, is-websajts tal-istazzjonijiet tat-televiżjoni NET TV \cite{NetTV1} u One TV \cite{OneTV1} għandhom qafas bl-Ingliż, flimkien ma xi artikli bil-Malti, minkejja li l-programm tagħhom jinkludi titli kemm bil-Malti kif ukoll bl-Ingliż. L-istazzjon tar-radju tal-knisja RTK \cite{RTK1} (Malti u Ingliż) jippermetti lill-utent jagħżel bejn -żewġ lingwi. Is-sitweb tal-Public Broadcasting Services (PBS) \cite{PBS1} fih sezzjonijiet bl-Ingliż u taqsimiet bil-Malti kif għandu s-sitweb tar-Radju 101 \cite{radio101}. Din it-taħlita bejn Ingliż u Malti tirrifletti l-użu tal-lingwa fil-ħajja ta kuljum. Madankollu, fil-programmi, is-sitwazzjoni hija aktar ċara, minħabba li l-\emph{Awtorità tax-Xandir ta Malta} ħarġet linji gwida stretti għall-użu tal-Malti fuq it-TV u r-radju. Skont dawn, il-preżentaturi għandhom jitkellmu bil-Malti jew bl-Ingliż u mhux jaqilbu bejn -żewġ lingwi \cite{Fabri:2011a}. Għaldaqstant il-programmi tal-istazzjonijiet jinkludu xandiriet bil-Malti biss u oħrajn bl-Ingliż biss. Dawn ikunu wkoll spiss disponibbli fuq l-internet, jew bħala live stream inkella podcasts.
  297. Barra minn Malta, kollezzjoni kbira ta testi bil-Malti tinsab fi ħdan il-EUR-Lex \cite{EURLex1} li tospita l-liġi uffiċjali u dokumenti oħra tal-Unjoni Ewropea mill-1951 fit-23 lingwa uffiċjali tagħha.
  298. Ħafna jekk mhux id-dokumenti kollha tal-web disponibbli bmod miftuħ jintużaw fi proġetti tal-korpus, .~il-\emph{JRC-Acquis Multilingual Parallel Corpus} \cite{JRC-Acquis1}, li huwa korpus parallel li fih testi sħaħ tal-Liġi tal-Unjoni Ewropea fi 22 lingwa. Korpus ieħor li fih numru dejjem jikber ta dokumenti tal-web viżibbli bil-Malti huwa l-korpus tal-MLRS (Server għar-Riżorsi Lingwistiċi bil-Malti) \cite{MLRS1}.
  299. \end{multicols}
  300. \clearpage
  301. % --------------------------------------------------------------------------
  302. \ssection[Appoġġ ta Teknoloġija Lingwistika għall-Malti]{Appoġġ ta Tek\-no\-lo\-ġi\-ja Ling\-wis\-ti\-ka għall-Malti}
  303. \begin{multicols}{2}
  304. Teknoloġiji lingwistiċi huma teknoloġiji ta informazzjoni li huma speċjalizzati biex jittrattaw il-lingwa umana. Għalhekk dawn it-teknoloġiji huma wkoll ta spiss ikorporati taħt it-terminu Teknoloġija Lingwistika Umana. Il-lingwa umana sseħħ fil-forma mitkellma u miktuba. Filwaqt li t-taħdit huwa l-eqdem u l-aktar mod naturali ta komunikazzjoni lingwistika, l-informazzjoni kumplessa u l-biċċa l-kbira tal-għarfien uman jinżammu u jiġu trażmessi ftesti miktubin. Teknoloġiji ta taħdit u testi jipproċessaw jew jipproduċu dawn -żewġ modi ta lingwa bl-użu ta dizzjunarji, regoli grammatikali u semantika. Dan ifisser li t-teknoloġija lingwistika (LT) jorbot il-lingwa mal-forom diversi tal-għarfien, indipendentement mill-midja (tat-taħdit inkella tat-testi) li fihom huma espressi. Figura~\ref{fig:ltincontext_mt} turi l-pajsaġġ grafiku tat-Teknoloġija Lingwistika.
  305. Fil-komunikazzjoni tagħna aħna nħalltu l-lingwa ma modi oħra ta komunikazzjoni u mezzi oħra ta informazzjoni per eżempju, aħna norbtu t-taħdit ma ġesti u espressjonijiet tal-wiċċ. Testi diġitali jingħaqdu ma stampi u ħsejjes. Il-films jistgħu jinkludu lingwa fforma mitkellma u miktuba. Għaldaqstant teknoloġiji ta taħdit u testi jikkoinċidu u jinteraġixxu ma ħafna teknoloġiji oħra ta komunikazzjoni multimodali u ta' multimidja.
  306. %
  307. F'din it-taqsima, se niddiskutu l-oqsma ta' applikazzjoni tat-teknoloġija lingwistika, jiġifieri ċekkjatur lingwistiku, tfittxija ta' web, interazzjoni tat-taħdit u traduzzjoni awtomatika. Dawn il-applikazjonijiet u teknoloġiji basiċi jinkludu
  308. \begin{figure*}[htb]
  309. \colorrule{grey3}{\textwidth}{1.5pt}
  310. \center
  311. \includegraphics[width=\textwidth]{../_media/maltese/language_technologies}
  312. \caption{Teknoloġija lingwistika fil-kuntest}
  313. \label{fig:ltincontext_mt}
  314. \colorrule{grey3}{\textwidth}{1.5pt}
  315. \end{figure*}
  316. %FIXME: I think something is missing here: The paragraph starting with "In this section, we will discuss..." plus the item list in the English version. Please double-check with the English version! -- Fixed - now. Sorry, there were too many changes in the English version. We totally lost track. :-(
  317. \begin{itemize}
  318. \item korrezzjoni ortografika
  319. \item appoġġ għal min jikteb
  320. \item tagħlim tal-lingwi assistita mill-kompjuter
  321. \item irkupru ta' informazzjoni
  322. \item estrazzjoni ta' informazzjoni
  323. \item qosor ta' testi
  324. \item tweġib ta' mistoqsijiet
  325. \item rikonoxximent ta' taħdit
  326. \item sinteżi ta' taħdit
  327. \end{itemize}
  328. It-teknoloġija lingwistika hija qasam stabbilit ta' riċerka li għalih hemm ammont estensiv ta' letteratura introduttorja. Il-qarrej interessat għandu jirreferi għar-referenzi li ġejjin: \cite{carstensen-etal1, jurafsky-martin01, manning-schuetze1, lt-world1, lt-survey1}.
  329. Qabel ma niddiskutu l-oqsma tal-applikazzjoni referruti hawn fuq, se niddeskrivu fil-qosor l-arkitettura ta' sistema LT tipika.
  330. \subsection{Arkitetturi ta Applikazzjonijiet}
  331. Applikazzjonijiet ta software tipiċi għall-ipproċessar tal-lingwa jikkonsistu f'diversi komponenti li jirriflettu aspetti differenti tal-lingwa u tal-kompitu li jkunu qed jimplimentaw. Ir-rappreżentazzjoni~\ref{fig:textprocessingarch_mt} turi arkitettura ferm simplifikata li tista’ tinsab f’sistema ta’ pproċessar ta’ testi. L-ewwel tliet moduli jittrattaw l-istruttura u t-tifsira tat-test imdaħħal fis-sistema:
  332. \begin{figure*}[b]
  333. \colorrule{grey3}{\textwidth}{1.5pt}
  334. \center
  335. \includegraphics[width=\textwidth]{../_media/maltese/text_processing_app_architecture}
  336. \caption{Arkitettura tipika għall-ipproċessar ta' testi}
  337. \label{fig:textprocessingarch_mt}
  338. \colorrule{grey3}{\textwidth}{1.5pt}
  339. \end{figure*}
  340. \begin{enumerate}
  341. \item Qabel l-ipproċessar: tindif tad-dejta, tneħħija tal-ifformattjar fejn hu xieraq, sejbien tal-lingwa mdaħħla fis-sistema, standardizzazjoni tar-rappreżentazzjoni ta simboli speċjali bħas-sing fil-Malti.
  342. \item \textbf{Analiżi grammatikali}: tiftix tal-verb u l-oġġetti tiegħu, il-modifikaturi, eċċ; sejbien tal-istruttura tas-sentenza.
  343. \item \textbf{Analiżi semantika}: tneħħja tal-ambigwità (Liema tifsira ta bank hija t-tajba fkuntest partikolari?), soluzzjoni ta anafori u espressjonijiet ta' referenza bħal “hi”, “il-karozza”, eċċ; rappreżentazzjoni tat-tifsira tas-sentenza b’mod li tinqara minn magna.
  344. \end{enumerate}
  345. Moduli ta kompiti speċifiċi mbagħad iwettqu ħafna operazzjonijiet differenti bħal sommarju awtomatiku ta test imdaħħal fis-sistema, tfittxija ta bażi ta' dejta u ħafna oħrajn. Hawn taħt, se nagħtu eżempji ta’ oqsma ewlenin ta’ applikazzjoni u niġbdu l-attenzjoni għal xi wħud mill-moduli ta’ arkitetturi differenti f'kull sezzjoni. Għal darboħra, l-arkitetturi huma simplifikati ferm u idealizzati, sabiex iservu biex wieħed jispjega l-kumplessità tal-applikazzjonijiet tat-teknoloġija lingwistika b'mod ġenerali li jinftiehem.
  346. \begin{figure*}[b]
  347. \colorrule{grey3}{\textwidth}{1.5pt}
  348. \center
  349. \includegraphics[width=\textwidth]{../_media/maltese/language_checking}
  350. \caption{Verifika tal-Lingwa (fuq: ibbażat fuq statistika, isfel: ibbażat fuq regoli)}
  351. \label{fig:langcheckingaarch_mt}
  352. \colorrule{grey3}{\textwidth}{1.5pt}
  353. \end{figure*}
  354. Wara l-introduzzjoni tal-oqsma ewlenin ta applikazzjoni, se nagħtu ħarsa ġenerali fil-qosor lejn is-sitwazzjoni fir-riċerka u l-edukazzjoni tat-TL, u nikkonkludu bdeskrizzjoni ta programmi ta ffinanzjar (tal-passat). Fl-aħħar ta din it-taqsima, se nippreżentaw stima esperta dwar is-sitwazzjoni fir-rigward ta għodod prinċipali u riżorsi tat-TL fnumru ta dimensjonijiet bħal disponibbiltà, maturità, jew kwalità. Is-sitwazzjoni ġenerali tat-TL għall-Malti hija mqassra fil-figura~\ref{fig:lrlttable_mt} (p.~\pageref{fig:lrlttable_mt}) fl-aħħar ta' dan il-kapitlu. Din it-tabella tinnota r-riżorsi kollha li huma b’tipa grassa fit-test. L-appoġġ tat-TL għall-Malti jiġi mqabbel ma' lingwi oħra li huma parti ta' din is-serje ta' \emph{white papers}.
  355. \subsection{Oqsma ewlenin ta applikazzjoni}
  356. F'din it-taqsima, niffokaw fuq l-għodod u r-riżorsi l-aktar importanti u nipprovdu ħarsa ġenerali dwar attivitajiet ta' TL f'Malta.
  357. \subsubsection{L-Iċċekkjar tal-lingwa}
  358. Kull min juża għodda ta pproċessar tal-kliem bħal Microsoft Word iltaqa ma komponent li jiċċekkja l-ortografija, jindika żbalji ortografiċi u jipproponi korrezzjonijiet. 40 sena wara l-ewwel programm ta korrezzjoni tal-ortografija minn Ralph Gorin, ċekkjaturi lingwistiċi llum ma jqabblux biss il-lista ta kliem estratta ma dizzjunarju ta kliem spellut bmod korrett, iżda saru dejjem aktar sofistikati. Minbarra l-algoritmi li jiddependu fuq il-lingwa għall-immaniġġjar tal-morfoloġija (.~formazzjoni tal-plural), xi wħud issa huma kapaċi jagħrfu żbalji marbuta ma sintassi, bħal verb nieqes jew verb li ma jaqbilx mas-suġġett tiegħu fil-persuna u n-numru, .~f \emph{`She *write a letter.’} Madankollu, iċ-ċekkjaturi l-aktar disponibbli (inkluż Microsoft Word) ma jsibu l-ebda żbalji fl-ewwel vers ta’ poeżija \cite{zar1} li tidher hawn taħt:
  359. \begin{quote}
  360. I have a spelling checker,\\
  361. It came with my PC.\\
  362. It plane lee marks four my revue\\
  363. Miss steaks aye can knot sea.
  364. \end{quote}
  365. Għall-immaniġġjar ta’ dan it-tip ta’ żbalji, l-analiżi tal-kuntest huwa meħtieġ f'ħafna każijiet, eż.~sabiex jiġi deċiż f'liema pożizzjoni l-\emph{għ} siekta għandha tinkiteb f’verb Malti, bħal f’dan l-eżempju:
  366. \begin{enumerate} %[(a)]
  367. \item \textit{...~in-negozjati li kien għamel il-Gvern ...}
  368. \item \textit{Pawlu, agħmel l-eżamijiet!}
  369. \item *\textit{...~in-negozjati li kien agħmel il-Gvern ...}
  370. \end{enumerate}
  371. Iż-żewġ verbi \emph{għamel} u \emph{agħmel} jiġu ppronunzjati \lingua{ɐː.mɛl}.
  372. Dan jew jeħtieġ il-formulazzjoni ta’ regoli tal-\textbf{grammatika} li jkunu speċifiċi, jiġifieri livell għoli ta’ kompetenza u xogħol manwali, jew l-użu tal-hekk imsejjaħ mudell lingwistiku tal-istatistika. Mudelli bħal dawn jikkalkulaw il-probabbiltà ta’ kelma partikolari li tidher f’ambjent speċifiku (jiġifieri, il-kliem ta’ qabel u ta’ wara). Pereżempju, kien għamel hija sekwenza ta’ kliem ħafna aktar probabbli milli kien agħmel. Mudell lingwistiku tal-statistika jista’ jitnissel awtomatikament minn ammont kbir ta’ dejta lingwistika (xierqa) (jiġifieri \textbf{korpus}).
  373. Sa issa, dawn il-metodi ġew l-aktar żviluppati u evalwati fuq dejta lingwistika bl-Ingliż. Madankollu, dawn mhux neċessarjament jittrasferixxu sew għal-lingwi li jkollhom inflessjoni għolja bħall-Malti, fejn tip ta’ kelma partikolari, bħal verb, tista’ tagħti numru kbir ta’ forom ortografiċi.
  374. Bħal lingwi oħra, mezz biex jiddetermina jekk sensiela partikolari hijiex kelma valida mhux kundizzjoni suffiċjenti għal sejbien ta’ żbalji ortografiċi, iżda huwa kundizzjoni neċessarja. S’issa, għalkemm saru diversi tentattivi, l-ebda mezz bħal dan ma jeżisti għall-Malti.
  375. Wieħed minn tal-ewwel kien ta’ \cite{Mangion:1999} li juża forom rudimentali ta’ analiżi morfoloġiċi bbażati fuq regoli. Kelma kienet essenzjalment ikkunsidrata valida jekk tistax tkun derivata permezz ta' zokk misjub f’dizzjunarju. Il-problema b’ dan il-metodu huwa li jeħtieġ lista kompluta ta’ kull zokk, u naturalment, ir-regoli għandhom ikunu preċiżi ħafna. Ir-riżultati kienu kemxejn limitati mil-lista ta’ zkuk, li ma kinitx kompluta, u n-natura imperfetta tar-regoli.
  376. Metodu ieħor ħares lejn l-istatistiċi għal soluzzjoni. L-idea intuwittiva hija li għal lingwa partikolari, ċerti sekwenzi ta’ karattri huma improbabbli ħafna. Bl-Ingliż, pereżempju, qatt ma nsibu s-sekwenza \emph{kk}, għalhekk jekk isseħħ taħt l-istess sekwenza f’kelma miktuba, nistgħu nbassru, bi grad għoli ta’ kunfidenza, li l-kelma mhijiex valida. B’mod aktar ġenerali, nistgħu nikkalkulaw il-probabbiltà ta’ xi sekwenza bħala funzjoni tal-probabbiltajiet tas-sekwenzi ta’ taħtha kollha, bl-adozzjoni tal-prinċipju li sabiex il-kelma tiġi kkunsidrata valida, il-probabbiltà trid taqbeż ċertu limitu. Ċekkjatur ortografiku statistiku li jagħmel użu minn prinċipju bħal dan kien żviluppat minn \cite{Mizzi:2000}. Dan ma kienx jeħtieġ dizzjunarju, iżda minflok kien ibbażat fuq id-distribuzzjoni ta’ n-grammi ta' karattri misjuba f’korpus ta’ gazzetta. Deher ċar li biex dan il-metodu jirnexxi kien jeħtieġ (i) mudell lingwistiku aktar preċiż li jirrikjedi aktar dejta lingwistika minn dik disponibbli dak iż-żmien, u (ii) li l-probabbiltà tas-sekwenza waħedha ma kinitx biżżejjed sabiex kelma ortografika tiġi kklassifikata bħala żball. Kif issuġġerit hawn fuq, informazzjoni oħra hija meħtieġa, bħal kategorija tal-kliem mill-kuntest tal-madwar.
  377. Tentattivi oħra biex jiġi żviluppat ċekkjatur ortografiku għall-Malti jinkludu ċekkjatur fuq l-internet li ġie żviluppat minn Ramon Casha tal-Linux User Group \cite{Linux-spellcheck1}. Dan huwa bbażat fuq lista ta’ kliem b’madwar miljun tip ta’ kelma oriġinarjament miġbura minn korpus li jvarja, u sussegwentement estiż permezz ta’ regoli differenti għall-immaniġġjar ta’ inflessjonijiet. L-eżattezza tiegħu ma ġietx stabbilita uffiċjalment. Microsoft ukoll kienu qed jaħdmu fuq ċekkjatur ortografiku biex jinkluduh mal-pakkett ta’ interface tagħhom għal-lingwa Maltija għalkemm mhux magħruf meta dan se jiġi rilaxxat.
  378. L-użu ta’ ċekkjatur lingwistiku mhuwiex limitat għal għodod ta’ pproċessar tal-kliem. L-iċċekkjar tal-lingwa jiġi wkoll applikat biex jiġu kkoreġuti awtomatikament mistoqsijiet mibgħuta lil magni ta’ tiftix, eż.~suġġerimenti tat-tip ``Ridt tfisser ...'' lil Google.
  379. Ir-riżultat taż-żieda mgħaġġla fid-domanda għall-prodotti tekniċi huwa li bosta kumpaniji bdew biex jiffokaw dejjem aktar fuq il-kwalità tad-dokumentazzjoni teknika quddiem il-ilmenti potenzjali tal-klijenti dwar l-użu ħażin tal-lingwa u l-esiġi tal-ħsara li tirriżulta minn instruzzjonijiet ħziena jew li ġew mifhuma ħazin. Softwer biex jappoġġja lil min jikteb jista jgħin lill-awtur tad-dokumentazzjoni teknika sabiex juża vokabularju u strutturi tas-sentenzi li huma konsistenti ma' ċerti regoli espremiti formalment u ristrizzjonijiet (korporattivi) tat-terminoloġija.
  380. Softwer biex jappoġġja lil min jikteb ma jeżistix bħalissa, iżda jista jkun hemm iskop konsiderevoli għall-użu ta' tali softwer fin-naħa tal-produzzjoni tal-Malti. Waħda mir-raġunijiet għall-iskarsezza ta' kontenut miktub bil-Malti, per eżempju fil-korrispondenza tan-negozju, hija li l-produzzjoni ta' kitba bil-Malti korreta hija diffiċli. Bosta kelliema nattivi kompetenti huma inklinati li jagħmlu żbalji meta jiġu għal-lingwa miktuba, u allura jippreferu jiktbu bl-Ingliż.
  381. Id-disponibbiltà ta' għodod sempliċi u tajbin għall-appoġġjar tal-kitba jistgħu jtaffu din il-problema.
  382. Barra minn CD ta’ dizzjunarju interattiv bl-istampi \cite{Sciriha:1997}, sal-lum l-ebda applikazzjoni bħal din ma ġiet żviluppata għall-Malti.
  383. \subsubsection{Tiftix fuq il-web}
  384. \begin{figure*}[htb]
  385. \colorrule{grey3}{\textwidth}{1.5pt}
  386. \center
  387. \includegraphics[width=\textwidth]{../_media/maltese/web_search_architecture}
  388. \caption{Tiftix fuq il-web}
  389. \label{fig:websearcharch_mt}
  390. \colorrule{grey3}{\textwidth}{1.5pt}
  391. \end{figure*}
  392. Tiftix fuq il-web, f’intranets, jew libreriji diġitali huwa probabbilment l-aktar teknoloġija lingwistika użata llum, iżda l-anqas żviluppata. Il-magna ta’ tiftix Google, li bdiet fl-1998, illum hija użata għal madwar 80\% tat-tiftix kollu fid-dinja kollha \cite{spi1}. Mill-2004, il-verb \emph{google} huwa saħansitra msemmi fid-dizzjunarju \emph{Cambridge Advanced Learner’s Dictionary}. La l-interface tat-tiftix u lanqas il-preżentazzjoni tar-riżultati miksuba ma nbidlu b'mod sinifikanti mill-ewwel verżjoni. Fil-verżjoni kurrenti, Google tipproponi korrezzjoni tal-ortografija għall-kliem spellut ħażin u, fl-2009, inkorporat kapaċitajiet bażiċi ta' tiftix semantiku fit-taħlita algoritmika tagħha \cite{pc1}, li tista’ ttejjeb il-preċiżjoni tat-tiftix billi tanalizza t-tifsira tat-termini mistoqsija f’kuntest. Is-suċċess tal-istorja tal-Google turi li b’ammont sostanzjali ta’ dejta disponibbli u tekniki effiċjenti għall-indiċjar ta’ din id-dejta, metodu bbażat il-biċċa l-kbira fuq statistika, jista’ jwassal għal riżultati sodisfaċenti.
  393. \columnbreak
  394. Madankollu, għal talba aktar sofistikata għall-informazzjoni, l-integrazzjoni ta’ għarfien aktar profond tal-lingwistika hija essenzjali. Fil-laboratorji ta’ riċerka, esperimenti bl-użu ta' \textbf{riżorsi lessikali} bħal teżawri li jinqraw minn magni u riżorsi ontoloġiċi tal-lingwa bħal WordNet urew titjib billi jippermettu s-sejbien ta’ paġna fuq il-bażi ta’ sinonimi tat-termini ta’ tiftix, eż.~\emph{enerġija atomika, enerġija nukleari} jew saħansitra termini relatati aktar mill-bogħod.
  395. %Madankollu, għal talba aktar sofistikata għall-informazzjoni, l-integrazzjoni ta’ għarfien aktar profond tal-lingwistika hija essenzjali. Fil-laboratorji ta’ riċerka, %esperimenti bl-użu ta' \textbf{riżorsi lessikali} bħal teżawri li jinqraw minn magni u riżorsi ontoloġiċi tal-lingwa bħal WordNet urew titjib billi jippermettu
  396. %s-sejbien ta’ paġna fuq il-bażi ta’ sinonimi tat-termini ta’ tiftix, eż. bil-Ġermaniż \emph{Atomkraft, Kernenergie u Nuklearenergie} (enerġija atomika, enerġija %atomika/nukleari, u enerġija nukleari) jew saħansitra termini relatati aktar mill-bogħod.
  397. \boxtext{Il-ġenerazzjoni li jmiss tal-magni ta’ tiftix trid tkun tinkludi iktar teknoloġija lingwistika sofistikata.}
  398. Il-ġenerazzjoni li jmiss ta’ magni ta’ tiftix se jkollhom jinkludu teknoloġija lingwistika ferm aktar sofistikata. Jekk mistoqsija ta’ tiftix tikkonsisti f’domanda jew tip ieħor ta’ sentenza minflok lista ta’ kliem ewlieni, il-ksib ta’ tweġibiet rilevanti għal din il-mistoqsija teħtieġ \textbf{analiżi semantiku} u sintattiku ta’ din is-sentenza kif ukoll id-disponibiltà ta’ indiċi li jippermetti rkupru mgħaġġel tad-dokumenti rilevanti. Pereżempju, immaġina utent idaħħal il-mistoqsija “Tini lista tal-kumpaniji kollha li ttieħdu minn kumpaniji oħra fl-aħħar ħames snin”. Għal tweġiba sodisfaċenti, parsing sintattika jeħtieġ li tiġi applikata biex tiġi analizzata l-istruttura grammatikali tas-sentenza u jiġi determinat li l-utent qed ifittex kumpaniji li ttieħdu minn kumpaniji oħra. Barra minn hekk, l-espressjoni \emph{l-aħħar ħames snin} jeħtieġ li tiġi pproċessata biex jinstab liema snin qed tirreferi għalihom.
  399. Fl-aħħar nett, il-mistoqsija pproċessata teħtieġ li tiġi mqabbla ma’ ammont enormi ta’ dejta mhux strutturata sabiex jinstabu l-biċċa jew biċċiet ta’ informazzjoni li l-utent qed ifittex. Dan huwa ġeneralment imsejjaħ rkupru ta’ informazzjoni (RI) u jinvolvi t-tfittxija għal u l-klassifikazzjoni ta’ dokumenti rilevanti. Barra minn hekk, meta niġġeneraw lista ta’ kumpaniji, jenħtieġ li niġbru l-informazzjoni ta’ sekwenza partikolari ta’ kliem f'dokument li tirreferi għal isem ta’ kumpanija. Din it-tip ta’ informazzjoni tkun disponibbli mill-hekk imsejħa identifikazzjoni ta’ entità bl-isem.
  400. Saħansitra aktar diffiċli huwa t-tentattiv biex inqabblu mistoqsija ma’ dokumenti miktuba f'lingwa differenti. Għall-irkupru ta’ informazzjoni bejn il-lingwi, għandna nittraduċu awtomatikament il-mistoqsija fil-lingwi sorsi kollha possibbli u nittrasferixxu l-informazzjoni miksuba lura għal-lingwa fil-mira. Il-perċentwal dejjem jiżdied ta’ dejta disponibbli f'formati mhux testwali imexxi d-domanda għal servizzi li jippermettu rkupru ta’ informazzjoni multimidjali, jiġifieri, it-tiftix ta’ informazzjoni li jinkludu immaġini, audio u dejta bil-videos. Għal files audio u video, dan jinvolvi modulu ta’ \textbf{identifikazzjoni ta’ taħdit} biex ibiddel il-kontenut tat-taħdit f’test jew rappreżentazzjoni fonetika, li magħhom jistgħu jitqabblu l-mistoqsijiet tal-utent.
  401. F'Malta, hemm numru ta’ websajts ta’ tiftix li huma speċifikament immirati għal Malta \cite{philb1}. Barra minn hekk, hemm numru żgħir ta’ SMEs ibbażati f’Malta li jinkorporaw tekniki relattivament sofistikati tal-ipproċessar tal-lingwa fl-ambitu ta’ applikazzjonijiet ta’ tiftix. Charonite \cite{charonite1}, pereżempju, hija SME lokali li tittratta l-aħjar użu ta’ magni ta’ tiftix. Madankollu, bħalissa m’hemm l-ebda magni ta’ tiftix kummerċjalment disponibbli li huma speċifikament immirati lejn l-ilsien Malti, apparti minn prototip għall-irkupru ta’ informazzjoni bejn il-lingwi żviluppat għal skopijiet tal-LT4eL \cite{let1}, proġett ta’ riċerka Ewropew tal-FP6 li uża għodod ta’ teknoloġija lingwistika multilingwi u tekniki ta’ kodifikazzjoni ta’ semantika għal titjib fil-ksib ta’ materjal għat-tagħlim.
  402. \subsubsection{L-Interazzjoni tat-taħdit }
  403. L-interazzjoni tat-taħdit hija l-bażi għall-ħolqien ta’ interfaces li jippermettu lill-utent biex jinteraġixxi ma’ magni billi juża l-lingwa mitkellma aktar milli, pereżempju, stampa grafika, tastiera, u mouse. Illum, l-interfaces għall-vuċi (VUIs) huma normalment użati għal offerti ta’ servizzi ta’ awtomatizzazzjoni parzjali jew kompluti pprovduti mill-kumpaniji lill-klijenti tagħhom, l-impjegati jew l-imsieħba permezz tat-telefon. Dominji ta’ negozju li jiddependu ħafna fuq VUIs huma l-banek, il-loġistika, it-trasport pubbliku, u t-telekomunikazzjonijiet. Użi oħra tat-Teknoloġija għall-Interazzjoni tat-Taħdit huma interfaces għal apparat partikolari, eż.~sistemi ta’ navigazzjoni fil-karozza, u l-użu tal-lingwa mitkellma bħala alternattiva għall-modalitajiet ta’ input/output ta’ interfaces grafiċi għall-utent, eż.~fl-ismartphones.
  404. %FIXME: Missing translation -- Fixed!
  405. \boxtext{L-interazzjoni tat-taħdit hija l-bażi għall-ħolqien\\ ta’ interfaces li jippermettu lill-utent biex\\ jinteraġixxi billi juża l-lingwa mitkellma aktar\\ milli stampa grafika, tastiera, u mouse.}
  406. \begin{figure*}[htb]
  407. \colorrule{grey3}{\textwidth}{1.5pt}
  408. \center
  409. \includegraphics[width=\textwidth]{../_media/maltese/simple_speech-based_dialogue_architecture}
  410. \caption{Sistema tal-interazzjoni tat-taħdit}
  411. \label{fig:dialoguearch_mt}
  412. \colorrule{grey3}{\textwidth}{1.5pt}
  413. \end{figure*}
  414. Fil-qalba tagħha, l-Interazzjoni tat-Taħdit tinkludi l-erba’ teknoloġiji differenti li ġejjin:
  415. \begin{enumerate}
  416. \item L-\textbf{Identifikazzjoni awtomatika tat-taħdit} (ASR) hija responsabbli biex tiddetermina liema kliem kien attwalment mitkellem f’sekwenza partikolari ta’ ħsejjes imlissna minn utent.
  417. \item L-analiżi sintattika u l-interpretazzjoni semantika janalizzaw l-istruttura sintattika tat-tlissin tal-utent u jinterpretaw tal-aħħar skont l-għan tas-sistema rispettiva.
  418. \item Il-ġestjoni tad-djalogu hija meħtieġa sabiex jiġi determinata ma’ liema parti tas-sistema l-utent jinteraġixxi, liema azzjoni għandha tittieħed skont l-input tal-utent u l-funzjonalità tas-sistema.
  419. \item It-teknoloġija tas-\textbf{sinteżi tat-taħdit} (Text-to-Speech, TTS) hija użata biex tittrasforma l-kliem ta’ dik l-espressjoni fi ħsejjes li jkunu prodotti għall-utent.
  420. \end{enumerate}
  421. Waħda mill-isfidi ewlenin hija li jkollok sistema ASR li tidentifika l-kliem imlissen minn utent bl-aktar mod preċiż possibbli. Dan jeħtieġ jew restrizzjoni tal-firxa ta’ espressjonijiet possibbli tal-utent għal sett limitat ta’ kliem ewlieni, inkella l-ħolqien manwali ta’ mudelli lingwistiċi li jkopru firxa kbira ta’ espressjonijiet ta’ lingwa naturali tal-utent. Bl-użu ta' tekniki tat-tagħlim tal-magni, mudelli tal-lingwa jistgħu wkoll jiġu ġġenerati awtomatikament minn \textbf{korpora tat-taħdit}, jiġifieri kollezzjonijiet kbar ta' files audio ta' diskors u transkrizzjonijiet. Ir-restrizzjoni ta' espressjonijiet normalment jirriżulta f’użu pjuttost riġidu ta’ \emph{voice user interface} (VUI) u jikkawża aċċettazzjoni baxxa mill-utenti; iżda l-ħolqien, l-irfinar u l-manutenzjoni tal-mudelli lingwistiċi jistgħu jżidu l-ispejjeż b’mod sinifikanti. VUIs li jużaw mudelli lingwistiċi u fil-bidu jippermettu lill-utent biex jesprimi l-intenzjoni tiegħu b’mod flessibbli -- evokati, pereżempju, minn tislima bħal \emph{Kif nista’ ngħinek?} -- għandha aktar aċċettazzjoni mill-utenti.
  422. Il-kumpaniji għandhom tendenza li jużaw ħafna espressjonijiet irrekordjati minn qabel ta’ kelliema professjonali – idealment korporattivi. Għal espressjonijiet statiċi, li fihom il-kliem ma jiddependix fuq il-kuntesti partikolari tal-użu jew id-dejta personali ta’ utent partikolari, dan iwassal għal esperjenza rikka tal-utent. Madankollu, aktar ma l-espressjoni jkollha tikkunsidra kontenut dinamiku, aktar l-esperjenza tal-utent tista’ tbati aktar minn prosodija fqira li tirriżulta minn files audio individwali marbutin ma’ xulxin. Bl-ottimizzazzjoni, is-sistemi TTS tal-lum qegħdin jitjiebu fil-produzzjoni tan-naturalezza prosodika ta’ espressjonijiet dinamiċi.
  423. \boxtext{L-interazzjoni tat-taħdit hija l-bażi għall-ħolqien\\ ta' \emph{interfaces} li jippermettu lill-utenti sabiex\\ jinteraġixxu fid-diskors minflok jużaw\\ stampa grafika, tastiera, u mouse.}
  424. Rigward is-suq tat-teknoloġija għall-Interazzjoni tat-Taħdit, l-aħħar għaxar snin għaddew minn standardizzazzjoni qawwija tal-interfaces bejn il-komponenti ta’ teknoloġiji differenti, kif ukoll bi standards għal ħolqien ta’ prodotti ta’ software partikolari għal ċerta applikazzjoni. F’dawn l-aħħar għaxar snin kien hemm ukoll konsolidazzjoni soda tas-suq. Is-swieq nazzjonali fil-pajjiżi tal-G20 (jiġifieri pajjiżi ekonomikament b'saħħithom b’popolazzjoni konsiderevoli) huma dominati minn anqas minn 5 parteċipanti dinjija, b’Nuance (Istati Uniti) u Loquendo (Italja) dawk l-aktar prominenti fl-Ewropa. Fl-2011, Nuance ħabbret l-akkwist ta' Loquendo li jirripreżenta pass 'il quddiem fil-konsolidazzjoni tas-suq.
  425. Ħafna mill-iżvilupp tat-teknoloġija tat-taħdit f'Malta kkonċentra fuq 'mit-test għat-taħdit' (TTS). Xi xogħol pijunier fil-bidu kien imwettaq minn \cite{Micallef:1997} u dan kien segwit minn numru ta’ teżijiet f’livell ta’ Masters \cite{Farrugia:2005}. Xi xogħol fuq sistema TTS ibbażata fuq il-web beda minn \cite{Buhagiar-Micallef:2008}.
  426. Żvilupp sinifikanti fis-sinteżi tat-taħdit għall-Malti kien il-kisba ta’ offerta mingħand il-gvern għall-iżvilupp ta’ sintetizzatur tat-taħdit mill-kumpanija lokali Crimson Wing Ltd.~Malta. Dan ix-xogħol huwa parzjalment iffinanzjat mill-Fond Ewropew għall-Iżvilupp Reġjonali u kkummissjonat mill-Fondazzjoni għall-Aċċessibilità għat-Teknoloġija tal-Informazzjoni (FITA). Il-prototip se jkun konformi ma’ SAPI u se jinkludi tliet vuċijiet (tal-irġiel, tan-nisa, u tat-tfal). Skont preżentazzjoni riċenti \cite{Borg-et-al:2011} ix-xogħol qed javvanza sew u prototip, mistenni fl-2012, se jkun disponibbli biex jitniżżel mingħajr ħlas.
  427. Ix-xogħol fuq l-idenifikazzjoni tat-taħdit huwa anqas avvanzat. Prototip biex jidentifika n-numri nħoloq minn \cite{Calleja:2002} f’dominji sempliċi. Fir-rigward tat-taħdit, il-problema fundamentali tibqa’ nuqqas ta’ dejta annotata b’mod xieraq minħabba li dan jeħtieġ sforz manwali sinifikanti. Xi tentattivi ta’ dħul awtomatiku saru minn \cite{Psaila:2008}. Il-ħolqien ta’ korpus u qafas deskrittiv għall-istudju tal-intonazzjoni Maltija bdiet mill-Istitut tal-Lingwistika u twettqet minn Vella u Farrugia \cite{Vella-Farrugia:2006}. Huwa mistenni li l-korpus li qed jiġi żviluppat minn Crimson Wing se jkun disponibbli għar-riċerka.
  428. Ħarsa lil hinn mill-istat tat-teknoloġija tal-lum, turi li se jkun hemm bidliet sinifikanti minħabba l-firxa ta’ smartphones bħala pjattaforma ġdida għall-ġestjoni ta’ relazzjonijiet mal-klijenti – minbarra t-telefon, l-internet u mezzi għall-posta elettronika. Din it-tendenza se taffettwa wkoll l-użu tat-teknoloġija għall-Interazzjoni tat-Taħdit. Minn naħa, id-domanda għal telefonija bbażata fuq VUIs se tonqos, fuq medda ta' tul ta’ żmien. Min-naħa l-oħra, l-użu tal-lingwa mitkellma bħala modalità ta’ input faċli għall-utent għall-ismartphones se jikseb importanza sinifikanti.
  429. Din it-tendenza hija appoġġjata minn titjib osservabbli tal-eżattezza tal-identifikazzjoni tad-diskors li hija indipendenti mill-kelliem għal dettatura ta’ taħdit li diġà hija offruta bħala servizzi ċentralizzati għall-utenti tal-ismartphone. B’din l-‘esternalizzazzjoni’ tal-kompitu ta’ identifikazzjoni tal-infrastruttura tal-applikazzjonijiet, l-użu ta’ applikazzjoni speċifika ta’ teknoloġiji ewlenin lingwistiċi għandha tikber fl-importanza meta mqabbla mas-sitwazzjoni preżenti.
  430. \subsubsection{Traduzzjoni awtomatika}
  431. \begin{figure*}[htb]
  432. \colorrule{grey3}{\textwidth}{1.5pt}
  433. \centering
  434. \bigskip
  435. \includegraphics[width=\textwidth]{../_media/maltese/machine_translation}
  436. \caption{Traduzzjoni awtomatika (xellug: ibbażata fuq statistika; lemin: ibbażata fuq regoli)}
  437. \label{fig:mtarch_mt}
  438. \colorrule{grey3}{\textwidth}{1.5pt}
  439. \end{figure*}
  440. L-idea tal-użu ta’ kompjuters diġitali għat-traduzzjoni ta’ lingwi naturali bdiet fl-1946 minn A.~D.~Booth u kienet segwita minn fondi sostanzjali għar-riċerka f'dan il-qasam fil-ħamsinijiet u ssoktat mill-ġdid fit-tmeninijiet. Madankollu, it-\textbf{Traduzzjoni Awtomatika} (TA) tibqa’ tonqos li tissodisfa l-aspettativi għolja li ħolqot fis-snin bikrin tagħha.
  441. Fil-livell bażiku tagħha, TA sempliċiment tissostitwixxi l-kliem ta' lingwa naturali waħda b'ta' oħra. Dan jista’ jkun utli f'dominji ta' suġġetti b’lingwa ristretta u konvenzjonali ħafna, pereżempju, rapporti tat-temp.
  442. Madankollu, għal traduzzjoni tajba ta’ testi anqas standardizzati, unitajiet ta’ testi akbar (frażijiet, sentenzi, jew anki siltiet sħaħ) jeħtieġ li jkunu mqabbla mal-eqreb kontropartijiet tagħhom fil-lingwa fil-mira. Id-diffikultà prinċipali hawnhekk tinsab fil-fatt li l-lingwa umana hija ambigwa, u toħloq sfidi fuq livelli multipli, eż.~tneħħija ta’ ambigwità mis-sens tal-kelma fuq livell lessikali (`Jaguar' tista’ tfisser karozza jew annimal) jew it-twaħħil ta’ frażijiet prepożizzjonali fuq livell sintattiku bħal fl-eżempji li ġejjin:
  443. \begin{enumerate}%[(a)]
  444. \item \emph{Il-Kuntistabbli osserva lir-raġel bit-teleskopju.}
  445. \item \emph{Il-Kuntistabbli osserva lir-raġel bir-rivolver.}
  446. \end{enumerate}
  447. Mod wieħed kif wieħed jitratta l-kompitu huwa bbażat fuq regoli lingwistiċi. Għat-traduzzjonijiet bejn lingwi marbutin flimkien mill-qrib, traduzzjoni diretta tista tkun possibbli f'każijiet bħall-eżempju t’hawn fuq. Iżda sistemi bbażati fuq regoli (jew immexxija minn għarfien) ta’ spiss janalizzaw it-test imdaħħal fis-sistema u joħolqu rappreżentazzjoni intermedjarja u simbolika, li minnhom it-test fil-lingwa fil-mira jiġi ġġenerat. Is-suċċess ta’ dawn il-metodi jiddipendi ħafna fuq id-disponibbiltà ta’ dizzjunarji estensivi b’ informazzjoni morfoloġika, sintattika, u semantika, u settijiet kbar ta’ regoli tal-\textbf{grammatika} mfassla bir-reqqa minn lingwista tas-sengħa.
  448. \boxtext{Fil-livell bażiku tagħha, TA sempliċiment\\ tissostitwixxi l-kliem ta'\\[.3mm] lingwa naturali waħda b'ta' oħra.}
  449. Lejn it-tmiem tat-tmeninijiet, kif l-enerġija kompjutazzjonali żdiedet u saret anqas għalja, kien hemm aktar interess fil-mudelli statistiċi għat-TA. Il-parametri ta dawn il-mudelli statistiċi jittieħdu mill-analiżi ta \textbf{korpora ta testijiet} bilingwi, bħalma hu l-\textbf{korpus parallel} tal-Europarl, li fih il-proċedimenti tal-Parlament Ewropew fi 21-il lingwa Ewropea. Bdejta suffiċjenti, TA statistika jaħdem tajjeb biżżejjed li jikseb tifsira approssimattiva għal test ta lingwa barranija. Madankollu, bid-differenza ta sistemi mmexxija mill-għarfien, it-TA statistika (jew immexxija minn dejta) spiss tiġġenera produzzjoni mhux grammatikali. Min-naħa l-oħra, minbarra l-vantaġġ li anqas sforz uman huwa meħtieġ għall-kitba grammatikali, TA mmexxija minn dejta tista wkoll tkopri partikolaritajiet tal-lingwa li jintilfu fsistemi mmexxija minn għarfien, pereżempju espressjonijiet idjomatiċi.
  450. Minħabba li l-punti tajbin u dgħajfin tat-TA mmexxija minn għarfien u dejta huma kumplimentari, ir-riċerkaturi llum unanimament jimmiraw għal metodi ibridi li jgħaqqdu l-metodoloġiji tat-tnejn li huma. Dan jista jsir bmodi diversi. Wieħed jinkludi l-użu ta kemm sistemi mmexxija minn għarfien kif ukoll dejta u jkollu modulu ta għażla li jiddeċiedi dwar l-aħjar output għal kull sentenza. Madankollu, għal sentenzi itwal, l-ebda riżultat mhu se jkun perfett. Soluzzjoni aħjar hija li tgħaqqad l-aħjar partijiet ta kull sentenza minn outputs multipli, li jistgħu jkunu pjuttost kumplessi, minħabba li partijiet korrispondenti ta alternattivi multipli mhux dejjem huma ovvji u jridu jkunu allinjati.
  451. \boxtext{Il-kwalità tas-sistemi tat-TA għall-Malti hija meqjusa li għad għandha potenzjal kbir ta titjib.}
  452. FMalta l-ħidma mwettqa fit-Traduzzjoni Awtomatika kienet ristretta għal ftit teżijiet tal-grad ta Baċellerat u Masters. Sistema ta trasferiment ibbażata fuq l-LFG kienet żviluppata għall-Ingliż/Malti minn \cite{Farrugia:2000} u kienet tittraduċi bsuċċess it-tbassir tat-temp. Aktar tard J.~Bajada \cite{Bajada:2004, Bajada:2009} ħadem fuq TA statistika (SMT) bl-enfasi fuq tekniki għall-produzzjoni ta mudelli lingwistiċi u tat-traduzzjoni. Ix-xogħol preċedenti kien jikkonċerna mudelli bbażati fuq kliem, filwaqt li tal-aħħar żviluppa tekniki għal ġbir ta dejta bilingwali ta frażijiet minn korpus limitat.
  453. Bħal f'bosta oqsma oħra, il-problema bażika hija n-nuqqas ta’ kwantitajiet kbar ta’ dejta bilingwali annotata b’mod xieraq. Minħabba din ir-raġuni, forsi, is-sistema bil-punt ta’ referenza biex jiġu ġġudikati avvanzi tibqa’ Google Translate.
  454. Il-kwalità tas-sistemi tat-TA hija meqjusa li għad għandha potenzjal kbir ta titjib. L-isfidi jinkludu l-adattabilità tar-riżorsi lingwistiċi għal dominju ta suġġett partikolari jew qasam tal-utent u l-integrazzjoni tax-xogħol għaddej eżistenti bbażijiet ta termini u memorji tat-traduzzjoni. Barra minn hekk, ħafna mis-sistemi kurrenti huma bbażati fuq l-Ingliż u jappoġġjaw lil ftit biss mil-lingwi minn u għall-Ġermaniż, li jwassal għal tensjonijiet fix-xogħol għaddej totali tat-traduzzjoni, u .~jisforza utenti tat-TA biex jitgħallmu għodod ta kodiċi ta lessiku differenti għal sistemi differenti.
  455. Kampanji ta evalwazzjoni jippermettu tqabbil tal-kwalità tas-sistemi tat-TA, il-metodi diversi u l-istatus ta sistemi TA għal-lingwi differenti. Il-figura~\ref{fig:euromatrix_mt}, ippreżentata fi ħdan il-proġett Euromatrix+ tal-KE, turi l-prestazzjonijiet par par miksuba għal 22 lingwa uffiċjali (il-Gaelic Irlandiż huwa nieqes) f'termini ta’ punteġġ BLEU \cite{bleu1}. Aktar ma jkun għoli l-punteġġ, aktar tkun tajba t-traduzzjoni. Traduttur uman iġib madwar 80 \cite{bleu1}.
  456. L-aħjar riżultati (murija bl-aħdar u l-blu) nkisbu minn lingwi li jibbenefikaw minn sforzi konsiderevoli ta riċerka, fi ħdan programmi kkoordinati, u mill-eżistenza ta \textbf{korpora paralleli} u numerużi (.~Ingliż, Franċiż, Olandiż, Spanjol), l-agħar (bl-aħmar) minn lingwi li ma bbenefikawx minn sforzi simili, jew li huma differenti ħafna minn lingwi oħra (.~Ungeriż, Malti, Finlandiż).
  457. \begin{figure*}[htbp]
  458. \centering
  459. \setlength{\tabcolsep}{0.17em}
  460. \small
  461. \begin{tabular}{>{\columncolor{corange1}}cccccccccccccccccccccccc}
  462. & \multicolumn{22}{>{\columncolor{corange1}}c}{Lingwa tal-mira --- \textcolor{grey1}{Target language}}\\\addlinespace[{-.009cm}]
  463. \rowcolor{corange1} & EN & BG & DE & CS & DA & EL & ES & ET & FI & FR & HU & IT & LT & LV & MT & NL & PL & PT & RO & SK & SL & SV\\
  464. EN & -- & \textcolor{blue}{40.5} & \textcolor{blue}{46.8} & \textcolor{green2}{52.6} & \textcolor{green2}{50.0} & \textcolor{blue}{41.0} & \textcolor{green2}{55.2} & \textcolor{purple}{34.8} & \textcolor{purple}{38.6} & \textcolor{green2}{50.1} & \textcolor{purple}{37.2} & \textcolor{green2}{50.4} & \textcolor{purple}{39.6} & \textcolor{blue}{43.4} & \textcolor{purple}{39.8} & \textcolor{green2}{52.3} & \textcolor{blue}{49.2} & \textcolor{green2}{55.0} & \textcolor{blue}{49.0} & \textcolor{blue}{44.7} & \textcolor{green2}{50.7} & \textcolor{green2}{52.0}\\
  465. BG & \textcolor{green}{61.3} & -- & \textcolor{purple}{38.7} & \textcolor{purple}{39.4} & \textcolor{purple}{39.6} & \textcolor{purple}{34.5} & \textcolor{blue}{46.9} & \textcolor{red3}{25.5} & \textcolor{red3}{26.7} & \textcolor{blue}{42.4} & \textcolor{red3}{22.0} & \textcolor{blue}{43.5} & \textcolor{red3}{29.3} & \textcolor{red3}{29.1} & \textcolor{red3}{25.9} & \textcolor{blue}{44.9} & \textcolor{purple}{35.1} & \textcolor{blue}{45.9} & \textcolor{purple}{36.8} & \textcolor{purple}{34.1} & \textcolor{purple}{34.1} & \textcolor{purple}{39.9}\\
  466. DE & \textcolor{green2}{53.6} & \textcolor{red3}{26.3} & -- & \textcolor{purple}{35.4} & \textcolor{blue}{43.1} & \textcolor{purple}{32.8} & \textcolor{blue}{47.1} & \textcolor{red3}{26.7} & \textcolor{red3}{29.5} & \textcolor{purple}{39.4} & \textcolor{red3}{27.6} & \textcolor{blue}{42.7} & \textcolor{red3}{27.6} & \textcolor{purple}{30.3} & \textcolor{red2}{19.8} & \textcolor{green2}{50.2} & \textcolor{purple}{30.2} & \textcolor{blue}{44.1} & \textcolor{purple}{30.7} & \textcolor{red3}{29.4} & \textcolor{purple}{31.4} & \textcolor{blue}{41.2}\\
  467. CS & \textcolor{green2}{58.4} & \textcolor{purple}{32.0} & \textcolor{blue}{42.6} & -- & \textcolor{blue}{43.6} & \textcolor{purple}{34.6} & \textcolor{blue}{48.9} & \textcolor{purple}{30.7} & \textcolor{purple}{30.5} & \textcolor{blue}{41.6} & \textcolor{red3}{27.4} & \textcolor{blue}{44.3} & \textcolor{purple}{34.5} & \textcolor{purple}{35.8} & \textcolor{red3}{26.3} & \textcolor{blue}{46.5} & \textcolor{purple}{39.2} & \textcolor{blue}{45.7} & \textcolor{purple}{36.5} & \textcolor{blue}{43.6} & \textcolor{blue}{41.3} & \textcolor{blue}{42.9}\\
  468. DA & \textcolor{green2}{57.6} & \textcolor{red3}{28.7} & \textcolor{blue}{44.1} & \textcolor{purple}{35.7} & -- & \textcolor{purple}{34.3} & \textcolor{blue}{47.5} & \textcolor{red3}{27.8} & \textcolor{purple}{31.6} & \textcolor{blue}{41.3} & \textcolor{red3}{24.2} & \textcolor{blue}{43.8} & \textcolor{red3}{29.7} & \textcolor{purple}{32.9} & \textcolor{red3}{21.1} & \textcolor{blue}{48.5} & \textcolor{purple}{34.3} & \textcolor{blue}{45.4} & \textcolor{purple}{33.9} & \textcolor{purple}{33.0} & \textcolor{purple}{36.2} & \textcolor{blue}{47.2}\\
  469. EL & \textcolor{green2}{59.5} & \textcolor{purple}{32.4} & \textcolor{blue}{43.1} & \textcolor{purple}{37.7} & \textcolor{blue}{44.5} & -- & \textcolor{green2}{54.0} & \textcolor{red3}{26.5} & \textcolor{red3}{29.0} & \textcolor{blue}{48.3} & \textcolor{red3}{23.7} & \textcolor{blue}{49.6} & \textcolor{red3}{29.0} & \textcolor{purple}{32.6} & \textcolor{red3}{23.8} & \textcolor{blue}{48.9} & \textcolor{purple}{34.2} & \textcolor{green2}{52.5} & \textcolor{purple}{37.2} & \textcolor{purple}{33.1} & \textcolor{purple}{36.3} & \textcolor{blue}{43.3}\\
  470. ES & \textcolor{green}{60.0} & \textcolor{purple}{31.1} & \textcolor{blue}{42.7} & \textcolor{purple}{37.5} & \textcolor{blue}{44.4} & \textcolor{purple}{39.4} & -- & \textcolor{red3}{25.4} & \textcolor{red3}{28.5} & \textcolor{green2}{51.3} & \textcolor{red3}{24.0} & \textcolor{green2}{51.7} & \textcolor{red3}{26.8} & \textcolor{purple}{30.5} & \textcolor{red3}{24.6} & \textcolor{blue}{48.8} & \textcolor{purple}{33.9} & \textcolor{green2}{57.3} & \textcolor{purple}{38.1} & \textcolor{purple}{31.7} & \textcolor{purple}{33.9} & \textcolor{blue}{43.7}\\
  471. ET & \textcolor{green2}{52.0} & \textcolor{red3}{24.6} & \textcolor{purple}{37.3} & \textcolor{purple}{35.2} & \textcolor{purple}{37.8} & \textcolor{red3}{28.2} & \textcolor{blue}{40.4} & -- & \textcolor{purple}{37.7} & \textcolor{purple}{33.4} & \textcolor{purple}{30.9} & \textcolor{purple}{37.0} & \textcolor{purple}{35.0} & \textcolor{purple}{36.9} & \textcolor{red3}{20.5} & \textcolor{blue}{41.3} & \textcolor{purple}{32.0} & \textcolor{purple}{37.8} & \textcolor{red3}{28.0} & \textcolor{purple}{30.6} & \textcolor{purple}{32.9} & \textcolor{purple}{37.3}\\
  472. FI & \textcolor{blue}{49.3} & \textcolor{red3}{23.2} & \textcolor{purple}{36.0} & \textcolor{purple}{32.0} & \textcolor{purple}{37.9} & \textcolor{red3}{27.2} & \textcolor{purple}{39.7} & \textcolor{purple}{34.9} & -- & \textcolor{red3}{29.5} & \textcolor{red3}{27.2} & \textcolor{purple}{36.6} & \textcolor{purple}{30.5} & \textcolor{purple}{32.5} & \textcolor{red2}{19.4} & \textcolor{blue}{40.6} & \textcolor{red3}{28.8} & \textcolor{purple}{37.5} & \textcolor{red3}{26.5} & \textcolor{red3}{27.3} & \textcolor{red3}{28.2} & \textcolor{purple}{37.6}\\
  473. FR & \textcolor{green}{64.0} & \textcolor{purple}{34.5} & \textcolor{blue}{45.1} & \textcolor{purple}{39.5} & \textcolor{blue}{47.4} & \textcolor{blue}{42.8} & \textcolor{green}{60.9} & \textcolor{red3}{26.7} & \textcolor{purple}{30.0} & -- & \textcolor{red3}{25.5} & \textcolor{green2}{56.1} & \textcolor{red3}{28.3} & \textcolor{purple}{31.9} & \textcolor{red3}{25.3} & \textcolor{green2}{51.6} & \textcolor{purple}{35.7} & \textcolor{green}{61.0} & \textcolor{blue}{43.8} & \textcolor{purple}{33.1} & \textcolor{purple}{35.6} & \textcolor{blue}{45.8}\\
  474. HU & \textcolor{blue}{48.0} & \textcolor{red3}{24.7} & \textcolor{purple}{34.3} & \textcolor{purple}{30.0} & \textcolor{purple}{33.0} & \textcolor{red3}{25.5} & \textcolor{purple}{34.1} & \textcolor{red3}{29.6} & \textcolor{red3}{29.4} & \textcolor{purple}{30.7} & -- & \textcolor{purple}{33.5} & \textcolor{red3}{29.6} & \textcolor{purple}{31.9} & \textcolor{red2}{18.1} & \textcolor{purple}{36.1} & \textcolor{red3}{29.8} & \textcolor{purple}{34.2} & \textcolor{red3}{25.7} & \textcolor{red3}{25.6} & \textcolor{red3}{28.2} & \textcolor{purple}{30.5}\\
  475. IT & \textcolor{green}{61.0} & \textcolor{purple}{32.1} & \textcolor{blue}{44.3} & \textcolor{purple}{38.9} & \textcolor{blue}{45.8} & \textcolor{blue}{40.6} & \textcolor{red3}{26.9} & \textcolor{red3}{25.0} & \textcolor{red3}{29.7} & \textcolor{green2}{52.7} & \textcolor{red3}{24.2} & -- & \textcolor{red3}{29.4} & \textcolor{purple}{32.6} & \textcolor{red3}{24.6} & \textcolor{green2}{50.5} & \textcolor{purple}{35.2} & \textcolor{green2}{56.5} & \textcolor{purple}{39.3} & \textcolor{purple}{32.5} & \textcolor{purple}{34.7} & \textcolor{blue}{44.3}\\
  476. LT & \textcolor{green2}{51.8} & \textcolor{red3}{27.6} & \textcolor{purple}{33.9} & \textcolor{purple}{37.0} & \textcolor{purple}{36.8} & \textcolor{red3}{26.5} & \textcolor{red3}{21.1} & \textcolor{purple}{34.2} & \textcolor{purple}{32.0} & \textcolor{purple}{34.4} & \textcolor{red3}{28.5} & \textcolor{purple}{36.8} & -- & \textcolor{blue}{40.1} & \textcolor{red3}{22.2} & \textcolor{purple}{38.1} & \textcolor{purple}{31.6} & \textcolor{purple}{31.6} & \textcolor{red3}{29.3} & \textcolor{purple}{31.8} & \textcolor{purple}{35.3} & \textcolor{purple}{35.3}\\
  477. LV & \textcolor{green2}{54.0} & \textcolor{red3}{29.1} & \textcolor{purple}{35.0} & \textcolor{purple}{37.8} & \textcolor{purple}{38.5} & \textcolor{red3}{29.7} & \textcolor{red2}{8.0} & \textcolor{purple}{34.2} & \textcolor{purple}{32.4} & \textcolor{purple}{35.6} & \textcolor{red3}{29.3} & \textcolor{purple}{38.9} & \textcolor{purple}{38.4} & -- & \textcolor{red3}{23.3} & \textcolor{blue}{41.5} & \textcolor{purple}{34.4} & \textcolor{purple}{39.6} & \textcolor{purple}{31.0} & \textcolor{purple}{33.3} & \textcolor{purple}{37.1} & \textcolor{purple}{38.0}\\
  478. MT & \textcolor{green}{72.1} & \textcolor{purple}{32.2} & \textcolor{purple}{37.2} & \textcolor{purple}{37.9} & \textcolor{purple}{38.9} & \textcolor{purple}{33.7} & \textcolor{blue}{48.7} & \textcolor{red3}{26.9} & \textcolor{red3}{25.8} & \textcolor{blue}{42.4} & \textcolor{red3}{22.4} & \textcolor{blue}{43.7} & \textcolor{purple}{30.2} & \textcolor{purple}{33.2} & -- & \textcolor{blue}{44.0} & \textcolor{purple}{37.1} & \textcolor{blue}{45.9} & \textcolor{purple}{38.9} & \textcolor{purple}{35.8} & \textcolor{blue}{40.0} & \textcolor{blue}{41.6}\\
  479. NL & \textcolor{green2}{56.9} & \textcolor{red3}{29.3} & \textcolor{blue}{46.9} & \textcolor{purple}{37.0} & \textcolor{blue}{45.4} & \textcolor{purple}{35.3} & \textcolor{blue}{49.7} & \textcolor{red3}{27.5} & \textcolor{red3}{29.8} & \textcolor{blue}{43.4} & \textcolor{red3}{25.3} & \textcolor{blue}{44.5} & \textcolor{red3}{28.6} & \textcolor{purple}{31.7} & \textcolor{red3}{22.0} & -- & \textcolor{purple}{32.0} & \textcolor{blue}{47.7} & \textcolor{purple}{33.0} & \textcolor{purple}{30.1} & \textcolor{purple}{34.6} & \textcolor{blue}{43.6}\\
  480. PL & \textcolor{green}{60.8} & \textcolor{purple}{31.5} & \textcolor{blue}{40.2} & \textcolor{blue}{44.2} & \textcolor{blue}{42.1} & \textcolor{purple}{34.2} & \textcolor{blue}{46.2} & \textcolor{red3}{29.2} & \textcolor{red3}{29.0} & \textcolor{blue}{40.0} & \textcolor{red3}{24.5} & \textcolor{blue}{43.2} & \textcolor{purple}{33.2} & \textcolor{purple}{35.6} & \textcolor{red3}{27.9} & \textcolor{blue}{44.8} & -- & \textcolor{blue}{44.1} & \textcolor{purple}{38.2} & \textcolor{purple}{38.2} & \textcolor{purple}{39.8} & \textcolor{blue}{42.1}\\
  481. PT & \textcolor{green}{60.7} & \textcolor{purple}{31.4} & \textcolor{blue}{42.9} & \textcolor{purple}{38.4} & \textcolor{blue}{42.8} & \textcolor{blue}{40.2} & \textcolor{green}{60.7} & \textcolor{red3}{26.4} & \textcolor{red3}{29.2} & \textcolor{green2}{53.2} & \textcolor{red3}{23.8} & \textcolor{green2}{52.8} & \textcolor{red3}{28.0} & \textcolor{purple}{31.5} & \textcolor{red3}{24.8} & \textcolor{blue}{49.3} & \textcolor{purple}{34.5} & -- & \textcolor{purple}{39.4} & \textcolor{purple}{32.1} & \textcolor{purple}{34.4} & \textcolor{blue}{43.9}\\
  482. RO & \textcolor{green}{60.8} & \textcolor{purple}{33.1} & \textcolor{purple}{38.5} & \textcolor{purple}{37.8} & \textcolor{blue}{40.3} & \textcolor{purple}{35.6} & \textcolor{green2}{50.4} & \textcolor{red3}{24.6} & \textcolor{red3}{26.2} & \textcolor{blue}{46.5} & \textcolor{red3}{25.0} & \textcolor{blue}{44.8} & \textcolor{red3}{28.4} & \textcolor{red3}{29.9} & \textcolor{red3}{28.7} & \textcolor{blue}{43.0} & \textcolor{purple}{35.8} & \textcolor{blue}{48.5} & -- & \textcolor{purple}{31.5} & \textcolor{purple}{35.1} & \textcolor{purple}{39.4}\\
  483. SK & \textcolor{green}{60.8} & \textcolor{purple}{32.6} & \textcolor{purple}{39.4} & \textcolor{blue}{48.1} & \textcolor{blue}{41.0} & \textcolor{purple}{33.3} & \textcolor{blue}{46.2} & \textcolor{red3}{29.8} & \textcolor{red3}{28.4} & \textcolor{purple}{39.4} & \textcolor{red3}{27.4} & \textcolor{blue}{41.8} & \textcolor{purple}{33.8} & \textcolor{purple}{36.7} & \textcolor{red3}{28.5} & \textcolor{blue}{44.4} & \textcolor{purple}{39.0} & \textcolor{blue}{43.3} & \textcolor{purple}{35.3} & -- & \textcolor{blue}{42.6} & \textcolor{blue}{41.8}\\
  484. SL & \textcolor{green}{61.0} & \textcolor{purple}{33.1} & \textcolor{purple}{37.9} & \textcolor{blue}{43.5} & \textcolor{blue}{42.6} & \textcolor{purple}{34.0} & \textcolor{blue}{47.0} & \textcolor{purple}{31.1} & \textcolor{red3}{28.8} & \textcolor{purple}{38.2} & \textcolor{red3}{25.7} & \textcolor{blue}{42.3} & \textcolor{purple}{34.6} & \textcolor{purple}{37.3} & \textcolor{purple}{30.0} & \textcolor{blue}{45.9} & \textcolor{purple}{38.2} & \textcolor{blue}{44.1} & \textcolor{purple}{35.8} & \textcolor{purple}{38.9} & -- & \textcolor{blue}{42.7}\\
  485. SV & \textcolor{green2}{58.5} & \textcolor{red3}{26.9} & \textcolor{blue}{41.0} & \textcolor{purple}{35.6} & \textcolor{blue}{46.6} & \textcolor{purple}{33.3} & \textcolor{blue}{46.6} & \textcolor{red3}{27.4} & \textcolor{purple}{30.9} & \textcolor{purple}{38.9} & \textcolor{red3}{22.7} & \textcolor{blue}{42.0} & \textcolor{red3}{28.2} & \textcolor{purple}{31.0} & \textcolor{red3}{23.7} & \textcolor{blue}{45.6} & \textcolor{purple}{32.2} & \textcolor{blue}{44.2} & \textcolor{purple}{32.7} & \textcolor{purple}{31.3} & \textcolor{purple}{33.5} & --\\
  486. \end{tabular}
  487. \caption{Traduzzjoni awtomatika bejn 22 lingwa uffiċjali tal-UE -- \textcolor{grey1}{Machine translation between 22 EU-languages \cite{euro1}}}
  488. \label{fig:euromatrix_mt}
  489. \end{figure*}
  490. \subsection{Oqsma oħra tal-applikazzjoni}
  491. Il-bini ta applikazzjonijiet tat-teknoloġija lingwistika jinvolvi firxa ta kompiti sekondarji li mhux dejjem jidhru fuq livell ta interazzjoni mal-utent, iżda jipprovdu funzjonalitajiet ta servizz sinifikanti ``wara l-kwinti'' tas-sistema. Għaldaqstant, dawn jikkostitwixxu kwistjonijiet importanti ta riċerka li saru dixxiplini sekondarji individwali tal-Lingwistika Kompjutazzjonali fl-akkademja.
  492. \boxtext{Applikazzjonijiet tat-teknoloġija lingwistika spiss jipprovdu funzjonalitajiet ta servizz sinifikanti ``wara l-kwinti'' tas-sistema ta' softwer ikbar.}
  493. It-tweġib ta mistoqsijiet sar qasam attiv ta riċerka, li għalih inbnew il-\textbf{korpora} annotati u bdew kompetizzjonijiet xjentifiċi. L-idea hija li wieħed jimxi minn tfittxija bbażata fuq kelma ewlenija (li għalih il-magna tirrispondi bġabra sħiħa ta dokumenti potenzjalment rilevanti) għal xenarju fejn l-utent jistaqsi mistoqsija konkreta u s-sistema tipprovdi tweġiba waħda:
  494. \begin{itemize}
  495. \item[] \textit{Mistoqsija: F liema età Neil Armstrong għamel l-ewwel pass fuq il-qamar?'}
  496. \item[] \textit{Tweġiba: 38.}
  497. \end{itemize}
  498. Filwaqt li dan huwa ovvjament relatat ma Tiftix fuq il-web tal-qasam ewlieni msemmi qabel, it-tweġib tal-mistoqsijiet, illum huwa primarjament terminu ġenerali għal mistoqsijiet ta riċerka bħal liema tipi ta mistoqsijiet għandhom ikunu distinti u kif dawn għandhom ikunu trattati, kif sett ta dokumenti li potenzjalment fih ir-risposta jista jiġi analizzat u mqabbel (dawn jagħtu tweġibiet konfliġġenti?), u kif tista l-informazzjoni speċifika -- it-tweġiba -- tittieħed bmod affidabbli minn dokument, mingħajr ma jiġi injorat il-kuntest.
  499. Dan huwa min-naħa l-oħra marbut mal-kompitu tal-estrazzjoni ta informazzjoni (EI), qasam li kien ferm popolari u influwenti fil-mument tal-bidla statistika għal-Lingwistika Kompjutazzjonali, fil-bidu tad-disgħinijiet. EI timmira li tidentifika biċċiet speċifiċi ta informazzjoni fi klassijiet speċifiċi ta dokumenti; dan jista jkun .~s-sejbien tal-atturi ewlenin fit-teħid ta kontroll ta kumpaniji kif irrappurtat fi stejjer fil-gazzetti. Xenarju ieħor li nħadem fuqu huwa rapporti dwar inċidenti terroristiċi, fejn il-problema hija li tqabbel it-test ma mudell li jispeċifika min hu l-awtur, il-mira, il-ħin u l-post, u r-riżultati tal-inċident. Il-mili tal-mudell permezz ta dominju speċifiku hija l-karatteristika ċentrali tal-EI, li għal din ir-raġuni huwa eżempju ieħor ta teknoloġija ``wara l-kwinti'' li tikkostitwixxi qasam ta riċerka demarkata sew, iżda għal skopijiet prattiċi mbagħad jeħtieġ li tkun inkorporata fambjent ta applikazzjoni xierqa.
  500. Żewġ oqsma ``borderline'', li xi drabi jkollhom ir-rwol ta applikazzjoni waħedha u xi kultant dak ta komponent appoġġjat, ``taħt il-kappa'' huma sommarju ta testi u \textbf{ġenerazzjoni ta testi}. Sommarju, ovvjament, jirreferi għall-kompitu ta taqsir ta test twil, u jiġi offrut pereżempju bħala funzjonalità fi ħdan l-MS Word. Dan jaħdem aktar fuq bażi ta statistika, billi l-ewwel jidentifika l-kliem ``importanti'' ftest (jiġifieri, pereżempju, kliem li huma frekwenti ferm f'dan it-test iżda sostanzjalment anqas frekwenti fl-użu ġenerali tal-lingwa) u mbagħad jiddetermina dawk is-sentenzi li jinkludu ħafna kliem importanti. Dawn is-sentenzi mbagħad jiġu mmarkati fid-dokument, jew estratti minnu, u jiġu meħuda biex isir is-sommarju. F’dan ix-xenarju, li huwa bil-bosta l-aktar wieħed popolari, sommarju huwa ugwali għal estrazzjoni ta’ sentenzi: it-test jiġi trattat bħala sett sekondarju tas-sentenzi tiegħu. Is-sistemi ta’ sommarji kummerċjali kollha jagħmlu użu minn din l-idea. Metodu alternattiv, li għalih hija ddedikata ċerta riċerka, huwa li sentenzi \emph{ġodda} jiġu attwalment sintetizzati, jiġifieri, jinbena sommarju ta’ sentenzi li m’hemmx għalfejn jidhru f’dik il-forma fit-test sors. Dan jeħtieġ ċertu ammont ta’ fehim aktar profond tat-test u għaldaqstant huwa ħafna anqas b'saħħtu. Kollox ma kollox, ġeneratur ta test huwa fħafna każijiet mhux applikazzjoni li tista toqgħod waħedha iżda huwa inkorporat fambjent ta software akbar, bħal fis-sistema ta informazzjoni kliniku fejn dejta tal-pazjent hija miġbura, maħżuna u pproċessata, u l-ġenerazzjoni ta rapporti hija biss waħda minn ħafna funzjonalitajiet.
  501. \subsection{Programmi tal-Edukazzjoni}
  502. It-teknoloġija lingwistika huwa qasam ferm interdixxiplinarju, li jinvolvi l-kompetenza ta lingwisti, xjenzjati tal-kompjuter, matematiċi, filosofi, psikolingwisti, u newroxjentisti, fost oħrajn.
  503. F'Malta l-maġġoranza l-kbira ta’ riċerka u edukazzjoni fit-TL twettqet fl-Università ta’ Malta. Madankollu, din kienet stabbilita pjuttost tard. Raġuni waħda għal dan kienet id-dehra tard tax-Xjenza tal-Kompjuter bħala suġġett kurrikulari fl-Università. It-tmexxija politika turbulenti tal-pajjiż matul l-1970 u l-1980 ma pprevedietx ir-rivoluzzjoni fl-informatika li kellha sseħħ u kien biss fil-bidu tad-disgħinijiet li ġiet offruta għażla ta’ kors universitarju permezz tal-Fakultà tax-Xjenza tal-Kompjuter mal-Matematika.
  504. L-għeruq tal-istess bidla seħħew fl-1994, meta twettqet inizjattiva strateġika nazzjonali li tirrikonoxxi u ssaħħaħ ir-rwol tal-TI fis-setturi kummerċjali, politiċi, u fuq kollox, dawk edukattivi. Waħda mill-konsegwenzi immedjati ta dan kienet l-introduzzjoni ta programm sostanzjali ta erba snin ta Baċċelerat -- il-BSc.~IT (Hons) -- fl-Università kif ukoll it-twaqqif ta Dipartiment ġdid tax-Xjenza tal-Kompjuter u Intelliġenza Artifiċjali (CSAI, ingħata isem mill-ġdid ``Department of Intelligent Computer Systems (ICS)'' fl-2009). Kors fl-NLP kien inkluż bħala għażla avvanzata, u dan wassal, erba snin wara, għal serje ta proġetti bħala parti mill-kors għall-ewwel grad fl-aħħar sena tiegħu li trattaw kwistjonijiet tal-ipproċessar tal-lingwa inklużi metodi ta' kompjutazzjoni għall-Malti \cite{Galea:1999, Mangion:1999, Farrugia:1999, Farrugia:2000, Mizzi:2000, Bajada:2004, Attard:2005, Farrugia:2008, Farrugia:2009, Vella:2010}. Id-Dipartiment tal-Inġinerija tal-Komunikazzjonijiet u l-Kompjuters ħa sehem ukoll fil-programm, u dan wassal għal sett ieħor ta’ proġetti għall-ewwel grad fit-teknoloġija tat-taħdit.
  505. Influwenza oħra importanti fuq ir-riċerka hija L-Istitut tal-Lingwistika tal-Università (IOL), imwaqqaf fl-1988 bil-għan li jgħallem kif ukoll jippromwovi u jikkoordina r-riċerka kemm fil-Lingwistika Ġenerali kif ukoll Applikata, imexxi l quddiem ir-riċerka li tinvolvi d-deskrizzjoni ta lingwi partikolari, mhux l-anqas il-Malti, irawwem l-istudju ta oqsma sekondarji diversi tal-lingwistika, u jippromwovi riċerka interdixxiplinarja li tinvolvi l-akkademiċi fkooperazzjoni prattika li tgħaddi bejn konfini dipartimentali u fakultajiet barra mill-pajjiż. L-Istitut tal-Lingwistika jmexxi żewġ programmi għall-ewwel grad: B.A.~fil-Lingwistika Ġenerali u l-B.Sc.~l-ġdid fit-Teknoloġija Lingwistika Umana li se jkun offrut minn Ottubru 2011. Huwa wkoll possibbli li wieħed jagħmel Masters u Dottorat fil-Lingwistika mal-Istitut.
  506. Fl-1997, grupp interdixxiplinarju ta xjenzjati tal-kompjuter u lingwisti (M.~Rosner, R.~Fabri, J.~Caruana, M.~Montebello u oħrajn) bdew jaħdmu fuq il-Maltilex, proġett biex jinħoloq dizzjunarju kompjutazzjonali, li kien sostnut minn għotja żgħira mill-Università appoġġjata mill-Mid-Med Bank. Interface sempliċi fuq l-internet kien żviluppat biex jippermetti l-ħolqien u ż-żamma ta daħliet, kif irrappurtat f \cite{Rosner-et-al:1998} fl-ewwel Grupp ta Ħidma tal-ACL dwar l-Approċċi Kompjutazzjonali għal Lingwi Semitiċi \cite{Rosner-et-al:1998}.
  507. Eluf ta daħliet bħal dawn saru bmod manwali, iżda l-proġett iltaqa ma problemi legali, billi l-kumpilazzjoni tad-daħliet kienet fil-biċċa l-kbira mnebbħa mid-dizzjunarju fforma ta ktieb ta Joseph Aquilina \cite{Aquilina:1987,Aquilina:1990}.
  508. L-isforz imbagħad ġie trasferit minn dizzjunarji fforma ta ktieb għal teħid ta daħliet lessikali minn sorsi oħra. Żewġ teżijiet \cite{Dalli:2001, Attard:2005} użaw teknika bbażata fuq allinjament meħuda mill-bijoinformatika sabiex jiġbru flimkien daħliet lessikali u dan kien użat bħala mezz ta strutturar tal-lessiku bmod awtomatiku.
  509. Minkejja n-nuqqas ta finanzjament, l-isforz tal-Maltilex issokta bmod kemxejn frammentat, appoġġjat mill-istaff tal-IOL u d-Dipartiment tas-CSAI. Ma kienx qabel l-2005 li l-Kunsill Malti għax-Xjenza u t-Teknoloġija (MCST) nieda l-ewwel Inizjattiva tar-Riċerka u l-Iżvilupp Teknoloġiku tal-pajjiż u proposta konġunta għas-Server għar-Riżorsi Lingwistiċi bil-Malti (MLRS) kienet aċċettata, sakemm ikun hemm appoġġ finanzjarju suffiċjenti biex jimpjegaw riċerkatur full time bejn l-2006 u l-2008. Il-proġett kellu żewġ miri kemm li joħloq dizzjunarju kif ukoll korpus \cite{Rosner:2009}, u stabbilixxa l-pedamenti għas-server tal-MLRS preżenti.
  510. Ir-riċerka msemmija hawn fuq tittratta prinċipalment mal-lingwa miktuba. Żewġ fergħat tax-xogħol relatat mat-taħdit ukoll qegħdin jitwettqu.
  511. L-ewwel waħda, mibdija minn tradizzjoni ta pproċessar tas-sinjali fi ħdan il-Fakultà tal-Inġinerija, ħolqot prototip ta sintetizzatur għat-taħdit \cite{Micallef:1997}. Ix-xogħol tiegħu influwenza diversi proġetti oħra mmirati biex itejbu s-sinteżi tat-taħdit minn perspettiva baxxa ta riżorsi inklużi \cite{Calleja:2002, Farrugia:2005, Camilleri:2010, Borg-et-al:2011}.
  512. It-tieni, tittratta l-kwistjoni tal-intonazzjoni \cite{Vella:2009} minn perspettiva lingwistika. Xi xogħol pijunier biex jinħoloq korpus u qafas deskrittiv għall-istudju tal-intonazzjoni tal-Malti sar minn Vella u Farrugia \cite{Vella-Farrugia:2006}.
  513. Barra minn Malta, żewġ gruppi ta riċerka li qegħdin fkollaborazzjoni attiva ma sforzi lokali mmirati lejn it-TL jistħoqqilhom aċċenn speċjali.
  514. Fl-Università ta Arizona, grupp immexxi mil-lingwista Adam Ussishkin huwa partikolarment interessat fil-kwistjonijiet psikolingwistiċi li jappartjenu għal-lingwi semitiċi inkluż il-Malti. Sabiex jiġu studjati dawn il-kwistjonijiet sar disponibbli korpus online \cite{Ussishkin-et-al:2009}.
  515. Fl-Università ta Bremen, il-Professur Thomas Stolz kien involut bmod attiv fl-istudju akkademiku tal-Malti iżda huwa partikolarment magħruf talli ospita l-ewwel konferenza dwar il-Lingwistika Maltija fi Bremen \cite{Comrie-et-al:2009}, waqqaf ġurnal \cite{GHILM2} u l-Għaqda Internazzjonali tal-Lingwistika Maltija, ibbażata wkoll fi Bremen, li taħdem flimkien mal-Kunsill għall-Ilsien Malti ibbażat fMalta.
  516. Kif diġà ssemma, il-komunitajiet sensittivi għat-TL li hemm fl-Università ta Malta jinsabu prinċipalment fi ħdan il-Fakultà tal-ICT, l-Istitut tal-Lingwistika. Hemm ukoll interess potenzjali fil-Fakultà tal-Arti (Dipartiment tal-Malti) u suġġetti oħra Umanistiċi għalkemm sa issa hemm it-tendenza li l-lingwistika kompjutazzjonali titqies bħala suġġett eżotiku li jinsab ffakultajiet aktar xjentifiċi bħax-xjenza tal-kompjuter jew l-istudji umanistiċi u, għalhekk, it-temi ta riċerka li ġew trattati jikkoinċidu bmod parzjali biss.
  517. Ħaġa kurjuża, Malta mhijiex nieqsa minn avvenimenti internazzjonali relatati mat-TL. L-LREC 2010 saret fil-Belt Valletta, u ġibdet 1200 parteċipant. Il-konferenza annwali tal-EAMT seħħet ukoll fMalta fl-1994, u saru wkoll numru ta workships iżgħar matul dawn l-aħħar 10 snin.
  518. \subsection{Programmi u Sforzi Nazzjonali}
  519. Malta ssieħbet fl-UE fl-2004 u dan l-avveniment immedjatament ta lill-Malti l-istatus ta lingwa uffiċjali tal-UE. Flimkien ma dan l-istatus inħolqu obbligi ġodda bmod partikolari biex jiġu tradotti kwantitajiet kbar ta dokumenti uffiċjali, u barra minn hekk, ir-rikonoxximent, fuq livell Ewropew, li bħala lingwa nazzjonali, għandu jkollu status tal-``ewwel klassi'' minn perspettiva teknoloġika kif ukoll soċjali, u jingħata d-drittijiet u privileġġi kollha li jgawdu l-``akbar'' lingwi Ewropej (jiġifieri li għandhom numri akbar ta kelliema nattivi).
  520. L-Istrateġija Nazzjonali tat-TI 2008-10 tal-gvern inkludiet numru ta għanijiet marbutin mal-Lingwa Maltija inkluż (i) l-iżvilupp tal-gvern fuq l-internet bil-Malti, (ii) il-ħolqien ta' għodod għal-lingwa Maltija, b’kollaborazzjoni mal-Università, u (iii) appoġġ għal komunitajiet fuq l-internet bil-Malti. Waqt li qed jinkiteb dan fl-2011, mhux l-għanijiet kollha ntlaħqu. Madankollu l-effetti fit-tul ta’ din l-istrateġija qed jibdew jieħdu forma.
  521. Bħalissa x-xena tat-teknoloġija lingwistika f'Malta tinsab taħt l-influwenza ta’ erba’ inizjattivi prinċipali:
  522. \begin{enumerate}
  523. \item L-ewwel nett, proġett appoġġjat mill-gvern parzjalment iffinanzjat mill-fond ta żvilupp reġjonali tal-UE qiegħed fil-proċess li jwassal li t-teknoloġija tat-taħdit tkun tista tintlaħaq minn persuni bdiżabiltà. Il-proġett bħalissa qed jiffoka fuq sinteżi tat-taħdit bil-Malti, u f'dan il-punt il-mudelli tal-lingwa rilevanti qegħdin fil-proċess li jiġu żviluppati. Il-konsorzju, li jikkonsisti f’SME (Crimson Wing Ltd), fondazzjoni (FITA, Fundazzjoni għall-Aċċess tat-TI), u l-Università, wiegħed li dawn ir-riżorsi se jkunu disponibbli għal skopijiet ta’ riċerka. Wieħed għad irid jara jekk il-komponenti tas-sintetizzatur tat-taħdit se jkun disponibbli għan-networks li jqassmu r-riżorsi ispirati minn CLARIN u META.
  524. \item It-tieni nett, kif jirriżulta mir-rapport kurrenti, Malta qed tipparteċipa fil-METANET4U u għalhekk tirċievi finanzjament sinifikanti mill-UE mmirat lejn it-tisħiħ u d-distribuzzjoni ta riżorsi u għodod li huma speċifikament għall-Malti. L-Università ta Malta hija membru tal-META-NET u l-ħsieb huwa li twettaq l-obbligi tagħha lejn l-għanijiet tal-META, partikolarment rigward l-identifikazzjoni tal-partijiet interessati, attwali u potenzjali.
  525. \item It-tielet, is-Server għar-Riżorsi Lingwistiċi bil-Malti (MLRS) \cite{Rosner-et-al:2006, MLRS1} qed jagħti l-frott u sforzi sinifikanti għadhom għaddejjin fl-Università, permezz tal-Istitut tal-Lingwistika (A.~Gatt, C.~Borg, R.~Fabri) u d-Dipartiment ta Sistemi Intelliġenti tal-Kompjuter (M.~Rosner), li jsostnu u jiżviluppaw dan. Bħalissa l-MLRS huwa online fuq \url{http://mlrs.research.um.edu.mt}. Il-korpus jinkludi madwar 100M kelma, u s-sistema tinkludi xi servizzi bażiċi li jinkludu tiftix KWIC u stampi, tfittxija skont il-mudelli, diversi tipi ta’ analiżi statistika eċċ. Bħalissa hemm aktar għodod ippjanati inkluż tagger għall-kategoriji tal-kliem u ċekkjatur ortografiku.
  526. \item Fl-aħħarnett, programm ġdid għall-ewwel grad fit-Teknoloġija tal-Lingwa Umana għandu jiġi mniedi mill-Istitut tal-Lingwistika fOttubru 2011. Dan se jkopri firxa sħiħa ta suġġetti u inevitabilment se jħalli impatt pożittiv fit-tul fuq l-istudju tal-Malti minn perspettiva kompjutazzjonali.
  527. \end{enumerate}
  528. Minbarra dawn, proġett biex tiġi żviluppata verżjoni elettronika tad-dizzjunarju ta Aquilina \cite{Aquilina:1987,Aquilina:1990} qed titħejja bħalissa. Dan huwa sforz kollaborattiv bejn l-Università ta Malta li qed tforni l-kompetenza lingwistika, l-Università ta Arizona, li diġà ddiġitalizzaw id-dizzjunarju fforma li tinqara minn kompjuter, u l-pubblikaturi Midsea Books Valletta. L-għanijiet doppji tal-proġett huma li jiġi aġġornat il-kontenut, u biex jagħtu lir-riċerkaturi l-flessibilità biex jaċċessaw it-test malajr. Għaddej sforz lokalment, sabiex jiġi organizzat livell tajjeb ta kompetenza lessikografika meħtieġa għall-aġġornament tal-kontenut.
  529. Għandna wkoll insemmu r-relazzjoni ta Malta mal-CLARIN, proposta ta infrastruttura ta riċerka tal-UE li tindirizza l-provvista ta riżorsi lingwistiċi għax-Xjenzi Umanistiċi u Soċjali. Matul il-fażi ta speċifikazzjoni, l-Università setgħet tipparteċipa bis-saħħa ta għotja żgħira ta appoġġ mill-Kunsill lokali għax-Xjenza u t-Teknoloġija. Madankollu, l-isfida biex jiġi żgurat il-finanzjament fit-tul meħtieġ għall-fażi ta kostruzzjoni tal-CLARIN kienet akbar. L-identifikazzjoni ta entità tal-gvern xierqa biex tieħu r-responsabbiltà għall-programm sissa kienet mingħajr suċċess. Konsegwentement, il-parteċipazzjoni futura ta Malta fil-fażi ta kostruzzjoni sissa għadha mhix deċiża.
  530. \subsection{Disponibbiltà ta Għodod u Riżorsi}
  531. Il-figura~\ref{fig:lrlttable_mt} tipprovdi ħarsa ġenerali lejn is-sitwazzjoni kurrenti tal-appoġġ tat-teknoloġija lingwistika għall-Malti. Il-klassifikazzjoni ta teknoloġiji eżistenti u riżorsi hija bbażata fuq stimi studjati minn esperti ewlenin diversi skont seba' kriterji, kull waħda tvarja minn 0 (baxxa ħafna) sa 6 (għolja ħafna).
  532. \begin{figure*}[htb]
  533. \centering
  534. \begin{tabular}{>{\columncolor{orange1}}p{.33\linewidth}@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c}
  535. \rowcolor{orange1}
  536. \cellcolor{white}&\begin{sideways}\makecell[l]{Kwantit\`{a}}\end{sideways}
  537. &\begin{sideways}\makecell[l]{\makecell[l]{Disponibilit\`{a}~~~} }\end{sideways} &\begin{sideways}\makecell[l]{Kwalit\`{a}}\end{sideways}
  538. &\begin{sideways}\makecell[l]{Kopertura}\end{sideways} &\begin{sideways}\makecell[l]{Maturit\`{a}}\end{sideways} &\begin{sideways}\makecell[l]{Sostenibilit\`{a}}\end{sideways} &\begin{sideways}\makecell[l]{Adattabilit\`{a}~~}\end{sideways} \\ \addlinespace
  539. \multicolumn{8}{>{\columncolor{orange2}}l}{Teknoloġija Lingwistika (Għodod, Teknoloġiji, Applikazzjonijiet)} \\\addlinespace
  540. Identifikazzjoni ta taħdit &0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 \\ \addlinespace
  541. Sinteżi ta taħdit &2.4 & 0.8 & 3.2 & 3.2 & 2.4 & 2.4 & 2.4\\ \addlinespace
  542. Analiżi grammatikali &0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8\\ \addlinespace
  543. Analiżi semantika &0& 0& 0& 0& 0& 0& 0\\ \addlinespace
  544. Ġenerazzjoni ta testi &0& 0& 0& 0& 0& 0&0\\ \addlinespace
  545. Traduzzjoni awtomatika &1.6 &1.6 & 1.6 & 1.6 & 1.6 & 1.6 & 1.6\\ \addlinespace
  546. \multicolumn{8}{>{\columncolor{orange2}}l}{Riżorsi Lingwistiċi (Riżorsi, Dejta, Bażijiet ta Għarfien)} \\\addlinespace
  547. Korpora ta testi &3.2 &3.2 &2.4 &2.4 &2.4 &3.2 &3.2\\ \addlinespace
  548. Korpora ta taħdit &2.4 &0.8 &2.4 &1.6 &2.4 &2.4 &2.4\\ \addlinespace
  549. Korpora Paralleli &3.2& 3.2& 2.4& 1.6& 1.6& 1.6& 1.6\\ \addlinespace
  550. Riżorsi lessikali &2.4&2.4 &1.6 &2.4 &2.4 &2.4 &2.4\\ \addlinespace
  551. Grammatiċi &0& 0& 0&0 &0 &0 &0\\
  552. \end{tabular}
  553. \caption{L-istat tal-appoġġ tat-teknoloġija tal-lingwa għall-Malti}
  554. \label{fig:lrlttable_mt}
  555. \end{figure*}
  556. Għall-Malti, il-karatteristiċi l-aktar evidenti li ħarġu mill-figura huma li
  557. \begin{itemize}
  558. \item l-biċċa l-kbira tad-daħliet huma vojta, u
  559. \item l-ogħla grad li ntlaħaq huwa 3.2.
  560. \end{itemize}
  561. Il-fatt li d-daħliet huma kważi kollha vojta jirrifletti l-istat immatur ta riċerka u żvilupp marbut mat-TL f'Malta. Għalkemm hemm sinjali li s-sitwazzjoni qed titjieb, l-investiment fit-teknoloġija lingwistika jibqa’ fuq livell baxx, u bħala riżultat, minkejja l-kisbiet lokali modesti, l-isforz huwa frammentat, kemm f’termini ta’ kopertura ta’ oqsma differenti, kif ukoll f'termini ta sostenibbiltà ta riċerka: kien hemm wisq proġetti li jinvolvu qasam wieħed, riċerkatur wieħed biss, u sena jew sentejn biss. L-isforzi kollettivi ma jammontawx għal dak li huwa mixtieq.
  562. Allura xinkiseb? Nistgħu naraw billi nħarsu lejn id-daħliet mhux vojta, li l-medja tal-punteġġ tagħhom jagħti l-ordni li ġejja:
  563. \begin{itemize}
  564. \item Għodod:
  565. \begin{enumerate}
  566. \item Sistema ta tokens, Sinteżi ta Taħdit
  567. \item Identifikazzjoni ta Taħdit
  568. \end{enumerate}
  569. \item Riżorsi:
  570. \begin{enumerate}
  571. \item Korpora ta Referenzi
  572. \item Korpora Paralleli
  573. \item Dizzjunarji, Terminoloġiji (huwa mifhum li dawn jinkludu listi ta kliem)
  574. \item Mudelli lingwistiċi
  575. \end{enumerate}
  576. \end{itemize}
  577. Fir-rigward tal-għodod:
  578. Estrazzjoni ta testi flivell baxx u għodod ta pproċessar huma disponibbli, inkluż fornitur ta tokens. POS-tagger qed jiġi żviluppat, iżda l-prestazzjoni tiegħu mhijiex l-aktar stat modern, sakemm ikun hemm aktar taħriġ bdejta annotata aħjar.
  579. Għodod ta livell ogħla (analiżi sintattika jew semantika, għodod ta klassifikazzjoni, estrazzjoni ta informazzjoni eċċ.) huma kompletament neqsin. Il-konsegwenza hija li, pereżempju, mhemm l-ebda treebanks disponibbli għall-Malti.
  580. Prototip ta għodod ta identifikazzjoni ta taħdit ġew żviluppati fl-Università iżda mhumiex faċilment disponibbli fiż-żmien meta qed jinkiteb dan. Madankollu, il-magna tat-taħdit iffinanzjata mill-gvern imsemmija qabel għandha tipprovdi sintetizzatur tat-taħdit jiffunzjona sal-2013. Filwaqt li dan huwa żvilupp pożittiv ħafna, huwa ffukat ħafna fuq in-naħa tas-sinteżi tat-taħdit. Kważi l-ebda xogħol fuq l-\textbf{identifikazzjoni tat-taħdit} ma huwa ppjanat f'dan l-istadju.
  581. Fir-rigward ta riżorsi, is-sitwazzjoni hija xi ftit aktar strutturata, minħabba li diġà jeżisti l-MLRS, infrastruttura kompjutazzjoni estensiva fil-forma ta server li tipprovdi funzjonalità bażika li tippermetti aċċess fuq il-web għall-korpora disponibbli, xi servizzi, u sistema rudimentali li tiffaċilita l-preżentazzjoni ta kontribuzzjonijiet. L-MLRS bħalissa qed jipprovdi xi servizzi bażiċi ħafna għall-estrazzjoni, rappreżentazzjoni, tiftix u analiżi ta testi.
  582. Il-korpus tal-MLRS eżistenti bħalissa għandu kobor ta madwar 100 miljun tokens. Dan huwa prinċipalment testwali u monolingwali. Huwa wkoll kemxejn mhux rappreżentattiv: hemm abbundanza ta materjal legalistiku, iżda nuqqas ta testi akkademiċi u xogħlijiet fittizji.
  583. F'dan l-istadju, dan il-materjal jista’ biss jiġi mfittex u analizzat permezz tas-server u ma jistax jiġi aċċessat direttament. Ir-raġunijiet huma legalistiċi. B’aċċess ristrett b’dan il-mod, il-kumplikazzjonijiet tal-IPR u d-drittijiet tal-awtur ġew evitati bil-pulit. Il-prezz huwa li dawn il-kumplikazzjonijiet eventwalment se jkollhom jiġu kkonfrontati fil-futur, u fil-fatt META tinsab fil-proċess ta’ formulazzjoni ta’ sett ta’ ftehim ta’ liċenzjar li jgħodd għad-distribuzzjoni tar-riżorsi, bħall-MLRS.
  584. \subsection{Tqabbil ta Trans-Lingwi }
  585. L-istat attwali tat-teknoloġija lingwistika jvarja bmod konsiderabbli minn komunità ta lingwa waħda għal oħra. Sabiex titqabbel is-sitwazzjoni ta bejn il-lingwi, din it-taqsima se tippreżenta evalwazzjoni bbażata fuq żewġ oqsma ta kampjuni tal-applikazzjoni (traduzzjoni bil-magni u pproċessar tad-diskors) u teknoloġija sottostanti (analiżi ta testi), kif ukoll riżorsi bażiċi meħtieġa biex jinbnew applikazzjonijiet għat-teknoloġija lingwistika.
  586. Il-lingwi kienu kategorizzati skont skala b'ħames punti:
  587. \begin{enumerate}
  588. \item Appoġġ eċċellenti
  589. \item Appoġġ tajjeb
  590. \item Appoġġ medju
  591. \item Appoġġ parzjali
  592. \item Appoġġ baxx għal kważi xejn
  593. \end{enumerate}
  594. L-appoġġ tat-teknoloġija lingwistika kien imkejjel skond il-kriterji li ġejjin:
  595. \textbf{Ipproċessar tad-Diskors:} il-kwalità tat-teknoloġiji eżistenti tal-identifikazzjoni tat-taħdit, il-kwalità tat-teknoloġiji eżistenti tas-sinteżi tat-taħdit, il-kopertura ta' dominji, in-numru u d-daqs tal-korpora eżistenti tad-diskors, il-ammont u l-varjetà tal-applikazzjonijiet bbażati fuq it-taħdit li huma disponibbli.
  596. \textbf{Traduzzjoni Awtomatika:} il-kwalità tat-teknoloġiji eżistenti tat-Traduzzjoni Automatika, in-numru tal-pari tal-lingwi koperti, il-kopertura ta' fenomeni u dominji lingwistiċi, il-kwalità u d-daqs tal-korpora paralleli eżistenti, il-ammont u l-varjetà tal-applikazzjonijiet tat-Traduzzjoni Awtomatika li huma disponibbli.
  597. \textbf{Analiżi ta Testi:} il-kwalità u l-kopertura tat-teknoloġiji eżistenti għall-analiżi ta testi (morfoloġija, sintassi, semantika), il-kopertura ta' fenomeni u dominji lingwistiċi, il-ammont u l-varjetà tal-applikazzjonijiet li huma disponibbli, il-kwalità u d-daqs tal-korpora (anotati) eżistenti ta’ testi, il-kwalità u l-kopertura tar-riżorsi lessikali (eż.~WordNet) u tal-grammatiċi li jeżistu.
  598. \textbf{Riżorsi:} il-kwalità u d-daqs tal-korpora ta testi, ta taħdit u korpora paralleli li jeżistu, il-kwalità u l-kopertura tar-riżorsi lessikali u tal-grammatiċi li jeżistu.
  599. Il-figuri~\ref{fig:speech_cluster_mt} sa~\ref{fig:resources_cluster_mt} juru li l-lingwa Maltija għandha biss appoġġ ta Teknoloġija ta Lingwi baxx għal medju u għalhekk titqabbel tajjeb ma lingwi oħrajn li huma mitkellma anqas fl-Ewropa. Jidher bmod ċar li r-riżorsi u l-għodda tat-teknoloġija lingwistika għall-Malti għadhom ma jilħqux il-kwalità u l-kopertura ta riżorsi paragunabbli u tal-għodda għal lingwi maġġuri bħall-Ġermaniż, u ċertament mhux dik ta dawk għal-lingwa Ingliża, li qiegħda fil-vantaġġ fi kważi l-oqsma kollha tat-teknoloġija lingwistika. U għad hemm ħafna aktar vojt fir-riżorsi tal-lingwa Ingliża fir-rigward ta applikazzjonijiet ta kwalità għolja.
  600. \subsection{Konklużjonijiet}
  601. \emph{Fdin is-serje ta \emph{white papers}, għamilna sforz inizjali importanti sabiex nivvalutaw l-appoġġ tat-teknoloġija lingwistika għal 30 lingwa Ewropea, u biex nipprovdu tqabbil ta livell għoli madwar dawn il-lingwi. Billi jiġi identifikat dan il-vojt, il-ħtiġijiet u d-defiċits, il-komunità Ewropea tat-teknoloġija lingwistika u l-partijiet interessati relatati qegħdin issa f pużizzjoni biex ifasslu riċerka fuq skala kbira u programm ta żvilupp immirat lejn il-bini ta Ewropa tassew multilingwali u bbażata fuq it-teknoloġija.}
  602. Rajna li hemm differenzi kbar bejn il-lingwi tal-Ewropa. Filwaqt li hemm softwer u riżorsi ta kwalità tajba disponibbli għal xi lingwi u oqsma ta applikazzjoni, oħrajn (is-soltu lingwi `iżgħar') għandhom vojt sostanzjali. Ħafna lingwi għandhom nuqqas ta’ teknoloġiji bażiċi u r-riżorsi essenzjali biex jiġu żviluppati dawn it-teknoloġiji. Oħrajn għandhom għodda u riżorsi bażiċi imma s’issa għadhom mhux kapaċi jinvestu fl-ipproċessar semantiku. Għalhekk aħna għad neħtieġu li nagħmlu sforz fuq skala kbira sabiex niksbu l-għan ambizzjuż li tiġi pprovduta traduzzjoni bil-magni ta’ kwalità għolja bejn il-lingwi Ewropej kollha.
  603. F’dan ir-rapport, aħna ppruvajna nwasslu l-istat paradossali tat-Teknoloġija tal-Lingwa Maltija. Il-paradoss iqum għaliex hemm sforzi sinifikanti li saru minn numru żgħir ta’ nies kwalifikati sew tul spettru ta’ attivitajiet relatati mat-teknoloġija lingwistika biex jittejeb l-istat tal-arti, kemm jekk dan ikun f’termini ta’ għodda, jew riżorsi, jew it-tnejn. Huwa wkoll ċar li ġol-kuntest aktar wiesgħa ta’ attivitajiet edukattivi, kummerċjali u kulturali fil-pajjiż, hemm post għat-teknoloġija lingwistika biex tagħmel kontribuzzjoni importanti. Il-problema hi li l-isforzi li saru mhumiex koordinati, huma ta’ żmien qasir, u frammentarji, għalhekk il-progress huwa aktar bil-mod milli għandu jkun.
  604. Koordinazzjoni sostnuta u diretta tal-isforzi hija, fl-opinjoni tagħna, l-unika mod li fihom il-benefiċċji tat-teknoloġija lingwistika għall-Malti se tkun realizzata fi żmien raġonevoli. Aħna nemmnu li anke f’pajjiż żgħir daqs Malta, il-ħidma għandha bżonn tinqasam bejn partijiet interessati differenti. Irridu naslu għal pjan direzzjonali fattibbli permezz ta’ verżjoni lokalizzata tad-diviżjoni tripartitika tax-xogħol sostnut minn META: identifikazzoni ta’ komunità b’viżjoni maqsuma; estensjoni ta’ infrastruttura biex jiġi ffaċilitat it-tqassim tar-riżorsi, u t-tisħiħ ta’ konnessjonijiet bejn t-teknoloġija lingwistika u l-oqsma ġirien tar-riċerka u l-iżvilupp.
  605. \begin{figure*}[t]
  606. \small
  607. \centering
  608. \begin{tabular}
  609. {
  610. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  611. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  612. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  613. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  614. >{\columncolor{corange1}}p{.13\linewidth}
  615. }
  616. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  617. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  618. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  619. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  620. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Appoġġ}} \\
  621. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{eċċellenti}} &
  622. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{tajjeb}} &
  623. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{medju}} &
  624. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{parzjali}} &
  625. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{baxx/xejn}} \\ \addlinespace
  626. & \vspace*{0.5mm}Ingliż
  627. & \vspace*{0.5mm}Ġermaniż \newline
  628. Taljan \newline
  629. Finlandiż \newline
  630. Franċiż \newline
  631. Olandiż \newline
  632. Portugiż \newline
  633. Spanjol \newline
  634. Ċek \newline
  635. & \vspace*{0.5mm}Bask \newline
  636. Bulgaru \newline
  637. Daniż \newline
  638. Estonjan \newline
  639. Galizjan \newline
  640. Grieg \newline
  641. Irlandiż \newline
  642. Katalan \newline
  643. Norveġiż \newline
  644. Pollakk \newline
  645. Svediż \newline
  646. Serb \newline
  647. Slovakk \newline
  648. Sloven \newline
  649. Ungeriż \newline
  650. & \vspace*{0.5mm}Islandiż \newline
  651. Kroat \newline
  652. Latvjan \newline
  653. Litwan \newline
  654. \textbf{Malti} \newline
  655. Rumen \\
  656. \end{tabular}
  657. \caption{L-Ipproċessar tad-Diskors: l-istat tal-appoġġ għal 30 lingwa Ewropeja}
  658. \label{fig:speech_cluster_mt}
  659. \end{figure*}
  660. \begin{figure*}[b]
  661. \small
  662. \centering
  663. \begin{tabular}
  664. { % defines color for each column.
  665. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  666. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  667. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  668. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  669. >{\columncolor{corange1}}p{.13\linewidth}
  670. }
  671. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  672. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  673. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  674. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  675. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Appoġġ}} \\
  676. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{eċċellenti}} &
  677. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{tajjeb}} &
  678. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{medju}} &
  679. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{parzjali}} &
  680. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{baxx/xejn}} \\ \addlinespace
  681. & \vspace*{0.5mm}Ingliż
  682. & \vspace*{0.5mm}Franċiż \newline
  683. Spanjol
  684. & \vspace*{0.5mm}Ġermaniż \newline
  685. Taljan \newline
  686. Katalan \newline
  687. Olandiż \newline
  688. Pollakk \newline
  689. Rumen \newline
  690. Ungeriż
  691. & \vspace*{0.5mm}Bask \newline
  692. Bulgaru \newline
  693. Daniż \newline
  694. Estonjan \newline
  695. Finlandiż \newline
  696. Galizjan \newline
  697. Grieg \newline
  698. Irlandiż \newline
  699. Islandiż \newline
  700. Kroat \newline
  701. Latvjan \newline
  702. Litwan \newline
  703. \textbf{Malti} \newline
  704. Norveġiż \newline
  705. Portugiż \newline
  706. Svediż \newline
  707. Serb \newline
  708. Slovakk \newline
  709. Sloven \newline
  710. Ċek \newline
  711. \end{tabular}
  712. \caption{Traduzzjoni bil-magni: l-istat tal-appoġġ għal 30 lingwa Ewropeja}
  713. \label{fig:mt_cluster_mt}
  714. \end{figure*}
  715. \begin{figure*}[tb]
  716. \small
  717. \centering
  718. \begin{tabular}
  719. { % defines color for each column.
  720. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  721. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  722. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  723. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  724. >{\columncolor{corange1}}p{.13\linewidth}
  725. }
  726. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  727. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  728. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  729. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  730. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Appoġġ}} \\
  731. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{eċċellenti}} &
  732. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{tajjeb}} &
  733. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{medju}} &
  734. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{parzjali}} &
  735. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{baxx/xejn}} \\ \addlinespace
  736. & \vspace*{0.5mm}Ingliż
  737. & \vspace*{0.5mm}Ġermaniż \newline
  738. Franċiż \newline
  739. Taljan \newline
  740. Olandiż \newline
  741. Spanjol
  742. & \vspace*{0.5mm}Bask \newline
  743. Bulgaru \newline
  744. Daniż \newline
  745. Finlandiż \newline
  746. Galizjan \newline
  747. Grieg \newline
  748. Katalan \newline
  749. Norveġiż \newline
  750. Pollakk \newline
  751. Portugiż \newline
  752. Rumen \newline
  753. Svediż \newline
  754. Slovakk \newline
  755. Sloven \newline
  756. Ċek \newline
  757. Ungeriż \newline
  758. & \vspace*{0.5mm}Estonjan \newline
  759. Irlandiż \newline
  760. Islandiż \newline
  761. Kroat \newline
  762. Latvjan \newline
  763. Litwan \newline
  764. \textbf{Malti} \newline
  765. Serb \\
  766. \end{tabular}
  767. \caption{Analiżi ta’ Testi: l-istat tal-appoġġ għal 30 lingwa Ewropeja}
  768. \label{fig:text_cluster_mt}
  769. \end{figure*}
  770. \begin{figure*}[tb]
  771. \small
  772. \centering
  773. \begin{tabular}
  774. { % defines color for each column.
  775. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  776. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  777. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  778. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  779. >{\columncolor{corange1}}p{.13\linewidth}
  780. }
  781. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  782. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  783. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  784. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Appoġġ}} &
  785. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Appoġġ}} \\
  786. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{eċċellenti}} &
  787. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{tajjeb}} &
  788. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{medju}} &
  789. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{parzjali}} &
  790. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{baxx/xejn}} \\ \addlinespace
  791. & \vspace*{0.5mm}Ingliż
  792. & \vspace*{0.5mm}Ġermaniż \newline
  793. Franċiż \newline
  794. Taljan \newline
  795. Olandiż \newline
  796. Pollakk \newline
  797. Svediż \newline
  798. Spanjol \newline
  799. Ċek\newline
  800. Ungeriż
  801. & \vspace*{0.5mm} Bask \newline
  802. Bulgaru \newline
  803. Daniż \newline
  804. Estonjan \newline
  805. Finlandiż \newline
  806. Galizjan \newline
  807. Grieg \newline
  808. Katalan \newline
  809. Kroat \newline
  810. Norveġiż \newline
  811. Portugiż \newline
  812. Rumen \newline
  813. Serb \newline
  814. Slovakk \newline
  815. Sloven \newline
  816. & \vspace*{0.5mm} Irlandiż \newline
  817. Islandiż \newline
  818. Latvjan \newline
  819. Litwan \newline
  820. \textbf{Malti} \\
  821. \end{tabular}
  822. \caption{Riżorsi tal-lingwi u tat-testi: l-istat tal-appoġġ għal 30 lingwa Ewropeja}
  823. \label{fig:resources_cluster_mt}
  824. \end{figure*}
  825. \end{multicols}
  826. \clearpage
  827. % --------------------------------------------------------------------------
  828. \ssection[Dwar META-NET]{Dwar META-NET}
  829. \begin{multicols}{2}
  830. \textbf{META-NET} huwa Network ta’ Eċċellenza ffinanzjat mill-Kummissjoni Ewropea. In-network bħalissa jikkonsisti f’54 membru minn 33 pajjiż Ewropew \cite{rehm2011}. META-NET irawwem Alleanza Ewropea ta’ Teknoloġija Multilingwi (META), komunità dejjem tikber ta’ professjonisti u organizzazzjonijiet tat-teknoloġija lingwistika fl-Ewropa. META-NET irawwem is-sisien teknoloġiċi għall-istabbiliment u ż-żamma ta’ soċjetà tal-informazzjoni Ewropea tassew multilingwi:
  831. \begin{itemize}
  832. \item jagħmel il-komunikazzjoni u l-kooperazzjoni bejn il-lingwi possibbli;
  833. \item jipprovdi aċċess ugwali għall-informazzjoni u l-għarfien fi kwalunkwe lingwa;
  834. \item joffri teknoloġija tal-informatika f’network avvanzat u affordabbli għaċ-ċittadini Ewropej.
  835. \end{itemize}
  836. In-network jappoġġja Ewropa li tingħaqad bħala suq diġitali u spazju ta’ informazzjoni uniku. Jistimula u jippromwovi teknoloġiji multilingwi għal-lingwi Ewropej kollha. It-teknoloġiji jippermettu traduzzjoni awtomatika, produzzjoni tal-kontenut, ipproċessar ta’ informazzjoni u ġestjoni ta’ għarfien għal varjetà wiesgħa ta’ applikazzjonijiet u dominji ta’ suġġetti. Huma jippermettu wkoll tkompli l-iżvilupp ta’ interfaces intuwittivi bbażati fuq il-lingwa għal elettronika tad-dar, makkinarju, vetturi, kompjuters u robots.
  837. Imniedi fl-1 ta’ Frar 2010, META-NET għadu wettaq diversi attivitajiet fit-tliet linji ta’ azzjoni tan-network META-VISION, META-SHARE u META-RESEARCH.
  838. \textbf{META-VISION} trawwem komunità dinamika u influwenti ta’ partijiet interessati, li tingħaqad madwar viżjoni komuni u aġenda ta’ riċerka strateġika komuni (SRA). L-enfasi ewlenija ta’ din l-attività hija li tinbena komunità tat-TL koerenti u koeżiva fl-Ewropa billi jinġabru flimkien rappreżentanti minn gruppi frammentati ħafna u diversi ta' partijiet interessati. Il-White Paper preżenti kien ippreparat flimkien ma’ volumi għal 29 lingwa oħra. Il-viżjoni teknoloġija maqsuma ġiet żviluppata fi tliet Gruppi ta’ Viżjoni settorjali. Il-Kunsill tat-Teknoloġija ta’ META ġie stabbilit biex jiddiskutu u jħejju l-SRA bbażata fuq il-viżjoni b’interazzjoni mal-komunità LT kollha.
  839. \textbf{META-SHARE} toħloq faċilità miftuħa u mqassma għal skambju u tqassim ta’ riżorsi. In-network 'peer to peer' ta’ repożitorji se jinvolvi dejta lingwistika, għodod u servizzi tal-web li huma dokumentati bi kwalità għolja ta’ metadejta u organizzati f’kategoriji standardizzati. Ir-riżorsi jridu jkunu faċilment aċċessibbli u mfittxija b’mod uniformi. Ir-riżorsi disponibbli jinkludu materjal miftuħ u ħieles ta’ sorsi kif ukoll oġġetti ristretti, kummerċjalment disponibbli, ibbażati fuq tariffi.
  840. \textbf{META-RESEARCH} tibni pontijiet għal oqsma marbuta mat-teknoloġija. Din l-attività tfittex li tmexxi l-avvanzi f’oqsma oħra u tikkapitalizza fuq riċerka innovattiva li tista’ tibbenefika mit-teknoloġija lingwistika. B’mod partikolari, din l-attività tiffoka fuq t-twettiq ta’ riċerka fit-traduzzjoni awtomatika, il-ġbir tad-data, it-tħejjija ta’ settijiet ta’ data u l-organizzazzjoni ta’ rizorsi lingwistiċi għal skopijiet ta’ evalwazzjoni; il-kompilazzjoni ta’ inventarji ta’ għodod u metodi; u l-organizzazzjoni ta’ workshops u avvenimenti ta’ taħriġ għall-membri tal-komunità.\\
  841. \centerline{\textbf{office@meta-net.eu -- http://www.meta-net.eu}}
  842. \end{multicols}
  843. \addtocontents{toc}{\protect\clearpage\protect}
  844. \addtocontents{toc}{\protect\thispagestyle{empty}\protect}
  845. \addtocontents{toc}{\protect\vspace*{4mm}\protect}
  846. \addtocontents{toc}{\smallskip{\Large\textsf{\centerline{THE MALTESE LANGUAGE IN THE DIGITAL AGE}}\par}}
  847. \setcounter{section}{0}
  848. \setcounter{figure}{0}
  849. \makeatletter
  850. \@ifundefined{theHsection}{
  851. \let
  852. }
  853. {
  854. \renewcommand*{\theHsection}{\thepart.\thesection}
  855. }
  856. \makeatother
  857. %\part*{\textcolor{white}{English}}
  858. \cleardoublepage
  859. \selectlanguage{english}
  860. \ssection[Executive Summary]{Executive Summary}
  861. \begin{multicols}{2}
  862. During the last 60 years, Europe has become a distinct political and economic structure. Culturally and linguistically it is rich and diverse. However, from Portuguese to Polish and Italian to Icelandic, everyday communication between Europe’s citizens, within business and among politicians is inevitably confronted with language barriers. The EU's institutions spend about a billion euros a year on maintaining their policy of multilingualism, i.\,e., translating texts and interpreting spoken communication. Does this have to be such a burden? Language technology and linguistic research can make a significant contribution to removing the linguistic borders. Combined with intelligent devices and applications, language technology will help Europeans talk and do business together even if they do not speak a common language.
  863. \boxtext{Language technology builds bridges.}
  864. Language barriers can bring business to a halt, especially for SMEs who do not have the financial means to reverse the situation. The only (unthinkable) alternative to this kind of a multilingual Europe would be to allow a single language to take a dominant position, to replace all other languages. One way to overcome the language barrier is to learn foreign languages. Yet without technological support, mastering the 23 official languages of the member states of the European Union and some 60 other European languages is an insurmountable obstacle for Europe’s citizens, economy, political debate, and scientific progress.
  865. The solution is to build key enabling technologies: language technologies will offer European stakeholders tremendous advantages, not only within the common European market, but also in trade relations with non-European countries, especially emerging economies. Language technology solutions will eventually serve as a unique bridge between Europe's languages. An indespensable prerequisite for their development is first to carry out a systematic analysis of the linguistic particularities of all European languages, and the current state of language technology support for them.
  866. The automated translation and speech processing tools currently available on the market fall short of the envisaged goals. The dominant actors in the field are primarily privately-owned for-profit enterprises based in Northern America. As early as the late 1970s, the EU realised the profound relevance of language technology as a driver of European unity, and began funding its first research projects, such as EUROTRA. At the same time, national projects were set up that generated valuable results, but never led to a concerted European effort. In contrast to these highly selective funding efforts, other multilingual societies such as India (22 official languages) and South Africa (11 official languages) have set up long-term national programmes for language research and technology development.
  867. The predominant actors in LT today rely on imprecise statistical approaches that do not make use of deeper linguistic methods and knowledge. For example, sentences are often automatically translated by comparing each new sentence against thousands of sentences previously translated by humans. The quality of the output largely depends on the size and quality of the available data. While the automatic translation of simple sentences in languages with sufficient amounts of available textual data can achieve useful results, shallow statistical methods are doomed to fail in the case of languages with a much smaller body of sample data or in the case of sentences with complex, non-repetitive structures. Analysing the deeper structural properties of languages is the only way forward if we want to build applications that perform well across the entire range of European languages.
  868. \boxtext{Language technology as a key for the future.}
  869. The European Union is thus funding projects such as EuroMatrix and EuroMatrixPlus (since 2006) and iTranslate4 (since 2010), which carry out basic and applied research, and generate resources for establishing high quality language technology solutions for all European languages.
  870. European research in the area of language technology has already achieved a number of successes. For example, the translation services of the European Union now use the Moses open-source machine translation software, which has been mainly developed in European research projects.
  871. In Malta, the most advanced areas in language technology are currently speech synthesis and text corpora: In the area of Maltese speech synthesis, a government-supported project partly funded by EU regional development funds is under way to bring speech technology within the reach of disabled persons. The consortium, which consists of an SME (Crimson Wing Ltd), a foundation (FITA, Foundation for IT Access), and the University, has pledged that these resources will be made available for research purposes.
  872. In the area of text corpora, the Maltese Language Resource Server (MLRS) has come to fruition and significant efforts are under way at University, through the Institute of Linguistics (A.~Gatt, C.~Borg, R.~Fabri) and the Department of Intelligent Computer Systems (M.~Rosner), to maintain and develop it. Currently, the corpus comprises some 100M words, and further tools are planned, including a part-of-speech tagger and a spell-checker.
  873. \boxtext{Language Technology helps to unify Europe.}
  874. Drawing on the insights gained so far, it appears that today’s `hybrid' language technology mixing deep processing with statistical methods will be able to bridge the gap between all European languages and beyond. As this series of white papers shows, there is a dramatic difference between Europe’s member states in terms of both the maturity of the research and in the state of readiness with respect to language solutions. This white paper for the Maltese language demonstrates that there is potential for a language technology industry and re-search environment in Malta. But although a number of technologies and resources for Maltese exist, there are far fewer than for “larger” European languages and certainly not enough to support the full range of language-sensitive applications that are available for those other languages.
  875. According to the assessment detailed in this report, the achievement of a breakthrough in Maltese language technology requires a whole cycle of changes involving content providers, developers and users of language technology. Some changes in national language policy must be implemented before any breakthroughs for the Maltese language can be achieved.
  876. META-NETs vision is high-quality language technology for all languages that supports political and economic unity through cultural diversity. This technology will help tear down existing barriers and build bridges between Europes languages. This requires all stakeholders -- in politics, research, business, and society -- to unite their efforts for the future.
  877. This white paper series complements other strategic actions taken by META-NET (see the appendix for an overview). Up-to-date information such as the current version of the META-NET vision paper \cite{Meta1} or the Strategic Research Agenda (SRA) can be found on the META-NET web site: \url{http://www.meta-net.eu}.
  878. \end{multicols}
  879. \clearpage
  880. \ssection[Languages at Risk: a Challenge for Language Technology]{Languages at Risk: a Challenge for\newline Language Technology}
  881. \begin{multicols}{2}
  882. We are witnesses to a digital revolution that is dramatically impacting communication and society. Recent developments in information and communication technology are sometimes compared to Gutenbergs invention of the printing press. What can this analogy tell us about the future of the European information society and our languages in particular?
  883. \boxtext{The digital revolution is comparable to Gutenbergs invention of the printing press.}
  884. After Gutenbergs invention, real breakthroughs in communication were accomplished by efforts such as Luthers translation of the Bible into vernacular language. In subsequent centuries, cultural techniques have been developed to better handle language processing and knowledge exchange:
  885. \begin{itemize}
  886. \item the orthographic and grammatical standardisation of major languages enabled the rapid dissemination of new scientific and intellectual ideas;
  887. \item the development of official languages made it possible for citizens to communicate within certain (often political) boundaries;
  888. \item the teaching and translation of languages enabled exchanges across languages;
  889. \item the creation of editorial and bibliographic guidelines assured the quality of printed material;
  890. \item the creation of different media like newspapers, radio, television, books, and other formats satisfied different communication needs.
  891. \end{itemize}
  892. In the past twenty years, information technology has helped to automate and facilitate many processes:
  893. \begin{itemize}
  894. \item desktop publishing software has replaced typewriting and typesetting;
  895. \item Microsoft PowerPoint has replaced overhead projector transparencies;
  896. \item e-mail allows documents to be sent and received more quickly than using a fax machine;
  897. \item Skype offers cheap Internet phone calls and hosts virtual meetings;
  898. \item audio and video encoding formats make it easy to exchange multimedia content;
  899. \item web search engines provide keyword-based access;
  900. \item online services like Google Translate produce quick, approximate translations;
  901. \item social media platforms such as Facebook, Twitter and Google+ facilitate communication, collaboration, and information sharing.
  902. \end{itemize}
  903. Although these tools and applications are helpful, they are not yet capable of supporting a fully-sustainable, multilingual European society in which information and goods can flow freely.
  904. \subsection[Language Borders Hold back the European Information Society]{Language Borders\newline Hold back the European Information Society}
  905. We cannot predict exactly what the future information society will look like. However, there is a strong likelihood that the revolution in communication technology is bringing together people who speak different languages in new ways. This is putting pressure both on individuals to learn new languages and especially on developers to create new technology applications to ensure mutual understanding and access to shareable knowledge.
  906. In the global economic and information space, there is increasing interaction between different languages, speakers and content thanks to new types of media.
  907. The current popularity of social media (Wikipedia, Facebook, Twitter, YouTube, and, recently, Google+) is only the tip of the iceberg.
  908. \boxtext{The global economy and information\\ space confronts us with different\\ languages, speakers and content.}
  909. Today, we can transmit gigabytes of text around the world in a few seconds before we recognise that it is in a language that we do not understand. According to a recent report from the European Commission, 57\% of Internet users in Europe purchase goods and services in non-native languages; English is the most common foreign language followed by French, German and Spanish. 55\% of users read content in a foreign language while 35\% use another language to write e-mails or post comments on the web \cite{EC1}.
  910. A few years ago, English might have been the lingua franca of the web -- the vast majority of content on the web was in English -- but the situation has now drastically changed. The amount of online content in other European (as well as Asian and Middle Eastern) languages has exploded.
  911. Surprisingly, this ubiquitous digital linguistic divide has not gained much public attention; yet, it raises a very pressing question: Which European languages will thrive in the networked information and knowledge society, and which are doomed to disappear?
  912. \subsection{Our Languages at Risk}
  913. While the printing press helped step up the exchange of information in Europe, it also led to the extinction of many European languages. Regional and minority languages were rarely printed and languages such as Cornish and Dalmatian were limited to oral forms of transmission, which in turn restricted their scope of use. Will the Internet have the same impact on our modern languages?
  914. \boxtext{The variety of languages in Europe is one of its richest and most important cultural assets.}
  915. Europes approximately 80 languages are one of our richest and most important cultural assets, and a vital part of this unique social model \cite{EC2}. While languages such as English and Spanish are likely to survive in the emerging digital marketplace, many European languages could become irrelevant in a networked society. This would weaken Europes global standing, and run counter to the strategic goal of ensuring equal participation for every European citizen regardless of language. According to a UNESCO report on multilingualism, languages are an essential medium for the enjoyment of fundamental rights, such as political expression, education and participation in society \cite{Unesco1}.
  916. \subsection{Language Technology is a Key Enabling Technology}
  917. In the past, investments in language preservation focussed primarily on language education and translation. According to one estimate, the European market for translation, interpretation, software localisation and website globalisation was 8.4 billion in 2008 and is expected to grow by 10\% per annum \cite{EC3}. Yet this figure covers just a small proportion of current and future needs in communicating between languages. The most compelling solution for ensuring the breadth and depth of language usage in Europe tomorrow is to use appropriate technology, just as we use technology to solve our transport and energy needs among others.
  918. Language technology targeting all forms of written text and spoken discourse can help people to collaborate, conduct business, share knowledge and participate in social and political debate regardless of language barriers and computer skills. It often operates invisibly inside complex software systems to help us already today to:
  919. \begin{itemize}
  920. \item find information with a search engine;
  921. \item check spelling and grammar in a word processor;
  922. \item view product recommendations in an online shop;
  923. \item follow the spoken directions of a navigation system;
  924. \item translate webpages via an online service.
  925. \end{itemize}
  926. Language technology consists of a number of core applications that enable processes within a larger application framework. The purpose of the META-NET language white papers is to focus on how ready these core enabling technologies are for each European language.
  927. \boxtext{Europe needs robust and affordable language technology for all European languages.}
  928. To maintain our position in the frontline of global innovation, Europe will need language technology, tailored to all European languages, that is robust and affordable and can be tightly integrated within key software environments. Without language technology, we will not be able to achieve a really effective interactive, multimedia and multilingual user experience in the near future.
  929. \subsection{Opportunities for Language Technology}
  930. In the world of print, the technology breakthrough was the rapid duplication of an image of a text using a suitably powered printing press. Human beings had to do the hard work of looking up, assessing, translating, and summarising knowledge. We had to wait until Edison to record spoken language and again his technology simply made analogue copies.
  931. Language technology can now simplify and automate the processes of translation, content production, and knowledge management for all European languages. It can also empower intuitive speech-based interfaces for household electronics, machinery, vehicles, computers and robots. Real-world commercial and industrial applications are still in the early stages of development, yet R\&D achievements are creating a genuine window of opportunity. For example, machine translation is already reasonably accurate in specific domains, and experimental applications provide multilingual information and knowledge management, as well as content production, in many European languages.
  932. As with most technologies, the first language applications such as voice-based user interfaces and dialogue systems were developed for specialised domains, and often exhibit limited performance. However, there are huge market opportunities in the education and entertainment industries for integrating language technologies into games, edutainment packages, libraries, simulation environments and training programmes. Mobile information services, computer-assisted language learning software, eLearning environments, self-assessment tools and plagiarism detection software are just some of the application areas in which language technology can play an important role. The popularity of social media applications like Twitter and Facebook suggest a need for sophisticated language technologies that can monitor posts, summarise discussions, suggest opinion trends, detect emotional responses, identify copyright infringements or track misuse.
  933. \boxtext{Language technology helps overcome the disability of linguistic diversity.}
  934. Language technology represents a tremendous opportunity for the European Union. It can help to address the complex issue of multilingualism in Europe the fact that different languages coexist naturally in European businesses, organisations and schools. However, citizens need to communicate across the language borders of the European Common Market, and language technology can help overcome this final barrier, while supporting the free and open use of individual languages. Looking even further ahead, innovative European multilingual language technology will provide a benchmark for our global partners when they begin to support their own multilingual communities. Language technology can be seen as a form of assistive technology that helps overcome the disability of linguistic diversity and makes language communities more accessible to each other. Finally, one active field of research is the use of language technology for rescue operations in disaster areas, where performance can be a matter of life and death: Future intelligent robots with cross-lingual language capabilities have the potential to save lives.
  935. \subsection{Challenges Facing Language Technology}
  936. Although language technology has made considerable progress in the last few years, the current pace of technological progress and product innovation is too slow. Widely-used technologies such as the spelling and grammar correctors in word processors are typically monolingual, and are only available for a handful of languages. Online machine translation services, although useful for quickly generating a reasonable approximation of a documents contents, are fraught with difficulties when highly accurate and complete translations are required. Due to the complexity of human language, modelling our tongues in software and testing them in the real world is a long, costly business that requires sustained funding commitments. Europe must therefore maintain its pioneering role in facing the technological challenges of a multiple-language community by inventing new methods to accelerate development right across the map. These could include both computational advances and techniques such as crowdsourcing.
  937. \boxtext{Technological progress needs to be accelerated.}
  938. \subsection{Language Acquisition in Humans and Machines}
  939. To illustrate how computers handle language and why it is difficult to program them to process different tongues, lets look briefly at the way humans acquire first and second languages, and then see how language technology systems work.
  940. Humans acquire language skills in two different ways. Babies acquire a language by listening to the real interactions between their parents, siblings and other family members. From the age of about two, children produce their first words and short phrases. This is only possible because humans have a genetic disposition to imitate and then rationalise what they hear.
  941. Learning a second language at an older age requires more cognitive effort, largely because the child is not immersed in a language community of native speakers. At school, foreign languages are usually acquired by learning grammatical structure, vocabulary and spelling using drills that describe linguistic knowledge in terms of abstract rules, tables and examples.
  942. \boxtext{Humans acquire language skills in two\\ different ways: learning from examples and\\ learning the underlying language rules.}
  943. Moving now to language technology, the two main types of systems acquire language capabilities in a similar manner. Statistical (or data-driven) approaches obtain linguistic knowledge from vast collections of concrete example texts. While it is sufficient to use text in a single language for training, e.\,g., a spell checker, parallel texts in two (or more) languages have to be available for training a machine translation system. The machine learning algorithm then learns patterns of how words, short phrases and complete sentences are translated.
  944. This statistical approach usually requires millions of sentences to boost performance quality. This is one reason why search engine providers are eager to collect as much written material as possible. Spelling correction in word processors, and services such as Google Search and Google Translate, all rely on statistical approaches. The great advantage of statistics is that the machine learns quickly in a continuous series of training cycles, even though quality can vary randomly.
  945. The second approach to language technology, and to machine translation in particular, is to build rule-based systems. Experts in the fields of linguistics, computational linguistics and computer science first have to encode grammatical analyses (translation rules) and compile vocabulary lists (lexicons). This is very time consuming and labour intensive. Some of the leading rule-based machine translation systems have been under constant development for more than 20 years. The great advantage of rule-based systems is that the experts have more detailed control over the language processing. This makes it possible to systematically correct mistakes in the software and give detailed feedback to the user, especially when rule-based systems are used for language learning. However, due to the high cost of this work, rule-based language technology has so far only been developed for a few major languages.
  946. \boxtext{The two main types of language technology systems acquire language in a similar manner.}
  947. As the strengths and weaknesses of statistical and rule-based systems tend to be complementary, current research focusses on hybrid approaches that combine the two methodologies. However, these approaches have so far been less successful in industrial applications than in the research lab.
  948. As we have seen in this chapter, many applications widely used in todays information society rely heavily on language technology. Due to its multilingual community, this is particularly true of Europes economic and information space. Although language technology has made considerable progress in the last few years, there is still huge potential in improving the quality of language technology systems. In the following, we will describe the role of Maltese in European information society and assess the current state of language technology for the Maltese language.
  949. \end{multicols}
  950. \clearpage
  951. \ssection[The Maltese Language in the European Information Society]{The Maltese Language in the\newline European Information Society}
  952. \begin{multicols}{2}
  953. \subsection{General Facts}
  954. Maltese is the national language of the Maltese archipelago, which consists of the islands Malta, Gozo (\emph{Għawdex}) and Comino (\emph{Kemmuna}).
  955. Together with English, Maltese is also the official language of Malta. According to the \emph{Demographic Review 2009} by the National Statistics Office of Malta, the estimated Maltese population (excluding foreigners) in Malta for the end of the year 2009 was 396,278. It is estimated that today, due to emigration phases from Malta mostly in the 1950s and 1960s, roughly the same number of expatriate native speakers lives abroad (mostly in the United Kingdom, Australia, USA and Canada).
  956. Although Maltese belongs to the South Arabic branch of the Semitic language family, it differs considerably from the other neo-Arabic languages. Its structure is the result of different language contact situations that emerged under different rulers of the islands in the course of a millennium. While the core of Maltese is Semitic, it also contains a Romance superstrate and English adstrate. Also, Maltese is the only Semitic language written in a (modified) Latin alphabet.
  957. The Semitic core of the Maltese language stems from the Arab conquest in 870 AD and its subsequent repopulation with Arabic speaking settlers. The first direct contact with Romance languages was established in 1090 when Malta was conquered by the Normans, who brought Sicilian with them, while the population still used their Arabic vernacular in everyday life. Malta was more and more cut off politically, culturally and linguistically from the Arabic world. In the following centuries, under the influence of the Romance languages of the rulers, more and more Romance loan words entered the Arabic dialect. When Malta was under British rule in 1800, the official language changed from Italian to English, which brought an increasing number of English loan words into Maltese . The following sentence taken from a newspaper article (\emph{l-Orizzont} of September 7th, 1995; reproduced in \cite[p.~135]{Ambros:1998}) can illustrate the different influences of the languages in contact (Romance loan words are in boldface, English loans underlined):
  958. \begin{examples}
  959. \item
  960. \gll Il-\underline{hold-up} sar minn żagħżugħ li kien liebes \textbf{nuċċali} \textbf{skur} tax-xemx.
  961. the-hold-up happened from young.man that was wearing glasses dark of.the-sun
  962. \glt `The robbery was committed by a young man who was wearing dark sunglasses.'
  963. \glend
  964. \end{examples}
  965. One remarkable fact about Maltese is that despite its relatively small number of speakers and the small area in which it is spoken, there is a comparatively rich number of variants or dialects. In general, a main distinction can be made between the Standard variety spoken in the urban areas like Valletta and Sliema and non-standard varieties spoken in the rural areas. Outside of Malta, the Maltese spoken in Australia has developed into an ethnolect of its own called \emph{Maltraljan} \cite{Bovingdon:2001}. It differs from Standard Maltese mainly in terms of its lexicon (i.\,e., the vocabulary) that is the result of extensively borrowing words from (Australian) English and subsequent change in meaning.
  966. With English being the second official language in Malta, many Maltese are bilingual. Between the poles of monolingualism and full bilingualism, there is a continuum of language-mixing and codeswitching. Most Maltese speak only Maltese at home and among each other. English, on the other hand, is the language used in the written context of higher education and in communication with foreigners.
  967. \subsection{Particularities of the Maltese Language}
  968. Maltese is the only Semitic language in the European Union and the only Semitic language written in a Latin alphabet. The Maltese alphabet makes use of some special graphemes that differ from other Latin alphabets (the sound values are given in the International Phonetic Alphabet): ċ \lingua{tʃ}, ġ \lingua{dʒ}, għ (mostly silent), ħ \lingua{h}, ż \lingua{z} \cite{Fabri:2011a,Borg-Alexander:1997}.
  969. Some particular characteristics of Maltese are:
  970. \begin{itemize}
  971. \item free word order
  972. \item Semitic morphology
  973. \item aspect-based temporal system
  974. \item lack of a morphological infinitive
  975. \end{itemize}
  976. \boxtext{Word order is relatively free in Maltese sentences.}
  977. Even though there are no case endings, Maltese has a very free word order. The sentence \emph{Il-kelb gidem il-qattusa lbieraħ} (`The dog bit the cat yesterday.') has the word order S(ubject) V(erb) O(bject) but could also be expressed as:
  978. \begin{examples}\label{WO_no_clitics_en}
  979. \item
  980. \gll Ilbieraħ il-kelb gidem il-qattusa.
  981. yesterday {the-dog (m)} he.bit {the-cat (f)}
  982. \gln (SVO)
  983. \glt `Yesterday, the dog bit the cat.'
  984. \glend
  985. \item
  986. \gll Gidem il-qattusa l-kelb ilbieraħ.
  987. he.bit {the-cat (f)} {the-dog (m)} yesterday
  988. \gln (VOS)
  989. \glt `The dog bit the cat yesterday.'
  990. \glend
  991. \item
  992. \gll Il-qattusa gidimha l-kelb ilbieraħ.
  993. {the-cat (f)} he.bit.her {the-dog (m)} yesterday
  994. \gln (OVS)
  995. \glt `The cat, it was bitten by the dog yesterday.'
  996. \glend
  997. \end{examples}
  998. As the English translations try to show, the different word orders have a different emphasis in meaning. In the first two examples, the word orders are unmarked, with the object following the verb. In the last example, the object \emph{il-qattusa} (`the cat') precedes the verb. As pointed out in \cite[p.~140]{Fabri:1993}, this word order is marked and emphasises the object for contrast. With the object in front, native speakers prefer to mark \emph{il-qattusa} with the object enclitic -\emph{ha} on the verb. Also, in spoken discourse, this contrast is expressed with different intonation. The word order in the second example (VOS) could be used for expressing a contrastive meaning as well, given the appropriate intonation, putting emphasis on \emph{gidem il-qattusa} `he bit the cat'. Without this contrastive meaning (and without the contrastive intonation) emphasis would be on the fact itself as in: ``Havent you heard what happened yesterday? The dog bit the cat yesterday!'' (Fabri, personal conversation).
  999. \boxtext{Maltese words can change internally\\ during inflection and derivation.}
  1000. As a Semitic language, Maltese shows a non-concatenative morphology, i.\,e., inflected and derived word forms change internally:
  1001. In languages like English, word forms are made up of stems and affixes, i.\,e., concatenatively. The verb \emph{shoot} can be inflected for third person by attaching the affix \emph{-s} to the stem as in \emph{(he) shoot-s}. Also, from the verbal stem a noun can be derived by adding the affix \emph{-er} as in \emph{shoot-er}. Hence both inflection and derivation take place without internal changes to the structure, i.\,e., concatenatively.
  1002. In Maltese, there is a mixture of stem-based and root-and-pattern-based morphology. In the Semitic component, the basic ``unit'' within a word is often not a stem but a root made up of three (sometimes four) consonants in a fixed order that carry a general meaning. Word stems with their specific meaning are formed by arranging the consonants according to a certain pattern. For example, the root \emph{k-t-b} carries the meaning of everything connected with ``writing''. In the following, patterns are represented as numbers \textbf{1,2,3} for the root consonants and \textbf{v} for the vowels between them, for example \textbf{1v2v3}. By applying the pattern \textbf{1v2v3} and filling the vowel positions between the root consonants \textbf{1,2} and \textbf{3} with the vowel sequence \textbf{i-e}, one gets the perfective verb \emph{kiteb} `he wrote'. Inflection of this verb for plural takes place by affixation of the plural affix \emph{-u}, giving the form \emph{kitbu} `they wrote'. Applying the pattern \textbf{1v22v:3} (\textbf{v:} stands for a long vowel) to the root renders the agent noun \emph{kittieb} `writer'. Inflection of the noun by adding the affix \emph{-a} gives the plural \emph{kittieba} `writers'. Note that the plural suffix \emph{-a} looks similar to the feminine marker \emph{-a} so that \emph{kittieba} could also refer to a female writer. The other Semitic Maltese plural suffixes are \emph{-in} as in \emph{mħallef} `judge', \emph{mħallfin} `judges'; \emph{-at/-iet} as in \emph{kittieba} `(female) writer', \emph{kittiebat} `(female) writers'; \emph{-ijiet} as in \emph{żmien} `time', \emph{żminijiet} `times'.
  1003. Plural nouns in Maltese can also be formed non-concatenatively (the so-called broken plural forms), i.\,e., no affixation takes place, but the noun is changed internally, e.\,g., \emph{ktieb} `book' vs.~\emph{kotba} `books'.
  1004. Loan verbs today are mostly imported using a special verb class that can accommodate undigested stems \cite{Mifsud:1995}. For example, the English stem \emph{park-} became the basis of the Maltese verb forms \emph{pparkjajt, pparkjat, pparkja} `I/ she/ he parked'. Today, this formerly marginal Semitic special verb class has increased in size due to the influx of English loan verbs. It is highly productive, often giving way to \emph{ad-hoc} loans of English verbs which already have a Semitic counterpart in Maltese. For example `to download (a file)' can be expressed using the Semitic verb \emph{niżżel} (originally meaning `he caused to come down'). Taking the English stem \emph{download} and importing it via the special verb class instead gives forms like \emph{ddawnlowdjajt, ddawnlowdjat, ddawnlowdja} `I/ she/ he down-loaded'. This strategy is often criticised as corrupting the language \cite{Fabri:2011a}.
  1005. \boxtext{The Maltese temporal system\\ is marked for aspect.}
  1006. Verbs in Maltese are marked for aspect, i.\,e., as to whether an action is completed (perfective) or not completed (imperfective) -- for a full account on tense and aspect in Maltese, see \cite{Fabri:1995, Ebert:2000}. In the absence of any other grammatical markers, verbs in the perfective are interpreted as ``past tense'' and verbs in the imperfective as ``present tense'': \emph{Andrew kiteb} `Andrew wrote'; \emph{Andrew jikteb} `Andrew writes'. Combination of the imperfective verb with \emph{kien}, the perfective form of the verb for `to be', expresses habitual past: \emph{Andrew kien jikteb} `Andrew used to write'. Adding word \emph{qed} `progressive' (like the English \emph{-ing} form) gives \emph{Andrew kien qed jikteb} `Andrew was writing' etc.
  1007. Maltese verbs do not have morphological infinitives. Thus, in complex predicates like in the English sentence `Andrew wants to write', both verbs are morphologically finite: \emph{Andrew irid jikteb} (literally: `Andrew he wants he writes') even though semantically, \emph{jikteb} is not finite.
  1008. \subsection{Recent Developments}
  1009. With the rise of English to the status of an international language and language of technology after the Second World War, the number of English loan words in Maltese has grown to a great extent. Many of them have become ``nativised'', i.\,e., they are adopted in regular use so much that even derived Semitic words cannot replace them. For example, instead of the commonly used word \emph{ajruport} (from English \emph{airport}), the Semitic word \emph{mitjar} once was proposed (derived from \emph{tar} `he flew'). However, it became never accepted by the language community. On the other hand, loan words enter the language very rapidly, being imported spontaneously, even though there are already proper Maltese words for them (for example \emph{ddawnlowdja} vs \emph{niżżel} `he downloaded'). This fuels fears among some that the language might become ``corrupted'' \cite{Fabri:2011a}.
  1010. Another recent development for Maltese is its status as an official language of the European Union. This has both advantages and disadvantages \cite{Fabri:2011a}. On the one hand, Maltese has finally become an internationally recognised language, a status that it did not have for a long time, being marginalised as a ``kitchen language'' centuries before. On the other hand, Maltese EU translators are confronted with certain challenges: many technical and legal terms have yet to be ``invented'' for Maltese. This results eventually in lexical expansion of the language (definitely a positive aspect), which, however, has to be coordinated by a central body so that individual translators do not come up with different terms for the same concepts independently from each other (which is a serious problem). The central body to deal with this challenge is the National Council for the Maltese Language (\emph{Il-Kunsill Nazzjonali tal-Ilsien Malti}).
  1011. Other developments in recent years concern the Maltese orthography. Maltese (together with English) became the official language of Malta on January 1, 1934 in the orthography released by the Union of Maltese Writers (\emph{Għaqda tal-Kittieba tal-Malti}) in 1924. Since then, the orthography has undergone three revisions (1984, 1992 and 2008).
  1012. The last reform was released in 2008. Its aim was to reduce writers' insecurities that resulting from a considerable number of spelling variants for certain words. As the \emph{Kunsill’s} document \emph{Deċiżjonijiet 1} \cite{Kunsill:2008a} points out, the great number of variants could be reduced by finding a consistent balance between grammatical and phonetic spelling. Thus the four variants \emph{zobtu, zoptu, sobtu} and \emph{soptu} (`suddenly, unexpectedly') could be reduced to the two variants \emph{zoptu} for \lingua{'zɔp.tʊ} and \emph{soptu} for \lingua{'sɔp.}. For a similar reason, the word \emph{skond} \lingua{skɔnt} `according to', was changed to \emph{skont} since its other grammatical forms do not justify spelling with \emph{d} (derived from Italian \emph{secondo}), as, e.\,g., \emph{skontok} \lingua{'skɔn.tɔk} `according to you'.
  1013. For the third area (loan words), the principle remains to write loan words according to the Maltese orthography if they are regarded as ``nativised'' and if it does not result in conflicts with the pronunciation or with other Maltese writing rules. However, many Maltese prefer to write English loan words with their original spelling, since they have become used to them. In fact, during a public seminar on the treatment of English loan words in April 2008, there were emotional discussions among the audience when it came to words like \emph{email} and their proposed new spellings as \emph{imejl}. Factors like the habits of a language community make the standardisation of spellings even more difficult than finding the balance between grammatical and phonetic principles \cite{Kunsill:2008b}.
  1014. These examples only give a slight idea of the hard work that the \emph{National Council for the Maltese Language} is undertaking as part of language cultivation in Malta. The next section will give an insight into the history of language cultivation in Malta.
  1015. \subsection{Language Cultivation in Malta}
  1016. Compared to other languages of Europe, the status of Maltese as an official language (since 1934) itself is a recent development. Thus language cultivation, too, had a late start.
  1017. For centuries, Maltese was only the spoken medium of the Maltese population and marginalised in comparison to the respective official language of Malta's rulers. This started to change with the language movement of the mid-/ late 18th century when first systematic studies of the language were conducted by Agius de Soldanis (1750) and Mikiel Anton Vassalli (1797). Especially Vassalli promoted the Maltese language by promoting its use in every domain of everyday life. Fortunato Panzavecchia's bible translations of the mid 19th century contributed to further standardisation of the language \cite{Kontzi:2005}. And with the move towards a standardised orthography in the early 20th century, an important step was made by the foundation of the Union of Maltese Writers (\emph{Għaqda tal-Kittieba tal-Malti}) in 1920. The orthographic system, which was developed by this organisation, became Malta's official orthography in 1934 and, with some changes and additions, has been in use since.
  1018. In 1964, after gaining independence from Great Britain, the status of Maltese as national language and as official language together with English was written into the constitution. When Malta joined the EU in 2004, Maltese became an official language of the EU. As noted in the section above, this results in certain challenges, which can only be solved by a body that coordinates standardisation and common practice in translation work.
  1019. The body in Malta to do this work is the National Council for the Maltese Language (\emph{Il-Kunsill Nazzjonali tal-Ilsien Malti}). It was founded in 2005 as the first government organisation to officially deal with language matters and language planning for the Maltese language. The Council's tasks are, as formulated in the Maltese Language Act (ACT No.~V of 2004): promoting the Maltese language, to ``adopt a suitable linguistic policy backed by a strategic plan'' and put it into practice. Another important task of the \emph{Kunsill} is to update the Maltese orthography and decide on correct spellings (taking over the task from the Academy of Maltese and thus being mainly responsible for the Maltese orthography reform of 2008). On its website, the Council also offers training courses for proof-readers and Maltese language courses for foreigners \cite{Kunsill1}.
  1020. Before the Council was founded, standardisation of orthography was the task of the Academy of Maltese (\emph{Akkademja tal-Malti}). It emerged in 1964 from the Union of Maltese Writers (\emph{Għaqda tal-Kittieba tal-Malti}), which had been the founding body for the first official orthography in 1924/1932. The Academy's main aim today is to promote academic studies in the Maltese language and literature, to promote the use of Maltese in every domain of everyday life and to build up contacts to people who are friends of the language and who use it outside of Malta \cite{Akkademja1}. The Academy works closely together with the National Council for the Maltese Language.
  1021. The motivation behind the Maltese Language Act was the idea that one national language which is shared by all individuals within that nation forms the basis for cultural and national identity. This of course calls for standardisation of the language. Indeed, from the language cultivation movement of the 19th century until today, Maltese has risen from a formerly marginalised vernacular to a national language of high prestige. This is also reflected in the ever-growing amount of literary works in Maltese during the same timespan and in the high number of influential organisations and bodies for the Maltese language and literature \cite{Fabri:2011a}.
  1022. \subsection{Language in Education}
  1023. Particularly in a bilingual society like in Malta, several aspects play a role when it comes to language in education.
  1024. %
  1025. One aspect is the language of instruction, i.\,e., the language that is used officially by the teachers during lessons in school or in seminars at university.
  1026. Another factor is the language used in certain school books. With English being the language of technology and natural sciences, most of the school books on these topics are in English. In fact, efforts to translate technical and scientific terms into Maltese have encountered several problems, one of them being the acceptance by the language community. Hence the school subjects, too, possibly determine the language of instruction for certain lessons, although it can also be that English school books (and the English terminology contained therein) are used while the language of instruction is Maltese.
  1027. Yet another aspect is the language used by individuals. Bilingual speakers not only use different languages in different social settings (``domains''), e.\,g., Maltese with the family at home, English with foreigners, Maltese or English during school lessons etc. They also tend to mix both languages, either by language mixing (e.\,g., English words are mixed into a conversation conducted in Maltese) or by code-switching (e.\,g., a conversation in Maltese switches to English and back again, with the English parts being larger than just single words, often consisting of several sentences). Thus even during school lessons that are taught in one language, conversations between teachers and students can switch between the languages \cite{Camilleri:1995}.
  1028. Keeping these three factors in mind, it becomes clear that the actual exposure of students to the respective language in schools or at the university is something different from the chosen language of instruction.
  1029. Regarding the official language of instruction in education, both Maltese and English can be found in schools and at the university, since Maltese and English share the status as official languages in Malta. In schools, both are taught as subjects from early on. Which language is used as language of instruction depends on the type of school. Private schools tend to use English more than Maltese (sometimes to a greater extent), while in state schools Maltese is slightly preferred to English. Church schools have their individual preferences in that some traditionally prefer one language over the other.
  1030. As was mentioned before, most science books that are used in school are in English. Thus, with the introduction of more and more scientific subjects later in school and even more so at the university, students are exposed to the two languages at the same time, using them for different situations: they might have their lessons taught in Maltese, but read their books and write their essays in English. Especially for students at university, conversations between them, friends and lecturers often take place in Maltese, sometimes code-switching/mixing between Maltese and English, or they are even in English only (the latter for example with inter-national students or lecturers).
  1031. At home with their family and friends, however, most Maltese speak Maltese, some mix languages and only a few families speak English only.
  1032. %
  1033. As can be seen from the examples above, despite the fact that both Maltese and English are used as languages in education, there is a clear distribution when it comes to their use in society. Sciriha and Vassallo (2001, p.~29, cited in \cite{Fabri:2011a}) point out that ``70\% of the respondents claimed to use Maltese at work, while 90\% said they communicate with their family members at home in Maltese. the percentages for spoken Maltese are extremely high but go down for other skills like reading and writing.''
  1034. This distribution of Maltese being used mainly as the spoken medium and English mainly as the written medium bears a certain risk, as it can have an impact on different skills of its native speakers when it comes to speaking, reading or writing. In order to give reasons to this statement, one has to look at the basic characteristics of spoken and written language.
  1035. In general, written texts differ from spoken discourse in a number of ways. What they have in common is that both are ways of transferring information between two parties, i.\,e., speaker and hearer, and writer and reader, respectively. However, they differ in the way information is passed on between them. Putting it in a simple way, a written text, unlike spoken discourse, is set outside a concrete interactive communicative situation. Spoken discourse, on the one hand, depends on the interaction between speaker and hearer. The speaker has to structure the information in a certain way. This is important because of the limited human short-term memory: a hearer in a conversation can only take in a certain amount of information before he has to interrupt and ask the speaker to make sure that he understood.
  1036. A written text, on the other hand, is non-interactive in so far as the reader cannot ask for more specific information. He can however, browse back and forth in the text (something that a hearer cannot do in discourse). In that way, a written text itself serves as the long-term memory for the reader. Thus, a written text structures information differently than would be done in a spoken conversation. For example, a text has to provide more background information in order to provide a common ground with the reader before the actual information flow starts. This is not a problem, given that a text can serve as a long-term memory for the reader. In fact, it allows for a more elaborated structure than spoken discourse, i.\,e., it usually contains longer sentences and a higher number of subordinate clauses.
  1037. This register (i.\,e., ``language style'') distinction is what in the literature, e.\,g., \cite{Biber:1991}, has been dubbed \emph{oral} versus \emph{literate} text structures. Of course, a text can also be written in an oral register that resembles spoken conversations (e.\,g., in forum chats or informal emails). But it is not the register normally used in, e.\,g., essays. Ideally, native speakers acquire the literate register already from an early age on, e.\,g., by their parents reading stories to them. Later in school, this knowledge is deepened by active exercise in writing essays, for example.
  1038. A literate register develops over time in a language with a literary tradition. Maltese, compared to its short history as an officially written language (since 1934) has a long and rich literary history. Even though the oldest literature discovered is very sparse (\emph{Il Cantilena} by Pietro Caxaro, dating back to about 1450), a literary tradition started to form around the 1740s. In the 19th century, the amount of literature in Maltese was growing \cite{Fabri:2011a}, and with it, Maltese was expanding. Today it is a language with a fully fledged literate register.
  1039. This register, however, needs to be exercised in order to keep up the status of the language as a both conversational and literary language. The trend in higher education to write more essays in English than in Maltese, at least theoretically, bears the risk of reducing Maltese to the oral register. A higher number of Maltese websites of all genres is desirable to cover both registers and their subtypes in order to ensure a stable status of the language in all its richness.
  1040. \subsection{International Aspects}
  1041. Bearing the previous sections in mind, it should be clear now that the international aspects of Maltese differ to a great extent from other languages. With under a million native speakers worldwide, Maltese is considered a ``lesser-spoken'' language. In its history, it was not the language of occupiers but rather one of the occupied. As a result of this, Maltese has never become what is traditionally considered an international language or lingua franca as was the case for, e.\,g., Latin, Spanish, Portuguese or English, all of which can be considered as the languages of conquerors. It did spread to other countries, where it is still spoken today (Australia, Canada, USA and UK), but only as a community language. It took nearly 200 years from the first interest of Maltese grammarians in their own language until it eventually gained the status of an official language. And even then, the other official language, English, still served as the language for international relations.
  1042. A change for Maltese to become an internationally visible language came with Malta's joining of the EU in 2004. Since then, it has been an official language inside the European Union, with all the benefits and challenges which are connected to this status.
  1043. Academically, the interest in Maltese as subject of science goes back to as far as 1603 when Hieronymus Megiser published his \emph{Thesaurus Polyglottus}, which included a list of Maltese words. The first scholar to systematically explore and promote the Maltese language was Mikiel Anton Vassalli. He published a grammar (1790), a dictionary (1797) and several alphabets (1788 and 1790) for Maltese and today is called ``the father of the Maltese Language'' \cite{Brincat:2011}. In the 20th century, Sutcliffe's Grammar of the Maltese Language (1936) was published. From the 1960s, Maltese language Linguistics gained wider international academic awareness through the publications by Joseph Aquilina (e.\,g., \emph{Papers in Maltese Linguistics} (1961) and \emph{Maltese-English Dictionary}, two volumes (1987 and 1990)). Since then, more and more scholars outside Malta have taken an interest in Maltese. 2007 saw the foundation of the International Association of Maltese Linguistics (\emph{Għaqda Internazzjonali tal-Lingwistika Maltija}) \cite{GHILM1}, an association of linguists who are interested in the Maltese language. The main aim of GĦILM, as stated on its website, is to provide ``a connection between interested scholars from all subdisciplines of Linguistics'', thus facilitating research on Maltese. It also wants to bring together people from different backgrounds who work with the Maltese language (linguists, translators, students and others).
  1044. \subsection{Maltese on the Internet}
  1045. A survey of the National Statistics Office of Malta in the second quarter of 2009 \cite{NSO2} shows that among a population of roughly 400,000 persons, 67 per cent had access to a computer and 64 per cent had access to the Internet. A recent Eurobarometer survey (published in May 2011) \cite{Eurobarometer1} among European Internet users' browsing habits showed that only 6.5 per cent of Maltese Internet users use exclusively Maltese on the Internet when reading, consuming content or communicating. Instead, 90.6 per cent choose to browse websites in English and 20.1 per cent Italian, respectively. These figures formed the basis of an article in the Maltese daily newspaper \emph{The Times of Malta}, which provoked a lively discussion mostly among Maltese readers of the online edition \cite{TimesOfMalta1}.
  1046. The exact findings in the survey, however, point to the conclusion that this habit is not a deliberate choice: When asked which language Maltese considered their mother tongue, 89.5 per cent of the respondents claimed that Maltese was their mother tongue (opposed to only 7.6 per cent for English and 0.2 per cent for Italian).
  1047. Languages other than respondents' own used to read or watch content on the Internet were English (90.6 per cent) and Italian (20.1 per cent). Only 6.5 per cent responded that they only use their own language, which is not surprising, given that most Maltese are bilingual in Maltese and English and a considerable number speak Italian as well.
  1048. When writing on the Internet, numbers in favour of Maltese are higher than when reading or watching content: 87 per cent claimed they used Maltese, 85 per cent English and 8 per cent Italian.
  1049. The reason for the majority to use English as the language for consuming online content may be just the limited number of websites in Maltese rather than the favour for English \emph{per se}. Remember that most respondents did not regard English as their own language and that the usage of Maltese increased when producing content on the web, even though this use of Maltese in most cases takes place in chat forums and social platforms, hence in a colloquial style, i.\,e., in the oral register.
  1050. A peculiarity about the Maltese used by the younger generation in social platforms and chat forums is its phonetic spelling, without the silent characters like \emph{} and \emph{h}. Thus \emph{għax} `because' is written as \emph{ax}, \emph{tiegħi} `my' as \emph{tiei} etc. The reason for this may be the late introduction of Maltese special characters into the PC world. Although Maltese has been implemented in the Unicode framework since its inception, computers and operation systems followed much later. The \emph{Maltese Standards Authority} only released a standardised Maltese keyboard layout in 2002, and Microsoft's Windows operating system has been available in a Maltese language version since as late as 2006 (with Windows XP). In the case of mobile phones, the special Maltese letters are still not implemented. Hence it remains to be seen whether the \emph{ad-hoc} orthography of the chat forums will give way to a spelling with special characters once they are available on mobile phones or whether this phonetic orthography will survive as a ``sociolect'' of the younger generation \cite{Fabri:2011b}.
  1051. As for the amount of Maltese on the Internet in general, it is hard to come up with exact numbers, not least because the number of websites is changing constantly. But there are other factors which give an idea about the amount of Maltese online in comparison to other languages.
  1052. A first look at the number of Wikipedia entries (on June 1st, 2011) showed that there were about 2,820 entries in Maltese in contrast to more than 3,640,000 entries in English and more than 1,238,000 entries in German.
  1053. Comparing the number of top level domains (TLD), the TLD .mt occupies rank 213 (out of 358) with an unspecified number of registered .mt domains (a member of the Network Information Centre Malta gave an estimate of about 5,000), opposed to 21,336,063 registered domains for .com (commercial, rank 1) and 5,459,604 domains for .de (Germany, rank 2). Of course, the number of registered domains does not tell anything about the language in which the pages under a certain domain are written.
  1054. Some rough numbers of the amount of Maltese language on the Internet can be calculated using a procedure proposed by \cite{Kilgarriff-Grefenstette:2003} (The authors are indebted to Dr.~Albert Gatt (Institute of Linguistics, University of Malta) for drawing their attention to this paper.). The basic idea is that function words (e.\,g., \emph{but, for, this} etc) are more frequent than content words (e.\,g., nouns, verbs, adjectives) and form a finite set in a language. Also, the percentage of the function words in a language is stable in a text sample as the size of the sample increases (Zipf's Law). Thus, one can calculate the amount of words for any language on the Internet as follows:
  1055. %Some rough numbers of the amount of Maltese language on the internet can be calculated using a procedure proposed by
  1056. %\cite{Kilgarriff-Grefenstette:2003,Gatt1en}. The basic idea is that function words (e.\,g., \emph{but, for, this} etc) are more frequent than content words %(e.\,g., nouns, verbs, adjectives) and form a finite set in a language. Also, the percentage of the function words in a language is stable in a text sample as %the size of the sample increases (Zipf's Law). Thus, one can calculate the amount of words for any language on the internet as follows:
  1057. In the first step, one calculates the amount of selected function words of Maltese in a corpus (i.\,e., a text collection) whose size is known. In the second step, one uses a search engine (e.\,g., Google) to find out the frequency for the same function words on the web. In the third step, the frequency from the corpus count is extrapolated to the Google search and then an average is calculated for the frequency of function words in the search results.
  1058. Some restrictions of this method should be mentioned: Firstly, the numbers gained by this method are only page hits. For example, 94,300 Google hits for the word \emph{għal} `for’ are not 94,300 instances of the word on the Internet, but 94,300 webpages which contain the word \emph{għal} at least once. Secondly, the search only finds webpages which have an individual URL \cite{Kilgarriff-Grefenstette:2003}. Pages that are only accessible via a web interface are not retrieved in the web search. Thirdly, a search engine will only search for a string irrespective of its environment on a webpage. It does not make judgements about whether a certain string is actually a word of a language.
  1059. Applied to Maltese function words, the method described above generates different estimates for Maltese. For websites in the domain .mt situated in Malta, the estimated size is 50 million words, while for .mt websites in all regions the size is 500 million words. The reason for this difference is that a lot of .mt domains are reserved for servers outside Malta.
  1060. The exact results of the Google searches (conducted on July 8, 2011) and their extrapolation can be retraced in Figure~\ref{table:Google_A_en} and Figure~\ref{table:Google_B_en} below. The column f/m (i.\,e., frequency per million) identifies how often in a million words the respective word occurs in the MLRS corpus. For example, in Figure~\ref{table:Google_A_en}, the word \emph{għal} `for appears nearly 3731 times among a million words. The Google search for \emph{għal} results in 94,300 pages with at least one instance of \emph{għal} on a webpage under the domain .mt in Malta. Multiplication by 1 million and division by 3730.96 makes an estimated 25,274,996 instances of any Maltese word on pages under the domain .mt inside Malta. If one does this calculation for the other words in the figure and averages the results, one arrives at a number slightly less than 50 million words. For all webpages worldwide listed under the domain .mt, the results are ten times as high.
  1061. \begin{figure*}[p]
  1062. \setlength{\tabcolsep}{2.5em}
  1063. \begin{tabularx}{\textwidth}{lrrr} \toprule\addlinespace
  1064. Word & f/m & Google (.mt only, Region=mt) & Extrapolation \\ \addlinespace\midrule\addlinespace
  1065. għal & 3730.96 & 94,300 & 25,274,996 \\
  1066. qed & 4770.79 & 118,000 & 24,733,849 \\
  1067. minn & 4833.58 & 173,000 & 35,791,276 \\
  1068. kien & 4073.83 & 93,800 & 23,025,015 \\
  1069. biex & 5276.78 & 179,000 & 33,922,202 \\
  1070. dan & 6412.28 & 434,000 & 67,682,634 \\
  1071. kienet & 1452.42 & 116,000 & 79,866,705 \\
  1072. kienu & 1465.56 & 135,000 & 92,114,959 \\
  1073. kont & 521.43 & 34,200 & 65,588,861 \\
  1074. konna & 301.39 & 19,400 & 64,368,426 \\
  1075. jekk & 2776.8 & 72,100 & 25,965,140 \\
  1076. mhux & 2101.32 & 79,500 & 37,833,362 \\ \addlinespace\midrule\addlinespace
  1077. Average& & & 48,013,952 \\ \addlinespace\bottomrule
  1078. \end{tabularx}
  1079. \caption{Google search, restricted to domain .mt and region Malta}
  1080. \refstepcounter{refs_en}
  1081. \label{table:Google_A_en}
  1082. \end{figure*}
  1083. \begin{figure*}[p]
  1084. \setlength{\tabcolsep}{3.1em}
  1085. \begin{tabularx}{\textwidth}{lrrr} \toprule\addlinespace
  1086. Word & f/m & Google (.mt only) & Extrapolation \\ \addlinespace\midrule\addlinespace
  1087. għal & 3730.96 & 1,340,000 & 359,156,892 \\
  1088. qed & 4770.79 & 966,00 & 202,482,188 \\
  1089. minn & 4833.58 & 1,240,000 & 256,538,632 \\
  1090. kien & 4073.83 & 3,100,000 & 760,954,679 \\
  1091. biex & 5276.78 & 6,530,000 & 1,237,497,110 \\
  1092. dan & 6412.28 & 3,980,000 & 620,684,062 \\
  1093. kienet & 1452.42 & 665,000 & 457,856,543 \\
  1094. kienu & 1465.56 & 436,000 & 297,497,202 \\
  1095. kont & 521.43 & 450,000 & 863,011,334 \\
  1096. konna & 301.39 & 81,600 & 270,745,546\\
  1097. jekk & 2776.8 & 1,120,000 & 403,341,976 \\
  1098. mhux & 2101.32 & 1,040,000 & 494,926,998 \\ \addlinespace\midrule\addlinespace
  1099. Average & & & 518,724,430 \\ \addlinespace\bottomrule
  1100. \end{tabularx}
  1101. \caption{Google search, restricted to domain .mt only}
  1102. \refstepcounter{refs_en}
  1103. \label{table:Google_B_en}
  1104. \end{figure*}
  1105. Of course, for a serious study, this search and extrapolation would have to include more words to arrive at more reliable numbers for the amount of Maltese on the Internet. But comparing the results with Table 3 in \cite{Kilgarriff-Grefenstette:2003}, one can say that both numbers are very low: for webpages in Malta only, the amount is more than Latvian and less than Icelandic ten years ago (the numbers in Table 3 were calculated in March 2001). For webpages worldwide, the amount of Maltese is more than Hungarian and less than Czech ten years ago. Given that ``the proportion of non-English text to English is growing'' \cite{Kilgarriff-Grefenstette:2003}, Maltese might be even less represented online today than the languages just mentioned.
  1106. Apart from private home pages and blogs, there are a number of official websites in Maltese. First of all, there is the home page of the Maltese government \cite{GovernmentOfMalta1}, which is available in both Maltese and English. Also, there are the Internet editions of the Maltese language daily and weekly newspapers: \emph{In-Nazzjon}, \emph{L-Orizzont} (daily), \emph{Illum}, \emph{Il-ĠENSillum}, \emph{KullĦadd}, \emph{Leħen is-Sewwa}, \emph{It-Torċa} (weekly).
  1107. The websites of the Maltese TV and radio stations show a mixture of both English and Maltese to different degrees. For example, the website of the stations NET TV \cite{NetTV1} and One TV \cite{OneTV1} show a framework in English, with some articles in Maltese, even though their programme contains both Maltese and English titles. The church-owned radio station RTK \cite{RTK1} (Maltese and English) lets the user choose between the two languages. The website of the Public Broadcasting Services (PBS) \cite{PBS1} contains sections in English and sections in Maltese as does the website of Radio 101 \cite{radio101}. This mixture between English and Maltese reflects the language use in everyday life. Within the programmes, however, the situation is a clearer, since the \emph{Maltese Broadcasting Authority} has issued strict guidelines for the use of Maltese on TV and the radio. Following those, presenters should speak in either Maltese or English and not switch between the two languages \cite{Fabri:2011a}. Hence the programmes of the stations contain broadcasts in Maltese only and others in English only. Those are often available online as well, either as live stream or as podcasts.
  1108. Outside Malta, a big collection for Maltese texts is within the EUR-Lex \cite{EURLex1} that hosts all official law and other documents of the European Union since 1951 in its 23 official languages.
  1109. Many if not all of these openly available web documents are used in corpus projects, e.\,g., the \emph{JRC-Acquis Multilingual Parallel Corpus} \cite{JRC-Acquis1}, which is a parallel corpus containing the complete text of the European Union Law in 22 languages. Another corpus that contains a growing number of visible web documents in Maltese is the corpus on the MLRS (Maltese Language Resource Server) \cite{MLRS1}.
  1110. \end{multicols}
  1111. \clearpage
  1112. \ssection[Language Technology Support for Maltese]{Language Technology Support\newline for Maltese}
  1113. \begin{multicols}{2}
  1114. Language technology is used to develop software systems designed to handle human language and are therefore often called human language technology. Human language comes in spoken and written forms. While speech is the oldest and in terms of human evolution the most natural form of language communication, complex information and most human knowledge is stored and transmitted through the written word. Speech and text technologies process or produce these different forms of language, using dictionaries, rules of grammar, and semantics. This means that language technology (LT) links language to various forms of knowledge, independently of the media (speech or text) in which it is expressed. Figure~\ref{fig:ltincontext_en} illustrates the LT landscape.
  1115. \begin{figure*}[htb]
  1116. \colorrule{grey3}{\textwidth}{1.5pt}
  1117. \center
  1118. \includegraphics[width=\textwidth]{../_media/english/language_technologies}
  1119. \caption{Language technologies}
  1120. \refstepcounter{refs_en}
  1121. \label{fig:ltincontext_en}
  1122. \colorrule{grey3}{\textwidth}{1.5pt}
  1123. \end{figure*}
  1124. When we communicate, we combine language with other modes of communication and information media for example speaking can involve gestures and facial expressions. Digital texts link to pictures and sounds. Movies may contain language in spoken and written form. In other words, speech and text technologies overlap and interact with other multimodal communication and multimedia technologies.
  1125. In this section, we will discuss the main application areas of language technology, i.\,e., language checking, web search, speech interaction, and machine translation. These applications and basic technologies include
  1126. \begin{itemize}
  1127. \item spelling correction
  1128. \item authoring support
  1129. \item computer-assisted language learning
  1130. \item information retrieval
  1131. \item information extraction
  1132. \item text summarisation
  1133. \item question answering
  1134. \item speech recognition
  1135. \item speech synthesis
  1136. \end{itemize}
  1137. Language technology is an established area of research with an extensive set of introductory literature. The interested reader is referred to the following references: \cite{carstensen-etal1, jurafsky-martin01, manning-schuetze1, lt-world1, lt-survey1}.
  1138. Before discussing the above application areas, we will briefly describe the architecture of a typical LT system.
  1139. \subsection{Application Architectures}
  1140. Software applications for language processing typically consist of several components that mirror different aspects of language. While such applications tend to be very complex, figure~\ref{fig:textprocessingarch_en} shows a highly simplified architecture of a typical text processing system. The first three modules handle the structure and meaning of the text input:
  1141. \begin{figure*}[b]
  1142. \colorrule{grey3}{\textwidth}{1.5pt}
  1143. \center
  1144. \includegraphics[width=\textwidth]{../_media/english/text_processing_app_architecture}
  1145. \caption{A typical text processing architecture}
  1146. \refstepcounter{refs_en}
  1147. \label{fig:textprocessingarch_en}
  1148. \colorrule{grey3}{\textwidth}{1.5pt}
  1149. \end{figure*}
  1150. \begin{enumerate}
  1151. \item Pre-processing: cleans the data, analyses or removes formatting, detects the input languages, and so on.
  1152. \item Grammatical analysis: finds the verb, its objects, modifiers and other sentence elements; detects the sentence structure.
  1153. \item Semantic analysis: performs disambiguation (i.\,e., computes the appropriate meaning of words in a given context); resolves anaphora (i.\,e., which pronouns refer to which nouns); represents the meaning of the sentence in a machine-readable way.
  1154. \end{enumerate}
  1155. After analysing the text, task-specific modules can perform other operations, such as automatic summarisation and database look-ups.
  1156. In the remainder of this section, we firstly introduce the core application areas for language technology, and follow this with a brief overview of the state of LT research and education today, and a description of past and present research programmes. Finally, we present an expert estimate of core LT tools and resources for Maltese in terms of various dimensions such as availability, maturity and quality. %FIXME The general situation of LT for the Maltese language is summarised in a matrix (figure~\ref{fig:lrlttable_en}). Tools and resources that are boldfaced in the text can also be found in figure~\ref{fig:lrlttable_en} (p.~\pageref{fig:lrlttable_en}) at the end of this chapter.
  1157. The general situation of LT for the Maltese language is summarised in figure~\ref{fig:lrlttable_en} (p.~\pageref{fig:lrlttable_en}) at the end of this chapter. This table lists all tools and resources that are boldfaced in the text. LT support for Maltese is also compared to other languages that are part of this series.
  1158. \subsection{Core Application Areas}
  1159. In this section, we focus on the most important LT tools and resources, and provide an overview of LT activities in Malta.
  1160. \subsubsection{Language Checking}
  1161. Anyone who has used a word processor such as Microsoft Word knows that it has a spell checker that highlights spelling mistakes and proposes corrections. The first spelling correction programs compared a list of extracted words against a dictionary of correctly spelled words. Today these programs are far more sophisticated. Using language-dependent algorithms for \textbf{grammatical analysis}, they detect errors related to morphology (e.\,g., plural formation) as well as syntax-related errors, such as a missing verb or a conflict of verb-subject agreement (e.\,g., \textit{she *write a letter}). However, most spell checkers will not find any errors in the following text \cite{zar1}:
  1162. \begin{quote}
  1163. I have a spelling checker,\\
  1164. It came with my PC.\\
  1165. It plane lee marks four my revue\\
  1166. Miss steaks aye can knot sea.
  1167. \end{quote}
  1168. For handling this type of error, analysis of the context is needed in many cases, e.\,g., for deciding in which position in a Maltese verb the silent \emph{} has to be written, as in:
  1169. \begin{enumerate} %[(a)]
  1170. \item \emph{...~in-negozjati li kien għamel il-Gvern ...}\\
  1171. `...~the negotiations that the government had made...'
  1172. \item \emph{Pawlu, agħmel l-eżamijiet!} \\
  1173. `Paul, do the exams!'
  1174. \item *\emph{...~in-negozjati li kien agħmel il-Gvern ...}
  1175. \end{enumerate}
  1176. Both \emph{għamel} `he made' and \emph{agħmel} `make!' are pronounced \lingua{ˈɐː.mɛl}.
  1177. This type of analysis either needs to draw on language-specific \textbf{grammars} laboriously coded into the software by experts, or on a statistical language model. In this case, a model calculates the probability of a particular word as it occurs in a specific position (e.\,g., between the words that precede and follow it). For example, \emph{kien għamel} is much more probable word sequence than \emph{kien agħmel}. A statistical language model can be automatically derived using a large amount of (correct) language data, a \textbf{text corpus}. Up to now, these approaches have mostly been developed and evaluated using English language data. However, they do not necessarily transfer well to highly inflectional languages like Maltese, where a given word type, such as a verb, can yield a large number of orthographic forms.
  1178. \begin{figure*}[htb]
  1179. \colorrule{grey3}{\textwidth}{1.5pt}
  1180. \center
  1181. \includegraphics[width=\textwidth]{../_media/english/language_checking}
  1182. \caption{Language checking (top: statistical; bottom: rule-based)}
  1183. \refstepcounter{refs_en}
  1184. \label{fig:langcheckingaarch_en}
  1185. \colorrule{grey3}{\textwidth}{1.5pt}
  1186. \end{figure*}
  1187. As with other languages, a means to determine whether a given string is a valid word is not a sufficient condition for spelling-error detection, but it is a necessary condition. As yet, no such means exists for Maltese, though various attempts have been made.
  1188. One of the earliest was by \cite{Mangion:1999} using a rudimentary form of rule-driven morphological analysis. Essentially a word was considered valid if it could be derived by rule from a citation form found in a dictionary. The problem with this approach is that it requires a complete list of all citation forms, and of course, the rules have to be very accurate. Results were somewhat limited by the list of citation forms, which was incomplete, and the imperfect nature of the rules.
  1189. A second approach looked to statistics for a solution. The intuitive idea is that for a given language, certain sequences of characters are highly unlikely. In English, for example, we never find the sequence \emph{kk}, so if that occurs as a substring in a written word, we can guess, with a high degree of confidence, that the word is not valid. More generally, we can calculate the probability of any string as a function of the probabilities of all its substrings, adopting the principle that to count as a valid word, that probability must exceed a certain threshold. A statistical spell checker making use of such a principle was developed by \cite{Mizzi:2000}. It did not require a lexicon, being based instead on the distribution of character n-grams found in a newspaper corpus. It became clear that for this approach to succeed (i) a more accurate language model was needed requiring more language data than was then available, and (ii) that string probability alone was insufficient to accurately classify an orthographic word as an error. As suggested above, other information is necessary, such as part of speech information from the surrounding context.
  1190. Other attempts to develop a spell-checker for Maltese include an online checker that has been developed by Ramon Casha of the Linux User Group \cite{Linux-spellcheck1}. This is based on a wordlist of around 1 million word types originally collected from various corpora, and subsequently extended by various rules for handling inflections. Its accuracy has not been officially established. Microsoft has also been working on a spell checker for inclusion with their Maltese language interface pack though it is not clear when this will be released.
  1191. %
  1192. The use of language checking is not limited to word processing tools. Language checking is also applied to automatically correct queries sent to search engines, e.\,g., Googles \emph{Did you mean} suggestions. Other application areas include various kinds of authoring support software.
  1193. As a result of the rapid increase in demand for technical products, many companies have begun to focus increasingly on the quality of technical documentation in the face of potential customer complaints about wrong linguistic usage and damage claims resulting from bad or badly understood instructions. Authoring support software can assist the writer of technical documentation to use vocabulary and sentence structures that are consistent with certain formally expressed rules and (corporate) terminology restrictions.
  1194. Authoring support software for Maltese does not at the moment exist but there would be considerable scope for the use of such software at the production end of Maltese. One of the reasons for the comparative scarcity of written Maltese content, in business correspondence for example, is that the production of correct Maltese text is difficult. Many competent native speakers are inclined to make mistakes when it comes to the written language and so they prefer to write in English. The availability of the right kind of simple authoring support tools could alleviate this problem.
  1195. An evolving area of language technology is computer-assisted language learning but apart from an interactive CD picture dictionary \cite{Sciriha:1997}, no such applications have been specifically developed for Maltese to date.
  1196. \subsubsection{Web Search}
  1197. \begin{figure*}[htb]
  1198. \colorrule{grey3}{\textwidth}{1.5pt}
  1199. \center
  1200. \includegraphics[width=\textwidth]{../_media/english/web_search_architecture}
  1201. \caption{Web search}
  1202. \refstepcounter{refs_en}
  1203. \label{fig:websearcharch_en}
  1204. \colorrule{grey3}{\textwidth}{1.5pt}
  1205. \end{figure*}
  1206. Search on the web, in intranets, or in digital libraries is probably the most widely used and yet underdeveloped language technology today. The search engine Google, which started in 1998, is nowadays used for about 80\% of all search queries world-wide \cite{spi1}. Since 2004, the verb \emph{to google} even has an entry in the \emph{Cambridge Advanced Learners Dictionary}. Neither the search interface nor the presentation of the retrieved results has significantly changed since the first version.
  1207. In the current version, Google offers spelling correction for misspelled words and also, in 2009, incorporated basic semantic search capabilities into their algorithmic mix \cite{pc1}, which can improve search accuracy by analysing the meaning of the query terms in context. The success story of Google shows that with a lot of data at hand and efficient indexing techniques a mainly statistical approach can lead to satisfactory results.
  1208. However, for more sophisticated information requests, the integration of deeper linguistic knowledge is essential. In the research labs, experiments using \textbf{lexical resources} such as machine-readable thesauri and ontological language resources like WordNet have shown improvements by allowing a page to be found on the basis of synonyms of the search terms, e.\,g., Maltese \emph{enerġija atomika, enerġija nukleari} (atomic energy, nuclear energy) or even more loosely related terms.
  1209. %However, for more sophisticated information requests, the integration of deeper linguistic knowledge is essential. In the research labs, experiments using %\textbf{lexical resources} such as machine-readable thesauri and ontological language resources like WordNet have shown improvements by allowing a %page to be found on the basis of synonyms of the search terms, e.\,g., German \emph{Atomkraft, Kernenergie and Nuklearenergie} (atomic energy, atomic %power, and nuclear energy) or even more loosely related terms.
  1210. The next generation of search engines will have to include much more sophisticated language technology. If a search query consists of a question or another type of sentence rather than a list of key-words, retrieving relevant answers to this query requires a syntactic and \textbf{semantic analysis} of the sentence as well as the availability of an index that allows for a fast retrieval of the relevant documents. For example, imagine a user inputs the query ``Give me a list of all companies that were taken over by other companies in the last five years''. For a satisfactory answer, syntactic parsing needs to be applied to analyse the grammatical structure of the sentence and determine that the user is looking for companies that have been taken over and not companies that took over others. Also, the expression \emph{last five years} needs to be processed in order to find out which years it refers to.
  1211. Finally, the processed query needs to be matched against a huge amount of unstructured data in order to find the piece or pieces of information the user is looking for. This is commonly referred to as information retrieval and involves the search for and ranking of relevant documents. In addition, generating a list of companies, we also need to extract the information that a particular string of words in a document actually refers to a company name. This kind of information is made available by so-called named-entity recognisers.
  1212. \boxtext{The next generation of search engines\\ will have to include much more sophisticated\\ language technology.}
  1213. Even more demanding is the attempt to match a query to documents written in a different language. For cross-lingual information retrieval, we have to automatically translate the query to all possible source languages and transfer the retrieved information back to the target language. The increasing percentage of data available in non-textual formats drives the demand for services enabling multimedia information retrieval, i.\,e., information search on images, audio, and video data. For audio and video files, this involves a \textbf{speech recognition} module to convert speech content into text or a phonetic representation, to which user queries can be matched.
  1214. In Malta, there are a number of search websites that are specifically oriented towards Malta \cite{philb1}. In addition there are a small number of Malta based SMEs that incorporate relatively sophisticated language processing techniques within search applications. Charonite \cite{charonite1}, for example, is a local SME dealing with search engine optimisation. However, at the time of writing there are no commercially available search engines that are specifically oriented towards the Maltese language, apart from a prototype for cross lingual information retrieval developed within the scope of LT4EL \cite{let1}, a European FP6 research project which used multilingual language technology tools and semantic encoding techniques for improving the retrieval of learning material.
  1215. \subsubsection{Speech Interaction}
  1216. Speech interaction is one of many application areas that depend on speech technology, i.\,e., technologies for processing spoken language. Speech interaction technology is used to create interfaces that enable users to interact in spoken language instead of using a graphical display, keyboard and mouse. Today, these voice user interfaces (VUI) are used for partially or fully automated telephone services provided by companies to customers, employees or partners. Business domains that rely heavily on VUIs include banking, supply chain, public transportation, and telecommunications. Other uses of speech interaction technology include interfaces to car navigation systems and the use of spoken language as an alternative to the graphical or touchscreen interfaces in smartphones.
  1217. \boxtext{Speech interaction is the basis for interfaces that allow a user to interact with spoken language.}
  1218. \begin{figure*}[htb]
  1219. \colorrule{grey3}{\textwidth}{1.5pt}
  1220. \center
  1221. \includegraphics[width=\textwidth]{../_media/english/simple_speech-based_dialogue_architecture}
  1222. \caption{Speech-based dialogue system}
  1223. \refstepcounter{refs_en}
  1224. \label{fig:dialoguearch_en}
  1225. \colorrule{grey3}{\textwidth}{1.5pt}
  1226. \end{figure*}
  1227. Speech interaction technology comprises four technologies:
  1228. \begin{enumerate}
  1229. \item Automatic \textbf{speech recognition} (ASR) determines which words are actually spoken in a given sequence of sounds uttered by a user.
  1230. \item Natural language understanding analyses the syntactic structure of a users utterance and interprets it according to the system in question.
  1231. \item Dialogue management determines which action to take given the user input and system functionality.
  1232. \item \textbf{Speech synthesis} (text-to-speech or TTS) transforms the systems reply into sounds for the user.
  1233. \end{enumerate}
  1234. One of the major challenges is to have an ASR system recognise the words uttered by a user as precisely as possible. This requires either a restriction of the range of possible user utterances to a limited set of keywords, or the manual creation of language models that cover a large range of natural language user utterances. Using machine learning techniques, language models can also be generated automatically from \textbf{speech corpora}, i.\,e., large collections of speech audio files and text transcriptions. Restricting utterances usually forces people to use the voice user interface (VUI) in a rigid way and can damage user acceptance; but the creation, tuning and maintenance of rich language models will significantly increase costs. VUIs that employ language models and initially allow a user to express their intent more flexibly -- prompted by a \textit{How may I help you?} greeting -- are better accepted by users.
  1235. Companies tend to use pre-recorded utterances of professional speakers for generating the output of the voice user interface. For static utterances, where the wording does not depend on the particular contexts of use or the personal user data, this can deliver a rich user experience. But more dynamic content in an utterance may suffer from unnatural intonation because different parts of audio files have simply been strung together. Through optimisation, todays TTS systems are getting better at producing natural-sounding dynamic utterances.
  1236. %\boxtext{Speech interaction is the basis for creating interfaces that allow a user to interact with spoken language instead of a graphical display, keyboard and mouse.}
  1237. Interfaces in speech interaction have been considerably standardised during the last decade in terms of their various technological components. There has also been strong market consolidation in speech recognition and speech synthesis. The national markets in the G20 countries (economically resilient countries with high populations) have been dominated by just five global players, with Nuance (USA) and Loquendo (Italy) being the most prominent players in Europe. In 2011, Nuance announced the acquisition of Loquendo, which represents a further step in market consolidation.
  1238. Most speech interaction technology development in Malta has concentrated on text-to-speech (TTS). Some pioneering work was initially carried out by \cite{Micallef:1997} and this was followed by a number of Masters dissertations \cite{Farrugia:2005}. Some work on a web-based TTS system was initiated by \cite{Buhagiar-Micallef:2008}.
  1239. A significant recent development for Maltese speech synthesis was the winning of a government tender for the development of a speech synthesiser by the local company Crimson Wing Malta Ltd. This work is partly financed by the EU Regional Development fund and commissioned by the Maltese Foundation for Information Access (FITA). The prototype will be SAPI compliant and will include three voices (male, female, and child). According to a recent presentation \cite{Borg-et-al:2011} the work is advancing well and a prototype, expected in 2012, will be freely available for download.
  1240. Work on speech recognition is less advanced. A prototype for recognizing numerals was created by \cite{Calleja:2002} in simple domains. With respect to speech, the fundamental problem remains a lack of suitably annotated data since this requires significant manual effort. Some attempts at automatic annotation have been made by \cite{Psaila:2008}. The creation of a corpus and descriptive framework for the study of Maltese intonation was initiated by the Institute of Linguistics carried out by \cite{Vella-Farrugia:2006}. It is expected that the corpora being developed by Crimson Wing will be made available for research.
  1241. Looking beyond the state of todays technology, there will be significant changes due to the spread of smartphones as a new platform for managing customer relationships in addition to the telephone, Internet, and email channels. This tendency will also affect the employment of technology for Speech Interaction. On the one hand, demand for telephony-based VUIs will decrease in the long run. On the other hand, the usage of spoken language as a user-friendly input modality for smartphones will gain significant importance. This tendency is supported by the observable improvement of speaker-independent speech recognition accuracy for speech dictation services that are already offered as centralised services to smartphone users. Given this outsourcing of the recognition task to the infrastructure of applications, the application-specific employment of linguistic core technologies will supposedly gain importance compared to the present situation.
  1242. \subsubsection{Machine Translation}
  1243. \begin{figure*}[htb]
  1244. \colorrule{grey3}{\textwidth}{1.5pt}
  1245. \center
  1246. \includegraphics[width=\textwidth]{../_media/english/machine_translation}
  1247. \caption{Machine translation (left: statistical; right: rule-based)}
  1248. \refstepcounter{refs_en}
  1249. \label{fig:mtarch_en}
  1250. \colorrule{grey3}{\textwidth}{1.5pt}
  1251. \end{figure*}
  1252. The idea of using digital computers for translation of natural languages came up in 1946 by A.~D.~Booth and was followed by substantial funding for research in this area in the 1950s and beginning again in the 1980s. Nevertheless, \textbf{Machine Translation} (MT) still fails to fulfill the high expectations it gave rise to in its early years.
  1253. The most basic approach to machine translation is the automatic replacement of the words in a text written in one natural language with the equivalent words of another language.
  1254. \boxtext{At its basic level, Machine Translation simply substitutes words in one natural language\\ with words in another language.}
  1255. This can be useful in subject domains with a very restricted, formulaic language, e.\,g., weather reports. However, for a good translation of less standardised texts, larger text units (phrases, sentences, or even whole passages) need to be matched to their closest counterparts in the target language. The major difficulty here lies in the fact that human language is ambiguous, which yields challenges on multiple levels, e.\,g., word sense disambiguation at the lexical level (`Jaguar' can mean a car or an animal) or the attachment of prepositional phrases on the syntactic level as in:
  1256. \begin{enumerate} %[(a)]
  1257. \item \emph{Il-Kuntistabbli osserva lir-ragel bit-teleskopju.}\\
  1258. `The policeman observed the man with the telescope.'
  1259. \item \emph{Il-Kuntistabbli osserva lir-ragel bir-rivolver.}\\
  1260. `The policeman observed the man with the revolver.'
  1261. \end{enumerate}
  1262. One way of approaching the task is based on linguistic rules. For translations between closely related languages, a direct translation may be feasible in cases like the example above. But often rule-based (or knowledge-driven) systems analyse the input text and create an intermediary, symbolic representation, from which the text in the target language is generated. The success of these methods is highly dependent on the availability of extensive lexicons with morphological, syntactic, and semantic information, and large sets of grammar rules carefully designed by a skilled linguist.
  1263. Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for MT. The parameters of these statistical models are derived from the analysis of bilingual text corpora, such as the Europarl \textbf{parallel corpus}, which contains the proceedings of the European Parliament in 21 European languages. Given enough data, statistical MT works well enough to derive an approximate meaning of a foreign language text. However, unlike knowledge-driven systems, statistical (or data-driven) MT often generates ungrammatical output. On the other hand, besides the advantage that less human effort is required for grammar writing, data-driven MT can also cover particularities of the language that go missing in knowledge-driven systems, for example idiomatic expressions.
  1264. As the strengths and weaknesses of knowledge- and data-driven MT are complementary, researchers nowadays unanimously target hybrid approaches combining methodologies of both. This can be done in several ways. One is to use both knowledge- and data-driven systems and have a selection module decide on the best output for each sentence. However, for longer sentences, no result will be perfect. A better solution is to combine the best parts of each sentence from multiple outputs, which can be fairly complex, as corresponding parts of multiple alternatives are not always obvious and need to be aligned.
  1265. \boxtext{The quality of MT systems is still considered\\ to have huge improvement potential.}
  1266. In Malta work carried out in Machine Translation has been restricted to just a few Bachelors and Masters dissertations. A transfer system based on LFG was developed for English/Maltese by \cite{Farrugia:2000} and successfully translated weather forecasts. Later J.~Bajada \cite{Bajada:2004, Bajada:2009} worked on statistical MT (SMT) with the emphasis on techniques for producing language and translation models. The earlier work concerned word-based models, whilst the latter developed techniques for gathering bilingual phrase data from a limited corpus.
  1267. Like in so many other areas, the underlying problem is a lack of sufficient quantities of suitably annotated bilingual data. For this reason, perhaps, the benchmark system against which to judge advances remains Google Translate.
  1268. The quality of MT systems is still considered to have huge improvement potential. Challenges include the adaptability of the language resources to a given subject domain or user area and the integration into existing workflows with term bases and translation memories. In addition, most of the current systems are English-centred and support only few languages from and into other languages, which leads to frictions in the total translation workflow, and, e.\,g., forces MT users to learn different lexicon coding tools for different systems.
  1269. Evaluation campaigns allow for comparing the quality of MT systems, the various approaches and the status of MT systems for the different languages. Figure~\ref{fig:euromatrix_mt} (p.~\pageref{fig:euromatrix_mt}), which was prepared during the EC Euromatrix+ project, shows the pairwise performances obtained for 22 of the 23 official EU languages (Irish was not compared). The results are ranked according to a BLEU score, which indicates higher scores for better translations \cite{bleu1}. A human translator would normally achieve a score of around 80 points.
  1270. The best results (shown in green and blue) were achieved by languages that benefit from considerable research efforts, within coordinated programs, and from the existence of many parallel corpora (e.\,g., English, French, Dutch, Spanish, German), the worst (in red) by languages that did not benefit from similar efforts, or that are very different from other languages (e.\,g., Hungarian, Maltese, Finnish).
  1271. \subsection{Other Application Areas}
  1272. Building language technology applications involves a range of subtasks that do not always surface at the level of interaction with the user, but provide significant service functionalities ``behind the scenes". They form important research issues that have now evolved into individual subdisciplines of computational linguistics. Question answering, for example, is an active area of research for which annotated corpora have been built and scientific competitions have been initiated. The concept of question answering goes beyond keyword-based search (in which the search engine responds by delivering a collection of potentially relevant documents) and enables users to ask a concrete question to which the system provides a single answer, e.\,g.,
  1273. \begin{itemize}
  1274. \item[] \textit{Question: How old was Neil Armstrong when he stepped on the moon?}
  1275. \item[] \textit{Answer: 38.}
  1276. \end{itemize}
  1277. While question answering is obviously related to the core area of web search, it is nowadays an umbrella term for such research issues as which different types of questions exist, and how they should be handled; how a set of documents that potentially contain the answer can be analysed and compared (do they provide conflicting answers?); and how specific information (the answer) can be reliably extracted from a document without ignoring the context.
  1278. \boxtext{Language technology applications often provide significant service functionalities ``behind the scenes” of larger software systems.}
  1279. This is in turn related to the information extraction (IE) task, an area that was extremely popular and influential at the time of the ``statistical turn'' in Computational Linguistics, in the early 1990s. IE aims at identifying specific pieces of information in specific classes of documents; this could be, e.\,g., the detection of the key players in company takeovers as reported in newspaper stories. Another scenario that has been worked on is reports on terrorist incidents, where the problem is to map the text to a template specifying the perpetrator, the target, time and location of the incident, and the results of the incident. Domain-specific template-filling is the central characteristic of IE, which for this reason is another example of a ``behind the scenes'' technology that constitutes a well-demarcated research area but for practical purposes then needs to be embedded into a suitable application environment.
  1280. Two ``borderline'' areas, which sometimes play the role of standalone application and sometimes that of supportive, ``under the hood'' component are text summarisation and \textbf{text generation}. Summarisation, obviously, refers to the task of making a long text short, and is offered for instance as a functionality within MS Word. It works largely on a statistical basis, by first identifying `important' words in a text (that is, for example, words that are highly frequent in this text but markedly less frequent in general language use) and then determining those sentences that contain many important words. These sentences are then marked in the document, or extracted from it, and are taken to constitute the summary. In this scenario, which is by far the most popular one, summarisation equals sentence extraction: the text is reduced to a subset of its sentences. All commercial summarisers make use of this idea. An alternative approach, to which some research is devoted, is to actually synthesise \emph{new} sentences, i.\,e., to build a summary of sentences that need not show up in that form in the source text. This requires a certain amount of deeper understanding of the text and therefore is much less robust. All in all, a text generator is in most cases not a stand-alone application but embedded into a larger software environment, such as into the clinical information system where patient data is collected, stored and processed, and report generation is just one of many functionalities.
  1281. \subsection{Educational Programmes}
  1282. Language technology is a highly interdisciplinary field, involving the expertise of linguists, computer scientists, mathematicians, philosophers, psycholinguists, and neuroscientists, among others.
  1283. In Malta the vast majority of research and education in LT has taken place at the University of Malta. However, it was established rather late. One reason for this was the late appearance of Computer Science as a curriculum subject at the University. The turbulent political leadership of the country during the 1970s and 1980s had not foreseen the information revolution to come and it was not until the early 1990s that an undergraduate option in Computing with Mathematics was offered through the Faculty of Science.
  1284. The roots of change came in 1994, when a national strategic initiative was undertaken to recognise and strengthen the role of IT in commercial, political, and above all, educational sectors. One immediate consequence of this was the introduction of a substantial four-year Bachelors programme -- the BSc.~IT (Hons) -- at University as well as the founding of a new Department of Computer Science and Artificial Intelligence (CSAI, renamed ``Department of Intelligent Computer Systems (ICS)'' in 2009). A course in NLP was included as an advanced option, and this led, four years later, to a series of undergraduate final year projects tackling language processing issues including computational approaches to Maltese \cite{Galea:1999, Mangion:1999, Farrugia:1999, Farrugia:2000, Mizzi:2000, Bajada:2004, Attard:2005, Farrugia:2008, Farrugia:2009, Vella:2010}. The Department of Computer Communications Engineering also participated in the programme, and this led to another set of undergraduate projects in speech technology.
  1285. Another important influence on research is the Universitys Institute of Linguistics (IOL), founded in 1988 with the aim of teaching as well as promoting and coordinating research in both General and Applied Linguistics, furthering research involving the description of particular languages, not least Maltese, fostering the study of the various sub-fields of linguistics, and promoting interdisciplinary research involving academics in practical cooperation that cuts across departmental and faculty boundaries abroad. The Institute of Linguistics runs two undergraduate programmes: a B.A.~in General Linguistics and a new B.Sc.~in Human Language Technology which will be on offer in October 2011. It is also possible to do a Masters Degree and a Ph.D.~in Linguistics with the Institute.
  1286. In 1997, an interdisciplinary group of computer scientists and linguists embarked on Maltilex (M.~Rosner, R.~Fabri, J.~Caruana, M.~Montebello and others), a project to create a computational lexicon, which was sustained by a small grant from the University supported by the Mid-Med Bank. A simple web-based interface was developed to enable the creation and maintenance of entries, as reported in \cite{Rosner-et-al:1998} at the first ACL Workshop on Computational Approaches to Semitic Languages \cite{Rosner:1998}. Several thousand such entries were created by hand, but the project ran into legal problems, the compilation of entries having been largely inspired by Joseph Aquilina's existing paper dictionary \cite{Aquilina:1987,Aquilina:1990}.
  1287. Effort then shifted from paper dictionaries to extraction of lexical entries from other sources. Two \cite{Dalli:2001, Attard:2005} used techniques based on alignment derived from bioinformatics to cluster lexical entries and this was used as a means of structuring the lexicon automatically.
  1288. Despite lack of funding, the Maltilex effort continued in a somewhat piecemeal fashion, supported by staff at the IOL and CSAI Department. It was not until 2005 that Malta's Council for Science and Technology (MCST) launched the country's first Research and Technology Development Initiative and a joint proposal for a Maltese Language Resource Server (MLRS) was accepted, providing sufficient financial support to employ a researcher full time between 2006 and 2008. The project had the twin goals of creating both a lexicon and a corpus \cite{Rosner:2009}, and it laid the foundations for the present MLRS server.
  1289. The research mentioned above mainly deals with the written language. Two branches of speech-related work are also ongoing.
  1290. The first, initiated from the signal-processing tradition within the Engineering Faculty, yielded a prototype speech synthesiser \cite{Micallef:1997}. His work has influenced several other projects aimed at improving speech synthesis from a low-resource perspective including \cite{Calleja:2002, Farrugia:2005, Camilleri:2010, Borg-et-al:2011}.
  1291. The second tackles the issue of intonation \cite{Vella:2009} from a linguistic perspective. Some pioneering work to create a corpus and descriptive framework for the study of Maltese intonation was carried out by \cite{Vella-Farrugia:2006}.
  1292. \columnbreak
  1293. Outside Malta, two research groups that are in active collaboration with local LT-oriented efforts deserve a special mention.
  1294. At the University of Arizona, a group led by linguist Adam Ussishkin is particularly interested in the psycholinguistic issues pertaining to Semitic languages including Maltese. To study these issues an online corpus has been made available \cite{Ussishkin-et-al:2009}.
  1295. At the University of Bremen, Prof.~Thomas Stolz has been actively involved with the academic study of Maltese but is particularly known for having hosted the first conference on Maltese Linguistics in Bremen \cite{Comrie-et-al:2009}, founded a periodical \cite{GHILM2} and the International Association of Maltese Linguistics, also based in Bremen, that exists alongside the Malta-based Council for the Maltese Language.
  1296. As mentioned, the LT-sensitive communities existing at the University of Malta mainly inhabit the Faculty of ICT, the Institute of Linguistics. There is also a potential interest in Faculty of Arts (Department of Maltese) and other Humanities subjects though up until now computational linguistics tends to be regarded as an exotic topic located in the more scientific computer science faculties or in the humanities and, therefore, the research topics dealt with only overlap only partially.
  1297. Curiously, Malta does not lack for LT-related international events. LREC 2010 was held in Valletta, drawing 1200 participants. The annual EAMT conference was also held in Malta in 1994, and there have also been a number of smaller workshops held during the last 10 years.
  1298. \subsection{National Projects and Initiatives}
  1299. Malta joined the EU in 2004 and this event immediately conferred to Maltese the status of being an official EU language. With this status came new obligations -- in particular to translate large quantities of official documents, and in addition, a recognition, at European level, that as a national language, it should have ``first-class'' status from a technological as well as a social perspective, and be accorded all the rights and privileges enjoyed by ``larger'' European languages (i.\,e., having larger numbers of native speakers).
  1300. The governments National IT Strategy 2008-10 included a number of objectives related to Maltese Language including (i) the development of online government in Maltese, (ii) creation of Maltese language tools, in collaboration with the University, and (iii) support for Maltese online communities. At the time of writing in 2011, not all the objectives have been realised. However the longer term effects of this strategy are beginning to take shape.
  1301. Currently the language technology scene in Malta is under the influence of four main initiatives:
  1302. \begin{enumerate}
  1303. \item First of all, a government-supported project partly funded by EU regional development funds is under way to bring speech technology within the reach of disabled persons. The project is currently focused on Maltese speech synthesis, and at this point the relevant language models are in the process of being developed. The consortium, which consists of an SME (Crimson Wing Ltd), a foundation (FITA, Foundation for IT Access), and the University, has pledged that these resources will be made available for research purposes. It remains to be seen whether components of the speech synthesiser will be made available to resource sharing networks inspired by CLARIN and META.
  1304. \item Second, as is evident from the current report, Malta participates in METANET4U and is thus in receipt of significant EU funding aimed at the enhancement and distribution of resources and tools that are specifically for Maltese. The University of Malta is a member of META-NET and intends to fulfill its obligations towards the aims of META, particularly regarding the identification of stakeholders, actually and potential.
  1305. \item Third, the Maltese Language Resource Server (MLRS) \cite{Rosner-et-al:2006, MLRS1} has come to fruition and significant efforts are under way at University, through the Institute of Linguistics (A.~Gatt, C.~Borg, R.~Fabri) and the Department of Intelligent Computer Systems (M.~Rosner), to maintain and develop it. Currently MLRS is online at \url{http://mlrs.research.um.edu.mt}. The corpus comprises some 100M words, and the system includes basic services that include KWIC search and display, pattern-directed search, various kinds of statistical analysis etc. Further tools are planned including a part-of-speech tagger and a spell-checker.
  1306. \item Finally, a new undergraduate programme in Human Language Technology is destined to be launched by the Institute of Linguistics in October 2011. This will cover a full range of topics and will inevitably have a positive long-term effect on the study of Maltese from a computational perspective.
  1307. \end{enumerate}
  1308. Besides these, a project to develop an electronic version of the Aquilina dictionary \cite{Aquilina:1987,Aquilina:1990} is currently in preparation. This is a collaborative effort between the University of Malta who are supplying the linguistic expertise, the University of Arizona, who have already digitised the dictionary into machine readable form, and the publishers Midsea Books of Valletta. The dual aims of the project are to update the content, and to confer upon researchers the flexibility to swiftly access the text. An effort is in progress locally, to organise the right level of lexicographic expertise necessary to update the content.
  1309. We should also mention Maltas relationship to CLARIN, a proposed EU research infrastructure addressing the provision of language resources for the Humanities and Social Sciences. During specification phase, the University was able to participate thanks to a small support grant from the local Council for Science and Technology. However, it has turned out to be more challenging to secure the longer term funding required for the construction phase of CLARIN. Identification of a suitable government entity to take responsibility for the programme has so far been without success. Consequently, Maltas future participation in the construction phase currently hangs in the balance.
  1310. \subsection{Availability of Tools and Resources}
  1311. Figure~\ref{fig:lrlttable_en} provides a rating for language technology support for the Maltese language. This rating of existing tools and resources was generated by leading experts in the field who provided estimates based on a scale from 0 (very low) to 6 (very high) using seven criteria.
  1312. For Maltese, the most evident characteristics revealed by the figure are that
  1313. \begin{itemize}
  1314. \item many entries are blank, and
  1315. \item the highest grade scored is 3.2.
  1316. \end{itemize}
  1317. \begin{figure*}[htb]
  1318. \centering
  1319. %\begin{tabular}{>{\columncolor{orange1}}p{.33\linewidth}ccccccc} % ORIGINAL
  1320. \begin{tabular}{>{\columncolor{orange1}}p{.33\linewidth}@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c@{\hspace*{6mm}}c}
  1321. \rowcolor{orange1}
  1322. \cellcolor{white}&\begin{sideways}\makecell[l]{Quantity}\end{sideways}
  1323. &\begin{sideways}\makecell[l]{\makecell[l]{Availability} }\end{sideways} &\begin{sideways}\makecell[l]{Quality}\end{sideways}
  1324. &\begin{sideways}\makecell[l]{Coverage}\end{sideways} &\begin{sideways}\makecell[l]{Maturity}\end{sideways} &\begin{sideways}\makecell[l]{Sustainability~~~}\end{sideways} &\begin{sideways}\makecell[l]{Adaptability}\end{sideways} \\ \addlinespace
  1325. \multicolumn{8}{>{\columncolor{orange2}}l}{Language Technology: Tools, Technologies and Applications} \\ \addlinespace
  1326. Speech Recognition &0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 \\ \addlinespace
  1327. Speech Synthesis &2.4 & 0.8 & 3.2 & 3.2 & 2.4 & 2.4 & 2.4\\ \addlinespace
  1328. Grammatical analysis & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8 & 0.8\\ \addlinespace
  1329. Semantic analysis &0& 0& 0& 0& 0& 0& 0\\ \addlinespace
  1330. Text generation & 0& 0& 0& 0& 0& 0&0\\ \addlinespace
  1331. Machine translation &1.6 &1.6 & 1.6 & 1.6 & 1.6 & 1.6 & 1.6\\ \addlinespace
  1332. \multicolumn{8}{>{\columncolor{orange2}}l}{Language Resources: Resources, Data and Knowledge Bases} \\ \addlinespace
  1333. Text corpora &3.2 &3.2 &2.4 &2.4 &2.4 &3.2 &3.2\\ \addlinespace
  1334. Speech corpora &2.4 &0.8 &2.4 &1.6 &2.4 &2.4 &2.4\\ \addlinespace
  1335. Parallel corpora &3.2& 3.2& 2.4& 1.6& 1.6& 1.6& 1.6\\ \addlinespace
  1336. Lexical resources &2.4&2.4 &1.6 &2.4 &2.4 &2.4 &2.4\\ \addlinespace
  1337. Grammars &0& 0& 0&0 &0 &0 &0\\
  1338. \end{tabular}
  1339. \caption{State of language technology support for Maltese}
  1340. \refstepcounter{refs_en}
  1341. \label{fig:lrlttable_en}
  1342. \end{figure*}
  1343. The fact that most entries are blank reflects the immature state of LT-related research and development in Malta. Although there are signs that the situation is improving, investment in language technology remains at a low level, and as a result, despite modest local achievements, the effort is fragmentary, both in terms of coverage of different areas, and in terms of sustainability of research: there have been too many projects involving just one area, just one researcher, and just one or two years. The collective efforts dont add up as they should.
  1344. So what has been achieved? We can see by looking at the non-blank entries, whose average score yields the following ordering:
  1345. \begin{itemize}
  1346. \item Tools:
  1347. \begin{enumerate}
  1348. \item Tokenisation, Speech Synthesis
  1349. \item Speech Recognition
  1350. \end{enumerate}
  1351. \item Resources:
  1352. \begin{enumerate}
  1353. \item Reference Corpora
  1354. \item Parallel Corpora
  1355. \item Lexicons, Terminology (this should be understood to include wordlists)
  1356. \item Language Models
  1357. \end{enumerate}
  1358. \end{itemize}
  1359. With respect to tools, low level text extraction and processing tools are available, including a tokeniser. A POS-tagger is under development, but its performance is not state-of-the-art, pending further training with better annotated data.
  1360. Higher level tools (syntactic or semantic analysis, classification tools, information extraction etc.) are entirely lacking. A consequence is that, for example, there are no treebanks available for Maltese.
  1361. Prototype speech recognition tools have been developed at University but are not readily available at the time of writing. However, the government-funded speech engine mentioned earlier should yield a working speech synthesiser by 2013. Whilst this is a very positive development, it is highly focused on the synthesis side of speech. Hardly any work on \textbf{speech recognition} is planned at this stage.
  1362. With respect to resources, the situation is a little more structured, in so far as there already exists MLRS, an extensible computational infrastructure in the form of a server providing the basic functionality to enable access over the web to available corpora, some services, and a rudimentary system to facilitate the submission of contributions. MLRS currently provides some very basic services for the extraction, representation, search and analysis of text.
  1363. The existing MLRS corpus is currently around 100 million tokens in length. It is predominantly textual and monolingual. It is also somewhat non-representative: there is an abundance of legalistic material, but a shortage of academic text and works of fiction.
  1364. As things stand, these materials can only be searched and analysed through the server and cannot be accessed directly. The reasons are legalistic. With access restricted in this way, the complications of IPR and copyright have been neatly sidestepped. The price is that these complications will eventually have to be confronted in the future, and in fact META is in the process of formulating a set of licence agreements to suit the distribution of resources, like MLRS.
  1365. \subsection{Cross-language comparison}
  1366. The current state of LT support varies considerably from one language community to another. In order to compare the situation between languages, this section will present an evaluation based on two sample application areas (machine translation and speech processing) and one underlying technology (text analysis), as well as basic resources needed for building LT applications. The languages were categorised using the following five-point scale:
  1367. \begin{enumerate}
  1368. \item Excellent support
  1369. \item Good support
  1370. \item Moderate support
  1371. \item Fragmentary support
  1372. \item Weak or no support
  1373. \end{enumerate}
  1374. LT support was measured according to the following criteria:
  1375. \textbf{Speech Processing:} Quality of existing speech recognition technologies, quality of existing speech synthesis technologies, coverage of domains, number and size of existing speech corpora, amount and variety of available speech-based applications.\vspace*{0.09cm}
  1376. \textbf{Machine Translation:} Quality of existing MT technologies, number of language pairs covered, coverage of linguistic phenomena and domains, quality and size of existing parallel corpora, amount and variety of available MT applications.\vspace*{0.09cm}
  1377. \textbf{Text Analysis:} Quality and coverage of existing text analysis technologies (morphology, syntax, semantics), coverage of linguistic phenomena and domains, amount and variety of applications, quality and size of existing (annotated) text corpora, quality and coverage of lexical resources (e.\,g., WordNet) and grammars.\vspace*{0.09cm}
  1378. \textbf{Resources:} Quality and size of existing text corpora, speech corpora and parallel corpora, quality and coverage of existing lexical resources and grammars.\vspace*{0.09cm}
  1379. Figures~\ref{fig:speech_cluster_en} to~\ref{fig:resources_cluster_en} show that the Maltese language has only low to medium LT support and thus compares well with other less spoken languages of Europe. LT resources and tools for Maltese clearly do not yet reach the quality and coverage of comparable resources and tools for major languages like German, and certainly not that of those for the English language, which is in the lead in almost all LT areas. And there are still plenty of gaps in English language resources with regard to high quality applications. %\vspace*{0.009cm}
  1380. \subsection{Conclusions}
  1381. \emph{In this series of white papers, we have made an important initial effort to assess language technology support for 30 European languages, and provide a high-level comparison across these languages. By identifying the gaps, needs and deficits, the European language technology community and related stakeholders are now in a position to design a large scale research and development programme aimed at building a truly multilingual, technology-enabled Europe.}
  1382. The results of this white paper series show that there is a dramatic difference in language technology support between the various European languages. While there are good quality software and resources available for some languages and application areas, others, usually smaller languages, have substantial gaps. Many languages lack basic technologies for text analysis and the essential resources. Others have basic tools and resources but the implementation of for example semantic methods is still far away. Therefore a large-scale effort is needed to attain the ambitious goal of providing high-quality language technology support for all European languages, for example through high quality machine translation.
  1383. In this report, we have tried to convey the paradoxical state of Maltese Language Technology. The paradox arises because there are significant efforts made by a small number of well-qualified people across a spectrum of LT-related activities to improve the state of the art, whether this be in terms of tools, or resources, or both. It is also clear that within the wider context of educational, commercial and cultural activities in the country, there is a place for LT to make an important contribution. The problem is that efforts that have been made are uncoordinated, short term, and fragmentary, so progress is slower than it has to be.
  1384. Sustained and directed coordination of effort is, in our opinion, the only way in which the benefits of LT for Maltese will be realised in a reasonable time. We believe that even in a country as small as Malta, the work needs to be shared out amongst different stake-holders. We must arrive at a workable roadmap via a localised version of the tripartite division of labour advocated by META: identification of a community with a shared vision; extension of an infrastructure to facilitate the sharing of resources, and reinforcement of connections between LT and neighbouring fields of research and development.
  1385. \begin{figure*}[tb]
  1386. \small
  1387. \centering
  1388. \begin{tabular}
  1389. { % defines color for each column.
  1390. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1391. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1392. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1393. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1394. >{\columncolor{corange1}}p{.13\linewidth}
  1395. }
  1396. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Excellent}} &
  1397. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Good}} &
  1398. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Moderate}} &
  1399. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Fragmentary}} &
  1400. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Weak/no}} \\
  1401. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1402. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1403. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1404. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1405. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{support}} \\ \addlinespace
  1406. & \vspace*{0.5mm}English
  1407. & \vspace*{0.5mm}
  1408. Czech \newline
  1409. Dutch \newline
  1410. Finnish \newline
  1411. French \newline
  1412. German \newline
  1413. Italian \newline
  1414. Portuguese \newline
  1415. Spanish \newline
  1416. & \vspace*{0.5mm}Basque \newline
  1417. Bulgarian \newline
  1418. Catalan \newline
  1419. Danish \newline
  1420. Estonian \newline
  1421. Galician\newline
  1422. Greek \newline
  1423. Hungarian \newline
  1424. Irish \newline
  1425. Norwegian \newline
  1426. Polish \newline
  1427. Serbian \newline
  1428. Slovak \newline
  1429. Slovene \newline
  1430. Swedish \newline
  1431. & \vspace*{0.5mm}
  1432. Croatian \newline
  1433. Icelandic \newline
  1434. Latvian \newline
  1435. Lithuanian \newline
  1436. \textbf{Maltese} \newline
  1437. Romanian\\
  1438. \end{tabular}
  1439. \caption{Speech processing: state of language technology support for 30 European languages}
  1440. \refstepcounter{refs_en}
  1441. \label{fig:speech_cluster_en}
  1442. \end{figure*}
  1443. \begin{figure*}[tb]
  1444. \small
  1445. \centering
  1446. \begin{tabular}
  1447. { % defines color for each column.
  1448. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1449. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1450. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1451. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1452. >{\columncolor{corange1}}p{.13\linewidth}
  1453. }
  1454. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Excellent}} &
  1455. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Good}} &
  1456. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Moderate}} &
  1457. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Fragmentary}} &
  1458. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Weak/no}} \\
  1459. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1460. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1461. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1462. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1463. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{support}} \\ \addlinespace
  1464. & \vspace*{0.5mm} English
  1465. & \vspace*{0.5mm}
  1466. French \newline
  1467. Spanish
  1468. & \vspace*{0.5mm}
  1469. Catalan \newline
  1470. Dutch \newline
  1471. German \newline
  1472. Hungarian \newline
  1473. Italian \newline
  1474. Polish \newline
  1475. Romanian \newline
  1476. & \vspace*{0.5mm}Basque \newline
  1477. Bulgarian \newline
  1478. Croatian \newline
  1479. Czech \newline
  1480. Danish \newline
  1481. Estonian \newline
  1482. Finnish \newline
  1483. Galician \newline
  1484. Greek \newline
  1485. Icelandic \newline
  1486. Irish \newline
  1487. Latvian \newline
  1488. Lithuanian \newline
  1489. \textbf{Maltese} \newline
  1490. Norwegian \newline
  1491. Portuguese \newline
  1492. Serbian \newline
  1493. Slovak \newline
  1494. Slovene \newline
  1495. Swedish \newline
  1496. \end{tabular}
  1497. \caption{Machine translation: state of language technology support for 30 European languages}
  1498. \refstepcounter{refs_en}
  1499. \label{fig:mt_cluster_en}
  1500. \end{figure*}
  1501. \begin{figure*}[tb]
  1502. \small
  1503. \centering
  1504. \begin{tabular}
  1505. { % defines color for each column.
  1506. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1507. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1508. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1509. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1510. >{\columncolor{corange1}}p{.13\linewidth}
  1511. }
  1512. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Excellent}} &
  1513. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Good}} &
  1514. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Moderate}} &
  1515. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Fragmentary}} &
  1516. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Weak/no}} \\
  1517. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1518. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1519. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1520. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1521. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{support}} \\ \addlinespace
  1522. & \vspace*{0.5mm}English
  1523. & \vspace*{0.5mm}
  1524. Dutch \newline
  1525. French \newline
  1526. German \newline
  1527. Italian \newline
  1528. Spanish
  1529. & \vspace*{0.5mm}Basque \newline
  1530. Bulgarian \newline
  1531. Catalan \newline
  1532. Czech \newline
  1533. Danish \newline
  1534. Finnish \newline
  1535. Galician \newline
  1536. Greek \newline
  1537. Hungarian \newline
  1538. Norwegian \newline
  1539. Polish \newline
  1540. Portuguese \newline
  1541. Romanian \newline
  1542. Slovak \newline
  1543. Slovene \newline
  1544. Swedish \newline
  1545. & \vspace*{0.5mm}
  1546. Croatian \newline
  1547. Estonian \newline
  1548. Icelandic \newline
  1549. Irish \newline
  1550. Latvian \newline
  1551. Lithuanian \newline
  1552. \textbf{Maltese} \newline
  1553. Serbian \\
  1554. \end{tabular}
  1555. \caption{Text analysis: state of language technology support for 30 European languages}
  1556. \refstepcounter{refs_en}
  1557. \label{fig:text_cluster_en}
  1558. \end{figure*}
  1559. \begin{figure*}[tb]
  1560. \small
  1561. \centering
  1562. \begin{tabular}
  1563. { % defines color for each column.
  1564. >{\columncolor{corange5}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1565. >{\columncolor{corange4}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1566. >{\columncolor{corange3}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1567. >{\columncolor{corange2}}p{.13\linewidth}@{\hspace{.040\linewidth}}
  1568. >{\columncolor{corange1}}p{.13\linewidth}
  1569. }
  1570. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Excellent}} &
  1571. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Good}} &
  1572. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Moderate}} &
  1573. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{Fragmentary}} &
  1574. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{Weak/no}} \\
  1575. \multicolumn{1}{>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1576. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1577. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1578. \multicolumn{1}{@{}>{\columncolor{white}}c@{\hspace{.040\linewidth}}}{\textbf{support}} &
  1579. \multicolumn{1}{@{}>{\columncolor{white}}c}{\textbf{support}} \\ \addlinespace
  1580. & \vspace*{0.5mm}English
  1581. & \vspace*{0.5mm}
  1582. Czech \newline
  1583. Dutch \newline
  1584. French \newline
  1585. German \newline
  1586. Hungarian \newline
  1587. Italian \newline
  1588. Polish \newline
  1589. Spanish \newline
  1590. Swedish \newline
  1591. & \vspace*{0.5mm} Basque\newline
  1592. Bulgarian\newline
  1593. Catalan \newline
  1594. Croatian \newline
  1595. Danish \newline
  1596. Estonian \newline
  1597. Finnish \newline
  1598. Galician \newline
  1599. Greek \newline
  1600. Norwegian \newline
  1601. Portuguese \newline
  1602. Romanian \newline
  1603. Serbian \newline
  1604. Slovak \newline
  1605. Slovene \newline
  1606. & \vspace*{0.5mm}
  1607. Icelandic \newline
  1608. Irish \newline
  1609. Latvian \newline
  1610. Lithuanian \newline
  1611. \textbf{Maltese} \\
  1612. \end{tabular}
  1613. \caption{Speech and text resources: State of support for 30 European languages}
  1614. \refstepcounter{refs_en}
  1615. \label{fig:resources_cluster_en}
  1616. \end{figure*}
  1617. \end{multicols}
  1618. \clearpage
  1619. \ssection[About META-NET]{About META-NET}
  1620. \begin{multicols}{2}
  1621. \textbf{META-NET} is a Network of Excellence partially funded by the European Commission. The network currently consists of 54 research centres in 33 European countries \cite{rehm2011}. META-NET forges META, the Multilingual Europe Technology Alliance, a growing community of language technology professionals and organisations in Europe. META-NET fosters the technological foundations for a truly multilingual European information society that:
  1622. \begin{itemize}
  1623. \item makes communication and cooperation possible across languages;
  1624. \item grants all Europeans equal access to information and knowledge regardless of their language;
  1625. \item builds upon and advances functionalities of networked information technology.
  1626. \end{itemize}
  1627. The network supports a Europe that unites as a single digital market and information space. It stimulates and promotes multilingual technologies for all European languages. These technologies support automatic translation, content production, information processing and knowledge management for a wide variety of subject domains and applications. They also enable intuitive language-based interfaces to technology ranging from household electronics, machinery and vehicles to computers and robots.
  1628. Launched on 1 February 2010, META-NET has already conducted various activities in its three lines of action META-VISION, META-SHARE and META-RESEARCH.
  1629. \textbf{META-VISION} fosters a dynamic and influential stakeholder community that unites around a shared vision and a common strategic research agenda (SRA). The main focus of this activity is to build a coherent and cohesive LT community in Europe by bringing together representatives from highly fragmented and diverse groups of stakeholders. The present White Paper was prepared together with volumes for 29 other languages. The shared technology vision was developed in three sectorial Vision Groups. The META Technology Council was established in order to discuss and to prepare the SRA based on the vision in close interaction with the entire LT community.
  1630. \textbf{META-SHARE} creates an open, distributed facility for exchanging and sharing resources. The peer-to-peer network of repositories will contain language data, tools and web services that are documented with high-quality metadata and organised in standardised categories. The resources can be readily accessed and uniformly searched. The available resources include free, open source materials as well as restricted, commercially available, fee-based items.
  1631. \textbf{META-RESEARCH} builds bridges to related technology fields. This activity seeks to leverage advances in other fields and to capitalise on innovative research that can benefit language technology. In particular, the action line focuses on conducting leading-edge research in machine translation, collecting data, preparing data sets and organising language resources for evaluation purposes; compiling inventories of tools and methods; and organising workshops and training events for members of the community.\\
  1632. \centerline{\textbf{office@meta-net.eu -- http://www.meta-net.eu}}
  1633. \end{multicols}
  1634. \vfill
  1635. \cleardoublepage
  1636. \appendix
  1637. \addtocontents{toc}{\protect\bigskip}
  1638. \phantomsection\bsection[Referenzi -- References]{Referenzi --- References}
  1639. \bibliographystyle{unsrt}
  1640. \bibliography{maltese_references}
  1641. \cleardoublepage
  1642. \phantomsection\bsection[Membri ta' META-NET -- META-NET Members]{Membri ta' META-NET --- META-NET Members}
  1643. \label{metanetmembers}
  1644. \small
  1645. \begin{longtable}{@{}llp{113mm}@{}}
  1646. Awstrija & \textcolor{grey1}{Austria} & Zentrum für Translationswissenschaft, Universität Wien: Gerhard Budin\\ \addlinespace
  1647. Belġju & \textcolor{grey1}{Belgium} & Computational Linguistics and Psycholinguistics Research Centre, University of Antwerp: Walter Daelemans\\ \addlinespace
  1648. & & Centre for Processing Speech and Images, University of Leuven: Dirk van Compernolle \\ \addlinespace
  1649. Bulgarija & \textcolor{grey1}{Bulgaria} & Institute for Bulgarian Language, Bulgarian Academy of Sciences: Svetla Koeva\\ \addlinespace
  1650. Ċekja & \textcolor{grey1}{Czech Republic} & Institute of Formal and Applied Linguistics, Charles University in Prague: Jan Hajič\\ \addlinespace
  1651. Ċipru & \textcolor{grey1}{Cyprus} & Language Centre, School of Humanities: Jack Burston\\ \addlinespace
  1652. Danimarka & \textcolor{grey1}{Denmark} & Centre for Language Technology, University of Copenhagen: \newline Bolette Sandford Pedersen, Bente Maegaard\\ \addlinespace
  1653. Estonja & \textcolor{grey1}{Estonia} & Institute of Computer Science, University of Tartu: Tiit Roosmaa, Kadri Vider\\ \addlinespace
  1654. Finlandja & \textcolor{grey1}{Finland} & Computational Cognitive Systems Research Group, Aalto University: Timo Honkela\\ \addlinespace
  1655. & & Department of Modern Languages, University of Helsinki: Kimmo Koskenniemi,\newline Krister Lindén \\ \addlinespace
  1656. Franza & \textcolor{grey1}{France} & Centre National de la Recherche Scientifique, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur and Institute for Multilingual and Multimedia Information: Joseph Mariani \\ \addlinespace
  1657. & & Evaluations and Language Resources Distribution Agency: Khalid Choukri\\ \addlinespace
  1658. Ġermanja & \textcolor{grey1}{Germany} & Language Technology Lab, DFKI: Hans Uszkoreit, Georg Rehm\\ \addlinespace
  1659. & & Human Language Technology and Pattern Recognition, RWTH Aachen University: Hermann Ney \\ \addlinespace
  1660. & & Department of Computational Linguistics, Saarland University: Manfred Pinkal\\ \addlinespace
  1661. Gran Brittanja & \textcolor{grey1}{UK} &
  1662. School of Computer Science, University of Manchester: Sophia Ananiadou \\ \addlinespace
  1663. & & Institute for Language, Cognition and Computation, Center for Speech Technology Research, University of Edinburgh: Steve Renals \\ \addlinespace
  1664. & & Research Institute of Informatics and Language Processing, University of Wolverhampton: Ruslan Mitkov \\ \addlinespace
  1665. Greċja & \textcolor{grey1}{Greece} & R.C. Athena, Institute for Language and Speech Processing: Stelios Piperidis\\ \addlinespace
  1666. Irlanda & \textcolor{grey1}{Ireland} & School of Computing, Dublin City University: Josef van Genabith\\ \addlinespace
  1667. Islanda & \textcolor{grey1}{Iceland} & School of Humanities, University of Iceland: Eiríkur Rögnvaldsson\\ \addlinespace
  1668. Isvezja & \textcolor{grey1}{Sweden} & Department of Swedish, University of Gothenburg: Lars Borin \\ \addlinespace
  1669. Isvizzera & \textcolor{grey1}{Switzerland} & Idiap Research Institute: Hervé Bourlard \\ \addlinespace
  1670. Italja & \textcolor{grey1}{Italy} & Consiglio Nazionale delle Ricerche, Istituto di Linguistica Computazionale Antonio Zampolli: Nicoletta Calzolari\\ \addlinespace
  1671. & & Human Language Technology Research Unit, Fondazione Bruno Kessler:\newline Bernardo Magnini\\ \addlinespace
  1672. Kroazja & \textcolor{grey1}{Croatia} & Institute of Linguistics, Faculty of Humanities and Social Science, University of Zagreb: Marko Tadić \\ \addlinespace
  1673. Latvja & \textcolor{grey1}{Latvia} & Tilde: Andrejs Vasiļjevs\\ \addlinespace
  1674. & & Institute of Mathematics and Computer Science, University of Latvia: Inguna Skadiņa\\ \addlinespace
  1675. Litwanja & \textcolor{grey1}{Lithuania} & Institute of the Lithuanian Language: Jolanta Zabarskaitė\\ \addlinespace
  1676. Lussemburgu & \textcolor{grey1}{Luxembourg} & Arax Ltd.: Vartkes Goetcherian\\ \addlinespace
  1677. Malta & \textcolor{grey1}{Malta} & Department Intelligent Computer Systems, University of Malta: Mike Rosner\\ \addlinespace
  1678. Norveġja & \textcolor{grey1}{Norway} & Department of Linguistic, Literary and Aesthetic Studies, University of Bergen:\newline Koenraad De Smedt\\ \addlinespace
  1679. & & Department of Informatics, Language Technology Group, University of Oslo:\newline Stephan Oepen \\ \addlinespace
  1680. Olanda & \textcolor{grey1}{Netherlands} & Utrecht Institute of Linguistics, Utrecht University: Jan Odijk\\ \addlinespace
  1681. & & Computational Linguistics, University of Groningen: Gertjan van Noord\\ \addlinespace
  1682. Polonja & \textcolor{grey1}{Poland} & Institute of Computer Science, Polish Academy of Sciences: Adam Przepiórkowski, Maciej Ogrodniczuk \\ \addlinespace
  1683. & & University of Łódź: Barbara Lewandowska-Tomaszczyk, Piotr Pęzik\\ \addlinespace
  1684. & & Department of Computer Linguistics and Artificial Intelligence, Adam Mickiewicz University: Zygmunt Vetulani \\ \addlinespace
  1685. Portugall & \textcolor{grey1}{Portugal} & University of Lisbon: António Branco, Amália Mendes \\ \addlinespace
  1686. & & Spoken Language Systems Laboratory, Institute for Systems Engineering and Computers: Isabel Trancoso \\ \addlinespace
  1687. Rumanija & \textcolor{grey1}{Romania} & Research Institute for Artificial Intelligence, Romanian Academy of Sciences:\newline Dan Tufiș \\ \addlinespace
  1688. & & Faculty of Computer Science, University Alexandru Ioan Cuza of Iași: Dan Cristea \\ \addlinespace
  1689. Serbja & \textcolor{grey1}{Serbia} & University of Belgrade, Faculty of Mathematics: Duško Vitas, Cvetana Krstev,\newline Ivan Obradović \\ \addlinespace
  1690. & & Pupin Institute: Sanja Vranes \\ \addlinespace
  1691. Slovakkja & \textcolor{grey1}{Slovakia} & Ľudovít Štúr Institute of Linguistics, Slovak Academy of Sciences: Radovan Garabík \\ \addlinespace
  1692. Slovenja & \textcolor{grey1}{Slovenia} & Jožef Stefan Institute: Marko Grobelnik \\ \addlinespace
  1693. Spanja & \textcolor{grey1}{Spain} & Barcelona Media: Toni Badia, Maite Melero \\ \addlinespace
  1694. & & Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra: Núria Bel \\ \addlinespace
  1695. & & Aholab Signal Processing Laboratory, University of the Basque Country:\newline Inma Hernaez Rioja \\ \addlinespace
  1696. & & Center for Language and Speech Technologies and Applications, Universitat Politècnica de Catalunya: Asunción Moreno \\ \addlinespace
  1697. & & Department of Signal Processing and Communications, University of Vigo:\newline Carmen García Mateo \\ \addlinespace
  1698. Ungerija & \textcolor{grey1}{Hungary} & Research Institute for Linguistics, Hungarian Academy of Sciences: Tamás Váradi\\ \addlinespace
  1699. & & Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics: Géza Németh, Gábor Olaszy\\ \addlinespace
  1700. \end{longtable}
  1701. \normalsize
  1702. \renewcommand*{\figureformat}{}
  1703. \renewcommand*{\captionformat}{}
  1704. \begin{figure*}[htbp]
  1705. \colorrule{grey3}{\textwidth}{1.5pt}
  1706. \center
  1707. %\fbox{-- META-NET group picture omitted to keep the size of the PDF file small. --}
  1708. % \iftoggle{lowres}{%
  1709. \includegraphics[width=\textwidth]{../_media/meta-net_team_ebook.jpg}
  1710. % }{%
  1711. % \includegraphics[width=\textwidth]{../_media/meta-net_team.jpg}
  1712. % }
  1713. \caption{Kważi 100 esperti tat-teknoloġija lingwistika -- rappreżentanti tal-pajjiżi u tal-lingwi li huma rappreżentati f'META-NET -- ddiskutew u ffinalizzaw ir-riżultati ewlenin u l-messaġġi ta' Serje ta White Papers tal-META-NET f'laqgħa f'Berlin, fil-Ġermanja, dwar Ottubru 21/22, 2011. --- \textcolor{grey1}{About 100 language technology experts -- representatives of the countries and languages represented in META-NET -- discussed and finalised the key results and messages of the White Paper Series at a META-NET meeting in Berlin, Germany, on October 21/22, 2011.}}
  1714. \medskip
  1715. \colorrule{grey3}{\textwidth}{1.5pt}
  1716. \end{figure*}
  1717. \cleardoublepage
  1718. \phantomsection\bsection[Is-Serje ta White Papers ta' META-NET -- The META-NET White Paper Series]{Is-Serje ta’ White Papers ta' META-NET --- The META-NET\ \ \ \ \ \ White Paper Series}
  1719. \label{whitepaperseries}
  1720. \vspace*{-5mm}
  1721. \centering
  1722. \setlength{\tabcolsep}{2.2em}
  1723. \begin{tabularx}{\textwidth}{lllll} \toprule\addlinespace
  1724. %\begin{tabulary}{170mm}{LLL} \toprule
  1725. & Bask & Basque & euskara\\
  1726. & Bulgaru & Bulgarian & български\\
  1727. & Ċek & Czech & čeština\\
  1728. & Daniż & Danish & dansk\\
  1729. & Estonjan & Estonian & eesti\\
  1730. & Finlandiż & Finnish & suomi\\
  1731. & Franċiż & French & français\\
  1732. & Ġermaniż & German & Deutsch\\
  1733. & Galizjan & Galician & galego\\
  1734. & Grieg & Greek & ελληνικά\\
  1735. & Ingliż & English & English\\
  1736. & Irlandiż & Irish & Gaeilge\\
  1737. & Islandiż & Icelandic & íslenska\\
  1738. & Katalan & Catalan & català\\
  1739. & Kroat & Croatian & hrvatski\\
  1740. & Latvjan & Latvian & latviešu valoda\\
  1741. & Litwan & Lithuanian & lietuvių kalba\\
  1742. & Malti & Maltese & Malti\\
  1743. & Norveġiż Bokmål & Norwegian Bokmål & bokmål\\
  1744. & Norveġiż Nynorsk & Norwegian Nynorsk & nynorsk\\
  1745. & Olandiż & Dutch & Nederlands\\
  1746. & Pollakk & Polish & polski\\
  1747. & Portugiż & Portuguese & português\\
  1748. & Rumen & Romanian & română\\
  1749. & Serb & Serbian & српски\\
  1750. & Slovakk & Slovak & slovenčina\\
  1751. & Sloven & Slovene & slovenščina\\
  1752. & Spanjol & Spanish & español\\
  1753. & Svediż & Swedish & svenska\\
  1754. & Taljan & Italian & italiano\\
  1755. & Ungeriż & Hungarian & magyar\\ \addlinespace \bottomrule
  1756. \end{tabularx}