PageRenderTime 48ms CodeModel.GetById 18ms RepoModel.GetById 0ms app.codeStats 0ms

/uplug-treetagger/lib/Uplug/TreeTagger.pm

https://bitbucket.org/tiedemann/uplug
Perl | 48 lines | 32 code | 12 blank | 4 comment | 3 complexity | 3e9f3f0633aefcfc76f3dc8b3fb705bf MD5 | raw file
Possible License(s): GPL-3.0, LGPL-2.1, BSD-3-Clause
  1. package Uplug::TreeTagger;
  2. =head1 NAME
  3. Uplug::TreeTagger - Uplug add-on for using treetagger models for POS tagging
  4. =head1 SYNOPSIS
  5. # prepare some data (for example, for English)
  6. uplug pre/markup -in input.txt | uplug pre/sent -l en > sentences.xml
  7. uplug pre/en/basic -in input.txt -out tokenized.xml
  8. # tag text with marked sentence boundaries (using the TreeTagger tokenizer)
  9. uplug pre/en/toktag -in sentences.xml -out tagged.xml
  10. # tag a tokenized corpus
  11. uplug pre/en/tagTree -in tokenized.xml -out tagged.xml
  12. # run the entire pipeline (for English in this example)
  13. uplug pre/en/all-treetagger -in input.txt -out output.xml
  14. =head1 DESCRIPTION
  15. Note that you need to install the main components of L<Uplug> first. Download the latest version of uplug-main from L<https://bitbucket.org/tiedemann/uplug> or from CPAN and install it on your system.
  16. The Uplug::TreeTagger package includes configuration files for running TreeTagger from Uplug. It doesn't add anything to the actual code. The installation of the TreeTagger and of relevant POS tagging modules is integrated in the installation routines. Simply run
  17. perl Makefile.PL
  18. make
  19. make install
  20. to put binaries, model files and configurations into the global shared directory of Uplug. Note that downloading POS tagging models will take some time and that you need to agree with the terms and conditions of the TreeTagger (which will be printed on screen when running the first command).
  21. Currently supported languages that have been integrated into Uplug are:
  22. Bulgarian, German, English, Spanish, Estonian, French, Italian, Latin, Dutch, Swahili
  23. (see share/systems/pre)
  24. =head1 SEE ALSO
  25. Project website: L<https://bitbucket.org/tiedemann/uplug>
  26. CPAN: L<http://search.cpan.org/~tiedemann/uplug-main/>
  27. =cut
  28. 1;