|File Search||Catalog||Content Search|
Ucto is a product of the ILK Research Group, Tilburg University (The Netherlands).
This package provides the runtime files required to run programs that use ucto.
Ucto can tokenize UTF-8 encoded text files (i.e. separate words from punctuation and split sentences), and offers several other basic preprocessing steps such as changing case that make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for English, Dutch, French and Italian, but can be easily extended to sui more»
0.5.2 2012-03-29 [Ko vd Sloot] * some small changes. Made it work with libfolia 0.9 0.5.1 2012-02-27 [Ko vd Sloot] * added 'escape' possibility for regexps that start with a [ * better debugging output * removed all (?i) stuff from regexps. This attempts to avoid an ICU bug * added -X en --id= options * adapted to libfolia 0.8 (/tests too!) * some cleanup and refactoring [Maarten van Gompel] * a more»
ucto (0.5.2-2) unstable; urgency=low * Rebuild on amd64 to pull in libicu48 (was libicu44). -- Joost van Baal-Ilić <email@example.com> Sat, 23 Jun 2012 08:16:17 +0200 ucto (0.5.2-1) unstable; urgency=low * New Upstream Release * debian/control: depends on libfolia1-dev >= 0.9 * debian/watch updated to watch http://software.ticc.uvt.nl -- Ko van der Sloot <firstname.lastname@example.org> more»
2012-03-19 10:54 sloot * [r14472] src/ucto.cxx: numb change 2012-03-19 10:54 sloot * [r14471] more»
We need unit tests We need to comile list of known problems for several langauages
Maarten van Gompel Ko van der Sloot
This package was debianized by Joost van Baal <email@example.com> on Sat Dec 25 12:55:06 CET 2010. more»