Filewatcher File Search File Search
Content Search
» » » » » lttoolbox_3.1.0-1.1_powerpc.deb » Content »
pkg://lttoolbox_3.1.0-1.1_powerpc.deb:24532/usr/share/man/man1/  info  control  downloads

lttoolbox - Apertium lexical processing modules and tools…  more info»


.TH  lt‐proc  1  2006‐03‐23  "" "" lt‐proc − This application is
part of the lexical processing modules and tools ( ) This tool is
part   of   the   apertium   machine   translation  architecture:  [ ] fst_file [input_file [output_file]]
[  ]  fst_file  [input_file [output_file]] is the application re‐
sponsible for providing the four lexical processing  functionali‐

· morphological analyser  ( option −a ) · lexical transfer  ( op‐
tion −n ) · morphological generator  ( option −g ) · post‐genera‐
tor   ( option −p ) It accomplishes these tasks by reading binary
files containing a compact and efficient representation  of  dic‐
tionaries  (a  class of finite‐state transducers called augmented
letter transducers). These files are generated by lt−comp(1).  It
is  worth  to  mention  that some characters (‘[’, ‘]’, ‘$’, ‘^’,
‘/’, ‘+’) are special chars used for  format  and  encapsulation.
They should be escaped if they have to be used literally, for in‐
stance: ‘[’...‘]’ are ignored and the format  of  a  linefeed  is
‘^...$’.   Tokenizes  the text in surface forms (lexical units as
they appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and mor‐
phological inflection information. Tokenization is not  straight‐
forward  due  to the existence, on the one hand, of contractions,
and, on the other hand, of multi‐word lexical units. For contrac‐
tions, the system reads in a single surface form and delivers the
corresponding sequence of lexical forms. Multi‐word surface forms
are  analysed  in  a left‐to‐right, longest‐match fashion. Multi‐
word surface forms may be invariable (such as a multi‐word prepo‐
sition or conjunction) or inflected (for example, in es, "echaban
de menos", "they missed", is a form of the  imperfect  indicative
tense  of  the verb "echar de menos", "to miss"). Limited support
for some kinds of discontinuous multi‐word units is  also  avail‐
able. Single‐word surface forms analysis produces output like the
one in these examples:  "cantar" −> ‘^cantar/cantar<vblex><inf>$’
or                 ‘"daba"                −>                ‘^da‐
ba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$’.   Use   the
literal  case  of  the incoming characters Delivers a target‐lan‐
guage surface form for  each  target‐language  lexical  form,  by
suitably  inflecting  it.  Morphological generation (like ‐g) but
without unknown word marks (asterisk ‘*’).  Performs orthographi‐
cal  operations  such  as  contractions  and apostrophations. The
post‐generator is usually dormant (just copies the input  to  the
output)  until  a  special alarm symbol contained in some target‐
language surface forms wakes it up to perform a particular string
transformation  if  necessary; then it goes back to sleep.  Input
processing is in orthoepikon (previously ‘sao’) annotation system
format:  Apply a transliteration dic‐
tionary Display the version number.  Display this help.  The  in‐
put compiled dictionary.  Lots of...lurking in the dark and wait‐
ing for you!  (c) 2005,2006 Universitat d’Alacant  /  Universidad
de Alicante. All rights reserved.

Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017