Filewatcher File Search File Search
Content Search
» » » » » enca-1.12.tar.bz2 » Content »
pkg://enca-1.12.tar.bz2:531308/enca-1.12/data/  info  downloads


=== Programs === -- Regenerates all */*.base and */*.c files from the source one
           (given as first parameter in */, used by */ to
           regenerate stuff in individual directories too.  Uses many of
           following scripts.

*/ -- Customized scripts for individual directories.  Once a directory
             contains, it's run by the main one. -- Removes most auxiliary files from language subdirs.

basetoc.c -- [filter] Converts one .base file to .c file, used by
              $ ./basetoc <CHARSET.base >CHARSET.c -- Reads generated .c files and computes significancy data, weight
             sums and other summary data, writes file `totals.c'
             $ ./ CHARSET1.c ... CHARSETn.c -- [filter] Does some kind of funny weight normalization, useful
                for producing CHARSET.base files, since the weights must fit
                into unsigned short int:
                $ ./ <COUNTS >NORMALIZED_COUNTS
                Given a file on command line, it normalizes input to have
                exactly(!) the same weight sum:
                This is not run by -- Given two count files, it finds characters most suitable for
              hook deciding between these two, i.e. characters with the
              biggest difference of occurences:
              $ ./ COUNT1 COUNT2

xlt.c -- [filter] Extremely simple charset converter, to become independent
         on the other broken converters:
         $ ./xlt <TEXT >CONVERTED_TEXT

mystrings.c -- [filter] Extract text chunks from input (strings(1) doesn't
               seem to do good job on 8bit files):
               $ ./mystrings <FILE | ...

countall.c -- [filter] Count character frequencies
              $ ./countall <TEXT >rawcounts.CHARSET

countpair.c -- [filter] Count 8bit letter pair frequencies and print a table
               containing as much pairs as to get 95% of all
               $ ./countpair CHARSET.letters <TEXT >paircounts.CHARSET

findletters.c -- [filter] Find what 8bit characters from a charset map are
                 $ ./findletters <Letters >CHARSET.letters -- Run findletters.c for all charsets in maps/.

=== Data ===
Letters -- Unicode characters assumed to be letters, excluding 7bits.  Also
           excluding non-European scripts, to keep it small.

maps/ -- 8bit charset -> UCS2 maps, notable ones: -- Translates Latin `i' and `I' to Cyrillic 0x0456 and 0x0406,
                    thus approximates them the opposite way when used as
                    TARGET. -- It's Macintosh Cyrillic after Apple unification of Russian
                and Ukriainian variants and adding Euro symbol there, in
                Mac OS 9.0 or so (recode uses the old Russian maccyr -- FIXME
                with iconv it doesn't?). -- Macintosh Central European encoding, the real one, not the
               crappy one used by recode. -- KOI8-U (Ukrainian) (recode uses some strange mapping?). -- KOI8-Unified. -- KOI8-UB (Ukrainian/Belarussian). -- T1 Cork encoding (recode uses some strange mapping?). -- ISO-8859-13 map (recode uses some strange mapping?).

letters/ -- lists of 8bit charset that are letters (generated) for various
            charsets, run to create it

Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017