Filewatcher File Search File Search
Catalog
Content Search
» » » » » » Lingua-EN-Bigram-0.02.tar.gz » Content »
pkg://Lingua-EN-Bigram-0.02.tar.gz:263347/Lingua-EN-Bigram-0.02/  info  downloads

README


README

This module is designed to: 1) pull out all of the two-, three-, and
four-word phrases in a given text, and 2) list these phrases according
to their frequency. Using this module is it possible to create lists of
the most common phrases in a text as well as order them by their
probable occurance, thus implying significance. This process is useful
for the purposes of textual analysis and "distant reading".

The two-word phrases (bi-grams) are also listable by their T-Score. The
T-Score, as well as a number of the module's other methods, is
calculated as per Nugues, P. M. (2006). An introduction to language
processing with Perl and Prolog: An outline of theories, implementation,
and application with special consideration of English, French, and
German. Cognitive technologies. Berlin: Springer.

-- 
Eric Lease Morgan <eric_morgan@infomotions.com>
August 22, 2010
Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com