Filewatcher File Search File Search
Catalog
Content Search
» » » » » » Lingua-JA-NormalizeText-0.12.tar.gz » Content »
pkg://Lingua-JA-NormalizeText-0.12.tar.gz:34076/Lingua-JA-NormalizeText-0.12/  info  downloads

README

NAME
    Lingua::JA::NormalizeText - Text Normalizer

SYNOPSIS
      use Lingua::JA::NormalizeText;
      use utf8;

      my @options = ( qw/nfkc decode_entities/, \&dearinsu_to_desu );
      my $normalizer = Lingua::JA::NormalizeText->new(@options);

      print $normalizer->normalize('鳥が㌧㌦でありんす♥');
      # -> 鳥がトンドルです♥

      sub dearinsu_to_desu
      {
          my $text = shift;
          $text =~ s/でありんす/です/g;

          return $text;
      }

    # or

      use Lingua::JA::NormalizeText qw/nfkc decode_entities/;
      use utf8;

      my $text = '㈱㋰㋫㋫♥';
      print decode_entities( nfkc($text) );
      # -> (株)ムフフ♥

DESCRIPTION
    Lingua::JA::NormalizeText normalizes text.

METHODS
  new(@options)
    Creates a new Lingua::JA::NormalizeText instance.

    The following options are available.

      OPTION                 SAMPLE INPUT        OUTPUT FOR SAMPLE INPUT
      ---------------------  ------------------  -----------------------
      lc                     DdD                 ddd
      uc                     DdD                 DDD
      nfkc                   ㌦                  ドル (length: 2)
      nfkd                   ㌦                  ドル (length: 3)
      nfc
      nfd
      decode_entities        ♥            ♥
      strip_html             <em>あ</em>             あ    
      alnum_z2h              ABC123        ABC123
      alnum_h2z              ABC123              ABC123
      space_z2h
      space_h2z
      katakana_z2h           ハァハァ            ハァハァ
      katakana_h2z           スーハースーハー            スーハースーハー
      katakana2hiragana      パンツ              ぱんつ
      hiragana2katakana      ぱんつ              パンツ
      unify_3dots            はぁ。。。          はぁ…
      wave2tilde             〜                  ~
      tilde2wave             ~                  〜
      wavetilde2long         〜, ~              ー
      wave2long              〜                  ー
      tilde2long             ~                  ー
      fullminus2long         −                   ー
      dashes2long            —                   ー
      drawing_lines2long     ─                   ー
      unify_long_repeats     ヴァーーー          ヴァー
      nl2space               (new line)          (space)
      unify_long_spaces      (space)(space)      (space)
      remove_head_space      (space)あ(space)あ  あ(space)あ
      remove_tail_space      ああ(space)(space)  ああ
      old2new_kana           ゐヰゑヱ            いイえエ
      old2new_kanji          亞逸鬭              亜逸闘
      tab2space              (tab)(tab)          (space)(space)
      remove_controls        あ\x{0000}あ        ああ

    The order in which these options are applied is according to the order
    of the elements of @options. (i.e., The first element is applied first,
    and the last element is applied last.)

    External functions are also addable. (See dearinsu_to_desu function of
    SYNOPSIS section.)

   remove_controls
    Note that this option does not remove the following chars:

      CHARACTER TABULATION(tab)
      LINE FEED(LF)
      CARRIAGE RETURN(CR)

  normalize($text)
    normalizes $text.

AUTHOR
    pawa <pawapawa@cpan.org>

SEE ALSO
    新旧字体表: <http://www.asahi-net.or.jp/~ax2s-kmtn/ref/old_chara.html>

LICENSE
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com