EvoTalk

09 一月, 2006

Perl中文斷字

Posted by: asd In: Code Snippet| Perl| 程式設計 ()

使用utf8

file input 及 standard output 改變 encoding 為big5,輸出 wide character。split 依每個 character 分割split(//, ...);

PERL:
  1. use utf8;
  2. $grm = shift;
  3. open(IN,'<:encoding(big5)', $grm) || die $!;
  4.  
  5. #binmode(IN, ':encoding(big5)');
  6. binmode(STDOUT, ':encoding(big5)');
  7.  
  8. while($line = )
  9. {
  10. if($line =~ /.<(.*)>/)
  11. {
  12. print $line;
  13. $big5 = $1;
  14. #print $cht ;
  15. my @tokens = split(//,$big5);
  16. print " ";
  17. foreach $token(@tokens)
  18. {
  19. print "#$token ";
  20. }
  21. print "n";
  22. }else
  23. {
  24. print $line;
  25. }
  26. }
  27. close IN;

Tags:

Releated Posts



No Responses to "Perl中文斷字"

Comment Form