perlcn - ¼òÌåÖÐÎÄ Perl Ö¸ÄÏ
»¶ÓÀ´µ½ Perl µÄÌìµØ!
´Ó 5.8.0 °æ¿ªÊ¼, Perl ¾ß±¸ÁËÍêÉÆµÄ Unicode (ͳһÂë) Ö§Ô®, Ò²Á¬´øÖ§Ô®ÁËÐí¶àÀ¶¡ÓïϵÒÔÍâµÄ±àÂ뷽ʽ; CJK (ÖÐÈÕº«) ±ãÊÇÆäÖеÄÒ»²¿·Ý. Unicode Êǹú¼ÊÐԵıê×¼, ÊÔͼº¸ÇÊÀ½çÉÏËùÓеÄ×Ö·û: Î÷·½ÊÀ½ç, ¶«·½ÊÀ½ç, ÒÔ¼°Á½Õß¼äµÄÒ»ÇÐ (Ï£À°ÎÄ, ÐðÀûÑÇÎÄ, ÑÇÀ²®ÎÄ, Ï£²®À´ÎÄ, Ó¡¶ÈÎÄ, Ó¡µØ°²ÎÄ, µÈµÈ). ËüÒ²ÈÝÄÉÁ˶àÖÖ×÷ҵϵͳÓëƽ̨ (Èç PC ¼°Âó½ðËþ).
Perl ±¾ÉíÒÔ Unicode ½øÐвÙ×÷. Õâ±íʾ Perl ÄÚ²¿µÄ×Ö·û´®Êý¾Ý¿ÉÓà Unicode ±íʾ; Perl µÄº¯Ê½ÓëËã·û (ÀýÈçÕý¹æ±íʾʽ±È¶Ô) Ò²ÄÜ¶Ô Unicode ½øÐвÙ×÷. ÔÚÊäÈë¼°Êä³öʱ, ΪÁË´¦ÀíÒÔ Unicode ֮ǰµÄ±àÂ뷽ʽ´æ·ÅµÄÊý¾Ý, Perl ÌṩÁË Encode Õâ¸öÄ£¿é, ¿ÉÒÔÈÃÄãÇáÒ׵ضÁÈ¡¼°Ð´Èë¾ÉÓеıàÂëÊý¾Ý.
Encode ÑÓÉìÄ£¿éÖ§Ô®ÏÂÁмòÌåÖÐÎĵıàÂ뷽ʽ ('gb2312' ±íʾ 'euc-cn'):
euc-cn Unix ÑÓÉì×Ö·û¼¯, Ò²¾ÍÊÇË׳ƵĹú±êÂë gb2312-raw δ¾´¦ÀíµÄ (µÍ±ÈÌØ) GB2312 ×Ö·û±í gb12345 δ¾´¦ÀíµÄÖйúÓ÷±ÌåÖÐÎıàÂë iso-ir-165 GB2312 + GB6345 + GB8565 + ÐÂÔö×Ö·û cp936 ×ÖÂëÒ³ 936, Ò²¿ÉÒÔÓà 'GBK' (À©³ä¹ú±êÂë) Ö¸Ã÷ hz 7 ±ÈÌØÒݳöʽ GB2312 ±àÂë
¾ÙÀýÀ´Ëµ, ½« EUC-CN ±àÂëµÄµµ°¸×ª³É Unicode, ìóÐè¼üÈëÏÂÁÐÖ¸Áî:
perl -Mencoding=euc-cn,STDOUT,utf8 -pe1 < file.euc-cn > file.utf8
Perl Ò²ÄÚ¸½ÁË ``piconv'', Ò»Ö§ÍêÈ«ÒÔ Perl д³ÉµÄ×Ö·ûת»»¹¤¾ß³ÌÐò, Ó÷¨ÈçÏÂ:
piconv -f euc-cn -t utf8 < file.euc-cn > file.utf8 piconv -f utf8 -t euc-cn < file.utf8 > file.euc-cn
ÁíÍâ, ÀûÓà encoding Ä£¿é, Äã¿ÉÒÔÇáÒ×д³öÒÔ×Ö·ûΪµ¥Î»µÄ³ÌÐòÂë, ÈçÏÂËùʾ:
#!/usr/bin/env perl # Æô¶¯ euc-cn ×Ö´®½âÎö; ±ê×¼Êä³öÈë¼°±ê×¼´íÎó¶¼ÉèΪ euc-cn ±àÂë use encoding 'euc-cn', STDIN => 'euc-cn', STDOUT => 'euc-cn'; print length("ÂæÍÕ"); # 2 (Ë«ÒýºÅ±íʾ×Ö·û) print length('ÂæÍÕ'); # 4 (µ¥ÒýºÅ±íʾ×Ö½Ú) print index("×»×»½Ì»å", "»×»½"); # -1 (²»°üº¬´Ë×Ó×Ö·û´®) print index('×»×»½Ì»å', '»×»½'); # 1 (´ÓµÚ¶þ¸ö×Ö½Ú¿ªÊ¼)
ÔÚ×îºóÒ»ÁÐÀý×ÓÀï, ``×»'' µÄµÚ¶þ¸ö×Ö½ÚÓë ``×»'' µÄµÚÒ»¸ö×Ö½Ú½áºÏ³É EUC-CN ÂëµÄ ``»×''; ``×»'' µÄµÚ¶þ¸ö×Ö½ÚÔòÓë ``½Ì'' µÄµÚÒ»¸ö×Ö½Ú½áºÏ³É ``»½''. Õâ½â¾öÁËÒÔÇ° EUC-CN Âë±È¶Ô´¦ÀíÉϳ£¼ûµÄÎÊÌâ.
Èç¹ûÐèÒª¸ü¶àµÄÖÐÎıàÂë, ¿ÉÒÔ´Ó CPAN (http://www.cpan.org/) ÏÂÔØ Encode::HanExtra Ä£¿é. ËüÄ¿Ç°ÌṩÏÂÁбàÂ뷽ʽ:
gb18030 À©³ä¹ýµÄ¹ú±êÂë, °üº¬·±ÌåÖÐÎÄ
ÁíÍâ, Encode::HanConvert Ä£¿éÔòÌṩÁ˼ò·±×ª»»ÓõÄÁ½ÖÖ±àÂë:
big5-simp Big5 ·±ÌåÖÐÎÄÓë Unicode ¼òÌåÖÐÎÄ»¥×ª gbk-trad GBK ¼òÌåÖÐÎÄÓë Unicode ·±ÌåÖÐÎÄ»¥×ª
ÈôÏëÔÚ GBK Óë Big5 Ö®¼ä»¥×ª, Çë²Î¿¼¸ÃÄ£¿éÄÚ¸½µÄ b2g.pl Óë g2b.pl Á½Ö§³ÌÐò, »òÔÚ³ÌÐòÄÚʹÓÃÏÂÁÐд·¨:
use Encode::HanConvert; $euc_cn = big5_to_gb($big5); # ´Ó Big5 תΪ GBK $big5 = gb_to_big5($euc_cn); # ´Ó GBK תΪ Big5
Çë²Î¿¼ Perl ÄÚ¸½µÄ´óÁ¿ËµÃ÷Îļþ (²»ÐÒÈ«ÊÇÓÃÓ¢ÎÄдµÄ), À´Ñ§Ï°¸ü¶à¹ØÓÚ Perl µÄ֪ʶ, ÒÔ¼° Unicode µÄʹÓ÷½Ê½. ²»¹ý, ÍⲿµÄ×ÊÔ´Ï൱·á¸»:
Perl µÄÊ×Ò³ (ÓÉÅ·À³Àñ¹«Ë¾Î¬»¤)
Perl ×ۺϵä²ØÍø (Comprehensive Perl Archive Network)
Perl ÓʵÝÂÛ̳һÀÀ
¼òÌåÖÐÎÄ°æµÄÅ·À³Àñ Perl Êé½å
Öйú Perl Íƹã×éÒ»ÀÀ
Unicode ѧÊõѧ»á (Unicode ±ê×¼µÄÖƶ¨Õß)
Unix/Linux É쵀 UTF-8 ¼° Unicode ´ð¿ÍÎÊ
Encode, the Encode::CN manpage, encoding, the perluniintro manpage, the perlunicode manpage
Jarkko Hietaniemi <jhi@iki.fi>
Autrijus Tang (ÌÆ×Úºº) <autrijus@autrijus.org>