EvoTalk

Posts Tagged ‘utf8

18 六月, 2009

UTF8 <->UTF16 C Implement

Posted by: asd In: C++| Code Snippet| 程式設計

參考
UTF-8 与 UTF-16 相互转换及 \uhhhh 转换为 UTF-16 的 C++ 函数(上) 之函数篇
UTF-8 与 UTF-16 相互转换及 \uhhhh 转换为 UTF-16 的 C++ 函数(下) 之使用篇

Tags:

use command line tool : sed 、 iconv
Create a batch file 「u2b.bat」
view plain

CODE:

@echo off

SET FILE=%1

IF NOT DEFINED FILE GOTO USAGE

 

iconv -f utf-8 -t cp950 %FILE%&gt; 1.tmp

del %FILE%

ren 1.tmp %FILE%

 

EXIT /B 0

 

:USAGE

ECHO Convert encoding from utf8 to big5

ECHO Usage: u2b.bat [file]

 

EXIT /B 1

Create a batch file 「unix2dos.bat」
view plain

CODE:

@echo off

SET FILE=%1

IF NOT DEFINED FILE GOTO USAGE

 

sed -i "s/$/\r/" %FILE%

 

EXIT [...]

Tags: ,

19 十一月, 2007

UTF-8 Encoding and Decoding

Posted by: asd In: C++| Code Snippet| 程式設計

參考

CodeProject - UTF-8 Encoding and Decoding
Unicode UTF-8 encoding

改成 c 版本
view plain

C++:

#include &lt;stdio.h&gt;

#include &lt;string.h&gt;

 

void EncodeToUTF8(char * szSource, char *szFinal);

void DecodeFromUTF8(char * szSource, char *szFinal);

 

int main(int argc, char* argv[])

{

char szEncodeFinal[256];

char szDecodeFinal[256];

EncodeToUTF8("123abc測試", szEncodeFinal);

printf("Encode:%s\n", szEncodeFinal);

DecodeFromUTF8(szEncodeFinal, szDecodeFinal);

printf("Decode:%s\n", szDecodeFinal);

return 0;

}

 

void EncodeToUTF8(char * szSource, char *szFinal)

{

unsigned short ch;

 

unsigned char bt1, bt2, bt3, bt4, bt5, bt6;

 

int n, nMax = strlen(szSource);

 

//CString sFinal, szTemp;

szFinal[0] = [...]

Tags:

15 一月, 2007

Catch Google Suggest Keyword

Posted by: asd In: Code Snippet| Perl| 程式設計

參考Google Suggest script
改為支援查詢中文,重點為中文必須是utf8編碼,url遇到中文需要url encode,也就如%xx%xx%xx型式。輸入5401常用中文字,輸出為每個常用字開頭的關鍵字(utf8編碼)排名

Tags: ,

13 二月, 2006

判斷字串是否為utf8編碼

Posted by: asd In: C++| Code Snippet| 程式設計

參考「自动辨别文本是不是utf-8的c#程序」 改成C code
view plain

C++:

//0000 0000-0000 007F - 0xxxxxxx  (ascii converts to 1 octet!)

//0000 0080-0000 07FF - 110xxxxx 10xxxxxx    ( 2 octet format)

//0000 0800-0000 FFFF - 1110xxxx 10xxxxxx 10xxxxxx (3 octet format)

BOOL IsUTF8(const char *str)

{

int   i;

BYTE cOctets;  // octets to go in this UTF-8 encoded character

BYTE chr;

BOOL  bAllAscii= TRUE;

long iLen = strlen(str);

 

cOctets= 0;

for( i=0; i &lt;iLen; [...]

Tags:

Page 1 of 212