Linux操作系統(tǒng)下漢字編碼的轉(zhuǎn)換

guitarhua 2012-05-03

展開(kāi)全文

Linux操作系統(tǒng)下漢字編碼的轉(zhuǎn)換

來(lái)源：中國(guó)IT實(shí)驗(yàn)室

http://www. 2011-11-28 17:05:08

因?yàn)轫?xiàng)目的需要linux下將GBK編碼轉(zhuǎn)換為utf8編碼,，google一下，網(wǎng)上的相關(guān)資源比較少,，下面的操作經(jīng)過(guò)本人的反復(fù)試驗(yàn),。本例子同樣適用于其他的編碼轉(zhuǎn)換,。

有g(shù)bk到utf8的轉(zhuǎn)換過(guò)程,，需要經(jīng)過(guò)unicode作為中間編碼。因?yàn)閃indows的轉(zhuǎn)換相對(duì)簡(jiǎn)單,，先講一下windows下的轉(zhuǎn)換過(guò)程,，linux下的過(guò)程基本相同，函數(shù)使用上有差別,。

Windows下：

1,、在windows下可以使用函數(shù)MultiByteToWideChar先將多字節(jié)字符，轉(zhuǎn)換為unicode,。

2,、使用函數(shù)WideCharToMultiByte,，將unicode再轉(zhuǎn)換為utf8編碼。

google一下,，網(wǎng)上例子很多,。在這里貼了一個(gè)簡(jiǎn)單的源碼，實(shí)現(xiàn)ansi到utf8編碼的轉(zhuǎn)換

view plain

char *multichar_2_utf8(const char *m_string)

{

int len=0;

wchar_t *w_string;

char *utf8_string;

//計(jì)算由ansi轉(zhuǎn)換為unicode后,，unicode編碼的長(zhǎng)度

len=MultiByteToWideChar(CP_ACP,0,(LPCTSTR)m_string, -1, NULL,0);//CP_ACP指示了轉(zhuǎn)換為unicode編碼的編碼類(lèi)型

w_string=(wchar_t *)malloc(2*len+2);

memset(w_string,0,2*len+2);

//ansi到unicode轉(zhuǎn)換

MultiByteToWideChar(CP_ACP, 0, (LPCTSTR)m_string,-1,w_string, len);//CP_ACP指示了轉(zhuǎn)換為unicode編碼的編碼類(lèi)型

//計(jì)算unicode轉(zhuǎn)換為utf8后,，utf8編碼的長(zhǎng)度

len = WideCharToMultiByte(CP_UTF8, 0, w_string, -1, NULL, 0, NULL, NULL);//CP_UTF8指示了unicode轉(zhuǎn)換為的類(lèi)型

utf8_string=(char *)malloc(len+1);

memset(utf8_string, 0, len + 1);

//unicode到utf8轉(zhuǎn)換

WideCharToMultiByte (CP_UTF8, 0, w_string, -1, utf8_string, len, NULL,NULL);//CP_UTF8指示了unicode轉(zhuǎn)換為的類(lèi)型

free(w_string);

return utf8_string;

}

Linux下：

linux下沒(méi)有上面的兩個(gè)函數(shù)，需要使用函數(shù) mbstowcs和wcstombs

mbstowcs將多字節(jié)編碼轉(zhuǎn)換為寬字節(jié)編碼

wcstombs將寬字節(jié)編碼轉(zhuǎn)換為多字節(jié)編碼

這兩個(gè)函數(shù),，轉(zhuǎn)換過(guò)程中受到系統(tǒng)編碼類(lèi)型的影響,，需要通過(guò)設(shè)置來(lái)設(shè)定轉(zhuǎn)換前和轉(zhuǎn)換后的編碼類(lèi)型。通過(guò)函數(shù)setlocale進(jìn)行系統(tǒng)編碼的設(shè)置,。

linux下輸入命名

locale -a查看系統(tǒng)支持的編碼類(lèi)型,。

view plain

andy@andy-linux:~$ locale -a

en_AG

en_AU.utf8

en_BW.utf8

en_CA.utf8

en_DK.utf8

en_GB.utf8

en_HK.utf8

en_IE.utf8

en_IN

en_NG

en_NZ.utf8

en_PH.utf8

en_SG.utf8

en_US.utf8

en_ZA.utf8

en_ZW.utf8

POSIX

zh_CN.gb18030

zh_CN.gbk

zh_CN.utf8

zh_HK.utf8

zh_SG.utf8

zh_TW.utf8

本例子中實(shí)現(xiàn)的是由zh_CN.gbk到zh_CN.utf8的轉(zhuǎn)換

流程：

1、調(diào)用函數(shù)setlocale(LC_ALL,"zh_CN.gbk"),，設(shè)置待轉(zhuǎn)碼的字符串類(lèi)型為gbk類(lèi)型,。

2、調(diào)用函數(shù)mbstowcs,，實(shí)現(xiàn) 1 設(shè)置的編碼到unicode編碼的轉(zhuǎn)換,。

3、調(diào)用函數(shù)setlocale(LC_ALL,"zh_CN.utf8"),，設(shè)置轉(zhuǎn)換后編碼類(lèi)型為utf8類(lèi)型,。

4、調(diào)用函數(shù)wcstombs,，實(shí)現(xiàn)unicode到 3 設(shè)置的編碼類(lèi)型的轉(zhuǎn)換,。

下面是我寫(xiě)的源碼

view plain

#include

/******************************************************************************

* FUNCTION: gbk2utf8

* DESCRIPTION: 實(shí)現(xiàn)由gbk編碼到utf8編碼的轉(zhuǎn)換

* Input: utfStr,轉(zhuǎn)換后的字符串; srcStr,待轉(zhuǎn)換的字符串; maxUtfStrlen, utfStr的最

大長(zhǎng)度

* Output: utfStr

* Returns: -1,fail;>0,success

* modification history

* --------------------

* 2011-nov-25, lvhongya written

* --------------------

******************************************************************************/

int gbk2utf8(char *utfStr,const char *srcStr,int maxUtfStrlen)

{

if(NULL==srcStr)

{

printf("Bad Parameter\n");

return -1;

}

//首先先將gbk編碼轉(zhuǎn)換為unicode編碼

if(NULL==setlocale(LC_ALL,"zh_CN.gbk"))//設(shè)置轉(zhuǎn)換為unicode前的碼,當(dāng)前為gbk編碼

{

printf("Bad Parameter\n");

return -1;

}

int unicodeLen=mbstowcs(NULL,srcStr,0);//計(jì)算轉(zhuǎn)換后的長(zhǎng)度

if(unicodeLen<=0)

{

printf("Can not Transfer!!!\n");

return -1;

}

wchar_t *unicodeStr=(wchar_t *)calloc(sizeof(wchar_t),unicodeLen+1);

mbstowcs(unicodeStr,srcStr,strlen(srcStr));//將gbk轉(zhuǎn)換為unicode

//將unicode編碼轉(zhuǎn)換為utf8編碼

if(NULL==setlocale(LC_ALL,"zh_CN.utf8"))//設(shè)置unicode轉(zhuǎn)換后的碼,當(dāng)前為utf8

{

printf("Bad Parameter\n");

return -1;

}

int utfLen=wcstombs(NULL,unicodeStr,0);//計(jì)算轉(zhuǎn)換后的長(zhǎng)度

if(utfLen<=0)

{

printf("Can not Transfer!!!\n");

return -1;

}

else if(utfLen>=maxUtfStrlen)//判斷空間是否足夠

{

printf("Dst Str memory not enough\n");

return -1;

}

wcstombs(utfStr,unicodeStr,utfLen);

utfStr[utfLen]=0;//添加結(jié)束符

free(unicodeStr);

return utfLen;

}

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布,，不代表本站觀點(diǎn),。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息,，謹(jǐn)防詐騙,。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào),。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： guitarhua > 《理學(xué)》

舉報(bào)/認(rèn)領(lǐng)