1 |
import urllib2 |
2 |
from BeautifulSoup import BeautifulSoup |
3 |
4 |
page = urllib2.urlopen( 'http://www.' ); |
5 |
soup = BeautifulSoup(page,fromEncoding = "gb18030" ) |
6 |
7 |
print soup.originalEncoding |
8 |
print soup.prettify() |
如果中文頁面編碼是gb2312,gbk,,在BeautifulSoup構造器中傳入fromEncoding="gb18030"參數(shù)即可解決亂碼問題,,即使分析的頁面是utf8的頁面使用gb18030也不會出現(xiàn)亂碼問題!