java程序在抓取url頁(yè)面時(shí),,有時(shí)會(huì)遇到中文輸出亂碼的問(wèn)題,主要原因是編碼格式不匹配所導(dǎo)致,。大部分網(wǎng)頁(yè)以u(píng)tf8編碼格式存儲(chǔ),,而通過(guò)網(wǎng)絡(luò)抓取頁(yè)面時(shí),將utf8作為字節(jié)流形式傳輸?shù)奖镜?,因此需要將字?jié)流轉(zhuǎn)換回utf8編碼的文本,。如果不轉(zhuǎn)換,,或者轉(zhuǎn)換成其他編碼格式,就會(huì)出現(xiàn)中文亂碼,。 下面是我原來(lái)寫的代碼: // 獲得抓取網(wǎng)頁(yè)的源碼 public String getdata(String url) { String data = null; org.apache.commons.httpclient.HttpClient client = new HttpClient(); GetMethod getMethod = new GetMethod(url); getMethod .setRequestHeader("User_Agent", "Mozilla/5.0(Windows NT 6.1;Win64;x64;rv:39.0) Gecko/20100101 Firefox/39.0"); getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler());// 系統(tǒng)默認(rèn)的恢復(fù)策略 try { int statusCode = client.executeMethod(getMethod); if (statusCode != HttpStatus.SC_OK) { System.out.println("Wrong"); } byte[] responseBody = getMethod.getResponseBody(); data = new String(responseBody); return data; } catch (HttpException e) { System.out.println("Please check your provided http address!"); data = ""; e.printStackTrace(); } catch (IOException e) { data = ""; e.printStackTrace(); } finally { getMethod.releaseConnection(); } return data; } 大家注意我標(biāo)紅的地方,,這樣寫執(zhí)行程序的時(shí)候,所有中文都會(huì)顯示亂碼,,打印出來(lái)如下圖:
修改代碼,,使用utf編碼格式, String data = new String(responseBody,"utf8"); 中文顯示正常 ,,完整代碼如下,,注意標(biāo)紅的部分: // 獲得源碼 public String getdata(String url) { String data = null; org.apache.commons.httpclient.HttpClient client = new HttpClient(); GetMethod getMethod = new GetMethod(url); getMethod .setRequestHeader("User_Agent", "Mozilla/5.0(Windows NT 6.1;Win64;x64;rv:39.0) Gecko/20100101 Firefox/39.0"); getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler());// 系統(tǒng)默認(rèn)的恢復(fù)策略 try { int statusCode = client.executeMethod(getMethod); if (statusCode != HttpStatus.SC_OK) { System.out.println("Wrong"); } byte[] responseBody = getMethod.getResponseBody(); data = new String(responseBody, "utf8"); return data; } catch (HttpException e) { System.out.println("Please check your provided http address!"); data = ""; e.printStackTrace(); } catch (IOException e) { data = ""; e.printStackTrace(); } finally { getMethod.releaseConnection(); } return data; } 執(zhí)行代碼后,打印出來(lái)如下圖所示: 問(wèn)題解決,。 |
|