注意:亂碼和request的具體實(shí)現(xiàn)類有關(guān),,現(xiàn)在已經(jīng)查到的是RequestDispatcher.forward調(diào)用前使用的是org.apache.catalina.connector.RequestFacade類而RequestDispatcher.forward調(diào)用后使用的是org.apache.catalina.core.ApplicationHttpRequest,,他們內(nèi)部在ParseParameter的時候,, 用來解碼的默認(rèn)的編碼邏輯不同,使用不同的協(xié)議時,,影響亂碼的因素不同,!
具體參考:Tomcat源碼分析--ServletRequest.getParameterValues內(nèi)部分析,Request字符集&QueryStringEncoding
亂碼的產(chǎn)生
譬如漢字“中”,,以UTF-8編碼后得到的是3字節(jié)的值%E4%B8%AD,,然后通過GET或者POST方式把這3個字節(jié)提交到Tomcat容器,如果你不告訴Tomcat我的參數(shù)是用UTF-8編碼的,,那么tomcat就認(rèn)為你是用ISO-8859-1來編碼的,,而ISO8859-1(兼容URI中的標(biāo)準(zhǔn)字符集US-ASCII)是兼容ASCII的單字節(jié)編碼并且使用了單字節(jié)內(nèi)的所有空間,因此Tomcat就以為你傳遞的用ISO-8859-1字符集編碼過的3個字符,,然后它就用ISO-8859-1來解碼,,得到中-,解碼后。字符串中-在Jvm是以Unicode的形式存在的,,而HTTP傳輸或者數(shù)據(jù)庫保存的其實(shí)是字節(jié),,因此根據(jù)各終端的需要,你可以把unicode字符串中-用UTF-8編碼后得到相應(yīng)的字節(jié)后存儲到數(shù)據(jù)庫(3個UTF-8字符),,也可以取得這3個字符對應(yīng)的ISO-8859-1的3個字節(jié),,然后用UTF-8重新編碼后得到unicode字符“中”(特性:把其他任何編碼的字節(jié)流當(dāng)作ISO-8859-1編碼看待都沒有問題),然后用response傳遞給客戶端(根據(jù)你設(shè)置的content-type不同,,傳遞的字節(jié)也是不同的?。? 總結(jié):
- 1,HTTP GET或者POST傳遞的是字節(jié),?數(shù)據(jù)庫保存的也是字節(jié)(譬如500MB空間就是500M字節(jié))
- 2,,亂碼產(chǎn)生的原因是編碼和解碼的字符集(方式)不同導(dǎo)致的,即對于幾個不同的字節(jié),,在不同的編碼方案下對應(yīng)的字符可能不同,,也可能在某種編碼下有些字節(jié)不存在(這也是亂碼中?產(chǎn)生的原因)
- 3,,解碼后的字符串在jvm中以Unicode的形式存在
- 4,,如果jvm中存在的Unicode字符就是你預(yù)期的字符(編碼,解碼的字符集相同或者兼容),,那么沒有任何問題,,如果jvm中存在的字符集不是你預(yù)期的字符,譬如上述例子中jvm中存在的是3個Unicode字符,,你也可以通過取得這3個unicode字符對應(yīng)的3個字節(jié),,然后用UTF-8對這3個字節(jié)進(jìn)行編碼生成新的Unicode字符:漢字“中”
- 5,ISO8859-1是兼容ASCII的單字節(jié)編碼并且使用了單字節(jié)內(nèi)的所有空間,,在支持ISO-8859-1的系統(tǒng)中傳輸和存儲其他任何編碼的字節(jié)流都不會被拋棄,。換言之,把其他任何編碼的字節(jié)流當(dāng)作ISO-8859-1編碼看待都沒有問題,。
下面的代碼顯示,,使用不同的編碼來Encoder會得到不同的結(jié)果,同時如果Encoder和Decoder不一致或者使用的漢字在編碼ISO-8859-1中不存在時,,都會表現(xiàn)為亂碼的形式,!
- try {
-
- // 漢字“中”用UTF-8進(jìn)行URLEncode的時候,得到%e4%b8%ad(對應(yīng)的ISO-8859-1的字符是中)
- String item = new String(new byte[] { (byte) 0xe4, (byte) 0xb8, (byte) 0xad }, "UTF-8");
- // 中
- System.out.println(item);
-
- item = new String(new byte[] { (byte) 0xe4, (byte) 0xb8, (byte) 0xad }, "ISO-8859-1");
- // 中
- System.out.println(item);
-
- System.out.println(new BigInteger("253").toByteArray());
- System.out.println(Integer.toBinaryString(253));
-
- // 中
- item = new String(item.getBytes("ISO_8859_1"), "UTF-8");
- System.out.println(item);
- // 中
- item = new String(item.getBytes("UTF-8"), "ISO_8859_1");
- System.out.println(item);
-
- // 漢字中以UTF-8編碼為 %E4%B8%AD(3字節(jié))
- System.out.println(URLEncoder.encode("中", "UTF-8"));
- // 漢字中以UTF-8編碼為 %3F (1字節(jié) 這是由于漢字在ISO-8859-1字符集中不存在,,返回的是,?在ISO-8859-1下的編碼)
- System.out.println(URLEncoder.encode("中", "ISO-8859-1"));
- // 漢字中以UTF-8編碼為 %D6%D0 (2字節(jié))
- System.out.println(URLEncoder.encode("中", "GB2312"));
-
- // 把漢字中對應(yīng)的UTF-8編碼 %E4%B8%AD 用UTF-8解碼得到正常的漢字 中
- System.out.println(URLDecoder.decode("%E4%B8%AD", "UTF-8"));
- // 把漢字中對應(yīng)的ISO-8859-1編碼 %3F 用ISO-8859-1解碼得到?
- System.out.println(URLDecoder.decode("%3F", "ISO-8859-1"));
- // 把漢字中對應(yīng)的GB2312編碼 %D6%D0 用GB2312解碼得到正常的漢字 中
- System.out.println(URLDecoder.decode("%D6%D0", "GB2312"));
- // 把漢字中對應(yīng)的UTF-8編碼 %E4%B8%AD 用ISO-8859-1解碼
- // 得到字符中(這個就是所謂的亂碼,其實(shí)是3字節(jié)%E4%B8%AD中每個字節(jié)對應(yīng)的ISO-8859-1中的字符)
- // ISO-8859-1字符集使用了單字節(jié)內(nèi)的所有空間
- System.out.println(URLDecoder.decode("%E4%B8%AD", "ISO-8859-1"));
- // 把漢字中對應(yīng)的UTF-8編碼 %E4%B8%AD 用GB2312解碼
- // 得到字符涓?,,因?yàn)榍?字節(jié) %E4%B8對應(yīng)的GB2312的字符就是涓,,而第3字節(jié)%AD在GB2312編碼中不存在,,故返回?
- System.out.println(URLDecoder.decode("%E4%B8%AD", "GB2312"));
- } catch (UnsupportedEncodingException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
Tomcat關(guān)于encoding編碼的默認(rèn)設(shè)置以及相關(guān)標(biāo)準(zhǔn):
對于Get請求,,"URI Syntax"規(guī)范規(guī)定HTTP query strings(又叫GET parameters)使用US-ASCII編碼,,所有不在這個編碼范圍內(nèi)的字符,必須經(jīng)常一定的轉(zhuǎn)碼:%61的形式(encode),。又由于ISO-8859-1 and ASCII對于0x20 to 0x7E范圍內(nèi)的字符是兼容的,,大部分的web容器譬如Tomcat容器默認(rèn)使用ISO-8859-1解碼URI中%xx部分的字節(jié)??梢允褂肅onnector中的URIEncoding來修改這個默認(rèn)用來解碼URI中%xx部分字節(jié)的字符集,。URIEncoding要和get請求query string中encode的編碼一直,或者通過設(shè)置Content-Type來告訴容器你使用什么編碼來轉(zhuǎn)碼url中的字符
POST請求應(yīng)該自己通過參數(shù)Content-Type指定所使用的編碼,,由于許多客戶端都沒有設(shè)置一個明確的編碼,,tomcat就默認(rèn)使用ISO-8859-1編碼。注意:用來對URI進(jìn)行解碼的字符集,,Request字符集,,Response字符集的區(qū)別!不同的Request實(shí)現(xiàn)中,,對于上述3個編碼的關(guān)系是不同的
對于POST請求,,ISO-8859-1是Servlet規(guī)范中定義的HTTP request和response的默認(rèn)編碼。如果request或者response的字符集沒有被設(shè)定,,那么Servlet規(guī)范指定使用編碼ISO-8859-1,,請求和相應(yīng)指定編碼是通過Content-Type響應(yīng)頭來設(shè)定的。
如果Get,、Post請求沒有通過Content-Type來設(shè)置編碼的話,,Tomcat默認(rèn)使用ISO-8859-1編碼??梢允褂肧etCharacterEncodingFilter來修改Tomcat請求的默認(rèn)編碼設(shè)置(encoding:使用的編碼,, ignore:true,,不管客戶端是否指定了編碼都進(jìn)行設(shè)置, false,,只有在客戶端沒有指定編碼的時候才進(jìn)行編碼設(shè)置,, 默認(rèn)true)
注意:一般這個Filter建議放在所有Filter的最前面(Servlet3.0之前基于filter-mapping在web.xml中的順序,, Servlet3.0之后有參數(shù)可以指定順序),因?yàn)橐坏膔equest里面取值后,,再進(jìn)行設(shè)置的話,,設(shè)置無效。因?yàn)樵诘谝淮螐膔equest取值時,,tomcat會把querystring或者post方式提交的變量,,用指定的編碼轉(zhuǎn)成從parameters數(shù)組,,以后直接從這個數(shù)組中獲取相應(yīng)參數(shù)的值!
到處都使用UTF-8建議操作:
- 1,, Set URIEncoding="UTF-8" on your <Connector> in server.xml.使得Tomcat Http Get請求使用UTF-8編碼
- 2,, Use a character encoding filter with the default encoding set to UTF-8. 由于很多請求本身沒有指定編碼, Tomcat默認(rèn)使用ISO-8859-1編碼作為HttpServletRequest的編碼,,通過filter修改
- 3,, Change all your JSPs to include charset name in their contentType. For example, use <%@page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents). 指定Jsp頁面使用的編碼
- 4, Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8. Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8"). 設(shè)置Response返回結(jié)果的編碼
- 5, Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.指定所有模版引擎佘勇的編碼
- 6, Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. SetCharacterEncodingFilter一般要放置在第一位,,否則可能無效
- /*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www./licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
- package filters;
-
-
- import java.io.IOException;
- import javax.servlet.Filter;
- import javax.servlet.FilterChain;
- import javax.servlet.FilterConfig;
- import javax.servlet.ServletException;
- import javax.servlet.ServletRequest;
- import javax.servlet.ServletResponse;
-
-
- /**
- * <p>Example filter that sets the character encoding to be used in parsing the
- * incoming request, either unconditionally or only if the client did not
- * specify a character encoding. Configuration of this filter is based on
- * the following initialization parameters:</p>
- * <ul>
- * <li><strong>encoding</strong> - The character encoding to be configured
- * for this request, either conditionally or unconditionally based on
- * the <code>ignore</code> initialization parameter. This parameter
- * is required, so there is no default.</li>
- * <li><strong>ignore</strong> - If set to "true", any character encoding
- * specified by the client is ignored, and the value returned by the
- * <code>selectEncoding()</code> method is set. If set to "false,
- * <code>selectEncoding()</code> is called <strong>only</strong> if the
- * client has not already specified an encoding. By default, this
- * parameter is set to "true".</li>
- * </ul>
- *
- * <p>Although this filter can be used unchanged, it is also easy to
- * subclass it and make the <code>selectEncoding()</code> method more
- * intelligent about what encoding to choose, based on characteristics of
- * the incoming request (such as the values of the <code>Accept-Language</code>
- * and <code>User-Agent</code> headers, or a value stashed in the current
- * user's session.</p>
- *
- * @author Craig McClanahan
- * @version $Id: SetCharacterEncodingFilter.java 939521 2010-04-30 00:16:33Z kkolinko $
- */
-
- public class SetCharacterEncodingFilter implements Filter {
-
-
- // ----------------------------------------------------- Instance Variables
-
-
- /**
- * The default character encoding to set for requests that pass through
- * this filter.
- */
- protected String encoding = null;
-
-
- /**
- * The filter configuration object we are associated with. If this value
- * is null, this filter instance is not currently configured.
- */
- protected FilterConfig filterConfig = null;
-
-
- /**
- * Should a character encoding specified by the client be ignored?
- */
- protected boolean ignore = true;
-
-
- // --------------------------------------------------------- Public Methods
-
-
- /**
- * Take this filter out of service.
- */
- public void destroy() {
-
- this.encoding = null;
- this.filterConfig = null;
-
- }
-
-
- /**
- * Select and set (if specified) the character encoding to be used to
- * interpret request parameters for this request.
- *
- * @param request The servlet request we are processing
- * @param result The servlet response we are creating
- * @param chain The filter chain we are processing
- *
- * @exception IOException if an input/output error occurs
- * @exception ServletException if a servlet error occurs
- */
- public void doFilter(ServletRequest request, ServletResponse response,
- FilterChain chain)
- throws IOException, ServletException {
-
- // Conditionally select and set the character encoding to be used
- if (ignore || (request.getCharacterEncoding() == null)) {
- String encoding = selectEncoding(request);
- if (encoding != null)
- request.setCharacterEncoding(encoding);
- }
-
- // Pass control on to the next filter
- chain.doFilter(request, response);
-
- }
-
-
- /**
- * Place this filter into service.
- *
- * @param filterConfig The filter configuration object
- */
- public void init(FilterConfig filterConfig) throws ServletException {
-
- this.filterConfig = filterConfig;
- this.encoding = filterConfig.getInitParameter("encoding");
- String value = filterConfig.getInitParameter("ignore");
- if (value == null)
- this.ignore = true;
- else if (value.equalsIgnoreCase("true"))
- this.ignore = true;
- else if (value.equalsIgnoreCase("yes"))
- this.ignore = true;
- else
- this.ignore = false;
-
- }
-
-
- // ------------------------------------------------------ Protected Methods
-
-
- /**
- * Select an appropriate character encoding to be used, based on the
- * characteristics of the current request and/or filter initialization
- * parameters. If no character encoding should be set, return
- * <code>null</code>.
- * <p>
- * The default implementation unconditionally returns the value configured
- * by the <strong>encoding</strong> initialization parameter for this
- * filter.
- *
- * @param request The servlet request we are processing
- */
- protected String selectEncoding(ServletRequest request) {
-
- return (this.encoding);
-
- }
-
-
- }
參考:tomcat wiki faq Character Encoding Issues
Apache Tomcat Configuration Reference - The HTTP Connector
|