Servlet、Jsp中的多國語言顯示
發(fā)表時間:2024-01-21 來源:明輝站整理相關(guān)軟件相關(guān)文章人氣:
[摘要]因為一直不信Java竟會有不能混排顯示多國語言的BUG,這個周末研究了一下Servlet、Jsp的多國語言顯示的問題,也就是Servlet的多字符集問題,由于我對字符集的概念還不是很清晰所以寫出的東西未必是準(zhǔn)確的,我是這樣理解Java中的字符集的:在運行時,每個字符串對象中存儲的都是編碼為UNIC...
因為一直不信Java竟會有不能混排顯示多國語言的BUG,這個周末研究了一下Servlet、Jsp的多國語言顯示的問題,也就是Servlet的多字符集問題,由于我對字符集的概念還不是很清晰所以寫出的東西未必是準(zhǔn)確的,我是這樣理解Java中的字符集的:在運行時,每個字符串對象中存儲的都是編碼為UNICODE內(nèi)碼的(我覺得所有的語言中都是有相應(yīng)編碼的,因為在計算機內(nèi)部字符串總是用內(nèi)碼來表示的,只不過一般計算機語言中的字符串編碼時平臺相關(guān)的,而Java則采用了平臺無關(guān)的UNICODE)。
Java從一個byte流中讀取一個字符串時,將把平臺相關(guān)的byte轉(zhuǎn)變?yōu)槠脚_無關(guān)的Unicode字符串。在輸出時Java將把Unicode字符串轉(zhuǎn)變?yōu)槠脚_相關(guān)的byte流,如果某個Unicode字符在某個平臺上不存在,將會輸出一個'?'。舉個例子:在中文Windows中,Java讀出一個"GB2312"編碼的文件(可以是任何流)到內(nèi)存中構(gòu)造字符串對象,將會把GB2312編碼的文字轉(zhuǎn)變?yōu)閁nicode編碼的字符串,如果把這個字符串輸出又將會把Unicode字符串轉(zhuǎn)化為GB2312的byte流或數(shù)組:"中文測試"----->"\u4e2d\u6587\u6d4b\u8bd5"----->"中文測試"。
如下例程:
byte[] bytes = new byte[]{(byte)0xd6, (byte)0xd0, (byte)0xce, (byte)0xc4, (byte)0xb2, (byte)0xe2, (byte)0xca, (byte)0xd4};//GBK編碼的"中文測試"
java.io.ByteArrayInputStream bin = new java.io.ByteArrayInputStream(bytes);
java.io.BufferedReader reader = new java.io.BufferedReader(new java.io. InputStreamReader (bin,"GBK"));
String msg = reader.readLine();
System.out.println(msg)
這段程序放到包含"中文測試"這四個字的系統(tǒng)(如中文系統(tǒng))中,可以正確地打印出這些字。msg字符串中包含了正確的"中文測試"的Unicode編碼:"\u4e2d\u6587\u6d4b\u8bd5",打印時轉(zhuǎn)換為操作系統(tǒng)的默認(rèn)字符集,是否可以正確顯示依賴于操作系統(tǒng)的字符集,只有在支持相應(yīng)字符集的系統(tǒng)中,我們的信息才能正確的輸出,否則得到的將會是垃圾。
話入正題,我們來看看Servlet/Jsp中的多語言問題。我們的目標(biāo)是,任一國家的客戶端通過Form向Server發(fā)送信息,Server把信息存入數(shù)據(jù)庫中,客戶端在檢索時仍然能夠看到自己發(fā)送的正確信息。事實上,我們要保證,最終Server中的SQL語句中保存的時包含客戶端發(fā)送文字的正確Unicode編碼;DBC與數(shù)據(jù)庫通訊時采用的編碼方式能包含客戶端發(fā)送的文字信息,事實上,最好讓JDBC直接使用UNICODE/UTF8與數(shù)據(jù)庫通訊!這樣就可以確保不會丟失信息;Server向客戶端發(fā)送的信息時也要采用不丟失信息的編碼方式,也可以是Unicode/Utf8。
如果不指定Form的Enctype屬性,F(xiàn)orm將把輸入的內(nèi)容依照當(dāng)前頁面的編碼字符集urlencode之后再提交,服務(wù)器端得到是urlencoding的字符串。編碼后得到的urlencoding字符串是與頁面的編碼相關(guān)的,如gb2312編碼的頁面提交"中文測試",得到的是"%D6%D0%CE%C4%B2%E2%CA%D4",每個"%"后跟的是16進(jìn)制的字符串;而在UTF8編碼時得到的卻是"%E4%B8%AD%E6%96%87%E6%B5%8B%E8%AF%95",因為GB2312編碼中一個漢字是16位的,而UTF8中一個漢字卻是24位的。中日韓三國的ie4以上瀏覽器均支持UTF8編碼,這種方案肯定包涵了這三國語言,所以我們?nèi)绻孒tml頁面使用UTF8編碼那么將至少可以支持這三國語言。
但是,如果我們html/Jsp頁面使用UTF8編碼,因為應(yīng)用程序服務(wù)器可能不知道這種情況,因為如果瀏覽器發(fā)送的信息不包含charset信息,至多Server知道讀到Accept-Language請求投標(biāo),我們知道僅靠這個投標(biāo)是不能獲知瀏覽器所采用編碼的,所以應(yīng)用程序服務(wù)器不能正確解析提交的內(nèi)容,為什么?因為Java中的所有字符串都是Unicode16位編碼的,HttpServletRequest.request(String)的功能就是把客戶端提交的Urlencode編碼的信息轉(zhuǎn)為Unicode字符串,有些Server只能認(rèn)為客戶端的編碼和Server平臺相同,簡單地使用URLDecoder.decode(String)方法直接解碼,如果客戶端編碼恰好和Server相同,那么就可以得到正確地字符串,否則,如果提交地字符串中包含了當(dāng)?shù)刈址,那么將會?dǎo)致垃圾信息。
在我提出的這個解決方案里,已經(jīng)指定了采用Utf8編碼,所以,可以避免這個問題,我們可以自己定制出decode方法:
public static String decode(String s,String encoding) throws Exception {
StringBuffer sb = new StringBuffer();
for(int i=0; i<s.length(); i++) {
char c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
break;
case '%':
try {
sb.append((char)Integer.parseInt(
s.substring(i+1,i+3),16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException();
}
i += 2;
break;
default:
sb.append(c);
break;
}
}
// Undo conversion to external encoding
String result = sb.toString();
byte[] inputBytes = result.getBytes("8859_1");
return new String(inputBytes,encoding);
}
這個方法可以指定encoding,如果把它指定為UTF8就滿足了我們的需要。比如用它解析:"%E4%B8%AD%E6%96%87%E6%B5%8B%E8%AF%95"就可以得到正確的漢字"中文測試"的Unicode字符串。
現(xiàn)在的問題就是我們必須得到客戶端提交的Urlencode的字符串。對于method為get的form提交的信息,可以用HttpServletRequest.getQueryString()方法讀到,而對于post方法的form提交的信息,只能從ServletInputStream中讀到,事實上標(biāo)準(zhǔn)的getParameter方法被第一次調(diào)用后,form提交的信息就被讀取出來了,而ServletInputStream是不能重復(fù)讀出的。所以我們應(yīng)在第一次使用getParameter方法前讀取并解析form提交的信息。
我是這么做的,建立一個Servlet基類,覆蓋service方法,在調(diào)用父類的service方法前讀取并解析form提交的內(nèi)容,請看下面的源代碼:
package com.hto.servlet;
import javax.servlet.http.HttpServletRequest;
import java.util.*;
/**
* Insert the type's description here.
* Creation date: (2001-2-4 15:43:46)
* @author: 錢衛(wèi)春
*/
public class UTF8ParameterReader {
Hashtable pairs = new Hashtable();
/**
* UTF8ParameterReader constructor comment.
*/
public UTF8ParameterReader(HttpServletRequest request) throws java.io.IOException{
super();
parse(request.getQueryString());
parse(request.getReader().readLine());
}
/**
* UTF8ParameterReader constructor comment.
*/
public UTF8ParameterReader(HttpServletRequest request,String encoding) throws java.io.IOException{
super();
parse(request.getQueryString(),encoding);
parse(request.getReader().readLine(),encoding);
}
public static String decode(String s) throws Exception {
StringBuffer sb = new StringBuffer();
for(int i=0; i<s.length(); i++) {
char c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
break;
case '%':
try {
sb.append((char)Integer.parseInt(
s.substring(i+1,i+3),16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException();
}
i += 2;
break;
default:
sb.append(c);
break;
}
}
// Undo conversion to external encoding
String result = sb.toString();
byte[] inputBytes = result.getBytes("8859_1");
return new String(inputBytes,"UTF8");
}
public static String decode(String s,String encoding) throws Exception {
StringBuffer sb = new StringBuffer();
for(int i=0; i<s.length(); i++) {
char c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
break;
case '%':
try {
sb.append((char)Integer.parseInt(
s.substring(i+1,i+3),16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException();
}
i += 2;
break;
default:
sb.append(c);
break;
}
}
// Undo conversion to external encoding
String result = sb.toString();
byte[] inputBytes = result.getBytes("8859_1");
return new String(inputBytes,encoding);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-4 17:30:59)
* @return java.lang.String
* @param name java.lang.String
*/
public String getParameter(String name) {
if (pairs == null !pairs.containsKey(name)) return null;
return (String)(((ArrayList) pairs.get(name)).get(0));
}
/**
* Insert the method's description here.
* Creation date: (2001-2-4 17:28:17)
* @return java.util.Enumeration
*/
public Enumeration getParameterNames() {
if (pairs == null) return null;
return pairs.keys();
}
/**
* Insert the method's description here.
* Creation date: (2001-2-4 17:33:40)
* @return java.lang.String[]
* @param name java.lang.String
*/
public String[] getParameterValues(String name) {
if (pairs == null !pairs.containsKey(name)) return null;
ArrayList al = (ArrayList) pairs.get(name);
String[] values = new String[al.size()];
for(int i=0;i<values.length;i++)
values = (String) al.get(i);
return values;
}
/**
* Insert the method's description here.
* Creation date: (2001-2-4 20:34:37)
* @param urlenc java.lang.String
*/
private void parse(String urlenc) throws java.io.IOException{
if (urlenc == null) return;
StringTokenizer tok = new StringTokenizer(urlenc,"&");
try{
while (tok.hasMoreTokens()){
String aPair = tok.nextToken();
int pos = aPair.indexOf("=");
String name = null;
String value = null;
if(pos != -1){
name = decode(aPair.substring(0,pos));
value = decode(aPair.substring(pos+1));
}else{
name = aPair;
value = "";
}
if(pairs.containsKey(name)){
ArrayList values = (ArrayList)pairs.get(name);
values.add(value);
}else{
ArrayList values = new ArrayList();
values.add(value);
pairs.put(name,values);
}
}
}catch(Exception e){
throw new java.io.IOException(e.getMessage());
}
}
/**
* Insert the method's description here.
* Creation date: (2001-2-4 20:34:37)
* @param urlenc java.lang.String
*/
private void parse(String urlenc,String encoding) throws java.io.IOException{
if (urlenc == null) return;
StringTokenizer tok = new StringTokenizer(urlenc,"&");
try{
while (tok.hasMoreTokens()){
String aPair = tok.nextToken();
int pos = aPair.indexOf("=");
String name = null;
String value = null;
if(pos != -1){
name = decode(aPair.substring(0,pos),encoding);
value = decode(aPair.substring(pos+1),encoding);
}else{
name = aPair;
value = "";
}
if(pairs.containsKey(name)){
ArrayList values = (ArrayList)pairs.get(name);
values.add(value);
}else{
ArrayList values = new ArrayList();
values.add(value);
pairs.put(name,values);
}
}
}catch(Exception e){
throw new java.io.IOException(e.getMessage());
}
}
}
這個類的功能就是讀取并保存form提交的信息,并實現(xiàn)常用的getParameter方法。
package com.hto.servlet;
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
/**
* Insert the type's description here.
* Creation date: (2001-2-5 8:28:20)
* @author: 錢衛(wèi)春
*/
public class UtfBaseServlet extends HttpServlet {
public static final String PARAMS_ATTR_NAME = "PARAMS_ATTR_NAME";
/**
* Process incoming HTTP GET requests
*
* @param request Object that encapsulates the request to the servlet
* @param response Object that encapsulates the response from the servlet
*/
public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
performTask(request, response);
}
/**
* Process incoming HTTP POST requests
*
* @param request Object that encapsulates the request to the servlet
* @param response Object that encapsulates the response from the servlet
*/
public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
performTask(request, response);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:52:43)
* @return int
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue int
*/
public static java.sql.Date getDateParameter(HttpServletRequest request, String name, boolean required, java.sql.Date defValue) throws ServletException{
String value = getParameter(request,name,required,String.valueOf(defValue));
return java.sql.Date.valueOf(value);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:52:43)
* @return int
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue int
*/
public static double getDoubleParameter(HttpServletRequest request, String name, boolean required, double defValue) throws ServletException{
String value = getParameter(request,name,required,String.valueOf(defValue));
return Double.parseDouble(value);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:52:43)
* @return int
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue int
*/
public static float getFloatParameter(HttpServletRequest request, String name, boolean required, float defValue) throws ServletException{
String value = getParameter(request,name,required,String.valueOf(defValue));
return Float.parseFloat(value);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:52:43)
* @return int
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue int
*/
public static int getIntParameter(HttpServletRequest request, String name, boolean required, int defValue) throws ServletException{
String value = getParameter(request,name,required,String.valueOf(defValue));
return Integer.parseInt(value);
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:43:36)
* @return java.lang.String
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue java.lang.String
*/
public static String getParameter(HttpServletRequest request, String name, boolean required, String defValue) throws ServletException{
if(request.getAttribute(UtfBaseServlet.PARAMS_ATTR_NAME) != null) {
UTF8ParameterReader params = (UTF8ParameterReader)request.getAttribute(UtfBaseServlet.PARAMS_ATTR_NAME);
if (params.getParameter(name) != null) return params.getParameter(name);
if (required) throw new ServletException("The Parameter "+name+" Required but not provided!");
else return defValue;
}else{
if (request.getParameter(name) != null) return request.getParameter(name);
if (required) throw new ServletException("The Parameter "+name+" Required but not provided!");
else return defValue;
}
}
/**
* Returns the servlet info string.
*/
public String getServletInfo() {
return super.getServletInfo();
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:52:43)
* @return int
* @param request javax.servlet.http.HttpServletRequest
* @param name java.lang.String
* @param required boolean
* @param defValue int
*/
public static java.sql.Timestamp getTimestampParameter(HttpServletRequest request, String name, boolean required, java.sql.Timestamp defValue) throws ServletException{
String value = getParameter(request,name,required,String.valueOf(defValue));
return java.sql.Timestamp.valueOf(value);
}
/**
* Initializes the servlet.
*/
public void init() {
// insert code to initialize the servlet here
}
/**
* Process incoming requests for information
*
* @param request Object that encapsulates the request to the servlet
* @param response Object that encapsulates the response from the servlet
*/
public void performTask(HttpServletRequest request, HttpServletResponse response) {
try
{
// Insert user code from here.
}
catch(Throwable theException)
{
// uncomment the following line when unexpected exceptions
// are occuring to aid in debugging the problem.
//theException.printStackTrace();
}
}
/**
* Insert the method's description here.
* Creation date: (2001-2-5 8:31:54)
* @param request javax.servlet.ServletRequest
* @param response javax.servlet.ServletResponse
* @exception javax.servlet.ServletException The exception description.
* @exception java.io.IOException The exception description.
*/
public void service(ServletRequest request, ServletResponse response) throws javax.servlet.ServletException, java.io.IOException {
String content = request.getContentType();
if(content == null content != null && content.toLowerCase().startsWith("application/x-www-form-urlencoded"))
request.setAttribute(PARAMS_ATTR_NAME,new UTF8ParameterReader((HttpServletRequest)request));
super.service(request,response);
}
}
這個就是Servlet基類,它覆蓋了父類的service方法,在調(diào)用父類service前,創(chuàng)建了UTF8ParameterReader對象,其中保存了form中提交的信息。然后把這個對象作為一個Attribute保存到Request對象中。然后照樣調(diào)用父類的service方法。
對于繼承這個類的Servlet,要注意的是,"標(biāo)準(zhǔn)"getParameter在也不能讀到post的數(shù)據(jù),因為在這之前這個類中已經(jīng)從ServletInputStream中讀出了數(shù)據(jù)了。所以應(yīng)該使用該類中提供的getParameter方法。
剩下的就是輸出問題了,我們要把輸出的信息,轉(zhuǎn)為UTF8的二進(jìn)制流輸出。只要我們設(shè)置Content-Type時指定charset為UTF8,然后使用PrintWriter輸出,那么這些轉(zhuǎn)換是自動進(jìn)行的,Servlet中這樣設(shè)置:
response.setContentType("text/html;charset=UTF8");
Jsp中這樣設(shè)置:
<%@ page contentType="text/html;charset=UTF8"%>
這樣就可以保證輸出是UTF8流,客戶端能否顯示,就看客戶端的了。