JDom或Dom4j輸出UTF-8的XML完美解決

9loong 2009-05-13

展開全文

JDom輸出UTF-8的XML完美解決
http://dev.yesky.com/82/8205582.shtml

2008-07-08 07:00作者：王琦出處：天極網(wǎng)責(zé)任編輯：nancy

　　現(xiàn)象描述：JDom輸出Xml文件，當(dāng)使用字符編碼GBK時正常，而輸出UTF-8時亂碼。

　　完美的解決方法從辟謠開始：

　　1)JDOM是否生成UTF-8的文件與Format是否設(shè)置無關(guān)，只有輸出其他字符編碼才需要設(shè)置，見下面的注釋。

　　2)JDOM輸出UTF-8文件亂碼的根本原因并非在JDOMAPI，而是在JDK。

　　具體描述：

　　JDOM的輸出類XMLOutputter有兩個output接口，除了都具有一個Document參數(shù)外，分別接受Writer和OutputStream參數(shù)。

　　這給我們一個錯覺，兩個接口可以任意使用。

　　首先我們用output(doc,System.out)來做測試，此時得到亂碼，

　　然后我們改為output(doc,new PrintWriter(System.out))來測試，輸出不是亂碼，

　　也就是說在控制臺的時候一定要用一個Writer接口包裝一下。

　　然后我們用output(doc,new FileWriter(path))來做測試，結(jié)果卻得到亂碼，

　　然后我們改為output(doc,new FileOutputStream(path))來測試，輸出不是亂碼，

　　也就是說在輸出文件的時候一定要用一個OutputStream接口包裝一下。

　　瘋了吧?呵呵，很搞笑是吧。經(jīng)過到JDOM的源碼中調(diào)試，發(fā)現(xiàn)沒有任何問題，問題出在了JDK里面。

　　JDK內(nèi)的對應(yīng)接口處理：

　　1)PrintWriter類有參數(shù)為OutputStream的構(gòu)造方法，因此可以從System.out包裝到PrintWriter

　　2)FileWriter類沒有參數(shù)為OutputStream的構(gòu)造方法，因此不能從FileOutputStream包裝到FileWriter

　　3)如果PrintWriter類用了參數(shù)為Writer的構(gòu)造方法(Writer實現(xiàn)為FileWriter)，最后輸出也是亂碼

　　4)如果用一個FileOutputStream來包裝一個控制臺輸出，也是亂碼

　　因此，對于JDK內(nèi)的各種輸出體系，各種InputStream、OutputStream、reader和writer要充分認識，否則極容易出現(xiàn)一些意想不到的問題。

　　測試的JDOM版本：1.0、1.1

　　測試代碼：

import java.io.File;
　　import java.io.FileOutputStream;
　　import java.io.FileWriter;
　　import java.io.PrintWriter;
　　import java.util.HashMap;
　　import org.jdom.Document;
　　import org.jdom.Element;
　　import org.jdom.output.Format;
　　import org.jdom.output.XMLOutputter;
　　public class BuildXML {
　　public static void main(String[] args) throws Exception{
　　File xmlfile=new File("C:\\EditTemp\\xml\\abc.xml");
　　//中文問題 //GBK 是沒有問題的，但UTF-8就是有問題的
　　//原因：
　　//1)對于磁盤文件，必須使用輸出流 FileOutputStream
　　// FileWriter out=new FileWriter(xmlfile);會導(dǎo)致亂碼
　　//2)對于控制臺輸出，則必須使用PrintWriter，如果直接使用System.out也會出現(xiàn)亂碼
　　// PrintWriter out=new PrintWriter(System.out);
　　FileOutputStream out=new FileOutputStream(xmlfile);
　　Element eroot=new Element("root");
　　eroot.addContent((new Element("code")).addContent("代碼"));
　　eroot.addContent((new Element("ds")).addContent("數(shù)據(jù)源"));
　　eroot.addContent((new Element("sql")).addContent("檢索sql"));
　　eroot.addContent((new Element("order")).addContent("排序"));
　　Document doc=new Document(eroot);
　　XMLOutputter outputter = new XMLOutputter();
　　//如果不設(shè)置format，僅僅是沒有縮進，xml還是utf-8的，因此format不是必要的
　　Format f = Format.getPrettyFormat();
　　//f.setEncoding("UTF-8");//default=UTF-8
　　outputter.setFormat(f);
　　outputter.output(doc, out);
　　out.close();
　　}
　　}

Dom4j 編碼問題徹底解決

http://www./resource/article/2004-10-31/1090.html

lonsen 發(fā)表于 2004-10-31 01:39:00

   這幾天開始學(xué)習(xí)dom4j，在網(wǎng)上找了篇文章就開干了，上手非常的快，但是發(fā)現(xiàn)了個問題就是無法以UTF-8保存xml文件，保存后再次讀出的時候會報“Invalid byte 2 of 2-byte UTF-8 sequence.”這樣一個錯誤，檢查發(fā)現(xiàn)由dom4j生成的這個文件，在使用可正確處理XML編碼的任何的編輯器中中文成亂碼，從記事本查看并不會出現(xiàn)亂碼會正確顯示中文。讓我很是頭痛。試著使用GBK、gb2312編碼來生成的xml文件卻可以正常的被解析。因此懷疑的dom4j沒有對utf-8編碼進行處理。便開始查看dom4j的原代碼。終于發(fā)現(xiàn)的問題所在，是自己程序的問題。
   在dom4j的范例和網(wǎng)上流行的《DOM4J 使用簡介》這篇教程中新建一個xml文檔的代碼都類似如下

    public void createXML(String fileName) {

        Document doc = org.dom4j.DocumentHelper.createDocument();

        Element root = doc.addElement("book");

        root.addAttribute("name", "我的圖書");

        Element childTmp;

        childTmp = root.addElement("price");

        childTmp.setText("21.22");

        Element writer = root.addElement("author");

        writer.setText("李四");

        writer.addAttribute("ID", "001");

        try {

            org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(

                    new FileWriter(fileName));

            xmlWriter.write(doc);

            xmlWriter.close();

        }

        catch (Exception e) {

            System.out.println(e);

        }

    }

   在上面的代碼中輸出使用的是FileWriter對象進行文件的輸出。這就是不能正確進行文件編碼的原因所在，java中由Writer類繼承下來的子類沒有提供編碼格式處理，所以dom4j也就無法對輸出的文件進行正確的格式處理。這時候所保存的文件會以系統(tǒng)的默認編碼對文件進行保存，在中文版的window下java的默認的編碼為GBK，也就是所雖然我們標識了要將xml保存為utf-8格式但實際上文件是以GBK格式來保存的，所以這也就是為什么能夠我們使用GBK、GB2312編碼來生成xml文件能正確的被解析，而以UTF-8格式生成的文件不能被xml解析器所解析的原因。
   好了現(xiàn)在我們找到了原因所在了，我們來找解決辦法吧。首先我們看看dom4j是如何實現(xiàn)編碼處理的

   public XMLWriter(OutputStream out) throws UnsupportedEncodingException {

        //System.out.println("In OutputStream");

        this.format = DEFAULT_FORMAT;

        this.writer = createWriter(out, format.getEncoding());

        this.autoFlush = true;

       namespaceStack.push(Namespace.NO_NAMESPACE);

    }

    public XMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException {

        //System.out.println("In OutputStream,OutputFormat");

        this.format = format;

        this.writer = createWriter(out, format.getEncoding());

        this.autoFlush = true;

       namespaceStack.push(Namespace.NO_NAMESPACE);

    }

    /**

     * Get an OutputStreamWriter, use preferred encoding.

     */

    protected Writer createWriter(OutputStream outStream, String encoding) throws UnsupportedEncodingException {

        return new BufferedWriter(

            new OutputStreamWriter( outStream, encoding )

        );

    }

   由上面的代碼我們可以看出dom4j對編碼并沒有進行什么很復(fù)雜的處理，完全通過java本身的功能來完成。所以我們在使用dom4j的來生成我們的XML文件時不應(yīng)該直接為在構(gòu)建XMLWriter時，不應(yīng)該直接為其賦一個Writer對象，而應(yīng)該通過一個OutputStream的子類對象來構(gòu)建。也就是說在我們上面的代碼中，不應(yīng)該用FileWriter對象來構(gòu)建xml文檔，而應(yīng)該使用FileOutputStream對象來構(gòu)建所以將代碼修改入下：
    public void createXML(String fileName) {

        Document doc = org.dom4j.DocumentHelper.createDocument();

        Element root = doc.addElement("book");

        root.addAttribute("name", "我的圖書");

        Element childTmp;

        childTmp = root.addElement("price");

        childTmp.setText("21.22");

        Element writer = root.addElement("author");

        writer.setText("李四");

        writer.addAttribute("ID", "001");

        try {
            //注意這里的修改

            org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(

                    new FileOutputStream(fileName));

            xmlWriter.write(doc);

            xmlWriter.close();

        }

        catch (Exception e) {

            System.out.println(e);

        }

    }

   至此DOM4J的問題編碼問題算是告一段落，希望對此文章對其他朋友有用。

（#）