Java开发XML解析器Document、SAXParser、XMLStreamReader详解

1.Document

接口对象是官方出的,W3C标准,作为HTML、XML实体类加载到内存中,形成文档对象,然后使用循环进行数据解析。

2.SAXParser

SAXParser是一个用于处理XML的事件驱动的“推”模型。它不是W3C标准,但它是一个得到了广泛认可的API,大多数SAXParser解析器在实现的时候都遵循标准。

SAXParser解析器不象DOM那样建立一个整个文档的树型表示,而是使用数据流的方式读取,然后根据读取文档的元素类型进行事件反馈。这些事件将会推给事件处理器,而事件处理器则提供对文档内容的访问数据包装等。

事件处理器有三种基本类型:

  • 用于访问XML DTD内容的DTDHandler;
  • 用于低级访问解析错误的ErrorHandler;
  • 用于访问文档内容的最普遍类型ContentHandler。

3.XMLStreamReader(StAX)

XMLStreamReader也属于数据留解析的一种,读入文件,按线性的方式从文件头一直读到文件尾;和SAXParser一样,使用事件驱动的模型来反馈事件。不同的是,XMLStreamReader不使用SAXParser的推模型,而是使用 “拉”模型进行事件处理。而且XMLStreamReader解析器不使用回调机制,而是根据应用程序的要求返回事件。XMLStreamReader还提供了用户友好的API用于读入和写出。

尽管SAXParser向ContentHandler返回不同类型的事件,但XMLStreamReader却将它的事件返回给应用程序,甚至可以以对象的形式提供事件。

当应用程序要求一个事件时,XMLStreamReader解析器根据需要从XML文档读取并将该事件返回给该应用程序。 XMLStreamReader提供了用于创建XMLStreamReader读写器的工具,所以应用程序可以使用StAX接口而无需参考特定实现的细节。

与Document和SAXParser不同,XMLStreamReader指定了两个解析模型:指针模型,如SAXParser,它简单地返回事件;迭代程序模型,它以对象形式返回事件(这里需要吐槽一下,我个人是比较喜欢SAXParser的handler事件处理的模式,代码方面比较值观),其实XMLStreamReader也可以跟SAXParser一样,但是需要额外的对象创建开销。

以下来看看示例代码:

1.Document解析XML的基础代码:

DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
Element element = document.getDocumentElement();

只需要三行代码就可以把Element对象读出来,这时候只需要遍历Element对象,就可以把数据组装出来。

2.SAXParser解析XML的基础代码

SAXParserFactory factory = SAXParserFactory.newInstance();
try {
    SAXParser parser = factory.newSAXParser();
    parser.parse(path, handler);
} catch (ParserConfigurationException e) {
    e.printStackTrace();
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

也是三行代码,其中比较重要的是handler的事件回调,这里使用的是DefaultHandler。

3.XMLStreamReader(StAX)

InputStream in = new FileInputStream(path);
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLStreamReader reader = factory.createXMLStreamReader(in);

while (reader.hasNext()) {
    int event = reader.next();
    if (event == XMLStreamConstants.START_ELEMENT) {
     
    } else if (event == XMLStreamConstants.END_ELEMENT) {

    } else if (event == XMLStreamConstants.END_DOCUMENT) {
        out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
    }
}

这里使用InputStream读入文件流,然后把流数据传递给XMLStreamReader对象,接着就循环遍历,在循环中必须使用.next()返回事件类型。

以下是我测试读取全国地区(含县级)数据的测试时间:

QQ截图20181216142211

Document使用了103ms,其中SAXParser解析最快,基本上都是10~16ms之间,这取决于个人电脑,我的是比较烂的垃圾笔记本。

以下贴出读取全国XML地区数据的JAVA代码,三种方式:

一、Document

import model.AreaModel;
import model.AreaNode;
import model.CityModel;
import org.w3c.dom.*;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * Document解析
 * Created by alan on 2018/12/16.
 */
public class XmlParserByDocument extends OutPut {

    private String path;

    List<AreaModel> areaModels = new ArrayList<>();

    public XmlParserByDocument() {
    }

    public XmlParserByDocument(String path) {
        this.path = path;
    }

    public List<AreaModel> getAreaModels() {
        return areaModels;
    }

    public void parser() {
        long t = System.currentTimeMillis();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        try {
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document document = builder.parse(path);
            Element element = document.getDocumentElement();

            out("document v" + document.getXmlVersion() + " encode " + document.getInputEncoding());
            if ("root".equals(element.getTagName())) {
                NodeList nodeList = element.getChildNodes();

                AreaModel area = null;
                CityModel city;
                for (int i = 0; i < nodeList.getLength(); i++) {
                    String nodeName = nodeList.item(i).getNodeName();
                    if ("province".equals(nodeName)) {
                        area = new AreaModel(parserNode(nodeList.item(i)), parserNodeList(nodeList.item(i).getChildNodes()));
                        areaModels.add(area);

                    }
                }

                out("Use Document object and use time is " + (System.currentTimeMillis() - t) + "ms.");

            } else {
                throw new Exception("invalid xml file.");
            }


        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {

        }
    }

    public void test(){
        String str = "";
        for (AreaModel a : areaModels) {
            str += a.getProvince() + "\n";
            for (AreaNode n : a.getCitys()) {
                str += "\t" + n + "\n";
                for (AreaNode j : n.getChild()) {
                    str += "\t\t" + j + "\n";
                }
            }
        }
        out(str);
    }

    private List<AreaNode> parserNodeList(NodeList list) {
        List<AreaNode> nodes = new ArrayList<>();
        int l = list.getLength();
        for (int i = 0; i < list.getLength(); i++) {
            if (list.item(i).hasChildNodes()) {
                AreaNode node = parserNode(list.item(i));
                node.setChild(parserNodeList(list.item(i).getChildNodes()));
                nodes.add(node);
            } else {
                AreaNode node = parserNode(list.item(i));
                if (node != null) {
                    nodes.add(node);
                }
            }
        }
        return nodes;
    }

    private AreaNode parserNode(Node node) {
        AreaNode areaNode = null;
        NamedNodeMap attrs = node.getAttributes();
        if (attrs != null) {
            areaNode = new AreaNode(attrs.getNamedItem("name").getTextContent(), Integer.valueOf(attrs.getNamedItem("postcode").getTextContent()));
        }
        return areaNode;
    }

}

 

二、SAXParser

import model.AreaModel;
import model.AreaNode;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributesImpl;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * Stream解析bySAX
 * Created by alan on 2018/12/16.
 */
public class XmlParserBySAX extends OutPut {

    private String path = "d:/test/area.xml";

    private List<AreaModel> areaModels;

    public XmlParserBySAX() {
    }

    public XmlParserBySAX(String path) {
        this.path = path;
    }

    public List<AreaModel> getAreaModels() {
        return areaModels;
    }

    public void parser() {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser parser = factory.newSAXParser();
            parser.parse(path, handler);
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public void test(){
        String str = "";
        for (AreaModel a : areaModels) {
            str += a.getProvince() + "\n";
            for (AreaNode n : a.getCitys()) {
                str += "\t" + n + "\n";
                for (AreaNode j : n.getChild()) {
                    str += "\t\t" + j + "\n";
                }
            }
        }
        out(str);
    }

    private long t = 0;
    private DefaultHandler handler = new DefaultHandler() {

        private AreaModel province;
        private List<AreaNode> citys;
        private List<AreaNode> areas;
        private AreaNode city;


        @Override
        public void startDocument() throws SAXException {
            areaModels = new ArrayList<>();
            t = System.currentTimeMillis();

//            out("start....");
        }

        @Override
        public void endDocument() throws SAXException {
            out("Use SAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
        }


        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            switch (qName) {
                case "province":
                    province = new AreaModel();
                    province.setProvince(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));

                    citys = new ArrayList<>();
                    break;
                case "city":
                    city = new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode")));
                    areas = new ArrayList<>();
                    break;
                case "area":
                    areas.add(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));
                    break;
            }
        }


        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            switch (qName) {
                case "province":
                    province.setCitys(citys);
                    areaModels.add(province);
                    break;
                case "city":
                    city.setChild(areas);
                    citys.add(city);
                    break;
                case "area":
                    break;
            }
        }
    };




}

三、XMLStreamReader(StAX)

import com.sun.org.apache.bcel.internal.generic.BREAKPOINT;
import model.AreaModel;
import model.AreaNode;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;

/**
 * 拉解析器解析
 * Created by alan on 2018/12/16.
 */
public class XmlParserByStAX extends OutPut {

    private String path;

    private List<AreaModel> areaModels = new ArrayList<>();

    public XmlParserByStAX() {
    }

    public XmlParserByStAX(String path) {
        this.path = path;
    }

    public List<AreaModel> getAreaModels() {
        return areaModels;
    }

    public void parser() {
        try {
            InputStream in = new FileInputStream(path);
            XMLInputFactory factory = XMLInputFactory.newFactory();
            XMLStreamReader reader = factory.createXMLStreamReader(in);

            AreaModel province = null;
            List<AreaNode> citys = null;
            List<AreaNode> areas = null;
            AreaNode city = null;
            long t = System.currentTimeMillis();
            areaModels = new ArrayList<>();

            while (reader.hasNext()) {
                int event = reader.next();
                if (event == XMLStreamConstants.START_ELEMENT) {
                    switch (reader.getName().toString()) {
                        case "province":
                            province = new AreaModel();
                            province.setProvince(new AreaNode(reader.getAttributeValue(null,"name"),
                                    Integer.valueOf(reader.getAttributeValue(null,"postcode"))));
                            citys = new ArrayList<>();
                            break;
                        case "city":
                            city = new AreaNode(reader.getAttributeValue(null,"name"),
                                    Integer.valueOf(reader.getAttributeValue(null,"postcode")));
                            areas = new ArrayList<>();
                            break;
                        case "area":
                            areas.add(new AreaNode(reader.getAttributeValue(null,"name"),
                                    Integer.valueOf(reader.getAttributeValue(null,"postcode"))));
                            break;
                    }
                } else if (event == XMLStreamConstants.END_ELEMENT) {
                    switch (reader.getName().toString()) {
                        case "province":
                            province.setCitys(citys);
                            areaModels.add(province);
                            break;
                        case "city":
                            city.setChild(areas);
                            citys.add(city);
                            break;
                        case "area":
                            break;
                    }

                } else if (event == XMLStreamConstants.END_DOCUMENT) {
                    out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
                }
            }

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (XMLStreamException e) {
            e.printStackTrace();
        }

    }

    public void test() {
        String str = "";
        for (AreaModel a : areaModels) {
            str += a.getProvince() + "\n";
            for (AreaNode n : a.getCitys()) {
                str += "\t" + n + "\n";
                for (AreaNode j : n.getChild()) {
                    str += "\t\t" + j + "\n";
                }
            }
        }
        out(str);
    }


}

四、AreaModel模型类源码

package model;

import java.util.List;

/**
 * Created by alan on 2018/12/15.
 */
public class AreaModel {

    private AreaNode province;

    private List<AreaNode> citys;

    public AreaModel(){}

    public AreaModel(AreaNode province, List<AreaNode> citys) {
        this.province = province;
        this.citys = citys;
    }

    public AreaNode getProvince() {
        return province;
    }

    public void setProvince(AreaNode province) {
        this.province = province;
    }

    public List<AreaNode> getCitys() {
        return citys;
    }

    public void setCitys(List<AreaNode> citys) {
        this.citys = citys;
    }
}

五、AreaNode模型类源码

package model;

import java.util.List;

/**
 * Created by alan on 2018/12/15.
 */
public class AreaNode {

    private String name;

    private Integer postCode;

    private List<AreaNode> child;

    public AreaNode() {
    }

    public AreaNode(String name, Integer postCode) {
        this.name = name;
        this.postCode = postCode;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Integer getPostCode() {
        return postCode;
    }

    public void setPostCode(Integer postCode) {
        this.postCode = postCode;
    }

    public List<AreaNode> getChild() {
        return child;
    }

    public void setChild(List<AreaNode> child) {
        this.child = child;
    }

    @Override
    public String toString() {
        String r = "{name:\"%s\",postCode:\"%s\"}";
        String str = String.format(r, this.getName(), this.getPostCode());
        return str;
    }
}

 

所有的代码都贴出来了,现在需要一个main()方法测试:

 private static String path = "d:/test/area.xml";

    public static void main(String[] args) {

        EventQueue.invokeLater(() -> {
            out("...");
            XmlParserByDocument document = new XmlParserByDocument(path);
            document.parser();

            //the 2.
            XmlParserBySAX sax = new XmlParserBySAX(path);
            sax.parser();

            //the 3.
            XmlParserByStAX stAX = new XmlParserByStAX(path);
            stAX.parser();

            out(document.getAreaModels().size());
            out(sax.getAreaModels().size());
            out(stAX.getAreaModels().size());

//            document.test();
//            stAX.test();
//            sax.test();
        });


    }

对了,把area.xml文件也分享出来:本地下载

 

 

Leave a Comment