Introduction
There were 2 main XML parsing technologies few years ago. They were SAX and DOM.
- SAX is event-driven and the events are fired and forget along the xml parsing. Advantages: It doesn’t need to cache the whole xml document in memory and you don’t need to wait til the whole xml been parsed before the first event emitted. Disadvantages: It uses Push API that holds the control during parsing. So clients cannot control the parsing and it doesn’t fit for xml manipulation.
- DOM is used to convert the xml into object tree in memory before manipulation. Advantages: Easier to manipulate the xml. Disadvantages: Eat up a lot of memory that is not good for documents larger than few MBs in size or in memory constrained environment such as J2ME.
Pull API is a more comfortable alternative for streaming processing of XML. A pull API is based around the more familiar iterator design pattern rather than observer design pattern. In a pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available. In a pull API the client program drives the parser. In a push API the parser drives the client. That leads to the invention of StAX.
In this article, I will introduce an new object model from Axis2 named AXIOM that uses StAX underneath for xml parsing. With this, xml parsing will cost less memory with better control.
Evolution of Axis
One of the first generation SOAP engines, Apache SOAP, uses a DOM-based object model internally to represent the XML document, where the XML handling techniques force the entire XML object model to be built at once. The second generation Apache Axis shifted to SAX to avoid keeping the complete information in the memory. SAX, however, has a major constraint - it is built around a "push" technique, and once the parsing of the XML document starts it cannot be stopped. To jump over this hurdle, Apache Axis has to record SAX events. So, effectively, the XML message has to be kept in the memory in the form of SAX events, thus making Apache Axis yet another memory intensive programming model.
Axis2 avoids keeping the complete SOAP message in the memory by introducing a new Object Model for representing the SOAP message AXIOM. AXIOM takes a dramatically new approach. Although AXIOM has an "external" resemblance to DOM, the difference lies in that it generates objects only when required. This "on-demand building" feature gives AXIOM the edge needed to overcome the memory barrier that early SOAP engines failed to pass.
An interesting feature of AXIOM is that it is based on Pull parsing. It is capable of generating pull events from the Object Model that is built. Further, if the Object Model happens to be half built, AXIOM is capable of shifting to the underlying pull parser to generate pull events directly from the stream. The heart of AXIOM is the XML Pull parser since it is the only parsing model that supports the pausing of the parsing process. AXIOM uses the Streaming API for XML (StAX), making it easy to manipulate and utilizing only a fraction of the memory used by a conventional object model. Combined with the speed of the streaming pull parser, AXIOM pushes Axis2 leaps ahead of its predecessors in terms of efficiency and speed.
Apart from new parser, Axis2 also has other new add-ons. They are:
- Pluggable Data Binding - you can pick and choose JAXB, Castor and XMLBean for xml - java conversion.
- Improved Support for Message-style interaction (RPC vs Message-based)
- Improved handlers
The goal of this article is to focus on parsing technology, so I will not discuss in detail the new features on Axis2. If you want to find out more, read this.
Reference
Fast and lightweight object model for XML






































(4.75 out of 5)
No Comment Received
Sorry the comment area are closed for non registered users