Tag Archives: XFire

Evolution of XML parsing technologies

Introduction

There were 2 main XML parsing technologies few years ago. They were SAX and DOM.

  1. SAX is event-driven and the events are fired and forget along the xml parsing. Advantages: It doesn’t need to cache the whole xml document in memory and you don’t need to wait til the whole xml been parsed before the first event emitted. Disadvantages: It uses Push API that holds the control during parsing. So clients cannot control the parsing and it doesn’t fit for xml manipulation.
  2. DOM is used to convert the xml into object tree in memory before manipulation. Advantages: Easier to manipulate the xml. Disadvantages: Eat up a lot of memory that is not good for documents larger than few MBs in size or in memory constrained environment such as J2ME.

Pull API is a more comfortable alternative for streaming processing of XML. A pull API is based around the more familiar iterator design pattern rather than observer design pattern. In a pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available. In a pull API the client program drives the parser. In a push API the parser drives the client. That leads to the invention of StAX.

In this article, I will introduce an new object model from Axis2 named AXIOM that uses StAX underneath for xml parsing. With this, xml parsing will cost less memory with better control.

Evolution of Axis

One of the first generation SOAP engines, Apache SOAP, uses a DOM-based object model internally to represent the XML document, where the XML handling techniques force the entire XML object model to be built at once. The second generation Apache Axis shifted to SAX to avoid keeping the complete information in the memory. SAX, however, has a major constraint – it is built around a "push" technique, and once the parsing of the XML document starts it cannot be stopped. To jump over this hurdle, Apache Axis has to record SAX events. So, effectively, the XML message has to be kept in the memory in the form of SAX events, thus making Apache Axis yet another memory intensive programming model.

Axis2 avoids keeping the complete SOAP message in the memory by introducing a new Object Model for representing the SOAP message AXIOM. AXIOM takes a dramatically new approach. Although AXIOM has an "external" resemblance to DOM, the difference lies in that it generates objects only when required. This "on-demand building" feature gives AXIOM the edge needed to overcome the memory barrier that early SOAP engines failed to pass.

An interesting feature of AXIOM is that it is based on Pull parsing. It is capable of generating pull events from the Object Model that is built. Further, if the Object Model happens to be half built, AXIOM is capable of shifting to the underlying pull parser to generate pull events directly from the stream. The heart of AXIOM is the XML Pull parser since it is the only parsing model that supports the pausing of the parsing process. AXIOM uses the Streaming API for XML (StAX), making it easy to manipulate and utilizing only a fraction of the memory used by a conventional object model. Combined with the speed of the streaming pull parser, AXIOM pushes Axis2 leaps ahead of its predecessors in terms of efficiency and speed.

Apart from new parser, Axis2 also has other new add-ons. They are:

  1. Pluggable Data Binding – you can pick and choose JAXB, Castor and XMLBean for xml – java conversion.
  2. Improved Support for Message-style interaction (RPC vs Message-based)
  3. Improved handlers

The goal of this article is to focus on parsing technology, so I will not discuss in detail the new features on Axis2. If you want to find out more, read this.

 

Reference

An Introduction to StAX

Fast and lightweight object model for XML

 

 

Leave a comment Continue Reading →

Axis2 vs XFire

Recently, I have assigned a web service project to one of my developers. To provide a guideline to choose the stack, I need to look into the solution provided by Axis and XFire again. Previously, it is no brainer in favor of XFire because Axis 1 is using DOM based parser whereas XFire is using StAX. Now, Axis 2 has come out that uses StAX parser as well. After a series of benchmark comparison and arguments from both parties. I feel that picking either framework isn’t likely to matter much. And if it does matter, make sure you chose the proper databinding toolkit as that will have the biggest affect on your performance.

For XFire, it uses JAXB for databinding by default that you are replaced with JiBX. Why JiBX is faster? The reason is JAXB uses reflection to populate the bean whereas JiBX uses byte code generation. According to Dan, JiBX has significant performance improvements. [code]]czoyMjA6XCIgSmlCWCBjYW4gcHJvdmlkZSBzaWduaWZpY2FudCBwZXJmb3JtYW5jZSBpbXByb3ZlbWVudHMgb3ZlciBKQVhCIGFzIGl7WyYqJl19dCBkb2VzIG5vdCB1c2UgcmVmbGVjdGlvbiwgYnV0IGluc3RlYWQgZG9lcyBieXRlIGNvZGUgZ2VuZXJhdGlvbiB0byBvcHRpbWl6ZXtbJiomXX0gZGF0YWJpbmRpbmcgLSA8YSBocmVmPVwiaHR0cDovL25ldHpvb2lkLmNvbS9ibG9nL1wiIHRhcmdldD1cIl9ibGFua1wiPkRhbjwvYT4gXCJ7WyYqJl19O3tbJiomXX0=[[/code]

For Axis2, it creates its own databinding solution named ADB and it is comparable with JiBX. So, Axis2 really has caught up its competitors in term of performance. Whether I will switch to use Axis2. Probably not. Because I don’t see it worth the hassle to learn another databinding solution and WS framework that gives you practically the same thing that XFire already provided. By switching to a StAX based framework you’re likely to see something like a 3-5x speedup (or something like 30x if you switch from rpc/encoded to doc/literal as well). This can make a significant difference in your application’s responsiveness and load handling. As long as you are using StAX, you are good. Oh, before I forget. For anyone who is using XFire or attempt to use XFire, please look at Apache CXF. According to XFire, CXF is XFire 2.0. I am excited to see those new features from CXF:

  1. Spring 2.0 XML support
  2. RESTful service support
  3. JSON support
  4. And some of WS specs support as well

Here is what I found from XFire site: We encourage all users who are currently evaluating XFire to use CXF instead at this point. If you are already an XFire user, you may wish to consider migrating to CXF depending on where you are in your own release cycle. We will continue to support XFire in the future with bug fix releases, but feature development will be focused on CXF. * UPDATE * 12/6/2007: Apache CXF 2.0 currently doesn’t support JiBX. It is expecting to support it for CXF 2.1. Reference http://www.infoq.com/news/2007/02/axis2-xfire-benchmark

Leave a comment Continue Reading →