Java XML APIs are lacking!


I can’t believe I’m having so much trouble working with XML in Java. Has anyone ever really tried to practice TDD on XSLT before? If so, could you please explain it to me? You know when you can clearly tell you’re walking untread terrain. You start asking questions that nobody has ever had a clear answer. Questions like, “where the hell is a valid XSLT Schema?” …and “how do I get the results from a single template in my stylesheet?” Now I’m finding what appears to be serious flaws in the JDK. Can somebody explain to me why the SAXException class completely ignores message text passed in the constructor when you include a throwable cause as the second parameter? Am I seeing things or is that intended behaviour? The developers clearly thought their exception message was way more important than any contextual information here. Also, XMLUnit is missing some vital information in there error messages. That’s what lead to my uncovering the ugly SAXException thing in the first place. When I create a Diff object from to input XML docs and one of them fails I’d like to see the text of the document that caused the failure. I realize that this may not always be ideal so maybe an optional parameter controlling the behaviour is necessary? At any rate it caused me to spill into my XML-unit-test-wrapper-special-doo-hickey-uber-wave-nuvo-insert-superfulous-adjective-here-thingy some ugly logic to handle all the stoopidity I was facing. I hate spilling ugly logic in my code, especially in new code I’m trying to keep clean.

Well, what were you doing?

What I was trying to do was run an XSL transform and assert the result. Performing the transform is complicated even with the help of the XMLUnit API because there are many features in the transform that I want to test in isolation. After my transform the results are harvested in a String which I then pass to XMLUnit to do a diff. In my particular case the output was what I expected but I would not know that for another thirty minutes as I danced along the XMLUnit stacktrace trying to find out why my test failed. Apparently the fragment I was performing a diff on was not well formed, causing a SAXParseException to be thrown. Also the fragment I was diff’ing against was not well formed. It took thirty minutes for the message “not well formed” to sink in my skull bone and materialize as, “Hey stoopid! You’re s’posed to have a single root element in an XMLDocument!” Maybe if the XMLUnit team made that the error message it would save plenty of developers a lot of headaches. At any rate I dove into my XMLUnitExtension framework (because I’m starting to build extensions for all of the APIs I use) and smattered some logic to better handle diffs on XML fragments.

Remember this?


    public void assertXMLEqualWithDetailedDiff(String message, String expectedXML, String actual)
            throws SAXException, IOException, ParserConfigurationException
    {
        Diff diff = new Diff(expectedXML, actual);
        if(! diff.identical())
        {
            String msg = (EMPTY_STRING.equals(message)) ? EMPTY_STRING :message + "\n";
            assertEquals(msg + new DetailedDiff(diff).toString(), 
                    expectedXML, actual);
        }
    }


(It’s a cool piece of code that lets me click the Idea diff hyperlink when XMLUnit coughs up an issue comparing two XML inputs. One of my prouder moments.)

It now looks like this:


    public void assertXMLEqualWithDetailedDiff(String message, String expectedXML, String actual)
            throws SAXException, IOException, ParserConfigurationException
    {
        DOMSource expectedDOM = toDOMSource(expectedXML), actualDOM = toDOMSource(actual);
        Diff diff = new Diff(expectedDOM, actualDOM);
        if(! diff.identical())
        {
            String msg = (EMPTY_STRING.equals(message)) ? EMPTY_STRING :message + "\n";
            assertEquals(msg + new DetailedDiff(diff).toString(),
                    expectedXML, actual);
        }
    }


(Still not too bad looking on the surface, right?)

I initially wanted to build out the framework to support not well formed XML but decided it was not the right time for that. Instead I settled for dumping the contents of the XML in error into the message. That way, at least I could see which XML was in error, the one supplied by the unit test or the one generated from the transform. What does that mean? It means I have to parse both XML inputs to the diff before I diff them. By the way, the diff does a parse on its own so the idea is yuk! I looked briefly for a assertXMLWellFormed equivalent and found many XML validation methods. The problem here is the thin line between well formed and valid. I don’t want to determine if it’s valid, (I can’t in that particular area anyway) I just want to make sure its well formed. So in through the front door walks the ugly XML parsing logic. “Hey, wassup?”, he says. “I hope you don’t mind but I brought friends.” Before I could slam the door shut hideous Exception handling comes barging through behind him. Mr. Exception handling brings bags for an overnight stay.

The above cute snippet is augmented by this:


    private DOMSource toDOMSource(String xml) throws ParserConfigurationException, IOException, SAXException
    {
        Document doc;
        try
        {
            doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                            .parse(new InputSource(new StringReader(xml)));
        }
        catch(final SAXParseException e)
        {
            Locator locator = new Locator(){
                public int getColumnNumber()
                {
                    return e.getColumnNumber();
                }

                public int getLineNumber()
                {
                    return e.getLineNumber();
                }

                public String getPublicId()
                {
                    return e.getPublicId();
                }

                public String getSystemId()
                {
                    return e.getSystemId();
                }
            };
            SAXParseException saxParseException = new SAXParseException("Could not parse: " + xml, locator);
            saxParseException.setStackTrace(e.getStackTrace());
            throw saxParseException;
        }

        return new DOMSource(doc);
    }


So now you see what I’m dealing with right? There’s gotta be a better way to code that out. Right now I’m getting so far behind on the project I can’t spend time being picture perfect. If you got any ideas break me off some…

3 thoughts on “Java XML APIs are lacking!

  1. May I say something that will definitely not be of ANY help?

    The reason there are no good XML APIs is because people are doing other stuff. Healthier stuff. Simple stuff like serialized HashMaps. XML is only used for very simple things and, sadly, there is much more XML being written by developers (e.g. Spring configuration) than XML being consumed.

    The only guys doing hardcore XML all have high-level tools like TIBCO Businessworks and pay a fortune for them.

    Maybe you should get one!🙂

  2. Tiago,

    I feel ya’ bro. In one way I feel like we suffer from N.I.H. syndrome sometimes. It’s true that there are many high level tools available for XML and they do cost a fortune. (I had been begging for a license for XMLSpy until just recently and that’s pretty pricey.) Then I think, hey there’s nothing more healthy than a daily dose of vitamin X. A pointy bracket a day keeps the doctor away! (That saying cannot be applied, unfortunately, to the deadlines.) I have this love hate relationship with XML. I can’t figure out if I love to hate it or if it hates me for the love I have for it, or if I hate my love in spite of the hate XML loves to give while I’m hating (or is that loving?) the thing that loves… where was I? Oh yeah! XML is banging!🙂 It’s banging the living crap outta me!

    There’s a reason I’m in this pointy-bracket paradigm. It’s definitely not by choice. We started with a simple idea for reporting and realized that we needed to support multiple outputs, like PDF, HTML, raw print, etc. Of course FOP stood out as a shining answer. That’s what started the bulk of my XML drama. All we needed to do was run SQL against our DB combining user options and preferences to customize the queries and then create some XML that contained the results. We could then style the XML and format with… and I’ll let you guess the technology choices here… no I won’t… ok, yes I will… nah forget it I’ll just tell you… XSLT and XSL-FO! The big idea being we could develop transforms on the same XML to produce outputs that FOP didn’t support. That idea went pretty far where now I’m supposed to be building on the same concept to support the variable output formats of system orders. The big twist here is the user is supposed to have complete control over the format and layouts and such. By now I know enough XSLT to try something really stoopid and hairbrained like write a report designer with it. I mean, to me it only makes sense. I’m not going to write an XSLT editor that would be user friendly enough for someone like my mother to use so the next best thing is to develop a simple grammar that attempts to capture the user’s layout preferences and then generate the same kind of XSLT/XSLFO mix I had been writing by hand.

    Back to my original long winded point. I learned that you can do much more than simple things with XML. (Like build an armada of angle-bracketed war ships to assualt the Pentagon!) It’s kinda like silly putty when you get the hang of it. Used incorrectly (as it is most of the time) it quickly makes a mess. You ever sit a small child down with some silly putty and no clear objective? A quick and total mess is assured! However, people have scuplted fine works of art with the same material. I’m somewhere in the middle, but more closely related to the small child. Seriously though, the TrAX API is off the hook. You should check it out sometime. With a custom SAX event generator you can present virtually anything as XML and pipe it through a transform. (You can push it through a chain of transforms at that.) I’m taking the idea a step further (and I wanna blog about it next) where I developed a method broadcaster that I can use to feed multiple content handlers simultaneously. The net effect would be well, I’ll keep that quite until I can work it into a blog entry.

  3. Pingback: Anonymous

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s