Saturday, April 25, 2009

NullPointered

As I mentioned in one of the previous posts, I have been outsourced to some project, which uses a home-made, rather naive web framework. This framework has something in common with Struts: it has an abstract Action class, that must be implemented by any action in the system. The main method from this class, that must be implemented by concrete actions, has signature of this shape:


abstract protected Result perform(...) throws ProcessDataException, NullPointerException;

So, how would you implement the "Hello World!" example in this framework? I guess, more or less this way:


public Result perform(...) throws ProcessDataException, NullPointerException {
throw new NullPointerException("Hello World!");
}

Well, this looks like a good candidate for The Daily WTF :)

Wednesday, April 15, 2009

Ext Core - a new weapon in your arsenal

Recently, the Ext-JS team announced introduction of the new product, Ext Core. At this moment it is in the beta stage, but I guess we should see the final version soon. The good news is that Ext Core will have free MIT license for any use, as opposed to Ext-JS, which is not free for commercial use.


Why is this announcement so important? Well, the Ext-JS is excellent JavaScript library, so if you want to build the really big pure-javascript client, Ext-JS is the best choice. However, such clients are still not that popular, and the server-side frameworks remain mainstream. So what most developers do is enhancing the server-side-generated web pages with javascript addons. The low-level DOM manipulation libraries, like Prototype or JQuery are perfect for this purpose. They are lightweight and free, and that's why they got so popular. Obviously, you could use Ext-JS for the same purpose, because it contains the low-level tools capable of doing the same thing. But the whole power of Ext-JS, the component-based UI, wouldn't be used at all. So would you pay for Ext-JS with features you won't use, if the JQuery can do it for free? Obviously, not. This I believe was an important barrier for Ext-JS adoption.


Now, from what I understand, beginning from from version 3, Ext will be split into two distinct layers (and products). Ext Core will contain the low-level code, for dealing with DOM elements and with API concentrated around Ext.Element class, while Ext-JS will contain full Ext distribution: bundled Ext Core plus all the UI widgets, tools and managers, with API concentrated around Ext.Component.


This means that Ext Core becomes a viable alternative for jQuery: it's also lightweight and free, and has fantastic community support on Ext-JS forum (I must say I've never met the second forum where the questions receive so fast responses, in spite of traffic being really high). I haven't done the comparison of Ext Core vs JQuery in terms of API: which library is more powerful and/or simpler. Many concepts are similar, even if they are named or implemented a bit different. In fact, I haven't used much of the low-level features of Ext-JS before, because when you use Ext components, it is usually not needed to bother with DOM-level details. The Ext Core is at beta stage, so probably API can evolve a bit yet. (For example, it's strange for me that there is no simple method in Ext.Element to get attribute value of the element; there is only namespaced-version getAttributeNS(namespace, name), but why there is no getAttribute(name)? How often do you come across namespaced attributes in HTML pages?)


One big advantage of Ext-JS is documentation. I've always found the Ext-JS API documentation more clear and complete than that of jQuery. Ext Core, I hope, will not be worse in this area. At this moment there is an API documentation available, but it also looks like "beta" version, because there is no details section (like in standard Ext-JS docs), and method links point directly to the source, instead of to the details section. That's good idea to link to the source, but I shouldn't be forced to go to the source every time I want to check the full method description. I hope this will be fixed before final release. There is also the Ext Core manual available, which looks very promising, even if it is also not finished yet (there are typos and errors, and some sections are only sketched). One thing is very interesting: authors of the manual teach you the Ext Core with aid of FireBug: there are plenty of examples based on FireBug. That's really cool. On the other hand, the section describing the effects is poor: you see red rectangle and the code which is supposed to do some effects on it (fade, switch off, ...), but you can't execute the code and see it in action. Quite annoying. But I hope it will be also fixed before Ext Core goes final.


Ext Core being free, lightweight and high-quality, should attract many developers. And obviously, in the longer term those people who started to use Ext Core and got to know it well, will be more eager to do one step further and switch to the "full version" of Ext-JS. Things are getting more and more interesting in JavaScript frameworks circle. With the new player, I guess we'll watch some significant changes in JavaScript libraries usage statistics soon.

Thursday, April 9, 2009

Fortunately, XSLT is dead (in web frameworks)

Several years ago there was a trend in Java web frameworks to use XML processing as a foundation for the framework logic: take the data from database and present it as XML document (either convert it from relational data, or use XML capable database), and transform it to HTML using XSLT. There were some books written about it, and some frameworks created (like Cocoon). After some time the hype started to decline, and currently virtually no modern web framework apply this approach, I think.


Actually I've never used it on myself, but I liked the idea at that time. It seemed to be clear and powerful. Funny that only now I have a chance to check how it works in practice. And at least now I know why this approach failed.


For two weeks, I'm outsourced to help some company in their web project. They use their own, home-made web framework for this, to talk to their own, home-made document-repository system. Unfortunately, both seemed to be buggy and unstable, though they say that's not the first project they use them in (I can't believe it's true, really). I guess they would be better off using standard Java Content Repository (JSR-170) for implementing documents repository, and some modern web framework instead of their own home-grown one. If they insisted on XML/XSLT transformations, they could use Cocoon. At least there would be more documentation available, and it would be well-tested and stable. But ok, that's not the first company that suffers from NIH syndrome, or guys are simply too lazy or overloaded to look around for other stuff. The interesting point is: how the XML-based processing works in practice? The short answer is: very poor. And the weakest link in the whole chain is XSLT.


XSLT bloat


I won't dare to say that XSLT is worthless - perhaps in some contexts it can be useful, especially for transforming one document tree into another (valid) document tree, i.e. from DOM to DOM. XSLT gives you guarantee that input and output documents will be valid XML - this is crucial e.g. in SOA applications, transforming one document into other documents, to be processed by machines. (X)HTML is a document tree too, at least formally, but from the point of view of web browser to have perfectly valid XHTML is good, but not crucial, and from the point of view of web designer or developer the DOM behind it doesn't matter at all, and making the template valid XML is of no importance. For dynamic generation of HTML pages in most cases it is much easier if you treat the HTML code as a unformatted text, and make a web page template by embedding some special processing directives in such text. This approach was applied by JSP (first with scriptlets, and than with JSTL), Velocity, FreeMarker, and other technologies. Neither of those technologies use the strict XML as template. On the opposite side we have JSPX (JSP using strict XML) - it never caught on and I guess many Java developers have never met it; and XSLT.


I've used a lot of JSPs with JSTL. It wasn't perfect, but it worked. Now I have to do the same with XSLT and it's a nightmare. Things that took me half an hour to do with JSP take several hours in XSLT. This is a list of things I hate the most in XSLT:



  1. Conditional arguments. For example: how to hide the row in table (using different CSS style), based on some CONDITION, with XSLT? See:

    <tr>
    <xsl:attribute name="style">
    <xsl:choose>
    <xsl:when test="CONDITION">
    <xsl:value-of select="'visibility: visible'">
    </xsl:when>
    <xsl:otherwise>
    <xsl:value-of select="'visibility: collapse'">
    </xsl:otherwise>
    </xsl:choose>
    </xsl:attribute>
    ...
    </tr>

    and now the same with JSP 1.x:

    <tr style='visibility:<%=CONDITION ? "collapse" : "visible"%>'>
    ...
    </tr>

    or with JSP 2.x:

    <tr style='visibility:${CONDITION ? "collapse" : "visible"}'>
    ...
    </tr>


  2. Nested loops. In JSTL the <for-each> tag has the var attribute, which is variable that gets assigned the current element from the collection during looping. In nested loops, you choose different var names, and you have easy access to variable at any level. In similar <for-each> in XSLT there is not var attribute. You must use additional variables as child nodes, or some other workarounds. Its very easy to get lost.

  3. Every XML-like fragment which is not an actual XML must be escaped. Say you have inlined javascript function which appends row to the table:
    onclick="append('<tr><td></td><td></td></tr>')"

    This will work in JSP quite good, but will blow up in XSLT with "could not compile stylesheet" message. You must escape each < character:
    onclick="append('&lt;tr>&lt;td>&lt;/td>&lt;td>&lt;/td>&lt;/tr>')"

    Nobody could understand what is going on here at first look now.

  4. The functional approach applied in XSLT design, instead of the well known (for all programmers) procedural one, makes "thinking in XSLT" very hard. "Normal" approach (JSP, Velocity, etc) takes the HTML template starting from familiar <html><head>...<body>... and looks for special markers, where it puts data from "model". This data can be Java object or XPath-extracted data from other XML. XSLT does it in completely reverse way: it starts with <template match="..."><apply-templates>... so it takes the XML data document first, and tries to manipulate its content to obtain other document. As I said, in SOA processing this is fine. But in HTML generation it looks completely alien. I must say I always had problems with mental visualization of this process.

  5. No formal XML schema for XSLT 1.0 exists. At least I couldn't find it - there is only unofficial DTD somewhere. This ruins IDE code completion abilities. And the XSLT is so complicated, that you can't simply learn it in one day (or even week), so some inline-help would be of real help.


Now take all those points together and multiply by all the places on the web page where dynamic content generation happens. Lengthy, complicated, with all those escapings, plus complicated XPath expressions, plus this "other-way-round" functional approach. That is developer horror. And worse, it is maintainer hell. In the current project, after two weeks I'm not able to understand the sections I have written a few days ago. I cannot imagine looking at it after several months. What's worse, the template code is so different than resultant HTML code, that navigation in the template and finding the actual place I need to edit takes always too much time.


After two weeks I'm fed up with XSLT. It's absolutely unproductive in web frameworks. Now I know why none of XML-based frameworks got big popularity ever. And I know I was completely wrong 3 years ago when I could bet that in the short time JSPX would replace JSP. Fortunately, it didn't.


Any alternatives?


Now that we know XSLT is evil, which view technology should we use in our projects? Stick with JSP? For JSF-based project I wouldn't use JSP for sure, because it simply doesn't play well with JSF, and instead I would go for something like facelets. But actually I wouldn't go for JSF either any longer (that's another story about the thing I used to believe in and predict long life to it, and that I was finally disappointed completely with). So for non-JSF projects there are JSP/JSTL, Velocity, FreeMarker, GSP for Grails projects, SiteMesh for page layouts, perhaps other technologies that I'm not aware of. JSP/JSTL is the most widely used, best known, and has best tool support, even if it is probably the worst one from the group. Take those crazy SQL-based tags in JSTL, or funny standard tags from JSP to deal with request parameters. Whey didn't they just took those tags from JSTL that are used always (if, for-each, format, ...) and make them part of standard JSP? Why I always have to include it as separate library on Tomcat? Besides, I said earlier that the actual input template doesn't have to be valid XML, but I must say I don't like constructs like <input value="<c:out ...>">. Tag inside other tag's attribute - this looks horrible. That is why now I think that template directives should not be constructed with XML tags: so all those custom tags, JSTL tags etc is the wrong direction. Such code is simply too hard to read, because it resembles too much HTML (ok, templates not always are used to generate HTML, but I guess in about 95% of cases they are). The better approach is to use some special characters, like # directives in Velocity, or [# in FreeMarker, or EL syntax ${...} in JSP/JSTL (but EL is too much limited and it is actually not directive; besides assumption that only getters can be called from EL was serious mistake: e.g. you cannot check collection size, because there is no getSize() method, only size()). Compare the if-then-else block created with JSP/JSTL:


<c:choose>
<c:when test="${expr1}">
...
</c:when>
<c:when test="${expr2}">
...
</c:when>
<c:otherwise>
...
</c:otherwise>
</c:choose>

with the same written with Velocity:

#if (expr1)
...
#elseif (expr1)
...
#else
...
#end

Which one is easier to read?


FreeMarker can be also good replacement for existing projects that already use XSLT. From what I see, you can bind the XML data document to some variable, and then access it with XPath queries from FreeMarker template to extract data. Velocity offers similar thing, it's called DVSL, but I doesn't look good to me, because it applies the same functional, other-way-round-alien-looking "apply-templates" approach as XSLT.


Velocity or FreeMarker integrates also well with Spring. Form my point of view, the only serious drawback of those technologies when compared to JSP, is IDE support. In the company where I do my painful job with XSLT, "the only IDE" is NetBeans (I think it is not obligatory, but simply all guys use it and all projects are not build with external Ant script or Maven, but simply by NetBeans, so it is hard to use other IDE anyway). I tried to find some plugin with Velocity or FreeMarker support for Netbeans (at least for syntax highlighting), but looks like there is no one. That's really strange for me - those technologies are on the market for many years, and are quite popular I think. So why there is no support in second most popular Java IDE for them? For Eclipse the situation is better, from what I see.


So if you start new project, think twice (or ten times) before jumping into XSLT. And if you use Eclipse, you can even think twice before using JSP/JSTL. Velocity or FreeMarker might be a better option.


Footnote: this article was selected for the "Developer's Perspective" column at the Javalobby News newsletter from Apr 21, 2009; and also reposted on DZone, where it has started very interesting discussion. If you are interested, look at comments there: you will also find some good points in favor of XSLT, to complete the picture.

Tuesday, April 7, 2009

Relocating resources with RESTful web services

Recently I spent long time on redesigning internals of a RESTful web service. In that service, I try to follow the rules presented in "REST canonical reference" book RESTful Web Services . I decided to use "meaningful" URIs based on resource names. Resource name is editable, so when it is changed, the URI must change too (another solution is to use "non-meaniningful" URIs based on immutable, usually auto-generated IDs of the resources). Say I have the resource named "myProject" addressed by URI http://myservice.org/v1/Projects/myProject. To change its name to "ourProject" one needs to send PUT request to this URI with the new name in request body. Say we use form-encoded representations in request, then we would send PUT request to http://myservice.org/v1/Projects/myProject with request body name=ourProject. Server processes the result, changes the name in the database, and must send the response back to the client. What should be the response? Here the interesting part starts.


First implementation


According to the book, if the PUT request causes the URI to change, the response should be 301 Moved Permanently with the header Location: http://myservice.org/v1/Projects/ourProject in order to let the client know the new location. I've implemented it this way, using Spring 3.0 REST support (with a bunch of my custom classes and annotations):



@RequestMapping(method=PUT, value="/projects/{projectName}")
public HttpResponse createOrUpdate(@PathVariable String projectName, PropertyValues newValues, @RequestUrl String url) {

SaveResult result = projectService.createOrUpdate(projectName, newValues);
if (result.isCreated()) {
return new HttpResponse(SC_CREATED).setHeader("Location", url);
} else if (result.isRenamed()) {
String newUrl = url.substring(0, url.lastIndexOf("/") + 1) + result.getObject().getName();
return new HttpResponse(SC_MOVED_PERMANENTLY).setHeader("Location", newUrl);
} else {
return new HttpResponse(SC_OK);
}
}

So in case when resource was renamed, I construct a new URL and return 301 code with new location. This should work, but it didn't. It looked like this response code triggered immediate GET request to the new location. Initially I thought that it is done internally on server either by Spring or Tomcat, but it turned out to be triggered by client. The same behavior happened for different clients: Firefox, IE, and pure Java client used for testing web services, the Wiztools.org RESTClient. I don't know what HTTP Java library is used internally by the latter tool, but I guess the behavior is implemented in this library, not in the client itself.


I started to search some information about this problem, but Google was of very little help for queries like "PUT with 301 Moved Permanently". Lastly I looked at RFC2616, the HTTP 1.1 specification. This is an excerpt with description of code 301:


If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

So it looks like all browsers and other clients are broken in this area: they should not redirect automatically to the new location. So I had to find some workaround for this.


A bit of theory behind it


Suppose that after issuing PUT request which returned 301 code, the browser actually did ask me for permission for redirection. And suppose I gave the permission. What should the browser (or other client) do? Obviously, send the request to the new URL. But what request? GET? or PUT (as initial request was PUT)? What does 301 response actually mean in this case? My first thinking was: I send PUT request to http://myservice.org/v1/Projects/myProject, which changes the data of that resource. If URL is not changed, response is 200. If URL is changed (say to http://myservice.org/v1/Projects/ourProject) as the result of new data submission, server updates the resource, and let me know the new URL with 301 response containing Location header. (At least, this is the logic roughly presented in the aforementioned book, I think). So, is there any point in doing any redirection in such case? The 301 code is only information here, not obligation to do any redirect. I sent PUT request to update the resources, not GET to retrieve it. So sending (either without permission, or with permission) GET to the redirected resource makes no sense, as I didn't wanted to retrieve the resource, only update it.


But then I looked at HTTP spec again, at general description of 3xx codes series:


This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

This clearly indicates that action needs to be taken, not can be taken. Interesting. So I infer that 301 code is actually obligation for redirection. (BTW this quote says that action can be done without user interaction only if second request is GET or HEAD. At the same time, first quote says the same, but it mention first request. Is it a bug in the spec? But logically, both make sense...) So perhaps my thinking was wrong. Imagine different situation: I retrieve myProject representation. After some time, I want to update its description, so I send PUT with new description to the http://myservice.org/v1/Projects/myProject URL. But in the meantime, another user changed this project name to ourProject. If the server is able to remember such changes, when my request to http://myservice.org/v1/Projects/myProject comes, it sees that myProject no longer exists, but remembers it was moved to http://myservice.org/v1/Projects/ourProject, so it sends 301 response with new location, telling me "this is a new location our your resource, send your PUT request there". Now browser can ask me for confirmation "resource has been relocated, should I resend the same PUT request to the new URL?". If I say "Yes", it sends PUT to the new location and updates description of ourProject. So the second request should be PUT, not GET. The note from spec found below 301 description also indirectly points out that original method should be preserved for second request:


Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request.

Well, looks like not only some existing HTTP/1.0 user agents, but also modern HTTP/1.1 clients do...


(Note: in the presented scenario the 409 response indicating concurrent modification problem would be probably more accurate; if another user renamed the resource, it could also have changed the description, so actually the current user could be more interested in retrieving the new representation first, and only then updating it.)


But GET is safe, right?


Obviously, the confirmation (in any form, say dialog box) would work better in case of client being a human. If the client is a program (and in case of REST web services it usually will be), for example single-page, rich-client JavaScript application, the confirmation would be a bit more problematic. But that's academic discussion at this moment, because browsers are buggy for 301 codes and don't ask for confirmation anyway. Even if they did, this would be probably beyond control of actual JavaScript application (like in the case of basic authentication dialog pop-up), so it would be of very little practical usage.


So what is the solution? The first idea: in valid REST service, GET requests are safe, so sending such request automatically by the browser after obtaining 301 code would do no harm, apart from increasing a bit the load on the server (without real need). But I see two problems here: the redirection is handled by browser (or other HTTP library) in automatic way, so the actual client don't have access to the response from first request (PUT), only to the one from second request. If response to PUT request contained some special, important headers, they will be lost. But there is also second, more important problem. In case of my service, I use basic HTTP authentication, with some small trick to bypass browser authentication dialog (I'll describe this technique in one of the next posts). After user gets authenticated, I set default authentication header for all Ajax request, using Ext-JS code Ajax.defaultHeaders = {Authentication : Basic HASH}. But it happens that in case of 301 response, the second, after-redirection request is launched without Ext-JS awareness, so the authentication headers are not applied, and the actual response to that request is error 401 or 403. So this seems like a dead-end.


Say goodbye to 301


After thinking about it, I came to the conclusion that probably authors of the book were wrong: in the case of URL change, as a direct result of PUT request, the 301 code is not the proper response. This conclusion is based on the aforementioned phrase from HTTP 1.1 spec, describing 3xx codes as indication that further action needs to be taken by the user agent in order to fulfill the request. So 301 response is not meant to be just an information about resource metadata change. It is rather a kind of directive for the client: actually, most people using HTTP client libraries written in different languages, usually expect the 3xx redirections to be handled automatically be the library itself.


So what response code should be used? The 204 No Content seems to be the best choice - see the spec description for this code:


The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.

This looks like perfect fit. Server fulfilled the request and updated the resource, but doesn't have to return the resource representation, as it is already known to the client. But as the result of the change, part of object metadata has been changed: namely the URI of the resource. So the response contains Location header with updated metadata. Simple, and works well with all (or most) HTTP clients.