Tuesday, April 7, 2009

Relocating resources with RESTful web services

Recently I spent long time on redesigning internals of a RESTful web service. In that service, I try to follow the rules presented in "REST canonical reference" book RESTful Web Services . I decided to use "meaningful" URIs based on resource names. Resource name is editable, so when it is changed, the URI must change too (another solution is to use "non-meaniningful" URIs based on immutable, usually auto-generated IDs of the resources). Say I have the resource named "myProject" addressed by URI http://myservice.org/v1/Projects/myProject. To change its name to "ourProject" one needs to send PUT request to this URI with the new name in request body. Say we use form-encoded representations in request, then we would send PUT request to http://myservice.org/v1/Projects/myProject with request body name=ourProject. Server processes the result, changes the name in the database, and must send the response back to the client. What should be the response? Here the interesting part starts.

First implementation

According to the book, if the PUT request causes the URI to change, the response should be 301 Moved Permanently with the header Location: http://myservice.org/v1/Projects/ourProject in order to let the client know the new location. I've implemented it this way, using Spring 3.0 REST support (with a bunch of my custom classes and annotations):

@RequestMapping(method=PUT, value="/projects/{projectName}")
public HttpResponse createOrUpdate(@PathVariable String projectName, PropertyValues newValues, @RequestUrl String url) {

SaveResult result = projectService.createOrUpdate(projectName, newValues);
if (result.isCreated()) {
return new HttpResponse(SC_CREATED).setHeader("Location", url);
} else if (result.isRenamed()) {
String newUrl = url.substring(0, url.lastIndexOf("/") + 1) + result.getObject().getName();
return new HttpResponse(SC_MOVED_PERMANENTLY).setHeader("Location", newUrl);
} else {
return new HttpResponse(SC_OK);

So in case when resource was renamed, I construct a new URL and return 301 code with new location. This should work, but it didn't. It looked like this response code triggered immediate GET request to the new location. Initially I thought that it is done internally on server either by Spring or Tomcat, but it turned out to be triggered by client. The same behavior happened for different clients: Firefox, IE, and pure Java client used for testing web services, the Wiztools.org RESTClient. I don't know what HTTP Java library is used internally by the latter tool, but I guess the behavior is implemented in this library, not in the client itself.

I started to search some information about this problem, but Google was of very little help for queries like "PUT with 301 Moved Permanently". Lastly I looked at RFC2616, the HTTP 1.1 specification. This is an excerpt with description of code 301:

If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

So it looks like all browsers and other clients are broken in this area: they should not redirect automatically to the new location. So I had to find some workaround for this.

A bit of theory behind it

Suppose that after issuing PUT request which returned 301 code, the browser actually did ask me for permission for redirection. And suppose I gave the permission. What should the browser (or other client) do? Obviously, send the request to the new URL. But what request? GET? or PUT (as initial request was PUT)? What does 301 response actually mean in this case? My first thinking was: I send PUT request to http://myservice.org/v1/Projects/myProject, which changes the data of that resource. If URL is not changed, response is 200. If URL is changed (say to http://myservice.org/v1/Projects/ourProject) as the result of new data submission, server updates the resource, and let me know the new URL with 301 response containing Location header. (At least, this is the logic roughly presented in the aforementioned book, I think). So, is there any point in doing any redirection in such case? The 301 code is only information here, not obligation to do any redirect. I sent PUT request to update the resources, not GET to retrieve it. So sending (either without permission, or with permission) GET to the redirected resource makes no sense, as I didn't wanted to retrieve the resource, only update it.

But then I looked at HTTP spec again, at general description of 3xx codes series:

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

This clearly indicates that action needs to be taken, not can be taken. Interesting. So I infer that 301 code is actually obligation for redirection. (BTW this quote says that action can be done without user interaction only if second request is GET or HEAD. At the same time, first quote says the same, but it mention first request. Is it a bug in the spec? But logically, both make sense...) So perhaps my thinking was wrong. Imagine different situation: I retrieve myProject representation. After some time, I want to update its description, so I send PUT with new description to the http://myservice.org/v1/Projects/myProject URL. But in the meantime, another user changed this project name to ourProject. If the server is able to remember such changes, when my request to http://myservice.org/v1/Projects/myProject comes, it sees that myProject no longer exists, but remembers it was moved to http://myservice.org/v1/Projects/ourProject, so it sends 301 response with new location, telling me "this is a new location our your resource, send your PUT request there". Now browser can ask me for confirmation "resource has been relocated, should I resend the same PUT request to the new URL?". If I say "Yes", it sends PUT to the new location and updates description of ourProject. So the second request should be PUT, not GET. The note from spec found below 301 description also indirectly points out that original method should be preserved for second request:

Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request.

Well, looks like not only some existing HTTP/1.0 user agents, but also modern HTTP/1.1 clients do...

(Note: in the presented scenario the 409 response indicating concurrent modification problem would be probably more accurate; if another user renamed the resource, it could also have changed the description, so actually the current user could be more interested in retrieving the new representation first, and only then updating it.)

But GET is safe, right?

Obviously, the confirmation (in any form, say dialog box) would work better in case of client being a human. If the client is a program (and in case of REST web services it usually will be), for example single-page, rich-client JavaScript application, the confirmation would be a bit more problematic. But that's academic discussion at this moment, because browsers are buggy for 301 codes and don't ask for confirmation anyway. Even if they did, this would be probably beyond control of actual JavaScript application (like in the case of basic authentication dialog pop-up), so it would be of very little practical usage.

So what is the solution? The first idea: in valid REST service, GET requests are safe, so sending such request automatically by the browser after obtaining 301 code would do no harm, apart from increasing a bit the load on the server (without real need). But I see two problems here: the redirection is handled by browser (or other HTTP library) in automatic way, so the actual client don't have access to the response from first request (PUT), only to the one from second request. If response to PUT request contained some special, important headers, they will be lost. But there is also second, more important problem. In case of my service, I use basic HTTP authentication, with some small trick to bypass browser authentication dialog (I'll describe this technique in one of the next posts). After user gets authenticated, I set default authentication header for all Ajax request, using Ext-JS code Ajax.defaultHeaders = {Authentication : Basic HASH}. But it happens that in case of 301 response, the second, after-redirection request is launched without Ext-JS awareness, so the authentication headers are not applied, and the actual response to that request is error 401 or 403. So this seems like a dead-end.

Say goodbye to 301

After thinking about it, I came to the conclusion that probably authors of the book were wrong: in the case of URL change, as a direct result of PUT request, the 301 code is not the proper response. This conclusion is based on the aforementioned phrase from HTTP 1.1 spec, describing 3xx codes as indication that further action needs to be taken by the user agent in order to fulfill the request. So 301 response is not meant to be just an information about resource metadata change. It is rather a kind of directive for the client: actually, most people using HTTP client libraries written in different languages, usually expect the 3xx redirections to be handled automatically be the library itself.

So what response code should be used? The 204 No Content seems to be the best choice - see the spec description for this code:

The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.

This looks like perfect fit. Server fulfilled the request and updated the resource, but doesn't have to return the resource representation, as it is already known to the client. But as the result of the change, part of object metadata has been changed: namely the URI of the resource. So the response contains Location header with updated metadata. Simple, and works well with all (or most) HTTP clients.

No comments: