How Amazon’s REST Compares To REST (H9)
Amazon’s REST service is one of the better-known attempts to build a web service on REST principles. It is frequently cited, in particular in articles comparing REST with SOAP, XML/RPC or other RPC-style services (in particular since Amazon offer both a REST and SOAP approach). As such it is worth examining how well it adheres to the REST style. (HA)
It’s always worth examining the practical effects of succeeding or failing to implement a style, but in this case it’s even more useful as many of the articles that cite it when comparing REST and SOAP are quick to point to claims that the REST service is 6 times more efficient and nearly 6 times as popular with developers – any philosophical differences with Amazon’s approach would have to justify itself with at least a suggestion of practical benefit. (HB)
Philosophy (HC)
Amazon have no statement of a philosophical goal or even a requirement document, after all they are in the business of selling books and other products, not saving the web for all mankind. However it is worth examining the description of REST they give in the API documentation: (HD)
REST (or XML over HTTP or XML/HTTP) uses URLs with specific name/value pairs to invoke methods and processes within Amazon.com's Web Services framework. The URL is the primary method used for message passing. Once the URL is processed, a well-formatted XML document is returned as a response. Because REST is based on such a widely accepted methodology, most developers should have no problem creating applications capable of quickly communicating with the Web services that expose this interface. (HE)
REST is not necessarily XML/HTTP nor is XML/HTTP necessarily REST except in so far that use of HTTP means touching the REST application that is the web. Their SOAP interface also uses XML/HTTP after all. [I’ve heard that Amazon didn’t initially label this approach REST themselves, can anyone confirm?] (HF)
The focus on the URL[sic] is important here, but there is no indication of any view of the URI as identifying a resource. This is perhaps a wee bit angels-on-pinheads since the use of a URI as identifying a resource in the general sense or of locating a file (or octet-stream, or whatever you want to call it without using the word “representation”) can generally be projected onto the same system by two people with differing views of the role of URIs. A discussion of the distinction between resources and representations would perhaps be best avoided for fear of making the style seem more intimidating than it should be. Elsewhere they also define REST as “a Web services protocol that was coined by Roy Fielding in his Ph.D. thesis (see http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm)”, which is plainly wrong, but which does indicate an awareness of the literature on the subject. (HG)
The name/value pairs is an accurate description of how Amazon’s service works, but is not inherent to REST and is arguably at odds with it (see QueryStringsConsideredHarmful). (HH)
The description of XML/HTTP as a “widely accepted methodology” and the ease with which it can be used is accurate and significant to Amazon’s goals. (Contra this of course is the tools which are provided by vendors to support the use of SOAP). (HI)
“Well-formatted” is presumably a matter of taste :) (HJ)
Conversely SOAP is defined thusly: (HK)
SOAP (Simple Object Access Protocol), a more complex method of sharing messages between client and server, was developed to deal with the limitations of REST. SOAP is a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment. It uses XML technologies to define an extensible messaging framework providing a message construct that can be exchanged over a variety of underlying protocols. The framework has been designed to be independent of any particular programming model and other implementation-specific semantics. (HL)
The “limitations of REST” are not explored or justified, perhaps reflecting a hype disadvantage on RESTs part (maybe there are no limitations…) since this is an article about REST rather than directly about the REST vs SOAP debate I’ll only note further that: (HM)
- REST is similarly independent of programming models and implementation-specific semantics – indeed one of the things that prompted Fielding’s work was a desire to move away from a position where HTTP was defined in terms of a reference library. 2. SOAP’s description is no more likely to be read favorably by a SOAP aficionado than REST’s description is by a RESTafarian. (HN)
Obtaining information from Amazon. (HO)
Most uses of the service will begin with an attempt to obtain data from it about a product, product category, or the results of a particular search. This is a read-only operation of the type that is naturally modeled in REST as GETting a representation of a resource. Amazon does indeed provide this service by allowing us to dereference a URI to obtain an XML document. It’s interesting to note that from a business perspective this is likely to be the type of request that is most common – and hence cost the greatest expense to Amazon, and that it does not in itself result in either Amazon or an associate getting paid, hence there is a particularly strong motivation for both parties to make this particularly efficient. (HP)
URI Construction (HQ)
GenerativeNaming is used to obtain the URI to use. The API documentation gives instructions for creating the URIs for various types of resource and this is relatively easy to do. (HR)
Note that the way they do it isn't RESTful because it doesn't respect the "hypermedia as the engine of application state" constraint. In order to do so, they'd need to declare the query parameters in a form (e.g. HTML or RDF). -- MarkBaker (HS)
The constructed URIs use query-strings (see QueryStringsConsideredHarmful). It is hard to think of a reason why this should be so, as Amazon have been ahead of the trend in avoiding query strings in the URIs they use elsewhere (though there are criticisms that could be made there also). This has implications regarding both the type of XML document returned, caching, and possible brittleness on the part of the application (such brittleness would be reduced or removed by a commitment to maintaining URIs; the fact that they are maintaining URIs used by a previous version of the service indicates that there is at least some commitment to do so). (HT)
User Identification (HU)
Two pieces of user identification are embedded into the generated URIs, one is referred to as a “developer token” and is given to the developer when they sign up to use the API, the other is the “associate id” and is used to identify associates so that they can be rewarded for referring customers or initiating sales. (HV)
Problems of such user-ids has been noted many times; those pertaining to REST are noted by RoyFielding http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluation.htm#sec_6_2_5]: (HW)
One form of abuse is to include information that identifies the current user within all of the URI referenced by a hypermedia response representation. Such embedded user-ids can be used to maintain session state on the server, track user behavior by logging their actions, or carry user preferences across multiple actions…. However, by violating REST's constraints, these systems also cause shared caching to become ineffective, reduce server scalability, and result in undesirable effects when a user shares those references with others. (HX)
The associate id is clearly necessary when purchases are made, or if the user is passed on to one of Amazon’s sites. It also allows for the possibility of representations or resources that are not directly involved in such actions containing URIs for resources that are. Since generative techniques are used to get a URI for the web service generative techniques could just as easily be use to get a URI for these operations as well (what’s sauce for the goose…). As such the id could be easily dropped as a requirement and Amazon do allow you to use a general id of webservices-20 (or any string for that matter) should you not be an associate. (HY)
The use of the developer token is less clear, but appears to be a license to help track use and potentially the means by which abusers of the service may have their rights to use it revoked. However many means of using the service will expose this identifier so it doesn’t have even a trivial measure of credibility as an authentication mechanism, and at the time of writing the service will still work with bogus tokens. If authentication is needed it would be better provided through the authentication mechanisms of RFC 2617. However, this would make the service more difficult to use with many tools than currently, since they’ve been successful without secure authentication it seems reasonable that they should just drop all attempts at authentication (one can’t rule out the possibility that they are future-proofing against a later need for authentication here). (HZ)
Status codes (I0)
Amazon returns 200 status for error messages; essentially it sits on top of HTTP and uses it as a transport protocol for tunneling it’s proprietary protocol. Hence a search for the book with the ISBN of 0000000027 “successfully” returns an error message, rather than the 404 we would expect for a book that doesn’t exist in Amazon’s records (actually the error message complains that it’s an invalid ASIN, while it may be an invalid ASIN it’s a valid ISBN, but that’s a minor niggle). (I1)
This has advantages for users of some tools (for example we can use the above with XSLT’s document() function, which is allowed to lose on a 404) it is problematic with others. Essentially it is being dishonest in it’s return codes, but while tools for accessing the web continue to hide status codes from users, or to reveal them in brittle ways (losing when encountering a 4xx or 5xx error) we have to forgive them. (I2)
Caching (I3)
The two pressures on network caching are strongly felt in the case of Amazon, little or no caching results in heavy use of the service and the consumption of bandwidth and other resources, overly-aggressive caching may result in inaccurate information, with pricing information having the greatest potential to cause damage. (I4)
As such the license and documentation restricts users in how long they can cache different types of information (1hour, 24hours or 3months depending on the type of data) but also strongly suggests that they do have some sort of caching. (I5)
REST explicitly contains caching mechanisms as part of it’s design, and this is reflected in explicit, and detailed, cache-control mechanisms in HTTP. A RESTful design will therefore use these explicit caching mechanisms – which includes both marking something as cacheable and marking it as not cacheable as appropriate. (I6)
Amazon doesn’t do this. The fact that the URIs contain ? means that the default behaviour of compliant caches will be to treat them as not cacheable (this is largely for backwards-compatibility with applications that would have side-effects for such GET operations). Hence the burden of creating and maintaining a cache is placed entirely on the user of the web service, and entirely at the point where the service is used. With some tools this will be very difficult or impossible, with all tools this is a pointless reinvention of a wheel (and not a wheel of insignificant complexity). Further it is difficult for the end user to do this well, they do not know the last-modification times, have no way of creating e-tags beyond perhaps hashing the entire output (which might save a transmission from them to a further user of a web application or web service which is built on top of Amazon’s, but which does nothing to reduce traffic with Amazon, and which is seriously expensive). Even the images offered have last-modification times but no ETag, expiry or cache-control headers. Conversely Amazon presumably does have some idea of when their data was last changed for a given request (at least for some of the requests, such as the request for information on a particular item) and may well be able to provide a time it is confident the data won’t change until. If such information were used with headers such as Last-Modified, ETag, Expires and Cache-Control and if conditional GETs (If-Modified-Since, If-None-Match) then this would enable the user to offload their responsibility for caching onto a normal HTTP cache, would enable such caching to achieve the desirable balance between network efficiency and accuracy, and would enable them to use this information in caculating the same headers for representations that they are in turn creating from the web service. (I7)
Further this cache could potentially be shared between multiple users; indeed Amazon could place a cache “in front of” their server and hence greatly relieve the load on the implementation that constructs the resources. This makes particular sense when we consider the differences between Girl With A Pearl Earring a best-selling novel that has been adapted into a successful film and Breton Grammar which is out of stock (as my choice of an out of stock book it was selected because I know one of the authors, to whom I apologise should he read this). We can certainly expect that requests information on the former would heavily outnumber requests for information on the latter over the next few weeks. A shared cache “close” to the service, or better yet a shared cache close to the service and others shared between some users (such as being shared between users of a large ISP) would provide a performance benefit, potentially of magnitudes. (I8)
To really benefit from caching the use of query strings and the use of information in the URI that does not relate directly to the resource the end user is interested in (such as user-ids) would have to be removed. Without this step the benefits from shared caching would be no where near as great, and would be much more expensive since duplicate information would quickly fill the cache – however it would also still be beneficial to allow users of the service to cache with a normal HTTP cache and for such cache’s to verify the freshness of cached representations through normal HTTP conditional GETs. (I9)
Document Format (IA)
The format of a document has only a slight bearing on REST, but the effects are not completely insignificant. In particular REST is designed explicitly to cater for hypermedia and hence hypermedia is the document format that a RESTful system will work best with. (IB)
A hypermedia document provides a mechanism to link to other documents, whether through replacing itself with the other document, embedding, inclusion, causing another view-area to be created, having one audio track be a background to another, or any other such mechanism. In terms of computer-targetted representations this means providing links to other documents describing related resources in a way that describes the relationship between resources and how to access them if desired. (IC)
The document format used by the Amazon service provides explict URIs only for images of products and for pages on one of Amazon’s sites about the same product. URIs for further using the web service are generated by inserting ASINs, search strings, or other data into a URI as described in various parts of the API documentation. This is potentially a brittle approach (but it’s safe as long as the URIs are CoolURIs?) and prohibits the possibility of using general tools that understand XLink or RDF where either of those technologies used to transmit the URIs. (ID)
The document also contains information that relates to the HTTP request a UserAgent field which reflects the User-Agent header and a RequestID? field which contains a 20-character alphanumeric code of no discernable value to the user. The UserAgent field might potentially be useful for debugging purposes, but adds nothing in general (users presumably either know what their user agent is or don’t care). This part of the representation is not related to the resource. (IE)
I also really hate the (often invalid) HTML source kludged into some PCDATA elements, though I suppose that’s not really a REST issue. (IF)
Okay more to add to this, but I'm getting tired -- JonHanna (IG)