For the latest on this topic, see RestAndUriOpacity    (1DJ)

Introduction: Onions in the Varnish    (1DK)

(by JeffBone, 2002-03-08)    (1DL)

In The Periodic Table, [1] ''Primo Levi tells a story that happened when he was working in a varnish factory. He was a chemist, and he was fascinated by the fact that the varnish recipe included a raw onion. What could it be for? No one knew; it was just part of the recipe. So he investigated, and eventually discovered that they had started throwing the onion in years ago to test the temperature of the varnish: if it was hot enough, the onion would fry.    (1DM)

-- Paul Graham (anecdotal)''    (1DN)

The Axiom of Opacity (of URI) is an onion in the varnish of REST / the Web. That is, it's something that was once suggested that has taken on a life of its own. Nobody really understands it or knows why it exists, but it's aquired the force of law. It's prima facie false; it should at best be called The Conceit of Opacity and at worst The Delusion of Opacity. There are a number of common misconceptions, myths, apocrypha, and so forth that are used to support the idea, and this document examines and debunks some of these.    (1DO)

Motivation    (1DP)

Let me say that the intent of this line of enquiry is not to needlessly antagonize REST, the nascent "principled architecture" / philosophy of system design of the Web. REST is in part an attempt to recognize and clarify how certain design decisions of the Web give rise to some of its desirable characteristics and therefore inform its further evolution. As such, it succeeds admirably; the recognition of the importance of HTTP's generic interfaces and its use as a coordination language is a fundamental and hugely important insight.    (1DQ)

OTOH, REST is largely defined by a small number of semi-formal specifications, a larger number of secondary publications and writings, and the collected design notes, anecdotes, and essays of Tim Berners-Lee. Among these artifacts there are a number of contradictions, a number of ambiguities, and perhaps at least some assertions and principles which are not sufficiently supported. Where certain principles and assertions can be formalized and their observable impact quantifiably understood, REST is solid. As is the case in other endeavors, as scientists and reasonable people we should always be willing to re-examine our assumptions and revise our thinking if necessary. Otherwise REST ceases to be a useful, reasonable philosophy and becomes a kind of religion.    (1DR)

My beef with the Opacity Axiom (hereafter, "the axiom") is that it (needlessly, IMHO) discards the important and useful concept of structured naming and semantically rich namespaces. In filesystems research and elsewhere over the last decade or two (described in slightly more detail in OpacityReconsidered) many researchers have recognized the power and utility gained by enriching the structure and semantics of names and namespaces. The Web, curiously, runs absolutely counter to this trend. It may be that indeed, Web-based names are better left opaque; but none of the arguments ("myths") used to support this assertion to date are, on further examination, valid. I would hate for a misinformed design "prejudice" to become a hard-and-fast axiom that needlessly and harmfully limits the future evolution of the Web.    (1DS)

Myth #1: URI are Intentionally Opaque by Design    (1DT)

URI as defined by RFC2396 [2] (particularly HTTP URI; we'll sloppily refer just to URI throughout this document when in fact we're usually talking about HTTP URI) are not, in fact, opaque. As Mark Baker puts it    (1DU)

"A URI is an odd creature with opaque, semi-opaque, and non-opaque parts. The authority/port part is non-opaque, the path part is opaque, and query parameters are semi-opaque. That is, they are opaque to intermediaries, but are not opaque to clients because they are a way for the publisher to say "put what you want here, I can take it". [3]    (1DV)

This is a good start to recognizing that URI are in no way opaque, but not quite all the way there. Even the path part, the part that is quasi-religiously regarded as opaque by many, is not in fact opaque. Mark's own opacity investigation [4] reveals this: he starts with the assumption that the path part is opaque and then immediately recognizes that the separators '/',';', and '?' decompose the path into distinct parts with specific relationships between them. This is a kind of *meaning* --- the syntax and specified semantics of these sub-parts of paths defines a particular model for the information space accessible via HTTP.    (1DW)

If URI (or their path components, anyway) were in fact truly opaque, we wouldn't need all that grammar in RFC2396; instead, we'd define things down past the authority and then just have a single, opaque part with specified acceptable characters. All that grammar exists to define a structure and particular semantics of the subcomponents of our "opaque" URI paths.    (1DX)

Indeed, we *need* URI to be non-opaque. If it weren't for the particular semantics of the path portion of the URI, we couldn't construct relative URI. Relative URI allow us to refer to one resource strictly through name construction given the name of another resource, and the benefits of this are large. The syntactic and algorithmic machinery that makes this possible relies not only on URI being transparent, but having a particular set of semantics. Given that a particular and well-defined exception to the Opacity Rule is "endorsed," why shouldn't other (presumptively equally) well-defined and explicit exceptions be acceptable?    (1DY)

Even Tim Berners-Lee apparently can't decide just how opaque URI really are. His own use and discussion of URI in some contexts directly contradict the axiom. Consider the following two quotes from Tim:    (1DZ)

"When you are not dereferencing you should not look at the contents of the URI string to gain other information." [5]    (1E0)

"Looking at this [ http://www.nsf.gov/pubs/1998/nsf9814/nsf9814.htm ], the "pubs/1998" header [sic, he's referring to hierarchical parts of a URI -jb] is going to give any future archive service a good clue that the old 1998 document classification scheme is in progress." [6]    (1E1)

So we're not supposed to look at the contents of a URI string. But here we are looking at the contents of a URI string. Hmmm... where there's a contradiction, there's probably confusion, and confusion about and selective application of an axiom makes one question the validity of the axiom.    (1E2)

Myth #2: Embedding Information in Names Decreases Their Longevity    (1E3)

"If you put information in a name, it decreases its longevity." [7]    (1E4)

In fact, this isn't so much a myth as a useless tautology. My speculation is that this assertion stems from the recognition that some kinds of information --- particularly the kind that it is often tempting to put in names, such as rigid categorization --- lose relevance with age, and therefore names which contain such information become less (or not at all) useful over time.    (1E5)

This is a valid observation, but it speaks to the brittle nature of static ontological schemes, not to the desirability of putting certain kinds of information in certain kinds of names. The assertion is too strict; a better formulation would be "if you put certain kinds of information in certain names, it may decrease their longevity." On the other hand, judicious use of other kinds of information in naming may increase their usefulness. As always, engineering is about trade-offs.    (1E6)

Ultimately, all names except what I call "generative identifiers" (names which can be generated directly from the content of the entity in question, and are therefore only applicable to immutable instances of potentially mutable abstract objects --- see below, or companion piece KindsOfNames) suffer from the longevity problem. The choice of character sets usable in URI, for instance, ultimately bounds their useful lifetime. It is naive to think that the languages and alphabets we're using today will persist indefinitely. So on some level the statement is true, but this is hardly a compelling reason to not embed appropriate information (whatever that is) in names when that is interesting and useful to the designers or users of those names.    (1E7)

Myth #3: Unless Names are Opaque, They Are Brittle    (1E8)

"If names aren't opaque, then if I 'move' a resource I'm going to have to change its name!"    (1E9)

Brittle names are a problem, but it isn't unique to the Web nor does it require opaque names as its solution. This problem and its solution can both be seen in filesystems: consider the difference between c:\foo.txt and /home/jbone/foo.txt. The former suffers from this problem because it embeds location information in the name. The latter avoids it by providing a flexible, composible namespace without low-level location information in the name. (/home/jbone is best thought of as an abstract view of all jbone's named objects rather than a location or container in which they "reside.")    (1EA)

The problem of brittle names has little to do with opacity, really; while certain kinds of opaque names might solve the problem, it's not an appropriate solution any more than global use of device:inode pairs by users and programs would be an appropriate solution in filesystems. Names are only brittle when they directly encode mutable information that doesn't relate to the abstract entity being named but rather to its implementation in some context. Without a doubt, tight-coupling between names and locations is undesirable; indeed IMHO existing URI are often unnecessarily brittle due to their dependency on domain names.    (1EB)

Myth #4: There's No Difference Between a Name and an Address    (1EC)

This is trivially false. List all the differences you can think of. They're different.    (1ED)

Just off the top of the head any computer scientist can tell you all sorts of different qualities that typically differentiate the two. Names need not be unique in some cases; addresses usually are, in a given context. Address spaces are often flat, while namespaces need not be. Etc., Etc. Tim Berners-Lee's reluctance to accept different kinds of names, IMO, might stem from the trauma of the URL/URN/URI debates and all the hand-wringing about identity, persistence, etc. that the Web community endured in its early days. This is understandable, but it doesn't make this POV authoritative or correct. A more complete treatment of the different kinds of names may be found at KindsOfNames.    (1EE)

Myth #5: Some Information is Inappropriate in Names, Therefore It All Is    (1EF)

The form of this argument is "some X are Y, therefore all X are Y." Trivially false.    (1EG)

Another early trauma of the Web was its use of filename extensions to embed content type information in names. This was notoriously problematic. Generally speaking, names should encode only information about the abstract entity being represented; they should avoid embedding information that is potentially transient, related to implementation or location, and so on. Good semantic names occupy a high level of logical abstraction and in some sense describe immutable or very infrequently mutable "Platonic" qualities of the abstract thing being named.    (1EH)

While I certainly have no problem with the current Content-Type scheme, it's not the only or obviously best one. In fact, filename extensions and other type informtion *could* be usefully embedded in URI if you standardized a mechanism for allowing servers to serve up their own mappings from extensions to types to clients. A client would then no longer have to "guess" about these mappings. This would in fact be just as flexible as the current scheme. Both schemes suffer somewhat from the need to maintain a authoritative (but potentially extensible) list of a subset of standard content types (mime types) via some "centralized" authority. This seems to me to be an illustration of TBL's Test of Independent Invention. [7]    (1EI)

Myth #6: Link Topologies Can Do Everything Structured Names Can    (1EJ)

Consider the following task. I am a Web crawler, and I want to find every publically available object that is served up by a Web server. This Web server contains and makes available one or more resources that are not linked to by any other publicly available resource, but for which one or more predictable naming schemes conceptually exists. If the server provides information which I can use to dynamically understand and generate structured names in its namespace, I may be able to find those objects. If names are opaque and I am restricted to traversing the hypergraph topology, finding the objects that are not linked to is impossible. See GenerativeNaming for more examples of how constructable semantic names can sometimes be more expressive or efficient or enable things that can't be done with graph traversal.    (1EK)

There's been some discussion of whether generative naming introduces a new navigational metaphor to the Web, where we've already got a perfectly good one: hypertext. It does no such thing. There is a clear difference between naming and navigating: naming allows random access to an information space, while navigating a hypermedia topology is a form of iterative access. Not only that, but you *need* naming to *enable* navigating: you have to start navigating from somewhere, and you have to get to that place somehow. See NamingIsntNavigating for more details.    (1EL)

Myth #7: Use the Query String. That's What It's For.    (1EM)

A URI has a multi-dimensional information model. There are parts that encode the starting point in a semi-flat (regarding DNS names as opaque for the purposes of URI) space, parts that encode a path through a graph-structured space, properties for differentiating and parameterizing nodes in that graph-structured space, and queries which provide an arbitrarily-dimensional space associated with each parameterizable node of each name subspace in HTTP URIspace. That's a long way of saying that URIs already provide a rich model of information organization, and this model is already exploited to good effect by i.e. relative URI.    (1EN)

If we call everything but the query string "opaque" and encourage people to shove all non-opaque information into query strings, then we in effect *lose information.* Query strings do not and cannot easily be made to encode the hierarchical relationships between the provided paramaters. Nor should they be made to: we already have '/' for that purpose, and we already use it for that purpose in transforming relative URI to and from absolute URI. Hierarchical or other namespace-traversal information belongs in the path part; the query string encodes an arbitrary namespace for each resource identified by the rest of the URI, not the entire authority's namespace itself.    (1EO)

Myth #8: Non-opaque Names Result In Tight Coupling    (1EP)

This is no more necessarily true --- or important --- than the statement "the expectation of conformance to a particular XML Schema or RDF structure tightly couples a client to a server." Both disciplined naming and schemas may be used to give information to a client for use in interacting with some resource or resources. Tight coupling arises when software "hardcodes" rather than discovers particular resources that it needs to use. Non-opaque names do not encourage this behavior, nor does opacity of names prevent it in naming or elsewhere. In fact, if you look at many XML applications, you ironically see a *lot* of very explicit URI hardcoded in; this tightly couples the XML application to various names. Let's hope those names are "cool" and don't change. If not for the pretext of URI opacity, computation of names for various entities might be used to break that tight coupling and increase the resiliency of the XML application.    (1EQ)

Myth #9: If Tim (or Roy, etc.) Says It, You Should Believe It    (1ER)

This isn't necessarily true. The form of the argument is "acknowledged expert asserts something --> something." That's not science or engineering, that's religion. A more useful assertion would be "if Tim et. al. says it, you'd better have a really good understanding of it in order to dispute it." Engineering is about quantifiable tradeoffs imposed as design decisions. I believe that Tim et. al. believe that URI opacity is a design decision that they've made, and I also believe that a cursory critical examination of their belief is enough to reveal that it is inaccurate. It's not a design decision, it's a cultural artifact. (Consider Tim's own dissatisfaction with previous instances where he bowed to peer pressure, such as the whole UDI thing...)    (1ES)

Roy in particular spent a lot of time thinking about how to formalize the non-opaque notions of URI; Tim has also spent a lot of time on the syntactic and semantic requirements for modeling different "shapes" of information spaces. The post-facto attempt to constrain the non-opaque URI notions to a particular set of uses is mostly an attempt to avoid further rather painful metaphysical arguments about information modeling such as characterized the whole early URL/URN debacle.    (1ET)

Or another related myth: "if Tim et. al. says it and you don't believe it, you just don't get it." In fact, once I "got" REST (by having Roy and Mark beat on me for over a year) I bought the whole package, including URI opacity. But familiarity with the use of namespaces in other contexts lead me back to re-examine the opacity axiom and the related problem (better left for later discussion) of the growing dependence of the Web on hypermedia structures. I get what they're saying. I think I understand why they're saying it. I just don't agree. That's an informed opinion.    (1EU)

Myth #10: Non-opaque URI Will Break The Semantic Web    (1EV)

This doesn't make any sense at all. Semantic naming is a powerful tool that supplements the Semantic Web in interesting and useful ways. Any name semantics necessary to convey information about how namespaces are organized, accessed, and used can easily be added to various XML descriptions in order to allow generative construction or meaningful deconstruction of names in those namespaces. Semantic naming also allows for certain things that are impossible or inefficient in purely declarative descriptions. Perhaps better put, generative mechanisms can be embedded in purely declarative descriptions in order to achieve things that can't be achieved efficiently or at all without them. Consider the push operational mode of XSLT for an example of a generative machine encoded in a declarative framework.    (1EW)

It is true to say that, to the extent that the URI namespace is re-recognized as a powerful, first-class, central organizational metaphor of the Web, therefore the focus and emphasis of capturing all knowledge explicitly in hypermedia is lessened. This may be "politically" dangerous for some efforts, camps, etc. I'm not too worried about that; I have my suspicions about the practicality of that effort, anyway. Just because you've got a hammer doesn't mean everything is nails; we've got a hammer and a saw, and we shouldn't use the hammer to cut boards.    (1EX)

Acknowledgements    (1EY)

Thanks (and apologies) ;) to Mark Baker and Paul Prescod, Roy Fielding, and particularly Tim Berners-Lee for providing much insight along with many of the myths that I'm debunking here. Without the possibility of frank, open argument and reasoned disagreement there is no learning, and without learning there is no progress. Cheers!    (1EZ)

References    (1F0)

  1. The Periodic Table by Primo Levi    (1F1)
  2. RFC2396 Uniform Resource Identifiers (URI): Generic Syntax by Tim Berners-Lee et. al.    (1F2)
  3. Message to the rest-discuss list by Mark Baker    (1F3)
  4. An Investigation into the Opacity Properties of RFC 2396 by Mark Baker    (1F4)
  5. Universal Resource Identifiers -- Axioms of Web Architecture By Tim Berners-Lee    (1F5)
  6. Cool URIs Don't Change by Tim Berners-Lee    (1F6)
  7. The Myth of Names and Addresses by Tim Berners-Lee    (1F7)
  8. Principles of Design by Tim Berners-Lee    (1F8)