What Does "Generative" Mean, Anyway?    (7G)

(by JeffBone, 2002-03-08) Note: THIS DOCUMENT CONTAINS REST HERESY.    (7H)

gen·er·a·tive (adj.) Having the ability to originate, produce, or procreate.    (7I)

Evil Genius: When I have the map, I will be free, and the world will be different, because I have understanding. / Robert: Understanding of what, master? / Evil Genius: Digital watches. And soon I will have understanding of videocassette recorders and car telephones. And when I have understanding of them, I shall have understanding of computers. And when I have understanding of computers, I shall be the Supreme Being! God isn't interested in technology. He knows nothing of the potential of the microchip or the silicon revolution. Look how he spends his time: forty-three species of parrots! Nipples for men!    (7J)

-- Time Bandits    (7K)

Give a person a fish, and you've fed him dinner. Teach a person to fish, and you've fed them for life. (traditional) Give a person a URI, and you've given him a reference to a resource. Teach a person to make URIs to your resources, and you've given him references to all of them. (heretical)    (7L)

Something is generative if it may be used to generate or produce something else of interest. For our purposes, a something is generative if it has a set of rules that operate on a set or sets of objects to yield other objects of interest. In computer science, the term has been used in various contexts: "generative communication" is a term associated with the Linda coordination language for loosely-coupled distributed and parallel computation (see below); generative.net is a Website devoted to the generation of artistic works through various processes, usually by machine or machine-human collaboration. What we're interested in here is the notion of generative naming.    (7M)

What is Generative Naming?    (7N)

I've been using the terms "semantic names," "structured names," and "generative names" rather willy-nilly in my recent OpacityReconsidered and OpacityMythsDebunked discussions. A little clarity is in order. Semantic names refer to names that mean something --- anything. Structured names refer to "compound" semantic names which consist of distinct subparts --- each potentially meaningful --- which are composed according to some disciplined compositional mechanism. Generative names are a particular use of structured naming. Generative naming is the use of rules, semantics, a compositional mechanism, and so forth to generate complete, transparent, meaningful compound names from their meaningful parts. Use of the "hierarchical" or "path" part of URI --- particularly HTTP URI --- for generative naming is specifically considered to be a bugaboo in URIspace given the pretext of opacity.    (7O)

Is This About Linda?    (7P)

We've referred to HTTP as a "coordination language" --- a term normally associated with Linda [x] --- and now we're talking about generative naming on the Web, which kind of sounds like generative communication. But we're not necessarily implying or describing any equivalence between the Web and Linda. URIspace is actually a much richer namespace than the flat tuplespace of most Linda systems, and generative naming in URIspace is different from generative communication in Linda. As it turns out, there is in fact a near-equivalence of the Linda and the Web, made possibly by a trivial mapping of concepts between the two. That topic is (or will be) dealt with in LindaAndTheWeb. For our purposes here, what we're talking about has nothing to do with Linda.    (7Q)

Why Do We Need Generative Naming?    (7R)

Once we dispense with the pretense that URI are and should be opaque, we are able to use generative naming in the URI namespace to accomplish things that cannot be accomplished (or cannot be accomplished easily or efficiently) with truly opaque URI + hypermedia. This page documents some of the scenarios in which generative naming might be used.    (7S)

No, Really, What is it Good For?    (7T)

Jumping the gun, aren't you? ;) Here's a sneak preview: Generative naming --- that is, computable names and some description for how to compute them --- are good for describing information spaces that cannot be (or cannot easily or efficiently be) completely enumerated. These include infospaces that are or have embedded in them dense, monotonoic, regular sequences; infinite but regular and computable sets; and so on. They're good for representing time, spatial, and taxonomic organizations of information on the Web. They're good for dealing with things that you don't want to make hypertext links to. They are also often more efficient means of dealing with the same kinds of information spaces and organization that hypertext can represent. If you want more details, you'll just have to forge onward...    (7U)

Oh, what is it not good for? It's not good for characterizing sparse or irregular information spaces. And generally, it's no substitute for hypertext unless the abstraction being represented just cannot be efficiently represented in hypertext.    (7V)

The Generative Semantics of URI    (7W)

As Mark indicates in notes on his own opacity investigation, the existing URIspace has several well-defined dimensions or degrees of freedom. '//' selects between roots in a (flat or hierarchical, depending on whether DNS names are regarded as opaque) namespace; '/' traverses named links in a single-rooted graph-structured namespace rooted in an authority, yielding a two-dimensional object, a directed graph. The addition of properties via ';' introduces a "2nd-and-a-half" dimension to the namespace. The query string '?' turns each "point" in an arbitrary number of so-far "2.5" dimensional graphs into its own namespace of arbitrary dimensionality or degree.    (7X)

Informally, '/' should only be used to indicate traversal of named arcs in (i.e. a path "through") the graph-structured part of the namespace. That is, given a partial path    (7Y)

a/b/    (7Z)

then the following should hold    (80)

a/b/.. == a/    (81)

Generative naming in URIspace does not at all require redefining the semantics associated with existing URI syntax; it simply points out that those semantics do indeed exist and can, with a bit of discipline and additional information, be usefully exploited. Given an understanding of the various dimensions available in URIspace and perhaps a bit of additional information, URI can usefully be both constructed and deconstructed along any of those dimensions.    (82)

"But shouldn't all this kind of stuff be in query parameters?"    (83)

No. See PathsAndQueryStrings for a detailed description of why it's bad to put hierarchical information into query strings. Even Paul agrees, at this point --- though he's still fighting hard to find things in the whole position to disagree with.    (84)

Representing Space    (85)

Structured names are good for meaningfully representing the hierarchical spatial arrangements of physical (or potentially physical) locations and objects. Consider a hypothetical service smartspace.com that presents Web-based interfaces to smart home and environment controls for its subscribers. Its information space is the collection of users, their homes and other buildings, the rooms in those buildings, and the controllable devices in those rooms. This information space is hierarchical and captures a hierarchical kind of "scope" that maps to real space: users "have" buildings that contain rooms that contain devices. This organizational scheme could informally be described as    (86)

  http://smartspace.com/<user>/<building>/<room>/<device>    (87)

In this notation, <part> describes an abstract type for which the extent --- collection of allowable entities --- may be obtained in some fashion. How the allowable values for each of those are represented and how they are retrieved is basically irrelevant; they might be described via RDF, some form of grammar, or some other mechanism. For instance, the following URI might be dereferenced in order to find an RDF or other description of how the information space is organized:    (88)

  http://smartspace.com/metaspace    (89)

The point is that there are "knowable" values which can be substituted for each <part> above to generate a meaningful URI. Consider the URI    (8A)

  http://smartspace.com/joe/home/livingroom/dvr    (8B)

Assuming that "joe" is a valid <user>, that "home" is a recognized <building>, that "livingroom" is a recognized <room>, and "dvr" is a recognized <device>, then the above URI means "Joe's home's livingroom's DVR." Note that the ability to construct this name does *not* imply that Joe has a "home," that even if he does that his home has a "livingroom," or that there's a "dvr" in the living room. It names the abstract concept of the DVR in the living room of Joe's home. It is meaningful *even if the resource doesn't exist* --- a 404 on dereference of this URI means that there is no physical instantiation of this concept. (This is also a "Cool URI" [x] per TBL; even if things change in physical reality, the concept remains valid.)    (8C)

Now that we've defined a generative scheme for names in this information space, we can use these names to answer questions that are harder / less efficient to answer using traversal of hypertext embedded in objects in this namespace. Ignoring issues of privacy, authentication, and authorization, assume that it is desirable for some third-party to answer the question    (8D)

"What are the names of all the users that have DVRs in their living rooms?"    (8E)

A non-generative mechanism such as traversal of semantically-useful hypermedia links would result in crawl of a potentially very large graph. Depending on how "decentralized" the description was, this could result in a separate HTTP transaction for each "step" in traversing each possible "path" through the hierarchy. Who are the users? Ok. For user 1: what buildings does this user have? Ok. For that building, what rooms does it have? Ok. Etc. etc. etc.    (8F)

To avoid this, the information space owner might arrange for a centralized, canonical description of every resource on the site --- which might prove difficult or impossible depending on the nature of the space and the rate of change in the space. Even if he did so, notice the deviation between what's being provided in the declarative and the generative case: the URI we described above described a *concept*, while such a canonical listing describes *actual resources that are available.*    (8G)

A mechanism for answering this question using generative naming might be described as follows. First, let's informally re-express the query in something that resembles URI syntax    (8H)

  http://smartspace.com/*/home/livingroom/dvr    (8I)

If we dispense with the myth of opacity, then we can begin to think about treating URI as a kind of query language and build systems that can recognize various wildcarding and other selection mechanisms embedded in URI directly. Such mechanisms would avoid a whole mess of graph traversal, client-side processing, potential network round-trips, and so forth. IN FACT, you've compressed a whole series of expensive operations into a single operation, the dereference of a semantically-meaningful name.    (8J)

Yes, you can probably answer the question posed with hypermedia in most cases wherever resources are reified and enumerable. But in doing so, you've (a) stopped naming abstract concepts, (b) you're devaluing the meaning of 404, (c) you've introduced a mechanism that may not be as efficient in all cases or indeed work in all cases, and (d) why should I have to "pass through" hypertext when what I'm interested in could in fact be answered just by meaningful names?    (8K)

Even without such server-side support for wildcarding and names-as-queries, clients can (given the correct information) construct meaningful names to use to answer the kind of question posed above. The following psuedocode describes this, assuming that there is a-priori agreement that a particular organizational and naming scheme exists and will be maintained (no more or less a risky proposition than that some hypertextual KR scheme exists and will be maintained):    (8L)

fun() {
  dvrOwners = []
  users = GET (some URI yielding a list of users in some expected form)
  for each user in users {
    if GET "http://smartspace.com" + user + "/home/livingroom/dvr" != 404 {
      dvrOwners += user
    }
  }
}    (8M)

There are pros and cons to all of these approaches. Semantic naming is simpler for some purposes (such as this one), more efficient in some cases (depending on the information space being modeled), and may enable information spaces that are too large or too mutable to be practically "mapped" in hypertext graphs. The bottom line is that structured names that *can* represent graph-structured namespaces and hypergraphs that *do* represent namespaces are different tools that are *both* fundamental components of the Web architecture, and you should use the right tool for the job.    (8N)

Representing Time    (8O)

Chronological schemes for organizing and presenting information are common and very powerful. Time has a natural, hierarchical breakdown: we think of things in terms of time units which break down into ever-smaller increments: year, month, day, hour, minute, second, etc. This sort of organization is often more powerful, useful, and durable than other (particularly taxonomic) schemes; cf. "lifestreams." [x] TBL even talks about this in his discussion of "cool URI." [1] Since the creation, publication, acquisition, etc. time of a piece of information is in some sense immutable, it is useful for creating persistent names to that information.    (8P)

Some applications are inherently organized around time. Consider the task of archiving (some or all of) the Web. By archiving, we mean the creation of immutable snapshots of Web at some given point. This kind of archiving can occur in several contexts: a personal, partial archive of the content a user has accessed is a useful dataset; a global archive of all Web content is a tremendously useful and valuable dataset. In any such dataset, it makes sense to leverage the hierarchical, time-based organizational scheme.    (8Q)

Brewster Kahle's Wayback Machine is just such an archive. According to the FAQ the Wayback Machine "is a service that allows people to visit archived versions of Web sites. Visitors to the Wayback Machine can type in a URL, select a date range, and then begin surfing on an archived version of the Web. Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older version of your favorite Web site. The Internet Archive Wayback Machine can make all of this possible."    (8R)

The Wayback machine uses structured naming to represent the information space of its archive. It specifically uses generative naming --- including wildcarding --- to allow the pathname part of the namespace to be used generatively to query the archive. (Cf. Advanced Search. (NB: I'm not particularly thrilled with the way their namespace is organized --- it doesn't fully leverage the hierarchical nature of the namespace to map the hierarchical timespace --- but it demonstrates some of the power of generative naming.) A Wayback URI looks like:    (8S)

http://web.archive.org/20011007203917/http://www.cnet.com    (8T)

Directed searches may be performed by generating URI according to some rules, namely "CONS-ing" up a date element and a target authority element. Each of these elements has additional syntax and semantics: the time element may be truncated to provide only year and month, just year, etc. A list of all archived copies of a given target URI within a particular period may be constructed using a wildcard syntax. A similar list of all archived copies of a given "site" within a particular period may also be obtained. These uses are illustrated below:    (8U)

http://web.archive.org/200010/http://www.cnet.com # closest match to middle of named month
http://web.archive.org/2000/http://www.cnet.com # closest match to middle of named year
http://web.archive.org/200109*/http://www.cnet.com # list of all archived copies in period
http://web.archive.org/200109*/http://www.cnet.com* # all URI for indicated site in September 2001    (8V)

Obviously, the names are semantically meaningful and intended to be generated. This is a powerful demonstration of the utility of structured and generative naming. What is not really demonstrated in this example is if (and if so, why) this is preferable to putting the semantic information in the query string --- and indeed, the munging of all date information into a single path element obscures this somewhat. A more complete discussion of the uses and tradeoffs of paths vs. query strings can be found in PathsAndQueryStrings.    (8W)

Representing Taxonomy    (8X)

Structured names are useful for describing taxonomy, and generative naming is useful for addressing objects organized according to some taxonomic scheme. Taxonomies are often hierarchical, sometimes heterarchical. In either case, taxonomies can be described by rooted graphs. The path syntax can be used to traverse the namespace of the taxonomy, assuming that path are transparent and can be used generatively. The "official" mechanism for doing this in a Webby way --- for any generative application of names --- is to shove all the transparent information into the query string. There are some benefits to this, but many drawbacks. A detailed example of a taxonomic information space may be found in PathsAndQueryStrings.    (8Y)

Disclaimer: Taxonomy is generally overused. It is a peculiarly human trait to attempt to force things into "hierarchical" classification schemes. We do it all the time. Often, the resulting taxonomies are very brittle and lose meaning over time as our understanding of what we're classifying changes. We see this all the time in filing systems, etc. --- as mentioned above, this recognition is what lead Eric Freeman, David Gelernter, and colleagues to create the "lifestreams" metaphor [x] for organizing and dealing with information. However, certain taxonomies exist that have withstood the test of time. The taxonomy used in biology for describing lifeforms is a good example of this.    (8Z)

Interacting With Non-Hypermedia Objects    (90)


Generative naming has an interesting relationship to coolness. Arguments against Generative naming have been made on the basis that it makes systems brittle, because systems depend on client knowledge of the URIs defined by the server. If we strive towards coolness then we should include generated URIs in that. This is sometimes more difficult than the equivalent case with other URIs, but normally not massively so. Being cool means not being brittle.    (91)

Also, there is no reason why we can't combine generated URIs with the same sort of techniques that would be used without them (the representation of one generated URI linking to others). -- JonHanna    (92)


As an ad once said, "Where's the HERESY?"    (93)

See also: A Description format for REST    (94)

-- MarkNottingham    (95)


Interesting generative identifiers may be build by using the cross-reference feature of XRIs, and each component of the path may be an URI, something like (not quite XRI syntax, but the idea):    (5CW)

  http://smartspace.com/(mailto:user@example.com)/(urn:foo:building:bar)/(roomschema:room)/(http://device)    (5CX)

Such that you may construct paths of less ambigous names.    (5CZ)

-- LaurianGridinoc    (5CY)