Naming Things

There’s a quote that’s known in the programming world:

There are only two hard things in Computer Science: cache invalidation and naming things.

It’s attributed to Phil Karlton, but most of the sites that quote it either simply provide the quote or make some reference to the cache invalidation side.

By the title of this post, you’re aware it’s about the other side: naming things.

Naming is a vital part of social activity.  We call each other by names.  We call all of the tools we use, the places we inhabit, the feelings and concepts, the songs and literature and occupations, all by names.

It’s probable that language itself was invented based on naming.  A name in itself can represent the thing, but it can also connote an action associated with that thing: The master carpenter points to the board, and says, “hammer.”  The board isn’t a hammer, but the word hammer is the verb, meaning the apprentice should put a nail in the board with the hammer.

The name of the thing can also represent a property of the thing: “the night was electric,” meaning that the night in question was teeming with the promise of sparks and activity and excitement.

When you add technology or any large number of things to the mix, however, naming gets a bit complicated.

One of the big naming trends on the modern web is tagging.  So you start tagging at some point, whatever the items (songs, albums, posts, websites, images), and you attempt to establish a formal taxonomy.  You want the names to be consistent, so that when you use the tags, you get all of the relevant entries and nothing more.

But, as per Wittgenstein’s argument against private language, with tags there isn’t the direct social feedback to hone the taxonomy, to keep it consistent.  The practice of speaking with other humans, using words, enforces some amount of consistency and evolutionary function upon language.  Tagging seldom does.

One attempt to inject the honing function is a simple game implemented by a few sites.  Two users, paired at random, are given a brief time to give their list of words describing one or more images.  Where they agree on the words (which are not shown to the partners), the website considers that a candidate for a tag.

Another attempt might be to search for similar images or items, and offering the previously used tags as candidates.

In both cases, how the tags are actually used could further refine them, in ways similar to the network effects used to enhance search results.  Namely, if certain tags garner more attention, then their related tags that are also used could be included by default to broaden a failed search.

The other instance of naming is in writing programs.  Every object, variable, and function needs a unique name to allow you to refer to it.  But you still have to type it every time, which leads to simple, typographic errors and the subsequent delay caused by finding them.

There are solutions here, too.  Word-completion are one example, but if you have similarly-prefixed names (eg, because they share some characteristic, like belonging to an input versus an output), then you may still have to wade through many hits or keep typing, and you have to pause to decide.

An alternative to this might be to keep a cache of 20 or so tokens, and keep it fresh based on frecency or LRU, etc.  The top ten could be selected via a simple key combination involving 0-9 keys, while any of them could be clicked to include them.

But that’s the general look of solutions: better tracking of the code by the editor.

That solution isn’t sufficient, though.  The problem runs deeper, as code complexity and reuse attempts create namespaces.  Think of a namespace like a distinct deck of cards.  There are many of the Ace of Spades in the world, but each regular deck only has the one.

But not only do you need to keep the namespace under your control straight, your namespace needs a name that’s unique.  If I download your library to use, and one I have is already named that, it can be a pain to rename one of them (and keep pulling from upstream), as well as add a burden on any users of your work.

And that namespace problem brings out the final case for today, of the Internet’s DNS.  This is a single namespace which holds the same sorts of traps as above.  And that’s before you get governments and corporations starting to muck about in it.  But it’s also more complicated due to the commercial aspect of domains.

There are a large number of useful domains that are currently held by individuals with no interest in them other than profiting off their sale.  There was an ongoing land-grab for most of the mid-to-late 1990s.  People thought, “if I buy this domain, then someone will want it, and I’ll get rich.”  But most of the domains simply go unused, while most startups do not look to acquire the names and choose strange names instead.

This sort of system has ancestry in the trademark system, a system of official names with restricted use.  And it’s a major headache there, too.  Again, we must recognize that language is ultimately a social construct, where shared use morphs the tongue.  Relying on limited, official sources of names goes against the mechanisms proven to work.

But, in the case of the Internet we already rely on unique, numeric names, IP addresses.  There is a difference here.  These names are arbitrary, but they are more orderly.  With few exceptions, nobody cares what their IP address is.  Phone numbers tried to have it both ways, of course, shoehorning a naming system atop a numeric system.

The fix seems to be more DNS roots, which means the value of the names in the major roots like com and net go down.  A better trademark system might also help.  Such a system could make it easier to look up a trademarked name and get to the canonical website.  Maybe that even means a separate lookup protocol for trademarks that can be used alongside DNS.

I’ve not had much direct experience with cache invalidation, but I do use names every day.  It seems reasonable that continuing to look at naming in technology as a process requiring the same sort of feedback as natural language will yield better naming strategies that keep the Internet more open and more useful.