Skip navigation

The best ID is a web of IDs

If you want to link to a book you’ve just read, what do you use? Amazon? Sure, but suppose you don’t feel like giving them the free advertising. Maybe you use Open Library, although their book pages are a little geeky. Maybe you Google for the book and link to the publisher’s page about it. Any of these sources are better than leaving the reference unlinked, but the fact that we’re not sure what to point at is a problem.

It might seem that the solution is to have everyone link books to a single catalog of all existing books, perhaps Open Library or WorldCat.org. But there are good reasons to keep things much messier than that.

To see why, take it out of the realm of books and instead think about people. Let’s say you want to post about someone named Christina Gomez. You probably have a few options for making it clear which Christina Gomez you mean. You might link to her blog, her Twitter handle, her LinkedIn page, the bio of her on her employer’s site, the bio on the site of the choir she sings with, or her police record for the time she shot a man in Reno just to watch him die.

Fortunately, there is a way to stitch all those Christina Gomez links together. In the world of the Semantic Web — the world in which Web pages yield more of their meaning to computers examining them — there’s something called a “SameAs” statement. As the name implies, saying that one link is the SameAs another means that they are both talking about the same thing in the world…the same person, book, place, etc.

SameAs statements, which are made visible to computers but hidden from human eyes, look like hacks to get over the unpleasant fact that we don’t all link to the same places. In fact, the world is better off with many ways of linking things. There’s richness in that messiness.

This may seem counter-intuitive. We’ve long assumed that if you want to disambiguate references – “Which Christina Gomez is this talking about?” – it’s best to have a single source that everyone uses, like having a single Social Security number or a single passport for any particular country. (“Wait, which US passport did I use when I left the country?” is a bad thing to mutter to a US Immigration officer.)

But the book Linked Data: Evolving the Web into a Global Data Space, by Tom Heath and Christian Bizer explains why SameAs is not a bandaid for a suboptimal situation. That “bandaid” actually provides important social functions. The book lists three.

First, SameAs lets people disagree about a person — a disagreement that may well be expressed by the sources they link to. To you, Christina Gomez should be understood by her link to the Littleton Amateur Hockey League page, to someone else she should be known as the person who donated a wing to the hospital, whereas I think it’s important that she be understood as CFO of Acrid Smoke Corp. With SameAs, we can all express ourselves, and our computers will still figure out we’re talking about the same person.

Second, when people can use multiple links to refer to the same person, we learn not just about Christina Gomez but about the people linking to her. The fact that you linked to her as a member of an ice hockey team tells us something about you.

Third, if we rely upon single, complete authoritative services, those services become single points of failure. Those sources would have to bear the burden of trying to include everyone and every thing in their domains, constantly updating, adjudicating disputes, etc. And suppose the service goes down or the organization that’s maintaining it calls it quits?

For those three reasons, it’s better to have our identifiers be webs, not anchors. This is not only a more useful expression of identity, it is also in my view a more accurate one.

Tip o’ the hat to Rob Sanderson (Twitter) at Stanford for pointing to the discussion in the Linked Data book.

Related Posts