Ten persistent myths about persistent identifiers

[Slightly expanded from a Twitter thread published 2018-08-24.] Here is a list of 10 persistent myths about persistent identifiers (PIDs), whether of type ARK, DOI, Handle, URN, PURL, or “Cool URL”, based on direct experience managing ​EZID.cdlib.org​ and N2T.net​.

Myth 1: PIDs guarantee access. Get real. All PID services run on evolving software and hardware that no vendor warranties. How could any organization change that, let alone for-profit publishers whose mission does not include preservation or non-profit archives and libraries whose budgets have been shrinking for decades?

Myth 2: PIDs rarely break. Nonsense. Millions of PIDs are broken, regardless of type. All PID types burden archival organizations with the heavy work of remaining solvent, not losing things, and updating redirection tables.

Myth 3: PIDs must not be URLs. What a crock. PIDs not carried inside clickable URLs are irrelevant or only of academic interest.

Myth 4: PIDs are opaque and are never used as vanity URLs. Nope. Tons of PIDs contain transient organizational names and acronyms. It is hard for most of us to accept the inevitable, that one’s own brand will become defunct, perhaps even poisonous. Embedding brands in poorly selected URL hostnames made many URLs notoriously fragile and erroneously led many people to blame URLs in general.

Myth 5: PIDs, unlike URLs, are not locations. Wrong. No URL or clickable PID is a location. It’s true that the URL hostname, where resolution starts, can be a weak point. But just because it contains a hostname, no expert can look at a URL and tell where the content will be assembled, via what redirect chain, DNS routing indirection, proxies, etc.

Myth 6: PIDs aren’t needed, so just use “Cool URLs”. Sorry. How do you know a URL is or even might be cool? By one account, the average URL breaks in 44 days, and most URLs were never meant to persist. Assigning a PID tells people there’s hope.

Myth 7: PID resolver technology is hard. Fiddlesticks. Global resolvers have been simple to build since the 1990’s: table lookup plus HTTP redirection plus $20 a year to rent a hostname. Every URL shortener, for example, is an identifier resolver.

Myth 8: PIDs require vendor lock in. Poppycock. No database system discriminates among identifier types unless the service provider directs it to discriminate.

Myth 9: PIDs must be centralized. False. Any PID with a globally unique core after the URL hostname is persistable. In fact if it cannot be served by other hosts, it cannot persist since its fate is tied to one host. No hostname or protocol lasts forever.

Myth 10: PIDs should be free. No. While you can choose a PID type that avoids locking you in and charging you for the right to create PIDs, every PID that you maintain represents a service commitment that must be paid for with at least some sweat equity.

Punchline: ARK identifiers don’t fit the PID mold — no fees, flexible metadata, and decentralized. ARKs are offered, not mandated, via the ​N2T.net​ resolver, which supports 700+ other types of identifier.

ARK Alliance | Website | + posts

John Kunze is a pioneer in the theory and practice of digital libraries. With a background in computer science and mathematics, he wrote BSD Unix software tools that come pre-installed with Mac and Linux systems. He created the ARK identifier scheme, the N2T.net scheme-agnostic resolver, and contributed heavily to the first standards for URLs (RFC1736, RFC1625, RFC2056), for library search and retrieval (Z39.50), for archival transfer (BagIt - RFC8493), for web archiving (WARC), and for metadata (RFC2413, RFC2731, ANSI/NISO Z39.85). Follow-on work in metadata includes creation of the Dublin Kernel and yamz.net.

  1. Hi John, Is there a more full description of your comment in Myth #6, “Assigning a PID tells people there’s hope.”?

    I.e., what is that “hope” which the ARK identifier, in particular and in details, provides, or more assuredly seeks to provide (per your “Punchline”, beyond* “… no fees, flexible metadata, and decentralized. ARKs are offered, not mandated ….”)? Note that you do include “… whether of type ARK …” among the PIDs about which your Myths perhaps swirl.

    * Or are those particular features all that are needed to fix the PID “mold” problems?

    Glad to see you working on such an important problem !

    1. Hi Peter,

      Good to hear from you. The “hope” referred to is that if an identifier is recognizable as a known type of PID (such as ARK, DOI, Handle, PURL, URN), then it suggests that the “intent” of whoever assigned was that it be permanent. This is a big deal in the wider internet, where the vast majority of URLs are not created with that intent. Having side that, original “intent” gets one only far enough to hope. Intent is not a contract or even a promise (from whom? to whom?), and much less a guarantee; even with the strongest original backing, PIDs cannot protect against reversal of fortune, human error, natural disaster, war, etc.

      ARKs aren’t better or worse than other PIDs in this regard. But they do break the mold in that they are cheaper, more flexible, and decentralized, which leaves organizations with more resources to get on with the hard work of preservation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.