Atom/RSS feed

ARK Alliance

Home of the Archival Resource Key (ARK)

Notes from ARK Summit Experts Meetings

ARK Experts Day @ National Library of France (BnF),March 22nd 2018

[ Discussions are noted in italics to distinguish them from the original agenda. Notes are mostly unedited. ]

1. ARK specification change proposals

(present IETF draft at https://tools.ietf.org/html/draft-kunze-ark-18)

  1. Goals of changes to the spec
    1. fix what’s broken, if anything
      1. remove barriers to acceptance
      2. move from Draft to RFC

5+1 proposed changes:

  1. Literal character repertoire changes: allow ‘~’, but disallow ‘#’ (which is is reserved in URIs fragments and LOD).
    • ’~’ is filesystem friendly and will not cause problems in current situation
    • The number of characters in the repertoire will not change; just these 2 characters will change
    • BnF: use the W3C recommendations for coolURIs: https://www.w3.org/TR/cooluris/ especially for hash URIs. Goal: distinguish the URI of a real-world object’s description from the URI for the RWO itself.
    • Logilab is also in favour of disallowing the hash
    • No opinions against => the group approves the change
  2. Make the first ‘/’ optional, so that ark:/12345/678 is equivalent to ark:12345/678. This would match a near universal practice in other id schemes, and is a commonplace and understandable mistake that currently penalizes ARK users and potential adopters.
    • Allows to reduce the length of the URL and users like the compactness of URLs
    • It’s optional in the sense that the parsers should still accept it and not break any old ARKs
    • Going forward, the canonical form would be the shorter form, but the documentation should always mention that the / is still accepted
    • Would it make sense to accept ark:12148:cb12345689 → no
    • The colons should be encoded according to the URI spec, but people are using it anyway -> colons are even a bigger problem
    • This is passed, with a note about the hierarchical computation
  3. Parsers (resolvers) should check for inflections (final punctuation character combinations) before normalization of final structural characters (‘/’ and ‘.’), for example, given “ark:/12345/678./”, parsers should check if “./” is an inflection and only normalize to “ark:/12345/678” if no inflection is matched
    • When a resolver is checking an ARK in order to decide what to do, resolvers are sensitive to inflections and they normalize ARKs first and usually final . or / are left outside
    • The idea is to keep the possibility to use those / and . for reserved used. E.g. direct consumption vs. landing page: would be great to leave the possibility open to use the final / for a landing page for the object
    • When registering an ARK in a database, leave the inflections out and store the normalized form, but for resolving ARKs we should use these final characters and check if they are in the list of inflections
    • For those final characters, instead of recommending to toss them out, recommend to “set them aside” because they might have a special meaning;
    • A ? at the end of the id is not considered part of the identifier. Proposal to tell in the spec that reserved characters (with a list of inflections) at the end are not part of the identifier. This would be a set of rules for resolvers
    • Some concerns about overloading the ARK specification with REST API type requests
    • This change would not affect current ARKs. Inflections should only be used when you are using a resolver, checking what was at the end and whether it has an associated feature or not in the context
    • Inflections are a situational use of the ARK
    • A / at the end of an ARK is a violation of the current specification
    • Software doesn’t want a landing page, but users do want to make a choice
    • People want access to previous versions or the history of the object; this kind of use would be possible with this change in the specification
    • This is about implementing a resolver, and not so much about the ARK proper
    • Proposal: leave new inflections out from the ARK spec, and elaborate the use cases in a separate specification that could also be used for other types of identifiers. Qualifiers would remain built in the ARK spec
    • Question: should we leave all inflections out of the ARK spec and define them for all kinds of Identifiers? -> but very disruptive to the current spec.
    • How do inflections and suffix pass-through work? Can we put an inflection at the end of a suffix? -> It should be subject to the same rules we were talking about, but not add this to the ARK specs
    • OK from the group, but will still look at the final wording
  4. Make the NAAN more flexible – instead of just 5 digits or 9 digits, allow any “beta-numeric” string (defined to be the same as noid repertoire: bcdfghjkmnpqrstvwxz0-9) with no runs of adjacent letters longer than two, eg, ark:/bc8/… but not ark:/bcd8/….
    • Leave lots of room for lots of name assigning authorities
    • Original idea: 5 digits can not be mistaken for a date, same for 9 digit.
    • People now want shorter addresses, so using a 9 digit NAAN would not be that desirable. So we might want to densify the NAAN namespace (allow letters in that mixture but make everything possible to disallow a brand)
    • Use the same opaque naming and character set as NOID, same rules (not more than 2 letters in a row).
    • Proposal: do not block this in the spec, but suggest very strongly absolutely no changes to the way we currently assign NAANs
    • Have a separate policy that defines who gets which type of NAANs (NAANs other than the 5 digits the CDL currently assigns)
    • Minimum length: the NAAN could be a single character. With this change we could also have NAANs for namespaces marked by special starting character (for example “x” for UUIDs or “p” for physical objects), not just organisations.
    • With this change is it still a “Number”? We could keep the “NAAN”, but call it a “Name”.
    • In terms of the registry indexing might be an issue if switching from integer to text
    • The policy could be decided within the project “ARKs in the Open”
    • Everyone agrees with this change
  5. Update our understanding of what it means for metadata returned by inflections (‘?’ and ‘??’) in 2018 to be both human- and machine-readable. In 2003, a simple email-header format (eg, ANVL) served both purposes, but now it is common to see a human-readable HTML landing page with machine-readable metadata embedded in it (where it doesn’t interfere with the user experience).
    • Now we have new norms for this: enter this in a browser and get an HTML page that would be human readable, underneath: Javascript, JSON, Metadata tags, number of ways metadata can be embedded in HTML
    • The spec should recommend providing human- and machine-readable metadata, but not specifying its form (in particular, keep silent about content negotiation), just providing examples.
    • Resolvers would have a choice in what format they would return upon the inflections
    • Keep the resolving (including content negotiation and inflections) for a separate specification
    • All of these changes are subject to reviewing the final wording of the draft
    • Simple rewording: provide an example (“e.g., HTML page with embedded JSON-LD”…)
    • OK with the group
  6. Max link length for the ARKs : now 128 digit limit
    • Should we raise it? Leave it alone
    • Whenever you are running a database, you have to set a limit for your column. However this is implementation-specific.
    • The idea is to remove obstacles, but the current trend is to have
    • Qualifiers are included in this limit (the BnF has already longer links when considering the qualifiers)
    • Current limits for databases shouldn’t be the reason for limiting something in the specification
    • 2 sentences would be struck from the spec and not mention the limit
    • Would dropping the limit have any impact on the suffix pass-through? -> No
    • The spec could just make a recommendation, but specific implementations could have a higher limit
    • We will change the spec to eliminate the limit, but we will make a recommendation for a minimum of 255 characters supported in implementations

2. Counting ARKs project

It is a feature of ARKs that there’s no centralized maintenance authority, but that makes it difficult to count how many ARKs there are in the world. We propose an easy way for registered ARK implementers – those who are willing – to post a small JSON or YAML file (eg, at a well-known URL path) containing a date and an estimated number of ARKs published. Such files would be harvested to obtain a base total.

3. Persistence statements

It has long been said that ARKs should provide a commitment or policy statement on demand from the current archival institution (object provider, name mapping authority). The day when this becomes true is closer with publication of “Persistence Statements: Describing Digital Stickiness” https://datascience.codata.org/articles/10.5334/dsj-2017-039/). The paper proposes certain controlled vocabulary terms as building blocks for exactly this purpose. All that is lacking is to select, review, and revise the terms (which we can evolve ourselves in a crowdsourced metadata dictionary), and finally test and propose as a community consensus.

4. Towards ARK sustainability

Any persistent identifier system maintained solely by one organization is vulnerable, and ARK is no exception. CDL is seeking guidance on sustainability of the ARK “infrastructure” and on building a coalition of organizations with shared responsibility and governance. The ARK infrastructure includes the specification, the NAAN registry, the arks-forum googlegroup, and the N2T.net resolver (code, admin scripts, and primary and secondary servers).

5. ARK survey: joint BnF-CDL proposal

BnF and CDL would like to feedback on a proposed online survey to get a better understanding of the different ARK implementations.

6. Wrap-up

ACTIONS By May 22, 2018

Follow up telecon 2018 June 16

— 2018.06.11 Notes from ARK experts group meeting —

// Attending

Sébastien Peyrard
Bertrand Caron
Jean-Philippe Tramoni
Adrien Di Mascio
Sheila Morrissey
Amy Kirchhoff
Pascale Montmartin
John Deck
Mark Phillips

// Agenda + notes

  1. ARKs-in-the-Open (AitO) project update
    Advisory Group to meet in next 4-7 weeks
    Working groups to be launched: technical, outreach, financial/sustainability

  2. Keeping momentum for the ARK Experts Day ad hoc group

    • collaboration pros and cons (eg, is there something urgent we need
      done soon?)
    • should we offer outcomes to seed AitO working groups?
    • should we offer ourselves to seed AitO working groups?

ARKsInTheOpen.org has all the info
Great deal of overlap between this group and the expected working groups from AitO.
The Experts Group is fine using our outcomes to feed into the working groups, and volunteer to carry our work forward in the context of the AitO project.

  1. ARK usage survey review – next steps

SP: Thanks to all for the draft survey comments. BnF is ok letting this survey be carried forward in a future outreach working group. Survey specialists have had a chance to review it as well. There will also be a French version of the survey.
JK: If the working groups don’t form in a way that’s to our liking we can always resume our Experts Group meetings. I don’t expect that outcome, but there are elements that we cannot control.
SP: We will have to consider where the survey will be hosted.
BC: Meanwhile, BnF will move forward in setting up the French-speaking forum.

  1. Other items

SM: It seems like a good idea to do a brief talk at iPres about some of our activities. I could to draft something in response to a recent call and send it to this group for review.
JK: +1!
JD: It would be great to have a single place to go for information about ARKs.
JK: I hope the outreach group can direct the building out of arks.org (currently pointing to n2t.net).

  1. Around the room:
    Confirmed that everyone is ok to wait for the AitO working groups and merge our efforts with theirs.