Overview of Archival Resource Key (ARK) Tools : Overview of Archival Resource Key (ARK) Tools 1 July 2005
John Kunze, California Digital Library
ARK Summary : ARK Summary Instead of one Name Authority: Assigning Authority + Mapping Authorities
http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff
\___________________/ \__/ \___/ \______/ \____________/
(replaceable) | | | 4 Qualifier
| ARK Label | | (NMA-supported)
| | |
1 Name Mapping Authority | 3 Name (NAA-assigned)
Hostport (NMAH) |
2 Name Assigning Authority Number (NAAN)
1 = current service provider; identity inert; replaceable
2 = organization that originally assigned the id
3 = name originally assigned to the abstract object, often opaque
4 = extension disclosing object hierarchy & variants, often non-opaque
ARK usage : ARK usage Two ARKs accessing the same thing
http://loc.gov/ark:/12025/654xz321
http://rutgers.edu/ark:/12025/654xz321
Access to metadata -- add a ‘?’
http://loc.gov/ark:/12025/654xz321?
Access to support statement -- add ‘??’
http://loc.gov/ark:/12025/654xz321??
3 minimal requirements to be an ARK
An archive that can’t do all 3 -- trustworthy?
Is an ARK persistent? Maybe. Have to ask.
Persistence and opaqueness : Persistence and opaqueness Do ARKs have to be this ugly (opaque)?
http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff
\___________________/ \__/ \___/ \______/ \____________/
NMAH Label NAAN Name Qualifier
No, but they encourage it. Persistence is all about managing associations between strings and things
And the landscape is littered with links that were required to die for political, legal, or social reasons
the appearance, deliberate or even accidental, of once-true assertions that are now misleading, infringing, offensive makes it hard for our descendants to continue managing
Pain of managing opaque ids is mitigated by the certainty of having strongly bound metadata
A hostname may also break : A hostname may also break Did it break because it appears to assert a branding that is no longer relevant? Have to pay attention to this.
Semantic rot is inevitable in all ids
The more opaque, the more protected
Non-opaque ids are very useful ad hoc metadata containers; in the tradeoff, consider the more regular and complete metadata promised by ARKs
Non-opaque service label extensions to opaque base ARKs are suitable; eg, “thumb”, “hi-res”
When the hostname breaks : When the hostname breaks Use low-tech, file lookup (like old /etc/hosts)
Or use MAPTR algorithm in client or plug-in
Resolver discovery using vanilla DNS and script:
use Net::DNS; # include simple DNS package
my $qtype = "NAPTR"; # initialize query type
my $naa = shift; # get NAAN script argument
my $mad = new Net::DNS::Resolver; # mapping authority discovery
&maptr("$naa.ark.arpa"); # call maptr - that's it
sub maptr { # recursive maptr algorithm
my $dname = shift; # domain name as argument
my ($rr, $order, $pref, $flags, $service, $regexp, $replacement);
my $query = $mad->query($dname, $qtype);
return if (! $query || ! $query->answer);
foreach $rr ($query->answer) {
next if ($rr->type ne $qtype);
($order, $pref, $flags, $service, $regexp, $replacement) = split(/\s/, $rr->rdatastr);
if ($flags eq "") { &maptr($replacement); # recurse
} elsif ($flags eq "h") { print "$replacement\n"; # candidate NMAH }}}
ARK lexical goodies : ARK lexical goodies Hyphens ignored
Neutralizes harm done by typesetters
Too many search results? Providers may disclose (or not)…
Sub-object hierarchy using reserved ‘/’
Variant objects using reserved ‘.’
Usual %hh (hex encoding) as an escape
ARK namespaces reserved : ARK namespaces reserved 12025 National Library of Medicine
12026 Library of Congress
12027 National Agriculture Library
13030 California Digital Library
13038 World Intellectual Property Organization
20775 University of California San Diego
29114 University of California San Francisco
28722 University of California Berkeley
15230 Rutgers University Libraries
13960 Internet Archive
64269 Digital Curation Centre
62624 New York University Libraries
67531 University of North Texas Libraries
27927 Ithaka Electronic-Archiving Initiative
12148 National Library of France
Reserve a namespace by email to ark@cdlib.org
The Their Stuff problem is easier : The Their Stuff problem is easier We can’t do much about Their Stuff except defensively test and fix Our links to it
Not worth Our ARKs -- we can’t vouch for the objects
Indirect naming may help (eg, PURL, SFX, etc)
So get a link validator, staff to replace dead URLs, and figure out how much effort you’ll expend
Email Them (external providers), if appropriate, but if They don’t maintain their ids, no scheme will help
Our Stuff Solutions for persistent identifier problems : Our Stuff Solutions for persistent identifier problems Identifier maintenance is different from but deeply implicated in collection mgmt
Recall: an identifier is [a string and] an association between a string and a thing
If you maintain object metadata, you already maintain ids (assuming your object has an id)
Natural to maintain redirection info as one more column of metadata, and ask your DB admin to nightly recreate web server redirect config files
Opaque identifier tools : Opaque identifier tools Non-opaque identifier strings are chosen deliberately to assert some things that are true at the time of assignment
Opaque identifier strings are best chosen by automated means, such as
NOID (nice opaque identifier)
Or UUID/GUID (universally unique identifier)
Sequence of hex encodings of your computer’s MAC address, current time, and sometimes a random number
No need to ask permission or register yourself
Looks like a something found in nature, but actually it’s based on IEEE and hardware vendor registries
Nice opaque identifiers (NOID) : Nice opaque identifiers (NOID) A noid minter is a lightweight database for generating, tracking, and binding unique ids
The noid tool creates minters and accepts commands that operate them
Open source, available at www.cpan.org
Can mint in random or sequential order, with or without a check character guaranteeing against the most common transcription errors
Anyone can run a noid minter, maintain associations via bindings to arbitrary elements (assertions), and set up a resolver (including rule-based)
Using NOID : Using NOID Identifiers minted according to a template:
noid dbcreate f5.reedeedk long 13030
which produces as first minted id
13030/f54x54g11
Noid is scheme-independent
Can be used to mint DOIs, URNs, URLs, lotto numbers, etc.
We (at CDL) use it to mint random ARKs with check chars
ARK Documentation : ARK Documentation ARK specification
http://www.ietf.org/internet-drafts/draft-kunze-ark-09.txt
ARK information sites
http://www.cdlib.org/inside/diglib/ark/
http://ark.nlm.nih.gov/
Overview article
http://www.infotoday.com/cilmag/feb04/primers.shtml
Background paper
http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze