New Development in OAI

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

By: yashad (122 month(s) ago)

Irequest you to kindly allow me to see this excellent presentation.thank you

Presentation Transcript

New Developments in OAI : 

New Developments in OAI Michael L. Nelson Old Dominion University http://www.cs.odu.edu/~mln/ mln@cs.odu.edu OA-Forum May 13-14, 2002 Pisa, Italy Many slides borrowed from Herbert Van de Sompel & Carl Lagoze

N.B. : 

N.B. OAI-PMH 2.0 is not scheduled for public beta release until May 19, 2002 some of the details of this presentation are still subject to change! final public release of 2.0 scheduled for June 1

What’s New in 2.0?! : 

What’s New in 2.0?! Good news: OAI-PMH is still Six Verbs + DC Incremental improvements single XML schema ambiguities removed more expressive options cleaner separation of roles & responsibilities Bad news: not backwards compatible with 1.1

Open Archives Initiative : 

Open Archives Initiative

The Rise and Fall of Distributed Searching : 

The Rise and Fall of Distributed Searching wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice Davis & Lagoze, JASIS 51(3), pp. 273-80 Powell & French, Proc 5th ACM DL, pp. 264-265 distributed searching of N nodes still viable, but only for small values of N NCSTRL: N > 100; bad NTRS/NIX: N<=20; ok (but could be better)

The Rise and Fall of Distributed Searching : 

The Rise and Fall of Distributed Searching Other problems of distributed searching (from STARTS) source-metadata problem how do you know which nodes to search? query-language problem syntax varies and drifts over time between the various nodes rank-merging problem how do you meaningfully merge multiple result sets? Temptations: centralize all functions “everything will be done at X” standardize on a single product “everyone will use system Y”

Metadata Harvesting : 

Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata data remains at remote repositories user . . . search for “cfd applications” local copy of metadata metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained all searching, browsing, etc. performed on the metadata here individual nodes can still support direct user interaction

Slide 8: 

Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0

Slide 9: 

Santa Fe Convention [02/2000] goal: optimize discovery of e-prints input: the UPS prototype RePEc /SODA “data provider / service provider model” Dienst protocol deliberations at Santa Fe meeting [10/99]

Slide 10: 

OAI-PMH v.1.0 [01/2001] goal: optimize discovery of document-like objects input: SFC DLF meetings on metadata harvesting deliberations at Cornell meeting [09/00] alpha test group of OAI-PMH v.1.0

Slide 11: 

low-barrier interoperability specification metadata harvesting model: data provider / service provider focus on document-like objects autonomous protocol HTTP based XML responses unqualified Dublin Core experimental: 12-18 months OAI-PMH v.1.0 [01/2001]

pre- 2.0 OAI Timeline Highlights : 

pre- 2.0 OAI Timeline Highlights October 21-22, 1999 - initial UPS meeting February 15, 2000 - Santa Fe Convention published in D-Lib Magazine precursor to the OAI metadata harvesting protocol June 3, 2000 - workshop at ACM DL 2000 (Texas) August 25, 2000 - OAI steering committee formed, DLF/CNI support September 7-8, 2000 - technical meeting at Cornell University defined the core of the current OAI metadata harvesting protocol September 21, 2000 - workshop at ECDL 2000 (Portugal) November 1, 2000 - Alpha test group announced (~15 organizations) January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S. (Washington DC) purpose: freeze protocol for 12-16 months, generate critical mass February 26, 2001 - OAI Open Day in Europe (Berlin) July 3, 2001 - OAI protocol 1.1 announced to reflect changes in the W3C’s XML latest schema recommendation September 8, 2001 - workshop at ECDL 2001 (Darmstadt)

Slide 13: 

OAI-PMH v.2.0 [06/2002] goal: recurrent exchange of metadata about resources between systems input: OAI-PMH v.1.0 feedback on OAI-implementers deliberations by OAI-tech [09/01 -] alpha test group of OAI-PMH v.2.0 [03/02 -]

Slide 14: 

low-barrier interoperability specification metadata harvesting model: data provider / service provider metadata about resources autonomous protocol HTTP based XML responses unqualified Dublin Core stable OAI-PMH v.2.0 [06/2002]

Slide 15: 

process leading to OAI-PMH v.2.0 pre-alpha phase alpha-phase creation of OAI-tech beta-phase

Slide 16: 

created for 1 year period charge: review functionality and nature of OAI-PMH v.1.0 investigate extensions release stable version of OAI-PMH by 05/02 determine need for infrastructure to support broad adoption of the protocol communication: listserv, SourceForge, conference calls creation of OAI-tech [06/01]

Slide 17: 

US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Mohammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton) OAI-tech

Slide 18: 

review process by OAI-tech: identification of issues conference call to filter/combine issues white paper per issue on-line discussion per white paper proposal for resolution of issue by OAI-exec discussion of proposal & closure of issue conference call to resolve open issues pre-alpha phase [09/01 – 02/02]

Slide 19: 

creation of revised protocol document in-person meeting Lagoze - Van de Sompel - Nelson – Warner autonomous decisions internal vetting of protocol document pre-alpha phase [02/02]

Slide 20: 

alpha-1 release to OAI-tech March 1st 2002 OAI-tech extended with alpha testers discussions/implementations by OAI-tech ongoing revision of protocol document alpha phase [02/02 – 05/02]

Slide 21: 

The British Library Cornell U. -- NSDL project & e-print arXiv Ex Libris FS Consulting Inc -- harvester for my.OAI Humboldt-Universität zu Berlin InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC OAI-PMH 2.0 alpha testers (1/2)

Slide 22: 

OAI-PMH 2.0 alpha testers (2/2) Old Dominion U. -- ARC , DP9 U. of Illinois at Urbana-Champaign U. Of Southampton -- OAIA, CiteBase, eprints.org UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection UKOLN, U. of Bath -- RDN Virginia Tech -- repository explorer

Slide 23: 

beta phase [05/02] beta release on May 1st 2002 to: registered data providers and service providers interested parties fine tuning of protocol document preparation for the release of 2.0 conformant tools by alpha testers

Slide 24: 

What’s new in OAI-PMH v.2.0? corrections new functionality general changes to improve solidity of protocol quick recap

Overview of OAI Verbs : 

Overview of OAI Verbs archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

Identify : 

Identify Arguments none Errors none Arguments none Errors badArgument 1.1 2.0

ListMetadataFormats : 

ListMetadataFormats Arguments identifier (OPTIONAL) Errors id does not exist Arguments identifier (OPTIONAL) Errors badArgument noMetadataFormats idDoesNotExist 1.1 2.0

ListSets : 

ListSets Arguments resumptionToken (EXCLUSIVE) Errors no set hierarchy Arguments resumptionToken (EXCLUSIVE) Errors badArgument badResumptionToken noSetHierarchy 1.1 2.0

ListIdentifiers : 

ListIdentifiers Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) Errors no records match Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors badArgument cannotDisseminateFormat badGranularity badResumptionToken noSetHierarchy noRecordsMatch 1.1 2.0

ListRecords : 

ListRecords Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors no records match metadata format cannot be disseminated Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors noRecordsMatch cannotDisseminateFormat badGranularity badResumptionToken noSetHierarchy badArgument 1.1 2.0

GetRecord : 

GetRecord Arguments identifier (REQUIRED) metadataPrefix (REQUIRED) Errors id does not exist metadata format cannot be disseminated Arguments identifier (REQUIRED) metadataPrefix (REQUIRED) Errors badArgument cannotDisseminateFormat idDoesNotExist 1.1 2.0

Slide 32: 

general changes clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines

Slide 33: 

general changes clear separation of OAI-PMH and HTTP OAI-PMH error handling all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb)

OAI Data Model:Resources / Items / Records : 

OAI Data Model:Resources / Items / Records item = identifier record = identifier + metadata format + datestamp

Slide 35: 

general changes better definitions of harvester, repository, item, unique identifier, record, set, selective harvesting oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression

Slide 36: 

general changes all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling

Slide 37: 

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord> </OAI-PMH> response no errors

Slide 38: 

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request> <error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error> </OAI-PMH> response with error

Slide 39: 

corrections all dates/times are UTC, encoded in ISO8601, Z-notation 1957-03-20T20:30:00.00Z

Slide 40: 

idempotency of resumptionToken: return same incomplete list when rT is reissued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new attributes for the resumptionToken: expirationDate completeListSize cursor resumptionToken

Slide 41: 

harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ granularity of from and until must be the same new functionality

Slide 42: 

Identify more expressive new functionality <Identify> <repositoryName>Library of Congress 1</repositoryName> <baseURL>http://memory.loc.gov/cgi-bin/oai</baseURL> <protocolVersion>2.0</protocolVersion> <adminEmail>dwoo@loc.gov</adminEmail> <adminEmail>caar@loc.gov</adminEmail> <deletedRecord>transient</deletedRecord> <earliestDatestamp>1990-02-01T00:00:00Z</earliestDatestamp> <granularity>YYYY-MM-DDThh:mm:ssZ</granularity> <compression>deflate</compression>

Slide 43: 

header contains set membership of item new functionality <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record>

Slide 44: 

ListIdentifiers returns headers new functionality <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“…” …>http://arXiv.org/oai2</request> <ListIdentifiers> <header> <identifier>oai:arXiv:hep-th/9801001</identifier> <datestamp>1999-02-23</datestamp> <setSpec>physic:hep</setSpec> </header> <header> <identifier>oai:arXiv:hep-th/9801002</identifier> <datestamp>1999-03-20</datestamp> <setSpec>physic:hep</setSpec> <setSpec>physic:exp</setSpec> </header> ……

Slide 45: 

ListIdentifiers mandates metadataPrefix as argument new functionality http://www.perseus.tufts.edu/cgi-bin/pdataprov? verb=ListIdentifiers &metadataPrefix=olac &from=2001-01-01 &until=2001-01-01 &set=Perseus:collection:PersInfo

Slide 46: 

character set for metadataPrefix and setSpec extended to URL-safe characters new functionality A-Z a-z 0-9 _ ! ‘ $ ( ) + - . * identifierType = anyURI repositoryName = string

Slide 47: 

introduction of provenance container to facilitate tracing of harvesting history in the periphery <about> <provenance> <originDescription> <baseURL>http://an.oa.org</baseURL> <identifier>oai:r1:plog/9801001</identifier> <datestamp>2001-08-13T13:00:02Z</datestamp> <metadataPrefix>oai_dc</metadataPrefix> <harvestDate>2001-08-15T12:01:30Z</harvestDate> </originDescription> <originDescription> … … … </originDescription> </provenance> </about>

Slide 48: 

introduction of friends container to facilitate discovery of repositories in the periphery <description> <Friends> <baseURL>http://cav2001.library.caltech.edu/perl/oai</baseURL> <baseURL>http://formations2.ulst.ac.uk/perl/oai</baseURL> <baseURL>http://cogprints.soton.ac.uk/perl/oai</baseURL> <baseURL>http://wave.ldc.upenn.edu/OLAC/dp/aps.php4</baseURL> </Friends> </description>

Slide 49: 

revision of oai-identifier guidelines for collection-level and set-level metadata in the periphery

Slide 50: 

future adoption communities OAI-PMH

Slide 51: 

release of OAI-PMH v.2.0 [06/2002] no backwards compatibility with v.1.0/1.1 stable migration process for registered repos ? formal standardization ? ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ? the OAI-PMH

Slide 52: 

proliferation of community-specific add-ons for: collection & set level metadata expressive metadata formats (e.g. qualified DC XML Schema) shared set-structures machine readable rights (about the metadata) communities

Slide 53: 

evolution from talking about OAI-PMH to talking about projects that use OAI-PMH to talking about projects and failing to mention they use OAI-PMH => OAI-PMH becomes part of the infrastructure adoption

Slide 54: 

indicators of adoption of OAI-PMH tools structural support service providers data providers

Slide 55: 

49 registered repositories [11/2001] 65 registered repositories [03/2002] 77 registered repositories [05/2002] 5+ million records many unregistered repositories data providers

Slide 56: 

Arc : cross-searching of registered repositories [Old Dominion U] [ http://arc.cs.odu.edu ] OLAC: cross-searching of Language Archive Community repositories http://www.language-archives.org/index.html service providers

Slide 57: 

Scirus scientific search engine [Elsevier] [ http://www.scirus.com ] my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] [http://www.myoai.com] growing interest from web search engines service providers

Slide 58: 

Repository Explorer: interactive exploration of repositories [Virginia Tech] [ http://www.purl.org/NET/oai_explorer ] eprints.org: generic OAI-PMH compliant repository software [U of Southampton] [ http://www.eprints.org ] ALCME repository and harvester software [OCLC] [ http://alcme.oclc.org/index.html ] OAI-PMH tools

Slide 59: 

Kepler [Old Dominion U] your personal OAI data provider: Kepler archivelet the Kepler service provider harvests from archivelets that register archivelet downloadable http://www.dlib.org/dlib/april01/maly/04maly.html exploration

Slide 60: 

DP9 [Old Dominion U] provides entry page to repositories for web-crawlers provides bookmarkable URL for OAI record provides resolution of OAI identifier into metadata software downloadable exploration

Slide 61: 

http://www.openarchives.org openarchives@openarchives.org

Emergency Backup Slides : 

Emergency Backup Slides

resumptionToken : 

resumptionToken scenario: harvesting 277 records in 3 separate 100 record “chunks”

Slide 64: 

Open Archives Initiative Open Archival Information System http://www.dlib.org/dlib/april01/04editorial.html http://www.dlib.org/dlib/may01/05letters.html http://ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.html exposure of metadata for harvesting insuring long-term preservation of archival materials OAIS OAIS w/ an OAI interface

Field of Dreams : 

Field of Dreams It should be easy to be a data provider, even if it makes more work for the service provider. if enough data providers exist, the service providers will come (DPs >> SPs) Open-source / freely available tools “drop-in” data providers: industrial strength: http://www.eprints.org/ personal size: http://kepler.cs.odu.edu/ tools to make your existing DL a data provider: http://www.openarchives.org/tools/tools.htm also: OAI-implementers mailing list / mail archive! service providers: only bits and pieces currently publicly available...

authorStream Live Help