The Repository Roadmap: 

The Repository Roadmap Are we heading in the right direction?


Background Digital Repositories Roadmap: looking forward by Rachel Heery, UKOLN, University of Bath and Andy Powell, Eduserv Foundation July 2006 looking forward to 2010 therefore, we should be about 25% there! are we? are the remaining milestones correct?

Aims (of this talk) provide an overview of the roadmap as we wrote it policy cultural technical legal critique the roadmap from a current perspective look again at the technical environment in which we are operating

Why look specifically at technology? the report says while the current technical infrastructure in the UK is in need of some development, it is primarily in the areas of policy (both national and institutional), culture and working practices that changes need to be made however, we know that getting the technology right can have a huge impact on policy, culture and working practices just look at Flickr, Google, Wikipedia, YouTube, blogs, Technorati, Twitter, Slideshare, …


Vision our vision… a vision for 2010 in which a high percentage of newly published UK scholarly output is made available on an open access basis and in which there is a growing recognition of the benefits of making research data, learning resources and other academic content freely available for sharing and re-use. Furthermore, geospatial information will be better integrated with other data through improved licensing agreements. Achieving this vision over a four-year period will not be easy, but it is intentionally set as a challenging aim in order to help focus discussion on what needs to happen to make it a reality not if but when

What is a repository? a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. … An institutional repository is not simply a fixed set of software and hardware (Cliff Lynch, 2003)

Where are we going? policy/political view organisational view cultural view material type view academic papers geospatial data learning materials research data technical infrastructure

Policy/political perspective ensure that there is a clear open access mandate from the Research Councils and other funding bodies realise greater national collaboration towards a common agenda: DTI, HEFCE, JISC, research councils, charities, etc. encourage institutions to define strategy for access and preservation of research outcomes explore national and institutional preservation responsibilities; provide institutions with preservation audit toolkit

Organisational perspective carry out analysis of existing business processes, workflows and dataflows; identify opportunities for innovative inter-working between repositories and between repositories and other applications clarify the ‘business process’ and ‘business function’ aspects of a repository ‘reference model’ support institutions in embedding open access and/or repositories into their information strategies

Cultural perspective: 

Cultural perspective explore user requirements in greater detail progress IPR and copyright issues to develop a licensing framework within which the intellectual property of institutions, individuals and third-parties is protected while at the same time encouraging the adoption of open access reach consensus on linking (manually citing) data and academic papers seek to encourage the involvement of a greater range of academics in the debate about open access

Academic papers the vision… for 2010 is that there will be open access to a significant proportion of newly published publicly-funded UK academic papers, primarily through an interoperable network of institutional repositories, subject based repositories, and national services academic authors will self archive because of funding body and/or institutional mandate recognition that reward relies on ‘impact’ recognition that impact is significantly enhanced by ensuring open access to scholarly papers


Milestones ensure that open access self-archiving is mandated with all major funding grants ensure that open access is embedded in the outlook of the Commission and European funding rounds encourage institutional open access policy commitment in line with above clearly demonstrate benefits of making papers available on an open access basis in terms of higher profile and more ‘hits’ for author/institutions

Milestones (cont.) citation counts and usage stats need to be unpicked and made to work in an open access environment provide clear guidelines aimed at various stakeholders ensure that JISC endorses creative commons (and similar) licensing approaches instigate major national advocacy campaigns instigate an equivalent of the Dutch Cream of Science project to raise awareness about open access

Geospatial data the vision… is for an information environment in which geospatial data will be much better integrated with research publications and data, learning resources and other content through an enhanced technical infrastructure and improved licensing agreements. Licensing of geospatial data collected with public funding will be reformed to make possible re-use by third parties


Milestones ensure that digital geospatial data rights (licensing simplification) and related security issues are addressed produce clear guidance frameworks in order that the community more readily appreciates and understands copyright restrictions generate improved license agreements between commercial data suppliers and academia ensure that appropriate repository-related training is available to the community mandate standards and enforce them

Learning materials the vision… is for a growing culture of sharing and re-using learning objects, facilitated by a network of repositories at institutional and national levels (e.g. JORUM) and an enhanced technical infrastructure. Furthermore, the licensing of learning materials will be protective of the rights of authors, institutions and third-parties but supportive of an open access approach


Milestones provide staff with seamless access to the different kinds of online resources available overcome divisions between library and learning and teaching support services overcome the unwillingness on the part of many teachers, especially at HE level, to use materials that they themselves have not developed minimise the time and effort spent discovering suitable resources clarify IPR policies for learning materials developed by academics

Research data the vision… is for an information environment in which there is a growing culture of making raw research data available on an open access basis. In many cases this will be done through departmental or institutional repositories, often with direct links to laboratory equipment. The metadata required to access, understand, and manipulate scientific datasets will continue to be largely the preserve of domain-experts. The community’s adoption of a common technical infrastructure for repositories will ensure interoperability between all types of repositories, particularly between those holding scholarly publications and those holding research data


Milestones institutions need to invest in research data repositories develop robust acquisition policies and discovery mechanisms reach agreements about unique identifiers and citations for all datasets provide services that support curation, migration and preservation develop approaches that allow proper management of IPR in research data

Technical infrastructure the vision… is for a technical infrastructure that supports the deposit, discovery, access and use of objects in repositories by software applications … across both open access and closed repositories and be based on a more thorough modeling of the objects being made available, the way such objects are described and identified and the mechanisms for automatically interlinking and manually citing scholarly output, research data and learning objects … machine to machine interfaces (the services) that open access repositories should support in order to ingest and make available content and metadata

Key milestones develop a ‘complex object’ model (i.e. an agreed way to model arbitrary bundles of objects) and XML syntax agree mechanisms for identifying ‘complex objects’ and citing and their component parts agree APIs (the machine to machine interfaces) that open access repositories should support in - at least ‘putting’, ‘getting’ and ‘deleting’ content and metadata in repositories ensure that repository content is well integrated with the large-scale Web search engines (e.g. Google)

Key milestones (cont.) ensure that repositories are well integrated into institutional and national access management approaches (such as Shibboleth) ensure that content licences are adopted as consistently as possible and work towards DRM solutions that allow software to take decisions based on machine-readable licences develop aggregator services that use the features of the technical infrastructure to hide the complexities of the repository landscape and offer a single, seamless view of UK repository content


Summary Policy – Research councils and other funding bodies need to mandate open access basis the RAE needs to move significantly towards using open access copies of scholarly publications institutions should build curation of scholarly publications, research data and learning objects into their information strategies Cultural – The ‘reward structures’ and ‘professional development’ infrastructure within the academic community need to recognise validity and impact of open access

Summary (cont.) Technical – more thorough modelling of the materials being made available and the way such materials are described and identified agreement about the machine to machine interfaces (the services) that open access repositories should support Legal – licensing of community-developed content needs to be supportive of the open access approach need to avoid a situation where concerns about IPR are allowed to stifle the creative sharing and re-use of academic content


Critique is the vision right? yes, absolutely are the milestones right? yes, probably but… strangely(!)… what is missing from the roadmap is the Web

Only 2 significant ‘Web’ quotes the issues associated with sharing knowledge about the modelling constructs being used within complex objects are non-trivial … this is a ‘semantic Web’ issue that requires significant research work the conceptual thinking that underpins the technical infrastructure sounds complex, it needs to be instantiated in a relatively simple and intuitive form … encourage adoption of the framework by a wide range of developers and service providers, including those creating services outside the academic domain … content exposed through the infrastructure must be made available in a form that is suitable for use by the 2010 equivalents of Google and Yahoo and in a way that is compatible with the ranking mechanisms that they adopt for ‘ordinary’ Web sites

What is a Repository? - revisited from the perspective of the content consumer a repository is just a Web site think existing Web presences… think BBC… think museum… think Flickr… think content management systems are these Web sites or repositories? who cares? but conceptualising the repository as a Web site changes priorities Web architecture, Google, usability, accessibility, …

BBC 15 principles

A brief aside… Let’s consider two Web 2.0 repository-like services Slideshare Scribd





Web 2.0 and all that Flickr, Slideshare, YouTube, Scribd, … repositories by another name? we need to ask… what makes these things successful? offering useful functionality – browser-based viewer, in-built format conversion (Scribd) social tools, tagging, commentary, more-like-this, favorites, … persistent URIs to content ability to embed documents in other Web sites visibility in Google

Yes but… they don’t do preservation who cares!? if I want content in Flickr to be preserved I don’t necessarily expect Flickr to do it preservation is important… but it doesn’t have to be solved in the repository they don’t do OAI yes, scholarly communication has some particular functional requirements which are not met by Google… author searching, citation counting, object complexity not handled well by the current Web how are these requirements best met? thru richer metadata?

ArXiv – the first Web 2.0 application?: 

ArXiv – the first Web 2.0 application?

Notice anything else? successful Web 2.0 services tend to be global in nature


Conclusion roadmap vision still compelling non-technical milestones remain valid and we seem to be broadly on target our technical approaches need to be firmly in line with the architecture of the Web scholarly communication is a social activity we need to understand what makes social Web sites work we need to build services that individuals choose to use because it is obvious and intuitive to do so (not because they are told to)

Closing thought… the Web 2.0 repository is out there… …but I’m not sure that we have found it yet!

Thank you

