Open Access Infrastructure: Where We Are and Where We Need to Go

Although the open access movement can be traced back to the late 1980s and early 1990s, many consider the Budapest Open Access Initiative in February 2002, the Bethesda Statement on Open Access Publishing in June 2003, and the Berlin Declaration on Open Access in October 2003 as the tipping points for the movement.

The number of institutions and funders issuing policies regarding the availability of their research in some form of open access (OA)grewfromonein2003 to over 350 by the end of 2013, according to ROARMAP (Registry of Open Access Repositories Mandatory Archiving Policies).

There’s no doubt that open access is here to stay, but the underlying infrastructure needed to support and sustain OA publishing is still very much in its development stages. Systems and services are in early stages of adoption with little interoperability between them. Some needed standards like ISSN and DOI are widely, though not universally, used, while others such as ISNI and ORCID are just beginning to be adopted. Additional needed standards in the areas of metadata, APIs, and protocols are either in discussion stages or not yet even envisioned.

This article, through a series of interviews with experts in the OA arena, highlights some of the major areas of infrastructure that are needed including institutional policies, compliance tracking and reporting, publishing tools, new economic models and licensing, and sustainability.

Implementation of the green OA repository by the Office for Scholarly Communication (OSC) is dependent on a number of standards. The Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH) ensures the content is discoverable and searchable. SWORD (Simple Web-service Offering Repository Deposit) provides interoperability between repositories. OSC references the publisher's DOI for the published article, if one exists, and also uses the DOI to look up and ingest relevant metadata about the article. Researchers are encouraged to obtain an ORCID identifier and associate it with all their publications. The primary article format is PDF, which is what most publishers still use, although OSC would prefer well marked-up XML or even HTML. Tools to convert from PDF into XML are not yet reliable enough and require substantial manual intervention. However, when the tools are better, the Harvard repository will add buttons to let users convert deposited PDFs to other formats on the fly. The recently issued PIRUS Code of Practice for recording and reporting usage at the individual article level will be adopted soon. Currently some repositories are reluctant to share deposits with other repositories because it takes away from their usage data. With PIRUS, they will be able to collect usage from wherever the article is accessed, which should help to encourage sharing.

There is much that publishers could do to aid institutions in managing their repositories. Adoption of community- or discipline-specific metadata vocabularies that are more robust than Dublin Core would eliminate or reduce the manual classification of article deposits. Using and sharing standardized article metadata through accessible APIs would serve numerous purposes and be useful beyond just repositories. Publishers could require or incentivize researchers to get an ORCID and provide it with all submitted manuscripts. They could also do direct deposits themselves of the final accepted manuscript into institutional repositories, like many publishers currently do with PubMed Central. Using formats other than PDF or providing multiple formats, e.g. PDF + HTML or + XML would aid in machine-readability and reusability of content.

The Harvard Open Access Project (HOAP) is distinct from the Office for Scholarly Communication (OSC) and looks beyond OA at Harvard to OA everywhere. It provides a current awareness service called the Open Access Tracking Project, creates an ontology for classifying OA developments, and catalogs OA journals published by scholarly societies. It particularly tries to spread awareness of Good Practices for University Open-Access Policies, consulting pro bono with other universities to assist them in developing their own OA policies.

Institutional Policies for Open Access

Peter Suber | Director of Office for Scholarly Communication (Harvard Library) and Director, Harvard Open Access Project (Berkman Center), Harvard University

The ten-year anniversary statement of the Budapest Open Access Initiative reaffirmed the two strategies of OA through repositories (also called “green OA”) and OA through journals (also called “gold OA”). Additionally, Recommendation 4.2 stated, “We should develop guidelines to universities and funding agencies considering OA policies, including recommended policy terms, best practices, and answers to frequently asked questions.” A month later, the first public edition of Good Practices for University Open-Access Policies, which had already been in development for several years, was released by the Harvard Open Access Project.

The Good Practices guide was based on the type of policy first adopted at Harvard, which asked faculty to deposit scholarly articles in the university’s institutional repository DASH (Digital Access to Scholarship at Harvard). Additionally, researchers grant the university a nonexclusive, irrevocable right to distribute their scholarly articles for any non-commercial purpose. This ensures that the repository can distribute the articles and does not have to track down rights or have different rights for different articles—a common problem with many institutional repositories. While there is a provision for obtaining a waiver regarding these rights, fewer than five publishers systematically require such a waiver as a prerequisite to publication.

Harvard researchers are free to publish articles in any journal of their choice. The policy is strictly about green OA; researchers are not required to choose gold OA journals for their publication. Commercial, subscription- based publications are equally acceptable. However, the university does want to encourage OA publishing and hosts a fund to pay the APCs for publication in fee-based OA journals—as long as they aren’t hybrid. Hybrid journals rarely reduce their subscription fees even when receiving APC fees for selected OA articles, which would mean the university is paying twice for the same content.


CC BY License
A Creative Commons Attribution license that allows the content to be shared and adapted for any purpose, including commercial, providing appropriate credit to the creator(s) is given.

A requirement by the publisher of record wherein a green repository deposit must be delayed for some period following the official publication.

Gold OA
The publication of a scholarly article in open access in a journal, usually peer-reviewed, and financed through article publication charges.

Green OA
The archiving of a scholarly publication for public access in a repository other than that of the publisher, e.g., an institutional repository (IR) or discipline-related repository service. The deposited version is usually the final manuscript accepted for publication, but may not be the version that includes the publisher’s final design and format. Also referred to as open access archiving.

Hybrid Journal
A journal where some articles are available in open access while others are available only by payment (individually or by subscription).

Mandate (Open Access) 
A requirement by an institution, funding agency, or government body that published research outcomes be available in some type of open access (green or gold). Mandates may dictate additional requirements regarding acceptable reuse licensing.

Open Access
Unrestricted, online access to a scholarly publication that is free to read(gratis), and may have additional free reuse rights (libre).

Institutional Repository (IR)
A database of content that contains, among other things, copies of the research output of authors. Repositories can be institution-based (representing the broad output of an institution), subject-based (representing the output of specific or related subjects), funder-based (representing the output of a funding agency, such as the NIH) or national (representing the output of a country or geographical region). Repositories can hold published or unpublished articles, presentations, datasets, and/or metadata about them.

Article Publication Charge (APC) / A fee paid to the publisher—usually by the author, author’s institution, or funding agency—to make an article available in open access. Essentially shifts the cost of production from the subscriber to the author. Also referred to as article processing charge.

Tracking and Reporting Compliance with OA Policies

Robert Kiley | Head of Digital Services, Wellcome Trust Library

The Wellcome Trust (WT) has been a vanguard of the open access (OA) movement over the last 10 years and expects recipients of its funding to provide free, online access to their published research results. Electronic copies of any research papers that have been accepted for publication in a journal have to be made available through PubMed Central (PMC) and Europe PubMed Central (Europe PMC) as soon as possible, but not later than six months of any publication. In April 2013, an additional requirement was introduced that if WT pays an article publication charge (APC), the article must be licensed using CC BY (Creative Commons Attribution).

Funding organizations, both governmental and private, as well as researchers’ institutions need to be able to track and report compliance with OA policies, which can be difficult, time consuming to compile, and not 100% accurate. There is currently no standardized metadata that can be used consistently with search and discovery services for identifying that an article is published in some type of open access. The information about the funding agency and the grant number is often included in an acknowledgements section of the text and either not repeated in the metadata or not used with standard formats or syntax. WT encourages researchers to make full use of available identifiers and metrics, in particular ORCID (Open Researcher and Contributor ID) and persistent digital identifiers, such as DOI, for both articles and datasets. Standard identifiers for funding sources would also be helpful.

Since WT requires article deposits in PubMed Central, they can run an automated search every month to find the number of articles attributed to the Trust. These searches are showing about a 70% compliance level with WT’s OA policies. The searches do occasionally pick up some false hits where Wellcome Trust is mentioned but is not a funder, and also miss some papers where WT funding is not properly attributed. They have been working with PubMed Central to more consistently index WT-funded research and put that information in the grant funding attribution field. WT manages the Europe PMC repository, on behalf of 26 other publishers, so they are able to ensure the infrastructure is in place there to search and report on the content. They are also an early adopter of the CrossRef service FundRef and would like to see more publishers use this system to report funding sources for published scholarly research.

Far more difficult to track is the compliance with the CC BY licensing requirement. License metadata isn’t always included at the article level or done in a consistent way. One publisher, for example, included the license type as a footnote to the article. Other publishers are only identifying licenses at a journal level or the license information is only available within the publisher's internal system. A standardized method and taxonomy is needed to express licensing at the article level in a machine-readable way.

It’s often not clear what the publisher’s policy for open access is, even at the journal level. Is it full gold OA, or hybrid, with or without support for green archiving? Deciphering this can be very difficult for researchers, especially where publishing is being done by one organization on behalf of another, such as a professional society. Thus researchers are uncertain if they will be complying with WT policy if they choose a particular journal. WT has been providing some funding support for SHERPA-FACT to help get this information better collected and searchable in the SHERPA system. Much of this information still has to manually interpreted by SHERPA.

Machine-readable licensing terms and/or an API to this information in the publishers’ systems could go a long way in enabling the collection and maintenance of policy and licensing information.

Not captured at all yet, outside of the publishers’ systems, are the fees to publish an article. Wellcome Trust currently has to go back to each institution to see what was paid per article, per publication, and per publisher. WT will give institutions a block of money to use for APCs and the institutions have to send a yearly spreadsheet showing how they spent the monies. The data that is returned can be variable in content and format. Last year, WT put this data online (see: spreadsheets/d/1RXMhqzOZDqygWzyE4HXi9DnJnxj dp0NOhlHcB5SrSZo/edit#gid=0) and used community crowdsourcing to enhance it with DOIs, OA status, and licensing. While a fairly successful effort, widely implemented standards for reporting could eliminate the need for such enhancement work.

Progress is being made, but much more attention is needed to get the needed infrastructure in place for compliance tracking and reporting. There is still inconsistent use of metadata and too much manual communications with spreadsheets being done. A great deal of the data needed is held by individual publishers and better tools and mechanisms are required to enable publishers to share the data they hold with funders and researcher institutions.

Integrating New Economic Models for OA Publishing

Roy Kaufman | Managing Director for New Ventures and Executive-level lead on Open Access Jennifer Goodrich | Director of Product Management Both with Copyright Clearance Center

In the subscription model of STM journal publishing, the number of relationships between the publisher and the paying customers is fairly concentrated. Libraries are the majority of the subscribers and most libraries work through subscription agents like Swets or EBSCO. Individuals may also subscribe as members of a learned society, with the payments aggregated by the societies. So publishers have a small number of payers to deal with, many payments are made once a year, and the payments, which are often aggregated, are on a larger scale.

Gold open access (OA) publishing, where the economic model switches to the author (or author’s institution) paying article publication charges (APCs) changes things considerably. The number of payments and paying individuals or organizations has increased exponentially, the payments are made throughout the year, and many of the individual payments are small in comparison to subscriptions. Additionally, the APC fees can vary depending not just on such technical issues such as page count and number of color illustrations, but also based on the location/currency of the author and by whether different types of discounts might apply, such as society memberships, institutional volume discounts, pre-paid deposit account discounts, or whether the institution subscribes to the journal.

Publishers’ systems are often not set up to handle the volume and variations of these new OA payments and the workflows are not always established to tie payments to specific articles and track that payments have been received prior to publication.

Previously, many institutions had few or no systems and processes in place to track their researchers’ publication activities. When any tracking was done, it was usually post-publication and often at the departmental level. Now institutions have to develop new policies regarding APC payments, as well as set up systems and processes to budget, fund, and manage such payments. The processes have to be initiated very early in the publication cycle, often prior to article acceptance, rather than post-publication. Researchers, who were often used to dealing directly and alone with publishers about their articles, have many more institutional hoops to jump through before they can get published. And institutions have to set up reporting mechanisms to funding agencies to prove compliance with OA policies. If the funder’s monies are used for APCs, these also have to be tracked and reported at the grant level or even by the specific article. For those institutions that are also doing Green OA repository publishing, even more processes and systems have to be established.

Article publication workflows are further complicated by, and increasingly tied to, licensing issues. Licenses used to be imposed by the publisher with little or no negotiation room. Often the licenses were standardized across a publisher’s entire portfolio; if more granular, they may have been at a discipline level or at most a journal title level. With OA publishing, some funding agencies and author institutions are dictating the type of license that is required, often CC BY (Creative Commons Attribution), but other license variations may apply. (On the other hand, a March 2014 survey conducted for Taylor and Francis showed that the majority of authors preferred more restrictions on the reuse of their published research.) Thus licenses can vary at the article level, especially in hybrid publications. An article could also have more than one grant and funder associated with it, each with different or possibly conflicting publishing and licensing requirements. These license nuances have to be identified from the time of article acceptance through to publication and distribution to the end users. Licenses can also affect the APC rates, since publishers may lose their rights to sell reprints with certain licenses—a major revenue stream for some—and may increase the APCs in those cases to make up the difference.

To date, many publishers and institutions are still struggling to set up working systems and processes to support OA workflow for APCs and licensing. New software and services are being introduced, both commercial and open source, but are not yet widely used or well integrated. Standards will be critical to making these new services integrate with each other and with existing systems, both within and between organizations. Metadata attached to the individual article that travels with it throughout the workflow is especially important. Among the standards that need to be utilized in this metadata are researcher identifiers such as ORCID, author and institution identifiers such as ISNI, and article identifiers such as DOI. The use of the DOI is furthest along, but even after close to 15 years of standardization, it is still not universally used by all publishers. And DOIs are usually not assigned until the time of publication (or even afterwards). ORCID and ISNI are more recent standards and are in the early adoption stages. Missing standards are those addressing funding information, such as funder and grant identifiers, licensing terms that are machine-readable, identification of the type of open access article and ties to embargo periods that may apply, article versioning (especially where green and gold versions both exist), and APIs or protocols for moving commonly used data between disparate systems.

Two services that are gaining some traction in aiding publishers and institutions in implementing the new economic models are FundRef from CrossRef and RightsLink® for Open Access from the Copyright Clearance Center. With FundRef, publishers deposit funding information from articles using a standard taxonomy of funder names. This funding data is then made publicly available through CrossRef's search interfaces and APIs for funders and other interested parties to use and analyze.

RightsLink provides a service that integrates directly with a publisher’s workflow to allow authors and institutions to pay, track, and manage APCs. Users can view estimated mandatory and optional charges before acceptance, as well as the final charges at time of acceptance. Payments can be made by credit card directly through the system, by crediting to a deposit account, or an invoice can be requested for one of seven different currencies. Monies are collected for and remitted to the publishers, eliminating their burden of handling these numerous payments. Various publisher reports are available at any time, including order history, manuscript status, and payment status. Forthcoming reports will show aggregated information by publication, institution, or funder. The service makes heavy use of metadata supplied by the publishers, utilizing APIs with their systems that allow the metadata in RightsLink to get updated as a manuscript moves through the publisher’s workflow from submission through publication. Thus it is a perfect example of how widespread use of standardized metadata by the publishers can improve the information and services available to everyone who uses this service.

Open Access Publishing Tools

Martin Eve | Lecturer in Literature at the University of Lincoln, UK, Academic Project Director of the Open Library of Humanities, and founding member of the Open Access Toolset Alliance

Tools for open access publishing of scholarly journals run the gamut from proprietary systems and large software packages that cover the whole workflow to niche open source tools for a single function. Interoperability between different systems is nearly nonexistent as are standard APIs and protocols to move data between them. Systems available outside of the commercial arena are still developing and the learning curve for using them can be quite steep.

Open Journal Systems from the Public Knowledge project is one of the more widely deployed open source journal management and publishing systems, but is still missing some needed functionality on the production end, such as content editing and XML generation. PLOS uses the Ambra platform, but it has not been adopted by many others, even though it is open source, possibly due to lack of modularity in its design. Wordpress plug-in solutions, such as Annotum can take a blog and turn it into an OA journal, but do not address other needed parts of the workflow. Still missing is a single, modular system that would allow a journal to be designed with drag and drop functionality, have plug- ins for all the different modules of the workflow, and support standards for creation, discovery, and preservation. Even more problematic is the inability to migrate content from one platform solution to another, as export formats and protocols do not currently exist.

A key standard for making scholarly information more re- usable and accessible is JATS (Journal Article Tag Suite, ANSI/ NISO Z39.96) for XML markup. Most researchers, however, are still writing and submitting their manuscripts in word processing software and there are few tools to easily convert such text into the JATS XML language. Those that exist are proprietary, rather than open source, and can be expensive. Even where JATS is used, different viewers can produce different results for the end user. The JATS for Reuse (JATS4R) project is working to define best practice tagging guidelines, along with tools that can help publishers identify whether their content is compliant with those best practices.

Also needed are standardized preservation solutions. Some libraries and repositories are participating in semi-private networks like LOCKSS or CLOCKSS. Some commercial publishers are using services such as Portico. But many journal publishers, both open access and commercial, are not using such preservation solutions for their e-journals. With libraries no longer owning their e-journals, this dependency on the content creator for long- term preservation is a serious concern.

Currently, a high degree of expertise is needed to use the existing tools for open access publishing. New tools, both commercial and open source, are in development, but a substantial lowering of the barrier to entry for using these toolsets is needed. More awareness and education about all the elements that must work together and where standards like JATS fit into the workflow are also critical to expanding open access publishing.

The Open Access Toolset Alliance was formed in August 2013 to create open source tools for open access scholarly publishing, facilitate discussion and collaboration, and showcase relevant projects. Individuals or institutions who are engaged in open source initiatives related to open access publishing are welcome to join.

Sustainability of an OA Infrastructure

Dr. Alma Swan | Director of European Advocacy Programmes for SPARC Europe, and Director, Key Perspectives Ltd.
Dr. Caroline Sutton | Publisher and Co-Founder, Co-Action Publishing

Almost all of the infrastructure services for open access were created on project money and many significant services still depend on such "soft" funding sources. This is a major concern for the future sustainability of these systems and services. In an effort to secure their long term future, some of these services have developed business models that involve individually approaching libraries and institutional repositories every year to obtain ongoing funds. While this may suit these individual services and the libraries involved at the moment, it is clearly not a workable solution for the long term if every service adopts this model. So far, there have been few efforts made to group the services together or to approach library consortia or associations for a more sustainable funding method. Services are often ephemeral “proofs of concept” with no plan or intent for ongoing management.

The Knowledge Exchange—a joint project of CSC-IT centre for Science in Finland, Denmark’s Electronic Research Library (DEFF), the German Research Foundation (DFG), Jisc in the United Kingdom, and SURF in the Netherlands—has undertaken work to look at the sustainability of the OA infrastructure. Their Sustainability of Open Access Services Phase 1 and 2 report identifies three strategic areas that are needed: “embedding business development expertise into service development; consideration of how to move money around the system to enable Open Access to be achieved optimally; and governance and coordination of the infrastructural foundation of Open Access.” The Phase 3 report discusses “two critical elements to designing an effective sustainability model for a free-to-the user infrastructure service: 1) inducing potential participants to reveal their demand for the service, and 2) getting organizations to contribute voluntarily to its provision.” It also states that “in some cases, a sustainable fee-based model—that enables an initiative to deliver key infrastructure services to those organizations in the value chain that most require them—may be preferable to the free dissemination of a less-robust service to a broader audience.”

Infrastructure Services for Open Access (IS4OA) was formed as an umbrella entity that aims to shelter a set of complementary OA services and to obtain ongoing funding for them from the research community using a few-to-few approach, rather than the many-to-many methods currently done for each individual project. In support of their mission to facilitate easy access to Open Access resources, IS4OA assumed responsibility in December 2012 for the ongoing support and maintenance of the Directory of Open Access Journals (DOAJ). Since then, they have implemented new governance and workflow; created an Advisory Board consisting of publishers, institutions, and libraries; introduced a more extensive application form to describe each journal; and are piloting the use of associate editor positions to review the applications and validating the information before it is added to DOAJ. In May 2014, IS4OA added the Open Citations Corpus, an open access repository of scholarly citation data, as a supported service. As more services are sheltered under its umbrella, IS4OA anticipates being able to further reduce administrative overhead for duplicated activities. It also foresees being able to implement data feeds between the services, thus improving individual services and exploiting potential mutual benefits to the full.

Creating and managing a sustainable OA infrastructure is a challenging task and much more joint, collaborative effort is needed to move successful projects and experiments into the mainstream. Publishers, in particular, are needed to join and support such efforts and bring their comprehensive knowledge and expertise to the table. One opportunity for such collaboration is the Jisc Open Access Good Practice project, which is planning a series of workshops in 2014-2015 to explore various open access issues and solutions.

Cynthia Hodgson ( is the Managing Editor of NISO’s Information Standards Quarterly magazine.


The author wishes to express sincere thanks to Kristen Ratan, Product Director, Public Library of Science (PLOS) for her assistance in scoping this article, identifying issues, and providing contacts for interviews.



