Home | About NISO | Blog

Archive for the ‘infrastructure’ Category

NISO response to the National Science Board on Data Policies

Wednesday, January 18th, 2012

Earlier this month, the National Science Board (NSB) announced it was seeking comments from the public on the report from the Committee on Strategy and Budget Task Force on Data Policies, Digital Research Data Sharing and Management.  That report was distributed last December.

NISO has prepared a response on behalf of the standards development community, which was submitted today.  Here are some excerpts of that response:

The National Science Board’s Task Force on Data Policies comes at a watershed moment in the development of an infrastructure for data-intensive science based on sharing and interoperability. The NISO community applauds this effort and the focused attention on the key issues related to a robust and interoperable data environment.

….

NISO has particular interest in Key Challenge #4: The reproducibility of scientific findings requires that digital research data be searchable and accessible through documented protocols or method. Beyond its historical involvement in these issues, NISO is actively engaged in forward-looking projects related to data sharing and data citation. NISO, in partnership with the National Federation of Advanced Information Services (NFAIS), is nearing completion of a best practice for how publishers should manage supplemental materials that are associated with the journal articles they publish. With a funding award from the Alfred P. Sloan Foundation and in partnership with the Open Archives Initiative, NISO began work on ResourceSync, a web protocol to ensure large-scale data repositories can be replicated and maintained in real-time. We’ve also had conversations with the DataCite group for formal standardization of their IsCitedBy specification. [Todd Carpenter serves] as a member of the ICSTI/CODATA task force working on best practices for data citation and NISO is looking forward to promoting and formalizing any recommendations and best practices that derive from that work.

….

We strongly urge that any further development of data-related best practices and standards take place in neutral forums that engage all relevant stakeholder communities, such as the one that NISO provides for consensus development. As noted in Appendix F of the report, Summary Notes on Expert Panel Discussion on Data Policies, standards for descriptive and structural metadata and persistent identifiers for all people and entities in the data exchange process are critical components of an interoperable data environment. We cannot agree more with this statement from the report of the meeting: “Funding agencies should work with stakeholders and research communities to support the establishment of standards that enable sharing and interoperability internationally.”

There is great potential for NSF to expand its leadership role in fostering well-managed use of data. This would include not only support of the repository community, but also in the promulgation of community standards. In partnership with NISO and using the consensus development process, NSF could support the creation of new standards and best practices. More importantly, NSF could, through its funding role, provide advocacy for—even require—how researchers should use these broad community standards and best practices in the dissemination of their research. We note that there are more than a dozen references to standards in Digital Research Data Sharing and Management report, so we are sure that this point is not falling on unreceptive ears.

The engagement of all relevant stakeholders in the establishment of data sharing and management practices as described in Recommendation #1 is critical in today’s environment—at both the national and international levels. While the promotion of individual communities of practice is a laudable one, it does present problems and issues when it comes to systems interoperability. A robust system of data exchange by default must be one grounded on a core set of interoperable data. More often than not, computational systems will need to act with a minimum of human intervention to be truly successful. This approach will not require a single schema or metadata system for all data, which is of course impossible and unworkable. However, a focus on and inclusion of core data elements and common base-level data standards is critical. For example, geo-location, bibliographic information, identifiers and discoverability data are all things that could be easily standardized and concentrated on to foster interoperability. Domain-specific information can be layered over this base of common and consistent data in a way that maintains domain specificity without sacrificing interoperability.

One of the key problems that the NSB and the NSF should work to avoid is the proliferation of standards for the exchange of information. This is often the butt of standards jokes, but in reality it does create significant problems. It is commonplace for communities of interest to review the landscape of existing standards and determine that existing standards do not meet their exact needs. That community then proceeds to duplicate seventy to eighty percent of existing work to create a specification that is custom-tailored to their specific needs, but which is not necessarily compatible with existing standards. In this way, standards proliferate and complicate interoperability. The NSB is uniquely positioned to help avoid this unnecessary and complicating tendency. Through its funding role, the NSB should promote the application, use and, if necessary, extension of existing standards. It should aggressively work to avoid the creation of new standards, when relevant standards already exist.

The sharing of data on a massive scale is a relatively new activity and we should be cautious in declaring fixed standards at this state. It is conceivable that standards may not exist to address some of the issues in data sharing or that it may be too early in the lifecycle for standards to be promulgated in the community. In that case, lower-level consensus forms, such as consensus-developed best practices or white papers could advance the state of the art without inhibiting the advancement of new services, activities or trends. The NSB should promote these forms of activity as well, when standards development is not yet an appropriate path.

We hope that this response is well received by the NSB in the formulation of its data policies. There is terrific potential in creating an interoperable data environment, but that system will need to be based on standards and rely on best practices within the community to be fully functional. The scientific community, in partnership with the library, publisher and systems provider communities can all collectively help to create this important infrastructure. Its potential can only be helped by consensus agreement on base-level technologies. If development continues in a domain-centered path, the goal of interoperability and delivering on its potential will only be delayed and quite possibly harmed.

The full text PDF of the entire response is available here.  Comments from the public related to this document are welcome.

Mandatory Copyright Deposit for Electronic-only Materials

Thursday, April 1st, 2010

In late February, the Copyright Office at the Library of Congress published a new rule that expands the requirement for the mandatory deposit to include items published in only in digital format.   The interim regulation, Mandatory Deposit of Published Electronic Works Available Only Online (37 CFR Part 202 [Docket No. RM 2009–3]) was released in the Federal Register.  The Library of Congress will focus its first attention on e-only deposit of journals, since this is the area where electronic-only publishing is most advanced.  Very likely, this will move into the space of digital books as well, but it will likely take sometime to coalesce.

I wrote a column about this in Against the Grain last September outlining some of these issues that this change will require.  A free copy of that article is available here.  The Library of Congress is aware, and will become painfully more so when this stream of online content begins to flow their way.  To support an understanding about these new regulations, LC hosting a forum in Washington in May to discuss publisher’s technology for providing these data on a regular basis.  Below is the description about the meeting that LC provided.

Electronic Deposit Publishers Forum
May 10-11, 2010
Library of Congress — Washington, DC

The Mandatory deposit provision of the US Copyright Law requires that published works be deposited with the US Copyright Office for use by the Library of Congress in its collection.  Previously, copyright deposits were required only for works published in a physical form, but recently revised regulations now include the deposit of electronic works published only online.  The purpose of this workshop is to establish a submission process for these works and to explore technical and procedural options that will work for the publishing community and the Library of Congress.

Discussion topics will include:

  • Revised mandatory deposit regulations
  • Metadata elements and file formats to be submitted

Space for this meeting is very limited, but if you’re interested in participating in the meeting, you should contact the Copyright Office.

  • Proposed transfer mechanisms
  • Upcoming Forum on Library Resource Management Systems

    Thursday, August 27th, 2009

    In Boston on October 8-9, NISO will host a 2-day educational forum, Library Resource Management Systems: New Challenges, New Opportunities. We are pleased to bring together a terrific program of expert speakers to discuss some of the key issues and emerging trends in library resource management systems as well as to take a look at the standards used and needed in these systems.

     

    The back end systems upon which libraries rely have become the center of a great deal of study, reconsideration and development activity over the past few years.  The integration of search functionality, social discovery tools, access control and even delivery mechanisms to traditional cataloging systems are necessitating a conversation about how these component parts will work together in a seamless fashion.  There are a variety of approaches, from a fully-integrated system to a best-of-breed patchwork of systems, from locally managed to software as a service approaches.  No single approach is right for all institutions and there is no panacea for all the challenges institutions face providing services to their constituents.  However, there are many options an organization could choose from.  Careful planning can help to find the right one and can save the institution tremendous amounts of time and effort.  This program will provide some of the background on the key issues that management will need to assess to make the right decision.

     

    Registration is now open and we hope that you can join us. 

    Changing the ideas of a catalog: Do we really need one?

    Wednesday, November 19th, 2008

    Here’s one last post on thoughts regarding the Charleston Conference.

    Friday afternoon during the Charleston meeting, Karen Calhoun, Vice President, WorldCat and Metadata Services at OCLC and Janet Hawk, Director, Market Analysis and Sales Programs at OCLC gave a joint presentation entitled: Defining Quality As If End Users Matter: The End of the World As We Know It(link to presentations page – actual presentation not up yet). While this program focused on the needs, expectations and desired functionality of users of WorldCat, there was an underlying theme which came out to me and could have deep implications for the community.

    “Comprehensive, complete and accurate.” I expect that every librarian, catalogers in particular, would strive to achieve these goals with regard to the information about their collection. The management of the library would likely add cost-effective and efficient to this list as well. Theses goals have driven a tremendous amount of effort at almost every institution when building its catalog. Information is duplicated, entered into systems (be they card catalogs, ILS or ERM systems) and maintained, eventually migrated to new systems. However, is this the best approach?

    When you log into the Yahoo web page, for example, the Washington Post, or a service like Netvibes or Pageflakes, what you are presented with is not information culled from a single source, or even 2 or three. On my Netvibes landing page, I have information pulled from no less than 65 feeds, some mashed up, some straight RSS feeds. Possibly (probably), the information in these feeds is derived from dozens of other systems. Increasingly, what the end-user experiences might seem like an integrated and cohesive experience, however on the back-end the page is drawing from multiple sources, multiple formats, multiple streams of data. These data stream could be aggregated, merged and mashed up to provide any number of user experiences. And yet, building a catalog has been an effort to build a single all-encompassing system with data integrated and combined into a single system. It is little wonder that developing, populating and maintaining these systems requires tremendous amounts of time and effort.

    During Karen’s and Janet’s presentation last week provided some interesting data about the enhancements that different types of users would like to see in WorldCat and WorldCatLocal. The key take away was that there were different users of the system, with different expectations, needs and problems. Patrons have one set of problems and desired enhancements, while librarians have another. Neither is right or wrong, but represent different sides of the same coin – what a user wants depends entirely on what the need and expect from a service. This is as true for banking and auto repair as it is for ILS systems and metasearch services.

      Putting together the pieces.

    Karen’s presentation followed interestingly from another session that I attended on Friday in which Andreas Biedenbach, eProduct Manager Data Systems & Quality at Springer Science + Business Media, spoke about the challenges of supplying data from a publisher’s perspective. Andreas manages a team that distributes metadata and content to the variety of complicated users of Springer data. This includes libraries, but also a diverse range of other organizations such as aggregators, A&I services, preservation services, link resolver suppliers, and even Springer’s own marketing and web site departments. Each of these users of the data that Andreas’ team supplies has their own requirements, formats and business terms, which govern the use of the data. Some of these streams are complicated feeds of XML structures to simple comma-separated text files. Each of which is in its own format, some standardized, some not. It is little wonder there are gaps in the data, non-conformance, or format issues. Similarly, it is not a lack of appropriate or well-developed standards as much as it is conformance, use and rationalization. We as a community cannot continue to provide customer-specific requests to data requests for data that is distributed into the community.

    Perhaps the two problems have a related solution. Rather than the community moving data from place to place, populating their own systems with data streams from a variety of authoritative sources could a solution exist where data streams are merged together in a seamless user interface? There was a session at ALA Annual hosted by OCLC on the topic of mashing up library services. Delving deeper, rather than entering or populating library services with gigabytes and terabytes of metadata about holdings, might it be possible to have entire catalogs that were mashed up combinations of information drawn from a range of other sources? The only critical information that a library might need to hold is an identifier (ISBN, ISSN, DOI, ISTC, etc) of the item they hold drawing additional metadata from other sources on demand. Publishers could supply a single authoritative data stream to the community, which could be combined with other data to provide a custom view of the information based on the user’s needs and engagement. Content is regularly manipulated and represented in a variety of ways by many sites, why can’t we do the same with library holdings and other data?

    Of course, there are limitations to how far this could go: what about unique special collections holdings; physical location information; cost and other institution-specific data. However, if the workload of librarians could be reduced in significant measure by mashing up data and not replicating it in hundreds or thousands of libraries, perhaps it would free up time to focus on other services that add greater value to the patrons. Similarly, simplifying the information flow out of publishers would reduce errors and incorrect data, as well as reduce costs.

    EU Research Data Preservation Project Seeks Survey Input from Publishers

    Tuesday, November 11th, 2008

    PARSE.Insight, a European Union project initiated in March 2008 “to highlight the longevity and vulnerability of digital research data,” is conducting an online survey about access and storage of research data.

    PARSE.Insight is “concerned with the preservation of digital information in science, from primary data through analysis to the final publications resulting from the research. The problem is how to safeguard this valuable digital material over time, to ensure that it is accessible, usable and understandable in future.”

    They are interested in getting publishers’ views included in their survey, in addition to researchers, since publishers play a critical role in the digital preservation of publications and related research data.

    The survey is available here:
    https://www.surveymonkey.com/s.aspx?sm=VfIpOoxogOv73uWOyaOhoQ_3d_3d

    Reponses are aggregated for analysis and made anonymous. If you wish to be informed about the results of the survey you can enter your e-mail address at the end of the survey.

    Ultimately, PARSE.insight plans to “to develop a roadmap and recommendations for developing the e-infrastructure in order to maintain the long-term accessibility and usability of scientific digital information in Europe.”

    Posted by Cynthia Hodgson

    NSF issues report on Cyberlearning

    Wednesday, August 13th, 2008

    The National Science Foundation released a report on Monday entitled Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge. The report was issued by the NSF Task Force on Cyberlearning, chaired by Christine Borgman at UCLA. Cyberlearning is “the use of networked computing and communication technologies to support learning.” This is an incredibly broad term and encompasses nearly everything that the scholarly community engages in: research, publishing, pedagogy, assessment, records management, discovery and access. The report outlines a a range of technical, social and pedagogical recommendations to NSF and the community focused on more broadly applying and benefiting from existing cyberinfrastrucutre for learning, as well as building out capacity for the future.

    The five top-level recommendations of the group were:

    • Encourage the development of cross-disciplinary “cyberlearning field”
    • Instill a “platform perspective” into cyberlearning, including interoperable design of hardware, software and services
    • Focus on the power of technology to
    • Adopt policies and programs that promote open resources
    • Focus on sustainability of post-grant funded initiatives

    It is very clear that we are only beginning to grapple with the questions regarding the incorporation of technology in learning. While there is much to chew on in this report, the two of these recommendations that caught my eye during a quick skim of the report, were the focus on openness and interoperability. Interoperability is a key problem that most systems face. The group’s recommendation that NSF fund the creation of an Open cyberlearning platform into which new hardware, modules, and the feed the system with further interoperable components funded by NSF is an intriguing model. Although we are at an early stage in the evolution of networked tools for learning, I expect that this approach would be too top-down and unwieldy to be widely successful. A more realistic approach would be to ensure that the existing and growing networks and tools be interoparable, becuase we can certainly expect that there will be approaches growing up outside of the funding network of NSF and the numerous other funding bodies (even presuming that they do work together as proposed in the report).

    The second focus, however cold pay true dividends. We as a community have limited ability to envision how scholars and students will engage with content in the future. The ability to reuse, remix and apply discoveries in new ways will be the critical area for success of information technology in the future. The report highlights the application of research data in this regard, but it is equally true of software tools, methodologies, and educational materials. Enforcing the application of openness principles on funded research would expand the availability of these resources, but also likely speed the creative application of those resources in new and innovative learning methodologies.

    If you are engaged at all in the educational environment, the report is worthy of a close read.

    Open Library Environment (OLE) Project – Planning open ILS systems

    Tuesday, August 5th, 2008

    The Open Library Environment (OLE) Project, a new initiative funded by the Mellon Foundation, launched its website this week.  The group aims to develop plans for the next generation of library automation systems build upon a modular SOA approach. Quoting from their Project Overview: The group “will convene the academic library community in planning an open library management system built on Service Oriented Architecture (SOA). Our goal is to think beyond the current model of an Integrated Library System and to design a new system that is flexible, customizable and able to meet the changing and complex needs of modern, dynamic academic libraries.”  The group will first research library processes and model practices and the systems necessary. Through the process, they hope to build a community that will This project has ties to the DLF project on ILS Discovery Interfaces and a number of other open source development initiatives in the community looking to address this issue.  It is also interesting to note that at least one ILS system vendor, Ex Libris, recently announced its new Open-Platform Strategy.There will certainly be interesting developments from the OLE Project and how their recommendations tie in with other ongoing work.  Of course, system interoperability relies heavily on standard data structures and interfaces.  If the end results aren’t easily plug and play, only the largest and most technically savvy organizations will be able to take advantage of the advances.