Home | About NISO | Blog

Archive for the ‘Standards’ Category

NISO response to the National Science Board on Data Policies

Wednesday, January 18th, 2012

Earlier this month, the National Science Board (NSB) announced it was seeking comments from the public on the report from the Committee on Strategy and Budget Task Force on Data Policies, Digital Research Data Sharing and Management.  That report was distributed last December.

NISO has prepared a response on behalf of the standards development community, which was submitted today.  Here are some excerpts of that response:

The National Science Board’s Task Force on Data Policies comes at a watershed moment in the development of an infrastructure for data-intensive science based on sharing and interoperability. The NISO community applauds this effort and the focused attention on the key issues related to a robust and interoperable data environment.

….

NISO has particular interest in Key Challenge #4: The reproducibility of scientific findings requires that digital research data be searchable and accessible through documented protocols or method. Beyond its historical involvement in these issues, NISO is actively engaged in forward-looking projects related to data sharing and data citation. NISO, in partnership with the National Federation of Advanced Information Services (NFAIS), is nearing completion of a best practice for how publishers should manage supplemental materials that are associated with the journal articles they publish. With a funding award from the Alfred P. Sloan Foundation and in partnership with the Open Archives Initiative, NISO began work on ResourceSync, a web protocol to ensure large-scale data repositories can be replicated and maintained in real-time. We’ve also had conversations with the DataCite group for formal standardization of their IsCitedBy specification. [Todd Carpenter serves] as a member of the ICSTI/CODATA task force working on best practices for data citation and NISO is looking forward to promoting and formalizing any recommendations and best practices that derive from that work.

….

We strongly urge that any further development of data-related best practices and standards take place in neutral forums that engage all relevant stakeholder communities, such as the one that NISO provides for consensus development. As noted in Appendix F of the report, Summary Notes on Expert Panel Discussion on Data Policies, standards for descriptive and structural metadata and persistent identifiers for all people and entities in the data exchange process are critical components of an interoperable data environment. We cannot agree more with this statement from the report of the meeting: “Funding agencies should work with stakeholders and research communities to support the establishment of standards that enable sharing and interoperability internationally.”

There is great potential for NSF to expand its leadership role in fostering well-managed use of data. This would include not only support of the repository community, but also in the promulgation of community standards. In partnership with NISO and using the consensus development process, NSF could support the creation of new standards and best practices. More importantly, NSF could, through its funding role, provide advocacy for—even require—how researchers should use these broad community standards and best practices in the dissemination of their research. We note that there are more than a dozen references to standards in Digital Research Data Sharing and Management report, so we are sure that this point is not falling on unreceptive ears.

The engagement of all relevant stakeholders in the establishment of data sharing and management practices as described in Recommendation #1 is critical in today’s environment—at both the national and international levels. While the promotion of individual communities of practice is a laudable one, it does present problems and issues when it comes to systems interoperability. A robust system of data exchange by default must be one grounded on a core set of interoperable data. More often than not, computational systems will need to act with a minimum of human intervention to be truly successful. This approach will not require a single schema or metadata system for all data, which is of course impossible and unworkable. However, a focus on and inclusion of core data elements and common base-level data standards is critical. For example, geo-location, bibliographic information, identifiers and discoverability data are all things that could be easily standardized and concentrated on to foster interoperability. Domain-specific information can be layered over this base of common and consistent data in a way that maintains domain specificity without sacrificing interoperability.

One of the key problems that the NSB and the NSF should work to avoid is the proliferation of standards for the exchange of information. This is often the butt of standards jokes, but in reality it does create significant problems. It is commonplace for communities of interest to review the landscape of existing standards and determine that existing standards do not meet their exact needs. That community then proceeds to duplicate seventy to eighty percent of existing work to create a specification that is custom-tailored to their specific needs, but which is not necessarily compatible with existing standards. In this way, standards proliferate and complicate interoperability. The NSB is uniquely positioned to help avoid this unnecessary and complicating tendency. Through its funding role, the NSB should promote the application, use and, if necessary, extension of existing standards. It should aggressively work to avoid the creation of new standards, when relevant standards already exist.

The sharing of data on a massive scale is a relatively new activity and we should be cautious in declaring fixed standards at this state. It is conceivable that standards may not exist to address some of the issues in data sharing or that it may be too early in the lifecycle for standards to be promulgated in the community. In that case, lower-level consensus forms, such as consensus-developed best practices or white papers could advance the state of the art without inhibiting the advancement of new services, activities or trends. The NSB should promote these forms of activity as well, when standards development is not yet an appropriate path.

We hope that this response is well received by the NSB in the formulation of its data policies. There is terrific potential in creating an interoperable data environment, but that system will need to be based on standards and rely on best practices within the community to be fully functional. The scientific community, in partnership with the library, publisher and systems provider communities can all collectively help to create this important infrastructure. Its potential can only be helped by consensus agreement on base-level technologies. If development continues in a domain-centered path, the goal of interoperability and delivering on its potential will only be delayed and quite possibly harmed.

The full text PDF of the entire response is available here.  Comments from the public related to this document are welcome.

When is a new thing a new thing?

Thursday, June 10th, 2010

I recently gave a presentation at the National Central Library in Taiwan at a symposium on digital publishing and international standards that they hosted. It was a tremendous meeting and I am grateful to my hosts, Director General Karl Min Ku and his staff for a terrific visit.  One of the topics that I discussed was the issue of the identification of ebooks. This is increasingly becoming an important issue in our community and I am serving on a BISG Working Group to explore thes issues. Below are some notes from one slide that I gave during that presentation, which covers one of the core questions: At what point do changes in a digital file qualify it as a new product?  The full slide deck is here. I’ll be expanding on these ideas in other forums in the near future, but here are some initial thoughts on this question.

——-

In a print world, what made one item different from another was generally it’s physical form. Was the binding hardcover or soft-cover? Was the type regular or large-size for the visually impaired, or even was it printed using Braille instead of ink? Was the item a book or a reading of the book, i.e. an audio book, was about as far afield as the form question had gone prior to the rise of the internet in the mid 1990s. In a digital environment, what constitutes a new item is considerably more complex. This poses tremendous issues regarding the supply chain, identification, and collections management in libraries.

This is a list of some of the defining characteristics for a digital text that are distinct from those in a print environment.  Each poses a unique challenge to the management and identification of digital items.

  • Encoding structure possibilities (file formats)
  • Platform dependencies (different devices)
  • Reflowable (resize)
  • Mutable (easily changed/updated)
  • Chunked (the entire item or only elements)
  • Networkable (location isn’t applicable)
  • Actionable/interactive
  • Linkable (to other content)
  • Transformable (text to speech)
  • Multimedia capable
  • Extensible (not constrained by page)
  • Operate under license terms (not copyright)
  • Digital Rights Management (DRM)

Just some of these examples pose tremendous issues for the supply chain of ebooks when it comes to fitting our current business practices, such as ISBN into this environment.

One question is whether the form of the ebook which needs a new identifier is the file format. If the publisher is distributing a single file format, say an epub file, but then in order for that item go get displayed onto a Kindle, it needs to be transformed into a different file format, that of the Kindle, at what point does the transformation of that file become a new thing? Similarly, if you wrap that same epub file with a specific form of digital rights management, does that create a new thing? From an end-user perspective, the existence and type of DRM could render a file as useless to the users as it would be if you supplied a Braille version to someone who can’t read Braille.

To take another, even thornier question, let’s consider location. What does location mean in a network environment. While I was in Taiwan, if I wanted to buy a book using my Kindle from there, where “am I” and where is the transaction taking place? Now in the supply chain, this makes a tremendous amount of difference. A book in Taiwan likely has a different ISBN number, assigned to a different publishers, because the original publisher might not have worldwide distribution rights. The price might be different, even the content of the book might be slightly different-based on cultural or legal sensitivities. But while I may have been physically located in Taiwan, my Amazon account is based in Maryland, where I live and where my Kindle is registered. Will Amazon recognize me as the account holder in the US or the fact of my present physical location in Taiwan, despite the fact that I traveled back home a week later and live in the US? Now, this isn’t even considering where the actual transaction is taking place, which could be a server farm somewhere in California, Iceland or Tokyo.  The complexity and potential challenges for rights holders and rights management could be tremendous.

These questions about when is a new thing a new thing are critically important question in the identification of objects and the registration and systems that underlie them. How we manage this information and the decisions we take now about what is important, what we should track, and how should we distinguish between these items will have profound impacts on how we distribute information decades into the future.

ISTC and Ur-Texts

Thursday, April 1st, 2010

Tuesday, I attended a meeting on the International Standard Text Code (ISTC), organized by the Book Industry Study Group (BISG) in Manhattan.  The meeting was held in conjunction with the release of a white paper on the ISTC by Michael Holdsworth entitled ISTC: A Work in Progress. This is a terrific paper and worthy of reading for those interested in this topic and I commend it to you all, if you haven’t seen it.  The paper provides a detailed introduction to the ISTC and what role this new identifier will play in our community.

During the meeting as I was tweeting about the standard, I got into a brief twitter discussion with John Mark Ockerbloom at the University of Pennsylvania Library.  Unfortunately as wonderful as Twitter is for instantaneous conversation, it is not at all easy to communicate nuance.    For that, a longer form is necessary, hence this blog post.

As a jumping off point, let us start with the fact that the ISTC has a fairly good definition about what it is identifying: the text of a work as a distinct abstract item that may be the same or different across different products or manifestations.  Distinguishing between those changes can be critical, as is tying together the various manifestations for collection development, rights and product management reasons.

One of the key principles of the ISTC is that:

“If two entities share identical ISTC metadata, they shall be treated as the same textual work and shall have the same ISTC.”

Where to draw this distinction is quite an interesting point.  As John pointed out in his question to me, “How are works with no definitive original text handled? (e.g. Hamlet) Is there an #ISTC for some hypothetical ur-Hamlet?”  The issue here is that there are multiple “original versions” of the text of Hamlet. Quoting from Wikikpedia: “Three different early versions of [Hamlet] have survived: these are known as the First Quarto (Q1), the Second Quarto (Q2) and the First Folio (F1). Each has lines, and even scenes, that are missing from the others.”

In this case, the three different versions would each have three different ISTCs assigned to them, since the text of the versions is different.  They could be noted as related to the other ISTCs (as well as the cascade of other related editions) in the descriptive metadata fields.  Hamlet is a perfect example of where the ISTC could be of critical value, since those who have an interest in the variances between the three different versions would want to know which text is the basis of the copy of Hamlet they are purchasing, since there are significant differences between the three copies.

Perhaps most stringent solution in keeping with the letter of the standard might be that the First Quatro, have been the first known to published, since it was the first to appear in the Stationers’ Register in 1602 although it likely was not published until summer or fall 1603.  The Second Quarto and First Folio were published later—in 1604 and 1623 respectively.  Although the first Quatro is often considered “inferior” to later versions, assigning it the “Source” ISTC would be no different than if it were published today, and subsequently re-published as a revision (which would be assigned a related ISTC).  While there has been controversy about the source text of Hamlet that probably began not long after the day it was published and has certainly grown as the field of scholarship around Shakespeare has grown, for the purposes of identification and linking does the “Ur-text” matter?

Certainly, a user would want to know that this is the canonical version, be that the Second Quatro or First Folio versions.  The critical point is that we identify things differently when there are important reasons to make the distinctions.  In the case of Hamlet, there is a need to make the distinction.  Which copy is considered “original” and which is a derivative isn’t nearly as important as making the distinction.

It is valuable to note the description in the ISTC User’s Manuel in the section on Original works and derivations.  Quoting from the Manuel:

7.1    What is an “original” work?

For the purposes of registration on the ISTC database, a work may be regarded as being “original” if it cannot be adequately described using one or more of the controlled values allowed for the “Derivation Type” element (specified elsewhere in this document).

A work is considered to be “original” for registration purposes unless it replicates a significant proportion of a previously existing work or it is a direct translation of the previously existing one (where all the words may be different but the concepts and their sequence are the same). It should be noted that this is a different approach from that used by FRBR2, which regards translations as simply different “expressions” of the same work.

The “Source ISTC” metadata field is an optional one and is “Used to identify the original work(s) from which this one is derived (where appropriate). It is recommended that these are provided whenever possible.”  In the case of the three Hamlet “original versions” this field would likely be left blank, since there is no way to distinguish between the “Original” and the “Derivation”.  Each of the three versions could be considered “Original”, but this would get messy if one were not noted as original.   There is a “Derivation type” metadata field with restricted values, although “Unspecified” is one option.  Since there isn’t necessarily a value in the “original” distinction, there isn’t a point arguing about which is original.  In the real world, what will likely be the “original” will be the first version that receives the assignment.

This same problem will likely be true of a variety of other texts, especially from distant historical periods.   A focus on core principles, that we distinguish what is important, that disambiguation is important, and avoiding the philosophical arguments surrounding “original” versus “derivative”, just as the ISTC community is trying to avoid “ownership” of the record, will help to serve the entire community.

There is a lot more information about the ISTC provided by NISO. Members and subscribers can read the article that Andy Weissberg VP of Identifier Services & Corporate Marketing at Bowker wrote in Information Standards Quarterly last summer, The International Standard Text Code (ISTC): An Overview and Status Report. For non-subscribers, Andy Weissberg also presented during the 2009 NISO-BISG Changing Standards Landscape forum prior to ALA’s Annual conference in Chicago.  You can view his presentation slides or watch the video from that meeting.

The International ISTC Agency Ltd is a not-for-profit company, limited by guarantee and registered in England and Wales. Its sole purpose is to implement and promote the ISO 21047 (ISTC) standard and it is operated by representatives of its founding members, namely RR Bowker, CISAC, IFRRO, and Nielsen Book Services.

The first edition of “ISO 21047 Information and Documentation – International Standard Text Code (ISTC)” was published by ISO in March 2009. It is available for purchase in separate English and French versions either as an electronic download or printed document from ISO.

Trust but verify: Are you sure this document is real?

Tuesday, November 3rd, 2009

Continuing on the theme of a “leaked” document that was posted last week from a systems supplier in the community.  One thing that few asked initially regarding this document is: “Is it real?”  In this case, not 24 hours after the document was “released”, it was confirmed by the author that he had written the document and that it had been circulating for some time. However, it is amazing the stir that can be started by posting a PDF document anonymously on the Wikileaks website, regardless of its provenance.

Last week was the 40th anniversary of the “birth” of the internet, when two computers were first connected using a primitive router and transmitted the first message from two computers: “Lo”.  They were trying to send the command “Login”, but the systems crashed before the full message was sent. Later that evening, they were able to get the full message through and with that the internet – in a very nascent form was born.  During a radio interview that week, Dr. Leonard Kleinrock, Professor of Computer Science, UCLA, who was a one of the scientists that was working on those systems that night, spoke about the event.  During one of the questions, Dr. Klenirock was asked about the adoption of IP Version 6. His response was quite fascinating:

Dr. KLEINROCK: Yes. In fact, in those early days, the culture of the Internet was one of trust, openness, shared ideas. You know, I knew everybody on the Internet in those days and I trusted them all. And everybody behaved well, so we had a very easy, open access. We did not introduce any limitations nor did we introduce what we should have, which was the ability to do strong user authentication and strong file authentication. So I know that if you are communicating with me, it’s you, Ira Flatow, and not someone else. And if you send me a file, I receive the file you intended me to receive.

We should’ve installed that in the architecture in the early days. And the first thing we should’ve done with it is turn it off, because we needed this open, trusted, available, shared environment, which was the culture, the ethics of the early Internet. And then when we approach the late ‘80s and the early ‘90s and spam, and viruses, and pornography and eventually the identity theft and the fraud, and the botnets and the denial of service we see today, as that began to emerge, we should then slowly have turned on that authentication process, which is part of what your other caller referred to is this IPV6 is an attempt to bring on and patch on some of this authentication capability. But it’s very hard now that it’s not built deep into the architecture of the Internet.

The issue of provenance has been a critical gap in the structure of the internet from the very beginning.  At the outset, when the number of computers and people who were connected to the network was small, the issue of authentication and validation were significant barriers to a working system.  If you know and trust everyone in your neighborhood, locking your doors is an unnecessary hassle.  In a large city, where you don’t know all of your neighbors, locking your doors is a critical routine that becomes second nature.  In our digital environment, the community has gotten so large that locking doors, authenticating and passwords to ensure you are who you claim to be is essential to a functioning community.

Unfortunately, as Dr. Kleinrock notes, we are in a situation where we need to patch some of the authentication and provenance holes in our digital lives.  This brings me back to the document that was distributed last week via Wikileaks.

There is an important need, particularly in the legal and scientific communities that provenance be assured.  With digital documents, which are easily manipulated or created and distributed anonymously, confirming the author and source of a document can be.  Fortunately, in this case, the authorship can be and was confirmed easily and quickly enough.  However, in many situations this is not the case, particularly for forged or manipulated documents.  Even when denials are issued, there is no way to prove the negative to a doubtful audience.

The tools for creating extremely professional looking documents are ubiquitous.  Indeed, the same software that most publishers companies use to create formal published documents is available to almost anyone with a computer.  It would not be difficult to create one’s own “professional” documents and distribute them as real.  The internet is full of hoaxes of these sorts and they run the gamut from absurd, to humorous, to quite damaging.

There have been discussions about the need for better online provenance information for nearly two decades now. Some work on metadata provenance is gaining broader adoption including PREMIS, METS and DCMI, some significant work on standards remains regarding the authenticity of documents.  The US Government and the Government Printing Office has made progress with the GPO Seal of Authenticity and digital signature/public key technology in Acrobat v. 7.0 & 8.0.  In January, 2009, GPO digitally signed and certified PDF files of all versions of Congressional bills introduced during the 111th and 110th Congresses. Unfortunately, these types of authentication technologies have not been broadly adopted outside the government.  The importance of provenance metadata was also re-affirmed in a recent Arizona Supreme Court case.

Although it might not help in every case, knowing the source of a document is crucial in assessing its validity.  Until standards are broadly adopted and relied upon, a word of warning to the wise about content on the Internet: “Trust but verify.”

Open Source isn’t for everyone, and that’s OK. Proprietary Systems aren’t for everyone, and that’s OK too.

Monday, November 2nd, 2009

Last week, there was a small dust up in the community about a “leaked” document from one of the systems suppliers in the community about issues regarding Open Source (OS) software.  The merits of the document itself aren’t nearly as interesting as the issues surrounding it and the reactions from the community.  The paper outlined from the company’s perspective the many issues that face organizations that choose an open source solution as well as the benefits to proprietary software.  Almost immediately after the paper was released on Wikileaks, the OS community pounced on its release as “spreading FUD {i.e., Fear Uncertainty and Doubt}” about OS solutions.  This is a description OS supporters use for corporate communications that support the use and benefits of proprietary solutions.

From my mind the first interesting issue is that there is a presumption that any one solution is the “right” one, and the sales techniques from both communities understandably presume that each community’s approach is best for everyone.  This is almost never the case in a marketplace as large, broad and diverse as the US library market.  Each approach has it’s own strengths AND weaknesses and the community should work to understand what those strengths and weaknesses are, from both sides.  A clearer understanding and discussion of those qualities should do much to improve both options for the consumers.  There are potential issues with OS software, such as support, bug fixing, long-term sustainability, and staffing costs that implementers of OS options need to consider.  Similarly, proprietary options could have problems with data lock-in, interoperability challenges with other systems, and customization limitations.   However, each too has their strengths.  With OS these include and openness and an opportunity to collaboratively problem solve with other users and an infinite customizability.  Proprietary solutions provide a greater level of support and accountability, a mature support and development environment, and generally known fixed costs.

During the NISO Library Resources Management Systems educational forum in Boston last month, part of the program was devoted to a discussion of whether an organization should build or buy LRMS system.  There were certainly positives and downsides described from each approach.  The point that was driven home for me is that each organization’s situation is different and each team brings distinct skills that could push an organization in one direction or another.  Each organization needs to weigh the known and potential costs against their needs and resources.  A small public library might not have the technical skills to tweak OS systems in a way that is often needed.  A mid-sized institution might have staff that are technically expert enough to engage in an OS project.  A large library might be able to reallocate resources, but want the support commitments that come with a proprietary solution.  One positive thing about the marketplace for library systems is the variety of options and choices available to management.

Last year, during the Charleston Conference during a discussion of Open Source, I made the comment that, yes, everyone could build their own car, but why would they.  I personally don’t have the skills or time to build my own, I rely on large car manufacturers to do so for me.  When it breaks, I bring it to a specialized mechanic who knows how to fix it.  On the other hand, I have friends who do have the skills to build and repair cars. They save lots of money doing their own maintenance and have even built sports cars and made a decent amount of money doing so.  That doesn’t make one approach right or wrong, better or worse.  Unfortunately, people frequently let these value judgments color the debate about costs and benefits. As with everything where people have a vested interest in a project’s success, there are strong passions in the OS solutions debate.

What make these systems better for everyone is that there are common data structures and a common language for interacting.  Standards such as MARC, Z39.50, and OpenURL, among others make the storage, discovery and delivery of library content more functional and more interoperable.  As with all standards, they may not be perfect, but they have served the community well and provide an example of how we can as a community move forward in a collaborative way.

For all of the complaints hurled at the proprietary systems vendors (rightly or wrongly), they do a tremendous amount to support the development of voluntary consensus standards, which all systems are using.  Interoperability among library systems couldn’t take place without them.  Unfortunately, the same can’t be said for the OS community.  As Carl Grant, President of Ex Libris, made the point during the vendor roundtable in Boston, “How many of the OS support vendors and suppliers are members of and participants in NISO?”  Unfortunately, the answer to that question is “None” as yet.  Given how critical open standards are to the smooth functioning of these systems, it is surprising that they haven’t engaged in standards development.  We certainly would welcome their engagement and support.

The other issue that is raised about the release of this document is its provenance.  I’ll discuss that in my next post.

Changing the ideas of a catalog: Do we really need one?

Wednesday, November 19th, 2008

Here’s one last post on thoughts regarding the Charleston Conference.

Friday afternoon during the Charleston meeting, Karen Calhoun, Vice President, WorldCat and Metadata Services at OCLC and Janet Hawk, Director, Market Analysis and Sales Programs at OCLC gave a joint presentation entitled: Defining Quality As If End Users Matter: The End of the World As We Know It(link to presentations page – actual presentation not up yet). While this program focused on the needs, expectations and desired functionality of users of WorldCat, there was an underlying theme which came out to me and could have deep implications for the community.

“Comprehensive, complete and accurate.” I expect that every librarian, catalogers in particular, would strive to achieve these goals with regard to the information about their collection. The management of the library would likely add cost-effective and efficient to this list as well. Theses goals have driven a tremendous amount of effort at almost every institution when building its catalog. Information is duplicated, entered into systems (be they card catalogs, ILS or ERM systems) and maintained, eventually migrated to new systems. However, is this the best approach?

When you log into the Yahoo web page, for example, the Washington Post, or a service like Netvibes or Pageflakes, what you are presented with is not information culled from a single source, or even 2 or three. On my Netvibes landing page, I have information pulled from no less than 65 feeds, some mashed up, some straight RSS feeds. Possibly (probably), the information in these feeds is derived from dozens of other systems. Increasingly, what the end-user experiences might seem like an integrated and cohesive experience, however on the back-end the page is drawing from multiple sources, multiple formats, multiple streams of data. These data stream could be aggregated, merged and mashed up to provide any number of user experiences. And yet, building a catalog has been an effort to build a single all-encompassing system with data integrated and combined into a single system. It is little wonder that developing, populating and maintaining these systems requires tremendous amounts of time and effort.

During Karen’s and Janet’s presentation last week provided some interesting data about the enhancements that different types of users would like to see in WorldCat and WorldCatLocal. The key take away was that there were different users of the system, with different expectations, needs and problems. Patrons have one set of problems and desired enhancements, while librarians have another. Neither is right or wrong, but represent different sides of the same coin – what a user wants depends entirely on what the need and expect from a service. This is as true for banking and auto repair as it is for ILS systems and metasearch services.

    Putting together the pieces.

Karen’s presentation followed interestingly from another session that I attended on Friday in which Andreas Biedenbach, eProduct Manager Data Systems & Quality at Springer Science + Business Media, spoke about the challenges of supplying data from a publisher’s perspective. Andreas manages a team that distributes metadata and content to the variety of complicated users of Springer data. This includes libraries, but also a diverse range of other organizations such as aggregators, A&I services, preservation services, link resolver suppliers, and even Springer’s own marketing and web site departments. Each of these users of the data that Andreas’ team supplies has their own requirements, formats and business terms, which govern the use of the data. Some of these streams are complicated feeds of XML structures to simple comma-separated text files. Each of which is in its own format, some standardized, some not. It is little wonder there are gaps in the data, non-conformance, or format issues. Similarly, it is not a lack of appropriate or well-developed standards as much as it is conformance, use and rationalization. We as a community cannot continue to provide customer-specific requests to data requests for data that is distributed into the community.

Perhaps the two problems have a related solution. Rather than the community moving data from place to place, populating their own systems with data streams from a variety of authoritative sources could a solution exist where data streams are merged together in a seamless user interface? There was a session at ALA Annual hosted by OCLC on the topic of mashing up library services. Delving deeper, rather than entering or populating library services with gigabytes and terabytes of metadata about holdings, might it be possible to have entire catalogs that were mashed up combinations of information drawn from a range of other sources? The only critical information that a library might need to hold is an identifier (ISBN, ISSN, DOI, ISTC, etc) of the item they hold drawing additional metadata from other sources on demand. Publishers could supply a single authoritative data stream to the community, which could be combined with other data to provide a custom view of the information based on the user’s needs and engagement. Content is regularly manipulated and represented in a variety of ways by many sites, why can’t we do the same with library holdings and other data?

Of course, there are limitations to how far this could go: what about unique special collections holdings; physical location information; cost and other institution-specific data. However, if the workload of librarians could be reduced in significant measure by mashing up data and not replicating it in hundreds or thousands of libraries, perhaps it would free up time to focus on other services that add greater value to the patrons. Similarly, simplifying the information flow out of publishers would reduce errors and incorrect data, as well as reduce costs.

RDA Draft now available

Monday, November 17th, 2008

The full draft of the Resource Description and Access (RDA) document is now available on the JSC website.

 From the JSC website:

For information on submitting comments on the draft content, see Making comments on RDA drafts.

The deadline for constituency responses is 2 February, 2009, to allow time for the comments to be compiled for consideration by the JSC at their meeting in March 2009. Note that each constituency committee will be setting their own deadlines for comments in advance of 2 February.    

Microsoft, Open ID and the future of authentication

Tuesday, October 28th, 2008

Microsoft announced today that the company is throwing its weight behind the OpenID system.  Microsoft’s Live ID will become an OpenID with the launch of their OpenID Provider (OP), which will initially be launched within Microsoft’s Community Technology Preview testing service. ”The current Technology Preview release is for testing purposes only, and is not intended for widespread adoption at this stage. After a period of industry testing and feedback, we will be incorporating any necessary fixes and feature enhancements into the next revision, to be released to Production sometime in 2009.”

There is a list of non-compliant websites, which users are demanding the use of OpenID on their sites, Demand OpenID.  The site lists some of the most recognized sites on the web, such as Google, Twitter, FacebookWikipediaYoutube and del.icio.us.  It will be very interesting to see who else follows Microsoft’s lead in this area.

This action is not surprising given Microsoft’s support of Open Standards, which hit its stride with the standardization within ISO of OOXML earlier this year.  In the release, they note “We have been tracking the evolution of the OpenID specification, from its birth as just a dream and a vision through its development into a mature, de facto standard with terms that make it viable for us to implement it now.” The fact that Microsoft is awaiting the maturity of non-Microsoft standards before they throw their weight behind them indicates that there will be a competitive approach to standards development in the coming years and we will be in a period with a number of competing standards for several years. 

The New York Times covered the announcement in today’s edition.  Quoting from the article:

“This move by a traditionally proprietary organization like Microsoft could be the signal that gives the market – both large and small players combined – the confidence to invest more time and energy into the widespread adoption of OpenID. That is good news for OpenID proponents. And it’s equally good news for all of us who are interested in simplifying the management of our identity across the multitude of sites we use on a day-to-day basis.” 

Microsoft has a Windows Live ID blog where more information will be posted as the testing moves forward. The library and publishing communities have been dealing with the thorny issue of authentication for a number of years.  The application of OpenID solves part of the problem, but does not address the other key aspect of authentication: certification. There will still need to be some considerable work toward rationalizing authentication and identity management, making the process simpler for end-users through a single sign-on is a big step in the right direction.  

The el-dente style of standards

Monday, October 27th, 2008

Although outside of NISO’s normal remit, I thought I’d provide some interesting fun for a rainy fall Monday.  ISO has just announced the publication of a standard on “state of the art” cooking of pasta.  From the release:

ISO 7304-2:2008 – Alimentary pasta produced from durum wheat semolina – Estimation of cooking quality by sensory analysis – Part 2: Routine methoddescribes a test method for laboratories to determine a minimum of cooking time for pasta.” “This International Standard specifies a method for assessing, by sensory analysis, the quality of cooked alimentary pasta in the form of long, solid strands (e.g. spaghetti) or short, hollow strands (e.g. macaroni) produced from durum wheat semolina, expressed in terms of the starch release, liveliness and firmness characteristics (i.e. texture) of the pasta.”

I know at least one of my college roommates who could have benefited from the advice this standard could provide.  I am sure that the development of this standards was … tasty.

Advice from Peter Drucker – an idea from the Resource Sharing meeting – Part 2

Saturday, October 11th, 2008

During the Collaborative Resource Sharing meeting earlier this week, Adam Wathem, Interim Head of Collections Services Department, Kansas State University Libraries wrapped up the meeting by discussing barriers to efficiencies within libraries.  It was  a great presentation that brought together the threads of the conversations and presentations throughout the meeting.  At one point in the presentation (available here), Adam quoted Peter Drucker, which summarizes one of the problems that libraries face:

“There is nothing so useless as doing efficiently that which should not be done at all.”  

How much of the workflow processes in institutions is bound by a focus on improving how things that neither meet user needs or expectation of today?