Home | About NISO | Blog

Archive for the ‘cloud computing’ Category

Changing the ideas of a catalog: Do we really need one?

Wednesday, November 19th, 2008

Here’s one last post on thoughts regarding the Charleston Conference.

Friday afternoon during the Charleston meeting, Karen Calhoun, Vice President, WorldCat and Metadata Services at OCLC and Janet Hawk, Director, Market Analysis and Sales Programs at OCLC gave a joint presentation entitled: Defining Quality As If End Users Matter: The End of the World As We Know It(link to presentations page – actual presentation not up yet). While this program focused on the needs, expectations and desired functionality of users of WorldCat, there was an underlying theme which came out to me and could have deep implications for the community.

“Comprehensive, complete and accurate.” I expect that every librarian, catalogers in particular, would strive to achieve these goals with regard to the information about their collection. The management of the library would likely add cost-effective and efficient to this list as well. Theses goals have driven a tremendous amount of effort at almost every institution when building its catalog. Information is duplicated, entered into systems (be they card catalogs, ILS or ERM systems) and maintained, eventually migrated to new systems. However, is this the best approach?

When you log into the Yahoo web page, for example, the Washington Post, or a service like Netvibes or Pageflakes, what you are presented with is not information culled from a single source, or even 2 or three. On my Netvibes landing page, I have information pulled from no less than 65 feeds, some mashed up, some straight RSS feeds. Possibly (probably), the information in these feeds is derived from dozens of other systems. Increasingly, what the end-user experiences might seem like an integrated and cohesive experience, however on the back-end the page is drawing from multiple sources, multiple formats, multiple streams of data. These data stream could be aggregated, merged and mashed up to provide any number of user experiences. And yet, building a catalog has been an effort to build a single all-encompassing system with data integrated and combined into a single system. It is little wonder that developing, populating and maintaining these systems requires tremendous amounts of time and effort.

During Karen’s and Janet’s presentation last week provided some interesting data about the enhancements that different types of users would like to see in WorldCat and WorldCatLocal. The key take away was that there were different users of the system, with different expectations, needs and problems. Patrons have one set of problems and desired enhancements, while librarians have another. Neither is right or wrong, but represent different sides of the same coin – what a user wants depends entirely on what the need and expect from a service. This is as true for banking and auto repair as it is for ILS systems and metasearch services.

    Putting together the pieces.

Karen’s presentation followed interestingly from another session that I attended on Friday in which Andreas Biedenbach, eProduct Manager Data Systems & Quality at Springer Science + Business Media, spoke about the challenges of supplying data from a publisher’s perspective. Andreas manages a team that distributes metadata and content to the variety of complicated users of Springer data. This includes libraries, but also a diverse range of other organizations such as aggregators, A&I services, preservation services, link resolver suppliers, and even Springer’s own marketing and web site departments. Each of these users of the data that Andreas’ team supplies has their own requirements, formats and business terms, which govern the use of the data. Some of these streams are complicated feeds of XML structures to simple comma-separated text files. Each of which is in its own format, some standardized, some not. It is little wonder there are gaps in the data, non-conformance, or format issues. Similarly, it is not a lack of appropriate or well-developed standards as much as it is conformance, use and rationalization. We as a community cannot continue to provide customer-specific requests to data requests for data that is distributed into the community.

Perhaps the two problems have a related solution. Rather than the community moving data from place to place, populating their own systems with data streams from a variety of authoritative sources could a solution exist where data streams are merged together in a seamless user interface? There was a session at ALA Annual hosted by OCLC on the topic of mashing up library services. Delving deeper, rather than entering or populating library services with gigabytes and terabytes of metadata about holdings, might it be possible to have entire catalogs that were mashed up combinations of information drawn from a range of other sources? The only critical information that a library might need to hold is an identifier (ISBN, ISSN, DOI, ISTC, etc) of the item they hold drawing additional metadata from other sources on demand. Publishers could supply a single authoritative data stream to the community, which could be combined with other data to provide a custom view of the information based on the user’s needs and engagement. Content is regularly manipulated and represented in a variety of ways by many sites, why can’t we do the same with library holdings and other data?

Of course, there are limitations to how far this could go: what about unique special collections holdings; physical location information; cost and other institution-specific data. However, if the workload of librarians could be reduced in significant measure by mashing up data and not replicating it in hundreds or thousands of libraries, perhaps it would free up time to focus on other services that add greater value to the patrons. Similarly, simplifying the information flow out of publishers would reduce errors and incorrect data, as well as reduce costs.

Charleston conference: Every librarian need not be a programmer too

Saturday, November 8th, 2008

Over dinner on Friday with the Swets team and their customers, I had the chance to speak with Mia Brazil at Smith College.  We had a great conversation.  She was telling me her frustration about getting systems to work and she was lamenting the challenges of not understanding programming.  She’d said that she tried learning SQL, but didn’t have much luck.  Now, learning SQL programming is no small feat and I can appreciate her frustrations (years ago, I helped build and implement marketing and circulation databases for publishers).  However, realistically, librarians aren’t programmers and shouldn’t be expected to be.

The systems that publishers and systems providers sell to libraries shouldn’t require that everyone get a master’s in database programming to implement or use.  While the larger libraries are going to have resources to implement and tweak these systems to meet their own needs, the smaller college or public libraries are not going to have the resources to have programmers on staff.  We shouldn’t expect that the staff at those libraries – on top of their other responsibilities – should have to be able to code their own system hacks to get their work done.

In a way, this was what Andrew Pace discussed in his session Friday on moving library services to the grid.  Essentially, Andrew argued that many libraries should consider moving to a software-as-a-service model for their ILS, catalog and other IT needs.  Much like Salesforce.com, provides an online platform for customer relationship management, or like Quicken does for accounting software, libraries shouldn’t have to locally load, support and hack systems to manage their work.  Some suppliers are headed in that direction.  While there are pros and cons related to this approach, it certainly is a viable solution for some organizations. I hope for Mia’s sake it happens sooner than later.

CENDI Meeting on Metadata and the future of the iPod

Wednesday, October 29th, 2008

I was at the CENDI meeting to speak today about metadata and new developments related to metadata. There were several great presentations during the morning and some worthy of additional attention. My particular presentation is here.

The presenter prior to me was Dr. Carl Randall, Project Officer from the Defense Technical Information Center (DTIC). Carl’s presentation was excellent. He spoke to the future of search and a research report that he wrote on Current Searching Methodology And Retrieval Issues: An Assessment. Carl ended hispresentation with a note about an article he’d just read entitled Why the iPod is Doomed written by Kevin Maney for portfolio.com.

The article was focused on why the Pod was doomed. The author posits that the technology of the iPod is outdated and will soon be replaced by online “cloud” computing services. To paraphrase from the article: The more entrenched a business is, the less likely it will be able to change when new competitors arise to challenge its existing model.

Another great quote from the article– “In tech years, they [i.e, the iPod and iTunes] are older than Henry Kissinger.”

I don’t quibble with the main tenant of the article; that services will move to the web and that we will think it quaint to have to purchase content and download individual songs, then carry around those songs on hard drives, which store those files locally. The iPod hardware and the iTunes model of by-the-drink downloads are both likely to have limited futures. I do think that Apple is probably better placed to transition their iTunes service to a subscription or cloud-based model than any others through their iPhones. The article dismisses this as unlikely because Apple hasn’t talked about it. This dismisses the fact that Apple never talks about their plans until theyare ready to announce a product of service.

As we move to an era of “cloud” computing, where both applications content are hosted on the network not on individual devices, it is likely that people will desire to purchase subscription access to all content on demand as opposed to the limited content that they specifically purchase.

A subscription model also provides new opportunity to expose users to new content. From my perspective, despite having over 10,000 songs in my iTunes library, I’ve been reluctant to purchase new content that I wasn’t already familiar with. I have used LastFM and other services (anyone remember FM radio) to become acquainted with new music. Part of the reason for this is that the barrier for me is time rather than cost, but I expect that the perceived cost issue is one for many potential users. I say “perceived” because, much research and practical experience shows us consumers will pay more for ongoing subscription services than they will for one-time up-front costs.

Moving content to the “cloud” provides many opportunities for content providers to exercise a measure of control that had been lost. By hosting files, rather than distributing them (streaming as distinct from downloading, for example) the content providers have greater ability to control distribution. Access becomes an issue ofauthentication and rights management, as opposed to DRM wrapping and other more onerous and intrusive approaches. Many of us have become quite comfortable with renting movies through Blockbuster, Netflix or cable OnDemand services.

There are downsides for the customers for moving to the cloud. There are very different rights associated with “renting” a thing (content, cars, houses, etc.) versus owning those things. How interested users will be in skipping those rights for the convenience of the cloud is an open question. Likely, the convenience will override the long-term interest in the rights. Frequently, it isn’t until someone realizes that they don’t have any control over the cloud is when they are burned by the owners of the services take them away in some fashion. If you’ve stored all of your photos on Flickr and the company deletes your account for whatever reason, you’ll wish that you had more control over the terms of service. From my perspective, I’d rather retain ownership and control the content I’ve purchased in those areas where I’m invested in preserving access or rights to reuse. I don’t know that the majority of users share my view on this; likely because they don’t spend much time thinking about the potential impacts.

This is something, in particular, libraries should be focused on having outsourced preservation of digital content to the publishers and organizations like Portico.

However, I do know that looking at these distribution models is a huge opportunity for suppliers of content in all forms. The risks of not acting or reacting are that a new upstart provider will displace the old. I grew up in Rochester, NY where Kodak was king in photography around the world for decades. Now Kodak is but a shadow of its former self, looking for a new business model in an era of digital imaging, not film and processing, which were its specialty.