Home | Data Dictionary Archive | February 18, 2011 | Table of Contents | Appendices | Appendix B: Measuring the Use of Electronic Library Services

Post a comment

NISO Z39.7-201X, Information Services and Use: Metrics & statistics for libraries and information providers - Data Dictionary

Draft for Trial Use. A proposed revision to the 2004 edition of the standard.

Appendix B: Measuring the Use of Electronic Library Services

B.1. General

Libraries today provide electronic as well as traditional services. With new forms of information resources developing and new possibilities for document delivery, the use of electronic library services is growing rapidly.

Ways of providing and delivering information have changed fundamentally and will continue to change in years to come. Nevertheless, libraries cannot wait for consolidation of this process, but must try to measure and report their activity in this field. Traditional statistics on collection building and use can only show part of a library's current performance. Therefore, this International Standard contains definitions of statistical measures for electronic as well as for traditional library services.

While most traditional statistics can be produced by the library itself, statistical data for electronic services, especially for their use, must be collected from different sources: vendors and suppliers, computing centers and library consortia will be involved. It is important at this time that libraries reach agreement about the statistical data they need to evaluate their services, and that they negotiate with vendors and suppliers of information resources and suppliers of automated library systems to provide such data.

This Standard sets out to define the:

    Various forms of electronic library services; Various forms of electronic information resources; and Various forms of use of electronic services.

Definition and collection of data for electronic information resources and electronic document delivery are treated in the main standard, because in this area it seems possible at this time to find definitions that are reliable.

Statistics for the use of electronic services are dealt with in this Annex, because, in this area, for some time statistics might be incomplete and methods of data collection are likely to change quickly because of rapid technological development.

B.2. Issues of Measuring the Electronic Collection

In contrast to conventional resources, electronic resources often have no physical form and boundaries, and this will affect the measurement of both collection and use. For example:

    Documents can consist of several files or elements (text, image, multimedia), and be embedded in web frames. Also the same document may look different when viewed through different web browsers. Furthermore the contents of electronic resources (whether individual full texts or those in databases) can undergo changes over time. Uniform Resource Identifiers (URI) are becoming more widespread and support the clear identification of documents. Databases can be configured to combine and sort information so that every search command may constitute a new object (document). Active Server Page (ASP) technology, for example, allows the generation of a web page out of a number of database entries upon each request. These cannot be counted as documents prior to their generation, and it is difficult to measure use. As abstract and indexing, full text, and other databases begin to merge into complex database products, it becomes increasingly difficult to differentiate between them. Therefore subdivision is only proposed as optional measure in Annex B. In future many differences between electronic serials and full text databases will be likely to diminish as well. A precise count of their number will therefore become difficult. Many resources (electronic serials, databases, or digital documents) can be accessed free on the Internet, and libraries may catalogue and index some of these. This is dealt with in the main standard (see ISO 2789, 6.2.14).

B.3. Issues of Measuring Use

Communication on the Internet can be described as stateless and transaction-based. Each web server will record some significant parameters of these transactions. Dependent on individual settings, the statistical information will be gathered in one or more "log files". In their standard setting, called Common Log file Format (CLF), seven basic parameters are recorded. Among these are: the requesting IP address (unique Internet Protocol number attached to each Internet computer), authentication information, a time stamp, the transfer success status and the transfer volume. The CLF can be extended by two more parameters (i.e., the referring link and the computer's browser), and operating system. Log files therefore only collect statistical data on transactions between Internet computers; time-based data (e.g., search time, time of document or resource exposure) can only be assessed if web log mining tools are being operated to analyze site or server traffic.

In order not to affect the usability of electronic collections libraries rarely implement personal authentication. Use by members of the population to be served, however, can only be determined if some identification information is being recorded. For the purpose of measurement, a request is therefore regarded as being originated by a member of the population to be served if the IP address belongs to the library or institution/legal service area. The access to paid-for electronic library services (e.g., acquired or licensed databases, serials etc.) is usually authenticated for lists or blocks of IP addresses. It must therefore be presumed that members of the population to be served will have originated all successful requests. Requests of free services (e.g., OPAC and library website), however, are impossible to validate in total. While access from inside the institution (identified by IP addresses) is assumed to originate from members of the population, remote use (e.g., from computers at home) will generally be anonymous. Furthermore, individual IP addresses using the same proxy server will not be recognizable, as only the IP address of the proxy will be recorded in the log file.

At the time of this writing a wide range of software tools are available to extract and analyze descriptive statistical information from log files, and a number of online statistics suppliers offer professional guidance in collecting and presenting log data. It must, however, be recognized that the quality and precision of statistics for web-based electronic collections will vary in a number of areas:

    Many paid for electronic collections must be accessed on remote (supplier) servers. Although an increasing number of suppliers nowadays present use statistics of electronic resources in accordance with a variety of guidelines (including ICOLC and others), libraries are dependent on suppliers for the completeness and quality of the data made available to them, and results are difficult to compare. Most Internet providers use proxy servers, and users can activate local cache files in their browsers to store copies of documents that have previously been accessed. In a proxy server environment repeat requests for a document are supported within caches/proxies instead of through the document server, thereby shortening the time of transmission. As these requests will not reach the document server, no statistical entry will be recorded in the log file, and the number of requests counted will underestimate the amount of real use. Individual browser cache settings can add more complication, however, because some professional web analysis tools (many of them developed to measure web advertising) can induce computers to ignore the stored copy and instead newly request the document. Not all requests of a page can be regarded as use: search engines will usually request websites for indexing purposes, and library website administrators will access their pages as most of them are subject to frequent maintenance. The number of requests counted will therefore overestimate the amount of real use. These entries can be removed if the requesting IP address is being recorded in the file. If no automatic filtering is available, the total count must be diminished manually by these page requests.

B.4. Use of Electronic Services

In the last few years, various libraries and institutions have tested datasets that could be used to assess the amount and the different ways of use of some or all electronic library services. Testing is still in progress and reports show that data derived from vendor systems, automated library systems, or library servers may differ considerably. A few measures have been developed that the participating libraries deem most useful and that may turn out to be reliable when based on the same definitions and the same methods of data collection.