The Ethics of Data: Anonymity Vs Analytics

August 2021

We are living in unprecedented times. We walk around with powerful computers in our pockets that can track our every move. We regularly offer up our location and vital information on what we buy, watch, and read to digital global powerhouses such as Facebook, Google, and Amazon.

This data is, of course, used to provide us with product and service suggestions designed to improve our lives. The technology now known as “big data” is a battleground for surveillance. Many feel we are living in a Big Brother world, where our every physical and online movement, purchase, and personal message is stored to create a picture of us that may or may not be accurate.

The age of big data is now firmly upon us, and we therefore face collective societal challenges on how our data is handled and used to target and track us. Data ethics is an emergent theme and one that poses complex questions for those of us who work in the identity and knowledge sector.

Data ethics explained

Luciano Floridi and Mariarosaria Taddeo, on behalf of the Turing Institute and the Oxford Internet Institute, defined data ethics in 2016 as “a new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes).”

We have been offering up our personal information for many years—Google was founded in 1998 and Facebook in 2004. Understanding the principles behind data ethics is only a small part of the story. As individuals, most of us will have created a digital footprint and an online profile that has been used for commercial gain by global corporations for many years.

What we should now be concerned with is the question of who, or which organizations, have oversight on how, and where, our personal data is used, stored, and passed on. We need to think about policy, governance, and the law, as well as ethical implications.

The digital era is borderless. One of the biggest directives we have seen in recent years is the 2018 introduction of Europe’s GDPR legislation. This has had a far reaching impact outside the EU, with other global regions adopting its guiding principles where they have European users or audiences. At the core of the GDPR framework are the ethical principles that information be used fairly, lawfully, and transparently.

Consent

A common theme we now face is the intricate issue of consent. In its basic format, we either agree or disagree to let our personal data be stored, used, and passed on. However, this can feel like an overly simplistic definition, given the complexity of data handling.

In the case of a library environment, patrons might be required to give personal details, or attributes, in order to be granted login credentials, e.g. first name, surname, email address, etc. Small decisions about a single attribute may seem harmless at the point of discovery but have much broader implications once released. How does the user know the extent to which their credentials may be used to track and report on their movements, as well as on their access patterns and choices of resources?

While legislation makes it clear we have the right to give informed consent, this is a murky area. How many of us really read the small print, instead scrolling to the end of a lengthy document to press the “accept” button?

Guardians of Information

For anyone working in the knowledge or information sector, be it in publishing, libraries, or identity and access management, we are faced with a plethora of ethical decisions. It is our duty to act in the interests of society at large in making choices that put safeguards in place for future protection.

Are librarians effectively the gatekeepers of privacy when it comes to making decisions about what data is collected and, in turn, how it is used to report trends and generate useful insight? This is one of the key questions. In many instances, IT departments consider themselves the gatekeepers, controlling the creation of the identity that is then used by the library to simplify access to information resources.

When looking at information access and personal authentication, we need to consider what attributes we ask for and how they are (or could be) used. It’s vital to build ethical considerations into the roadmap for any product that requires users to offer up personal information. The risk of information being used in the wrong way needs to be considered and built into the planning stage of any access system. Failure to do this can render systems ineffective at best, and in contravention of data laws at worst.

This is where federations are useful. A federation is defined as a technical framework where service providers (publishers) and identity providers (organizations) agree to exchange encrypted user attribute data. They also create a trust authority that gives service providers appropriate assurances that a user passing their unique organizational ID can be considered a valid and current member of that organization. In summary, “your name's on the list, you’ve got your ID—you can come in!”

A federated access platform will have built-in appropriate data handling, consent, and storage functions giving both organizations and providers reassurance that personal data is being treated ethically and legally.

Analytics

We all need to report on the effectiveness and impact of our work and the digital systems we use. It should not necessarily be assumed negative that we track and analyze student or library patron movements and information access.

Some libraries track usage patterns to identify when a student has not used the library. The discovery can then lead to outreach on the grounds of welfare. This is an important point, as research has shown that students who engage with their library have more successful academic results. We have seen this in practice at UK-based Anglia Ruskin University, which has highlighted the use of data in improved student outcomes (a good use of data). Organizations can only do this if they have access to data linked to individual credentials.

As the Association of College and Research Libraries (a division of the American Library Association) has reported in its research, “Several AiA studies point to increased academic success when students use the library. The analysis of multiple data points (e.g., circulation, library instruction session attendance, online database access, study room use, interlibrary loan) shows that students who used the library in some way achieved higher levels of academic success (e.g., GPA, course grades, retention) than students who did not use the library.” (Brown, 2017)

Information professionals need to know which resources are most useful, in order to make budgetary decisions on subscriptions and accessibility to published works.Pseudonymization, a technique where personal data can no longer be traced back to an individual, has been used as a methodology where organizations can still track and report on usage without compromising the user.However, be mindful that GDPR’s Recital 26 clause makes it clear that pseudonymized personal data remains personal data within the scope of the UK GDPR. The alternative is offering complete anonymity; however, this does not provide the same granular information for organizations in planning resources.

Our digital identities are often key unifiers across multiple platforms. Single sign-on is a standout benefit of a common ID, but the technology that ties together the user experience can be complex.

The Darker Side of Data

While we as well-meaning professionals may have a sound ethical mindset in only using personal data for positive means, this is unfortunately not always the case. There have already been cases where data has been manipulated, extracted, and used in ways not intended when it was originally collected.

One of the best-known examples of data misuse is the Cambridge Analytica scandal, which was revealed in 2018 by whistleblower and former employee Christopher Wylie. The data of 87 million Facebook profiles was found to have been harvested without consent and used to target and influence voting in the 2016 political campaigns of Ted Cruz and Donald Trump. This systematic practice was later recognized as an information breach. However, the outcome of this lawsuit could not undo the course of history.

Data breaches have now become consistent occurrences in our lives, and legislation and ethical considerations cannot prevent this. Cyber criminality relies on our personal data. Indeed, how can we combat these violations, when reports of government-backed online espionage appear alongside breaches of banking, security data, and ransom demands?

Such is the threat of cyber-criminality; the impact of incidents is far reaching. Reuters estimated that between 800–1500 businesses around the world were affected by a single ransomware attack on US-based information technology firm Kaseya in July of this year.

Data as a Force for Good

It’s not all bad news. Organizations such as NISO set the standard in information handling and dissemination through a community approach to define and preserve our future digital identities. The power of data can actually serve us well and has the ability to enrich our lives.Within the information ecosystem alone there are multiple platforms that work together to provide patrons and users with the most relevant and valuable experiences. For example, it is helpful to be given a recommendation for a book or a source relating to something we have already viewed or read—certainly, this can significantly reduce our online research time.

We have seen over the last two years that data can be used for good in a health care context, as part of the pandemic response. Many countries have harnessed public data to communicate with their populations and have used data to inform decisions relating to vaccine rollouts and lockdowns and to trace spikes in community transmission.

The Role of the Information Professional

Knowledge professionals will continue to play a pivotal role when it comes to the ethics of data. They are the custodians of information and shoulder the responsibility for the decisions made on handling, usage, and distribution of data today and for generations to come. However, opinions and decisions are always going to be nuanced when working with data.

As Dr. Philippa Sheail referred to CILIP (the UK-based library and information association) in her plenary session at industry event VALA2020, “There is increasing anxiety in information work about the tension between the freedom of access to information and the right of an individual to privacy concerning personal data, as well as the role of the information professional in defending people’s right to seek out and access different kinds of information. Recent legislation has not clarified that tension. The impact of technology and the ease with which data is now generated, stored, processed, and published means we, as information professionals, must be even more vigilant about protecting the interests of the users.

What Now?

Federated access management provides the framework to navigate the complex topic of data ethics and helps to overcome the issues relating to policy, law and governance—but it is the institutions and the individuals working within them who need to plot the course.

We need to collectively commit to a continual debate about ethical implications. As technology changes, we will no doubt be faced with more challenges. The capability and increased data that virtual and augmented reality offers will produce even more data in the future. Mark Zuckerberg has announced his future vision for Facebook to be a “metaverse” where we will live, work, and play, incorporating VR/AR technology that can track our physical movements alongside our friendships, shopping habits, and conversations. An ethical mindset is more important than ever.

There are several overarching questions that we can, and should, consistently and continually ask ourselves when handling and designing systems that deal with data:

Are we handling and using data fairly and transparently?
Have we considered who else has access to our data and for what purpose?
What is the system for data access, requests for changes, and deletions, and is this system easy to navigate and universally understood?
Have we done all we can to ensure security and limit risk? Is this an ongoing commitment and subject to collective review?
Is the way we handle data reflecting any bias or subjective analysis? How can we limit this to ensure parity?

As with many complex topics, awareness and a conscious commitment to standards are required. No one claims data ethics is an easy path to follow, but it is an endeavor that should be undertaken collectively within a supportive community.

Jon Bentley

Commercial Director

Jon is commercial director of OpenAthens, an organization with a mission to enable access to knowledge through single sign-on.

Related Information

Stay in touch with what's happening with NISO and the broader information commu…