Guest post by David Haynes, author of Metadata for Information Management and Retrieval, 2nd edition: Understanding metadata and its use
Use of metadata by the security services
“Metadata tells you everything about somebody’s life. If you have enough metadata you don’t really need content” (Schneier 2015, p.23)
If anyone wondered about the importance of metadata, this quote by Stuart Baker of the US National Security Agency should leave no-one in any doubt. The Snowden revelations about the routine gathering of metadata about international telephone calls to or from the United States continues to have repercussions today (Greenwald 2013). Indeed Privacy International (2017) has identified the following types of metadata that is gathered or could be gathered by security agencies:
- Device used
- Length of call
“Metadata in aggregate is content” as Jacob Appelbaum observed when the Wikileaks controversy first blew up (Democracy Now 2013). In other words when metadata from different sources is aggregated it can be used to reconstruct the information content of individual communications.
Invasion of privacy or personal benefit?
These concerns extend well beyond the use of metadata by Governments and the security services. The social media giants prosper by exploiting personal data and targeting digital advertising. Personal profiles of targeted individuals are based on metadata about online use and are the basis of online behavioural advertising. Cookies and other tracking technologies can monitor the online activity of an individual to predict future behaviour. Metadata about online sessions reveals a great deal about an individual and his or her life. This may extend to gathering information about friends, family, colleagues and other contacts.
The upside of this is that metadata is a powerful tool to facilitate use of online services, by remembering users’ preferences and delivering content that is more likely to be of interest or relevance to them. This has to be balanced against the risks associated with online disclosure of personal data.
Metadata describes an information object whether that be raw data or more descriptive information about an individual. This is important because the treatment of metadata has become a political issue. Personal data, especially data that reveals opinions, attitudes and beliefs is potentially very sensitive. Use of this personal data by service providers or by third parties can expose users to risks such as nuisance from unwanted ads, harassment from internet trolls or fraud through identity theft, if the data is not held or transmitted security. Many digital advertisers would say that because the data is aggregated it is not possible to identify individuals – i.e. the data is anonymised. However this is no protection against privacy breaches as has been demonstrated by Narayanan and Shmatikov (2009) and others.
Daniel Rosenberg (2013) makes a nice distinction between data, facts and evidence. Data if true may be a fact, but if false ceases to be a fact. Samuel Arbesman (2012) in his book ‘The Half Life of Facts’ introduced the idea that in a given period half the certainties that we had are shown to be false or are superceded by new understandings and that they cease to be ‘facts’. Data, whether it is true or not, continues to be data, but is only factual if true. Perhaps there is some way of recording the reliability of information or data so that it can be exploited appropriately. Many of the arguments and counter-arguments on climate change for instance centre on the quality and veracity of the evidence used by each side of the debate. This idea is not new, as medical researchers have for some time evaluated the quality of research used to make clinical decisions. This information about the quality and reliability of data is metadata.
Metadata is political
Metadata has become a political issue because of its use by security agencies and because of wider privacy issues in the commercial world. Anyone who had asked the question ‘What does metadata matter?’ prior to 2013 will realise just how important a bearing it has on current political issues. The Fourth Amendment to the U.S. Constitution protects ‘The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures’ (United States 1791). A lot hangs on the interpretation of privacy as Solove (2011) has so eloquently discussed in his book ‘Nothing to Hide’. ‘Fake news’ is not new, but the phenomenon has reared its head in recent elections and is unlikely to go away any time soon. Good governance also depends on a good understanding of metadata and accountability for past actions.
Metadata for information management and retrieval
In the new edition of Metadata for Information Management and Retrieval, published in January 2018 I consider the origins of metadata and look at the ways in which it is used for managing information resources. The ethical dimensions of metadata are explored and issues such as governance, privacy, security and human rights are considered. The book also discusses the digital divide and the potential that metadata has for making information accessible to wider audiences.
Metadata has an important role in politics and ethics. How then do we manage it to best effect?
Haynes, D (2018) Metadata for Information Management and Retrieval: understanding metadata and its use. ISBN 9781856048248. Facet Publishing. London, 2018, 267pp. http://www.facetpublishing.co.uk/title.php?id=048248
You can follow David on Twitter @JDavidHaynes
Arbesman, S., 2012. The half-life of facts : why everything we know has an expiration date,
Democracy Now, 2013. Court: Gov’t Can Secretly Obtain Email, Twitter Info from Ex-WikiLeaks Volunteer Jacob Appelbaum. Available at: https://www.democracynow.org/2013/2/5/court_govt_can_secretly_obtain_email [Accessed March 21, 2017].
Greenwald, G., 2013. NSA Collecting Phone Records of Millions of Verizon Customers Daily. The Guardian. Available at: http://www.theguardian.com/world/2013/jun/06/nsa-phone-records-verizon-court-order [Accessed July 7, 2014].
Narayanan, A. & Shmatikov, V., 2009. De-anonymizing Social Networks. In 2009 30th IEEE Symposium on Security and Privacy. IEEE, pp. 173–187.
Privacy International, 2017. Privacy 101. Metadata. Available at: https://www.privacyinternational.org/node/53 [Accessed March 23, 2017].
Rosenberg, D., 2013. Data before the Fact. In L. Gitelman, ed. “Raw Data” is an Oxymoron. Cambridge, MA: MIT Press, pp. 15–40.
Schneier, B., 2015. Data and Goliath: the hidden battles to collect your data and control your world, New York, NY: W.W.Norton.
Solove, D.J., 2011. Nothing to Hide: the false tradeoff between privacy and security, New Haven, CT: Yale University Press.
United States, 1791. U.S. Constitution Amendment IV, United States.
Join our mailing list
Sign up to our mailing list to hear more about new and forthcoming books.
Facet Publishing have announced the publication of Managing Digital Cultural Objects: Analysis, discovery and retrieval edited by Allen Foster and Pauline Rafferty both at Aberystwyth University.
The book explores the analysis and interpretation, discovery and retrieval of a variety of non-textual objects, including image, music and moving image.
Bringing together chapters written by leading experts in the field, the first part of this book provides an overview of the theoretical and academic aspects of digital cultural documentation and considers both technical and strategic issues relating to cultural heritage projects, digital asset management and sustainability. The second part includes contributions from practitioners in the field focusing on case studies from libraries, archives and museums. While the third and final part considers social networking and digital cultural objects.
Managing Digital Cultural Objects: Analysis, discovery and retrieval draws from disciplines including information retrieval, library and information science (LIS), digital preservation, digital humanities, cultural theory, digital media studies and art history. It’s argued that this multidisciplinary and interdisciplinary approach is both necessary and useful in the age of the ubiquitous and mobile web.
Key topics covered include:
- Managing, searching and finding digital cultural objects
- Data modelling for analysis, discovery and retrieval
- Social media data as a historical source
- Visual digital humanities
- Digital preservation of audio content
- Photos on social networking sites
- Searching and creating affinities in web music collections
- Film retrieval on the web.
The book will provide inspiration for students seeking to develop creative and innovative research projects at Masters and PhD levels and will be essential reading for those studying digital cultural object management. Equally, it should serve practitioners in the field who wish to create and develop innovative, creative and exciting projects in the future.
About the editors:
Allen Foster has a BA in Social History, a Master’s in Information Management and a PhD in Information Science. As Reader in Information Science, he has held various roles, including Head of Department for Information Studies, at Aberystwyth University. His research interest areas span the research process of Master’s and
PhD students, the development of models for information behaviour and serendipity, and user experience of information systems, creativity and information retrieval. He has guest edited for several journal special issues, is a regional editor for The Electronic Library and is a member of journal editorial boards, international panels and conference committees.
Dr Pauline Rafferty MA(Hons) MSc MCLIP is a Senior Lecturer and Director of Teaching and Learning at the Department of Information Studies, Aberystwyth University. She previously taught at the Department of Information Science, City University London, and in the School of Information Studies and Department of Media and Communication at the University of Central England, Birmingham.
Sarah Higgins, Aberystwyth University
Katrin Weller, GESIS Leibniz Institute for the Social Sciences
Hannah Dee, Aberystwyth University
Lorna Hughes, University of Glasgow
Lloyd Roderick, Aberystwyth University
Alexander Brown, Aberystwyth University
Maureen Pennock, British Library
Michael Day, British Library
Will Prentice, British Library
Corinne Jörgensen, Florida State University (Emeritus)
Nicola Orio, University of Padua
Kathryn La Barre, University of Illinois at Urbana-Champaign
Rosa Ines de Novias Cordeiro, Federal Fluminense University, Rio de Janeiro
Foundations of Library and Information Science offers a firm underpinning of knowledge and guidance for LIS students and professionals alike. It will prepare LIS students and professionals to cope with and effectively manage their many complex responsibilities by:
- providing an introduction to the LIS field
- identifying and discussing the current major topics and issues in LIS that will continue to affect the profession for years to come
- providing librarians and information professionals with an opportunity to refresh their knowledge through a systematic review of the major issues and topics that have changed the field
- placing LIS in a larger social, political, economic, political and cultural context
- inviting readers to further explore topics raised in the book.
Responding to the many changes occurring both in the field and in society at large, this text includes comprehensive coverage of:
- the impact of digital devices and social networking
- the impact of digital publishing and e-books
- the evolution of library services including virtual reference, embedded librarianship, digital access and repositories, digital preservation and civic engagement
- the new efforts to organize knowledge including FRBR, RDF, BIBFRAME, the semantic web and the next-generation library catalogue
- the significance of the digital divide and policy issues related to broadband access and network neutrality
- legal developments including new interpretations of copyright related to mass digitization of books and scholarly articles
- the continuing tensions in LIS education between information science and library science
- new initiatives to integrate libraries, archives, and museums.
Spanning all types of libraries, from public to academic, school, and special, this book illuminates the major facets of library and information science for aspiring professionals as well as those already practicing in the field.
Foundations of Library and Information Science; December 2015; paperback; 648pp; 9781783300846; £54.95; is published by Facet Publishing and is available from Bookpoint Ltd | Tel: +44 (0)1235 827702 | Fax: +44 (0)1235 827703 | Email: email@example.com | Web: www.facetpublishing.co.uk. | Mailing Address: Mail Order Dept, 39 Milton Park, Abingdon, Oxon OX14 4TD. The US edition is available in North America through ALA Editions.
The following is extracted from The Information Society, 6th edition by John Feather
A little more than a decade into the new century, people over the age of 35 in the industrialized countries are increasingly conscious of living in a world that is profoundly and fundamentally different from that into which many of them were born. In less than two decades, we have seen technological, economic, political and cultural change on a scale which, as a retrospective view becomes possible, is beginning to justify the use of the word ‘revolution’ to describe it. But revolution is a word that we associate with violence, with the storming of the Bastille or the bombardment of the Winter Palace. The 1990s were indeed a violent decade in some places, but our revolution was only indirectly a part of that. It began in the 1970s and is not yet complete; it has been at once less obvious and more far-reaching than a mere change in a regime or even in a whole political system. It has been a revolution in our way of living, which, in one way or another, has affected every human being on the planet.
The symbol of the revolution is the computer, the ‘electronic brain’ of the ‘boffins’ in science fiction films of two or three generations ago, which now seem far older than their 50 or 60 years. The computer is in every office, on most desks and in millions of homes. Behind the scenes it is involved in almost everything we do, from buying our groceries to making a telephone call. Even after more than a century of almost continuous innovation in the technology of communication, and the invention of devices from the telegraph and the telephone to the television, the computer is perceived, however vaguely, as being in some way different. By understanding that difference, we can begin to understand the new society which the computer is helping to create, the revolution which it has both inspired and driven.
It is now two centuries since the last comparable revolution was at its height in Britain. The exploitation of the power of steam was creating a new economy, and in so doing reordering patterns of work, social relationships and the structure and political organization of society. The new arrangements which stabilized in the first half of the 19th century were recognizably the successor of what had gone before, but unmistakably different from it. Institutions that survived were changed; many vanished, and many new ones were created. The revolution through which we are now living is at least as great in its significance.
The steam engine was the motive power, both literal and metaphorical, of the industrial revolution; the computer is driving the revolution which is taking us into the third millennium. Why has this machine become so important? What is so special about these devices that has made them the force behind changes far greater than those wrought by any other invention of an inventive century? The answer lies in their ability to simulate skills and attributes that we once thought were unique to ourselves: memory, logic, communication. Machines that are able to emulate, and in some ways to surpass, the intellectual and social capacities of those who make them are both fascinating and frightening. The virtually unlimited production and availability of such devices cannot leave any aspect of human thought and activity wholly untouched.
Communication and memory are central to the human experience. So far as we know, we are the only creatures on earth with a true sense of history, a desire and an ability to remember and analyse events in the past, and to make arrangements that allow us to record our knowledge and ideas in perpetuity, so that they can be recovered and understood by generations not yet born in societies which do not yet exist. Uniquely, we can communicate across time and space and have developed systems and devices that enable us to do so. These developments began in the dawn of human history, with the evolution of language itself (which some anthropologists would argue is the dawn of human history in any meaningful sense) and the later invention of the first systems for recording and preserving language in a material form.
The information-dependent society that is emerging from our revolution – the post-industrial revolution as some analysts call it – combines both profound change and fundamental continuity. It can only be understood in context. Part of this context is historical: the development of writing, printing and systems of communication. Part of it is economic: the means by which systems for the communication of information have become enmeshed in general systems of social and economic organization, so that information and the means of its storage and transmission have been commodified. A third part is political: commodified information is valorized by more than merely the cost of its production and distribution, for there is a real power to be derived from its possession and a loss of empowerment caused by its absence. These hypotheses about the origins, development and implications of the information society are at the heart of this book. The book begins with an historical survey, which sweeps without apology across much of the history of mankind. In that history, we observe first the development of writing, as people seek to preserve more information than their memories can hold and communicate it to those to whom they cannot speak. We trace the development of different systems of writing until one – the alphabet – emerges and supersedes almost all of the others because it is an adaptable and flexible means of preserving the languages in which we think and speak. Even the alphabet, however, cannot cope with all the concepts that the human mind can invent. Systems were developed which enabled our ancestors to record sound (as musical notation), numeric data and the relationship between them (as numbers and symbols for mathematical functions) and visual representation of size, shape and colour.
In the second phase of our history, a mechanical device – printing – was applied to the chronicling and dissemination of the information which was thus recorded. The invention of printing has been seen as a defining moment in the history of mankind. Certainly, it facilitated important changes in the organization and structure of western European culture, religion and politics, and was to be one of the instruments of European domination of almost all of the rest of the world. In the smaller world of communications, printing had another effect which we consider at length: it was the fundamental reason for the commodification of communications. A printer, we shall argue, needed more than merely skills in order to practise his craft successfully; a printer also needed both capital for the equipment with which the product was made and distribution systems through which the product could be sold. The printed book was the first mass medium, because it was economically impossible for it to be anything else.
Out of printing there developed the vast edifice of the publishing industry, the first significant manifestation of communications entering the world of commerce. The process of writing, producing and selling printed books was, for 400 years, the unchallenged system of communication between literate people. It became so familiar as to become a paradigm; its vocabulary and some of its customs have been imitated by the producers and consumers of very different media. In this book, the paradigm has been exploited to the full. There is a substantial analysis of the process of book publishing, and of the industry that has developed around it. This is developed as a model of commercial systems for the communication of knowledge and information, which can be applied in turn to the other media that have proliferated in the last 100 years.
The development of those other media – sound, vision, computing, and various combinations of them – is the final historical strand in this study. The history of information and communication in the last 150 years is, in part, the history of the development of new devices and systems which have extended our power to communicate in two ways. First, they have made it more systematic and faster and hence more efficient. Secondly, and more importantly, they have extended the scope of what can be communicated. Above all, accurate representations of visual phenomena – photography, film, video – have become a part of our daily lives. We have moved beyond text and language into the storage and communication of images of the visual world in which we actually live. Other inventions have speeded the transmission of information: the telegraph, the telephone, radio, television. These tools of communication are the building blocks of the information society. An increasingly literate society has, paradoxically, become more dependent than ever on oral and visual communication systems.
Only at the very end of our historical story do we reach the computer, and yet as soon as we do so we can begin to see its all-pervasive effects. The computer has brought together so many of the developments of the past. It has both demanded and facilitated the convergence of technologies, which allows us to combine computing with telecommunications and the digitization of text and image to permit almost instantaneous worldwide (and indeed extra-terrestrial) transmission of data.
The historical approach in Chapter 1 (free to view and download as a PDF here) and Chapter 2 is essentially an attempt to sketch the history of the storage, communication and retrieval of information, in terms of media and technology. We turn next to the economic issues that have arisen, which are becoming more acute and which are being more urgently addressed because of the increasing predominance of technology in the process of information provision and the delivery of information services. Information, as has already been suggested, was commodified and valorized by the invention of printing and the consequent development of an industry which used printing as its key technology. Publishing – the paradigm – is in the front line of exposure to change under the impact of the information revolution. The market-place itself is being redefined and extended. Some activities traditionally associated with publishing and others traditionally associated with libraries are being disaggregated and recombined. The new configurations have wide implications far beyond the boundaries of the academic world in which many of them originated. E-mail and electronic publishing are only two of the more obvious applications of the combination of computing and telecommunications which we broadly describe as ‘information technology’.
The printed word, which has been the traditional commodity in the information market-place, was supplemented and to a limited extent displaced throughout the 20th century. The information revolution encompasses all those media that communicate information to recipients. In the developed world, and indeed far beyond it, the most potent medium of all is television, the near-universal domestic source of information, entertainment and social interaction. Broadcasting, first in sound only and then in both sound and vision, has been with us for nearly 100 years. Its ability to transmit information and opinion instantaneously, with great apparent authority and directly to the home, was a force whose power was recognized before World War II and has been consistently exploited by governments, pressure groups and commercial interests ever since it was identified. Radio and television are integral to the information revolution, and yet they are also subject to it. Satellite broadcasting, which is computer-dependent, has brought a new sense of freedom to the television industry, but, like so many other developments, has also reiterated, if reiteration were needed, the need for huge capital investment to gain access to this key medium of information and influence.
It is not only the mass media that have changed the information marketplace. Broadcasting is, by definition, a public activity. Information, however, is increasingly seen, in some respects, as being too valuable to be public. Stored in databases throughout the world is information with commercial potential to which access is restricted by the ability of the information-seeker to pay for it. Again a revolution is being wrought. The library is the historic paradigm of information storage and retrieval as publishing is of information marketing. Libraries, like publishers, have been in the front line of change. These changes are far from superficial; it is not just that libraries now contain a wide range of media, and are increasingly dependent upon technology both for their management and for the provision of services to users. There are far more profound economic changes, for libraries are part of the increasingly commercialized chain of information supply. Traditionally, the library was merely the customer of the publisher. Now it has the potential to be the publisher’s partner in many enterprises, and librarians are reassessing their attitudes to the cost of information supply. Outside the confines of the institutional library, information providers have few of the inhibitions that have traditionally made librarians look askance at such matters. Information has values assigned to it, and it is provided at a profit to the provider; prices are determined by the forces of the market.
It is out of these economic themes covered in Chapters 3 and 4 that the political themes that predominate in Chapters 5 and 6 emerge. On a global scale, there is a growing gap between the rich and the poor in access to information as in so much else. The technological developments of the last 60 years have made more information more available to more people than at any other time in human history. At the same time, however, the cost of those technologies, and the cost of gaining access to information through them, have made it often difficult and sometimes impossible for information to be obtained by its potential beneficiaries. This is the central paradox and the central political dilemma of the information revolution. As in the industrial revolution, in different ways, the benefits to the majority, encompassed in the abstraction of ‘society’, are being achieved partly at the expense of weaker and poorer individuals whose skills are becoming outmoded and whose earning power is consequently declining.
The revolution in the communication of information has created what is sometimes called a ‘global village’. Yet instant access and instantaneous transmission depend upon a vastly expensive infrastructure of telecommunications and broadcasting systems on the part of the providers, and the acquisition of appropriate equipment (and sometimes skills) on the side of the consumers. Those who are excluded are the majority of the populations of most of the Third World and significant minorities even in richer countries. Even in the USA, the cabling of the ‘information superhighway’, the optic fibre network which can bring digital communications to the home, was politicized as the provider companies avoided poorer areas of cities to concentrate on the richer areas where demand and profits was higher. The gap between information rich and information poor is increasingly overt.
If that gap is the wider political dimension of the information revolution, its most obvious immediate political consequence has been to change, or to threaten or promise to change, the relationship between the state and its own citizens. Governments, like businesses, cannot function without information, and as they become more complex so do their information needs. Much of this is not only legitimate, it is both essential and benevolent. A modern state cannot function without such basic data as that provided by censuses, tax returns and electoral registers. There is, however, a debate, perhaps not yet sufficiently well articulated, about the boundaries of the legitimate information needs of a democratic state. Information about identifiable individuals is sensitive, and yet there are cases in which its dissemination, perhaps to a tightly defined group of recipients, is clearly in the public interest. In other cases, there can be no such interest, and dissemination is clearly an invasion of legitimate privacy. But there is an increasing number of less clear-cut areas, where the organs and agencies of the state are collecting information of great potential value or harm. The process of regulation – of balancing the general good against individual rights – has begun, but is still embryonic. The whole issue, complex enough already, was at the beginning of the 21st century further complicated by the growing concern with international networks of both criminals and terrorists; they are also among the beneficiaries of information and communications technology.
Historically, the state has always been a participant in the process of information transfer. It regulates the operation of the market-place through laws that control the dissemination of intellectual property, such as copyrights and patents, and also, in most cases, by exercising a certain level of moral jurisdiction through censorship. The state’s role, however, like so much else, is being transformed by the information revolution. The very concept of copyright, which for 300 years has been the legal foundation-stone of the publishing industry, becomes blurred when the technology of copying is uncontrollably widely available. As the historic functions of publishers and libraries begin to converge in electronic publishing and electronic document supply services, the very nature of copyright will need to be redefined. There are changes too in the general perception of the state’s right to intervene by intercepting private communications and controlling the content or availability of those intended for public consumption. These are lively political issues which touch on the fundamental principles of political democracy and an open society.
These three aspects of the evolution of the information society – the historical, the economic, the political – are considered in turn. Each raises its own questions, yet all are interrelated. The questions are not new, but they have all been made more urgent by the power of the computer to store, process and transmit information. In the industrialized countries it is no longer possible to conduct many of the most basic transactions of daily life without using the power of computers. Our greatest tool of information and communication is in danger of becoming our master. Much of this book is concerned with trying to define the issues that are raised by this prospect.
Finally, there is one group of people in society who have a special role to play in the information revolution. Computer scientists and information workers are the engineers of the post-industrial revolution (see Chapter 7). More than any other group, publishers, librarians and archivists have seen their professions transformed; whole new professions have come into existence as governments, businesses, industries and institutions have struggled to reposition themselves to deal with the new technologies of information and communication. In those parts of the world where the information revolution has made its greatest impact, the information professionals are becoming a larger and larger part of the workforce as a whole. But it is not only those professionals whose lives are being profoundly changed. Patterns of work and patterns of employment are being transformed as radically now as they were in the move from an agricultural to an industrial economy 200 years ago. Manufacturing itself is no longer dependent upon the mass employment of labour as computer-based devices are made which can undertake the routine work of making and assembling parts. We are living in the midst of this revolution. Those who are seeking to enter the information and communications professions – for whom this book is principally intended – need to be able to formulate the questions to which the answers will prescribe the limits of their professional lives. Some of those questions are posed, and a few of the answers are suggested, in the rest of this book.
This is extracted from The Information Society, 6th edition by John Feather.