Reliable and sustainable providers of

open information, knowledge and knowledge tools

A huge collection of information – new possibilities for libraries Photo: Plashing Vole/Flickr CC BY-NCDevelopments in research and technology provide new challenges for libraries. The area of digital humanities highlights these developments by bringing together in-depth understanding related to humanities and social sciences in one hand and a large scale of developments related to computer science.

These developments are closely related to age-old philosophical questions but also provide new possibilities for libraries to strengthen them as a backbone of a future society in which people are not only well informed but knowledgeable and even wise. This will be made possible by a huge collection of information in digital form and the analysis of this using computational analysis tools.

Good philosophy works in practice

Information science is closely linked with philosophy, in particular, epistemology, i.e. the study of knowledge. In information storage and retrieval, many aspects of epistemology were tested at a practical level. In a broader sense, this is true for the area of digital humanities.

In the broad definition of digital humanities, the study of the different disciplines within humanities and social sciences are supported by digital representation of information and computation modelling. Although digital humanities have attracted considerable attention during recent years, the area has traditions of several decades within e.g., corpus linguistics.

As the role of libraries is to serve as a neutral provider of increasingly digital content, it is natural to rely on tested traditional approaches. One such approach is to use classification systems as a means to facilitate a common ground.

Ontologies can be seen as a step further in this direction, systematically providing information on relations between information items.

Relevant and irrelevant

From the point of view of humanities, social sciences and other complex disciplines, some concerns can be raised, the primary ones being contextuality and subjectivity. The context or point of view has a remarkable impact on what is a good way of encoding information. What is relevant in one context may be irrelevant in another.

In a relevant manner, one category system may be useful in one context, but even misleading in another. In biology, one can ask what the relationship between Linné’s taxonomy and modern gene information is.

In addition to contextuality, subjectivity was mentioned as another area of concern. Subjectivity refers to the fact that each person has a personal vocabulary and conceptual system. Strictly speaking, each person understands each word or phrase in a different manner from another person due to differences in education, personal experience, etc. Naturally, differences are often minor because otherwise communication would not be possible.

Computational methods

The effect of subjectivity becomes clear when one leaves everyday life contexts to specific professional and disciplinary contexts. In summary, the number of potential ways of conceptualizing a domain is large, and the value of a particular conceptualization is purpose/view dependent.

A classification system always creates a divide between those who master it and those who do not. Luckily, new computational means can be used to deal with these types of issues. Computational methods have been developed for instance to:

  • create taxonomies and classification systems in an automatical, data-driven manner
  • assess the terminological difficulty of documents in an automated way
  • compare the degree of contextual subjectivity in interpreting and using words and phrases
  • analyze compatibilities and incompatibilities between different conceptual systems.

In general, new types of knowledge tools are becoming available. An important aspect here is that the tools are datadriven which means that models are built automatically. In the following, the relevance of these developments to libraries is discussed.

Opportunities for libraries

As computers are developed into devices that can process many knowledge-intensive processes in an increasingly automated fashion, one can ask what should and could libraries do. This is a broad and complex question for which the following list suggests some preliminary answers.

  1. Increase provision of new types of tools for information access and analysis, parallel ways – both classification systems and ontologies as well as machine-learning-based ways of accessing information.
  2. Including easy-to-use tools on a server with a direct access to the library’s collections, as well as provision of opensource tools that may have been collected and in some cases developed in collaboration with the research community. For instance, so-called hackathons can be a means to develop such tools or prototypes of the tools.
  3. Increasing the use of means to cross linguistic and multimodal borders. This includes cross-language information retrieval, use of machine translation and various types of mappings between written, spoken and visual information. For example, the research on speech recognition and machine vision has proved to be very challenging, but important breakthroughs have taken place during very recent years or even months. These make new kinds of services possible, which should be provided by libraries also, not only by companies such as Google, Facebook or Microsoft.These types of companies have huge economic resources for implementing the services, but it is good to keep in mind that the underlying innovations and core technologies actually have been and are being created in public funded universities and other research institutions.If the political decision makers are able to see the ‘bigger picture’ clearly enough, we can see a situation in which open information, open knowledge and open knowledge tools can provide a massive benefit for citizens and for the entire society, including SMEs, not only large companies. A natural provider of these services and tools are the different types of libraries, starting from the national libraries in the case of the complex applications.

For a long time, libraries have not only guided their patrons in finding information, but also provided tools for information retrieval. Provision of knowledge tools relying on large digital libraries and based on machine learning and pattern recognition technologies can be seen as a natural extension of the traditional role. The extended role requires, of course, new resources, skills and an open mind.

Professor University of Helsinki and National Library of Finland