Digital Libraries, Knowledge Networks, and Human-centered
Information Systems

Y.T. Chien
National Science Foundation
Arlington, VA 22230, USA
Email: ytchien@nsf.gov

Abstract

One of the most dramatic changes in the ongoing information revolution is the rapid convergence of computing, communications and content industries. Digital content, especially in the form of large, distributed, heterogeneous collections of electronic objects - text, voice, images, graphics, video, and others - is fueling the growth of the computing and communications in each other. This paper discusses the role of digital libraries, and knowledge networks in general, in this process, in the context of human-centered information systems.

Keywords Digital Libraries, Knowledge Networks, Human-centered Systems, Knowledge and Distributed Intelligence, New research Initiatives.

I. Introduction

Human-centered information systems (HCIS) [2-5] research is concerned with improving the interactions among humans, computing systems, and information resources. It builds on the foundations of computing and information sciences, with a special emphasis on human users being an essential component. Pathways in this research include finding and exploring new modes and environments for communication between these three components; improving the computing system's perception and understanding of human expression in the forms of languages and other communication modalities; enhancing the computing system's effectiveness in providing information and information services to the human user; and development of physical devices as intelligent extensions of human capabilities. Among the research issues addressed are data capture and store; information management and access; knowledge representation, delivery and distribution; intelligent human and computer interfaces; group and organizational interactions; determination of usability and adaptability; and programming paradigms and software environments tailored to problem domains and task specifications. The key challenge in this research is how to harness new information technologies for the benefits of diverse end users.

This paper discusses some of the key ingredients for moving the human-centered information systems forward in the broad context of the information age.

II. A New Framework

Work in human-centered information systems in the past has encompassed several fields of studies from symbolic computation, artificial intelligence, models of cognition, databases and information retrieval, expert systems technology, to robotics. Traditionally, these studies have concentrated on computationally intensive models and tasks. Their focus has been primarily on machines. With humans being in the center of computing and communication, the new emphasis is on content creation, information infrastructures, and information transfer. The important research tasks here are dealing with multiple modalities of input and output, multiple communication media and multiple players. The goal of future human-centered systems must be to achieve ease of use (by ordinary citizens and specialists) as well as to simultaneously solve the problems of scale, heterogeneity, and evolution of user needs.

The foundational aspects of human-centered information systems concentrate on computational models of human cognition and intelligent behaviors such as reasoning, planning, learning and adaptation; understanding of human language acquisition and processing and its role in automated systems; algorithms, methodologies and programming environments for data capture, store, transport, and access; architectures enabling the creation, integration, and easy use of diverse information sources, including voice, text, image, and gesture; representation of human expertise, domain knowledge, objects and their semantics and relationships; and theories and models of coordination and collaboration among people and in organizational settings. Research in this area is highly interdisciplinary, drawing on mathematics, cognitive sciences, biological and behavioral sciences, and engineering.

Human-centered information systems research includes experimentation as well as theoretical studies. The purpose of experimental research is threefold. The first is for the validation of theoretical results from abstract models, computational algorithms, or engineering designs. The second is for evaluating the performance of an integrated system that consists of components or parts, against realistic data sources and users. In speech understanding research, for example, the experimental system often consists of the voice input, speech analysis, recognition algorithms, a language understanding component, as well as other supporting functions to make use of related linguistic and dialog-based domain knowledge. Performance evaluation is then done by testing the system against speech input of a given vocabulary from many users. Lastly, experimental research may also take the form of system prototyping and testbed demonstration. This is usually for investigating issues of a cross-cutting nature such as scale, interoperability, alternative architectures, and usability, as well as for illustrating the applicability of technologies in real problems.

III. Knowledge Networks

In [1], Tapscott describes the convergence of three key industries in forming a new alliance in the "digital economy": the Computing industry, the Communications industry, and the Content industry, as illustrated below:

The emerging national and global information infrastructure connects a growing population of people, computers, information appliances, data repositories, and special instruments and other physical resources. This convergence, in many forms, presents is a new challenge and an enormous opportunity to knit a vast array of resources into a knowledge network for content-rich and collaborative activities among its massive users. At the National Science Foundation (NSF), we envision a series of programs of integrated research and education that brings together concerted efforts from academe, industry, and government to build a new substrate of science and technology needed to drive and benefit from this convergence in behalf of all members of the society. This relationship is shown in Figure 1.

Forms of Convergence. The "Knowledge Networks" we envision represents new research opportunities combining computing, communication, content, and collaboration in ways beyond the current technologies being pursued in each field.

Computing/Communication It has been widely recognized that computing and communications fields are converging to provide an exciting infrastructure through which to support new kinds of human activity. This convergence has taken many forms, such as:

Taken together, the convergence of computing and communications will form the foundation layer of the knowledge network. The National Science Foundation has taken the first steps in this foundational work through such cross-directorate initiatives as the Gigabit Testbed program, the High Performance Connections program, and the new initiative on the Next Generation Internet [7 ].

In future knowledge networks, two critical ingredients stand out: Content and Collaboration. They form the new fabric in weaving the new information, computing, and communications infrastructure on a global scale.

Content New kinds of scientific and practical digital content are appearing: the digitization effort of the Library of Congress, the visible human project of the National Library of Medicine, the vast databases from the Human Genome project, the information systems for Global Change studies, and the diverse data repositories from many other science and engineering disciplines are just a few of the examples. Interoperability - the convergence of content with different communication modalities, platforms, locations, knowledge domains, representations and semantic interpretations - is a key driving theme of current research. The recent NSF's cross-agency research initiative on Digital Libraries represents a major commitment in that direction (see below).

Collaboration The activities of people, separated by distance, time, and understanding, are also converging through the existing and fast developing information infrastructures. The National Science Foundation has been behind many of the new software tools (e.g., the CAVEs and the Upper Atmospheric Research Collaboratory - UARC) that enable interactivity among people, processes, and organizations in distributed, heterogeneous environments. Future research needs to focus on the convergence of technological, social, and behavioral aspects of coordination and collaboration to achieve symbiosis among people, knowledge sources, and the physical environments.

IV. Digital Libraries: An Example of Knowledge Networks

We describe an example here to illustrate the kind of foundational work that could lead to rich, high-impact future research in the "Knowledge Networks" area. .

The Digital Library Research Initiative.

Launched in 1994-95, the joint agency (NSF, DARPA, and NASA) initiative for advanced research in digital libraries is a four year effort that also involved cross-directorate participation (with the Directorate for Social, Behavioral, and Economic Sciences for funding and the Directorate for Education and Human Resources for planning). The projects are centered at Carnegie-Mellon University; the University of California-Berkeley; the University of Michigan, the University of Illinois, the University of California-Santa Barbara; and Stanford University. Each project brings together researchers and users from academic faculty, in partnerships with libraries, museums, publishers, schools, and computer and telecommunications industries. The projects' goal is to dramatically advance the means to collect, store, and organize information in digital forms of all kinds - data, text, images, motion video, sound, and integrated media - and make it available and sharable for searching, retrieval, and processing via high-performance communication networks in ways that transcend distance and time. The research goal is accompanied by the development of associated testbeds with experiments designed to evaluate the technologies in real application domains ranging from engineering, earth and space sciences, geosciences, and environmental sciences, publishing and broadcasting, to distance education. A complete description of this Initiative and the projects may be found in [8].

One of the distinguishing features of this initiative is the partnership arrangement. Each project consortium, led by a university, is joined by a large number and diverse group of parties participating in its research. Together, the collection of these "partners" exceed several hundreds and represents the real vitality of this program. See Figure 2 (a).

If one looks at the industrial partners closely, it's not hard to see how they scatter around the digital triangle, clustering in groups of computing, communications, and content, respectively. Yet, beyond the traditional roles of these industries, it is hardly surprising to note that many of the companies would fit equally well in one or the other groups in the triangle relationship. Their research interests and roles played in the DLI partnership mirror their present positions in the digital economy - converging towards each other, fueled by content interests. This interesting relationship is depicted in Figure 2 (b).

V. Research Threads

Convergence within these three activities, including the three industries in various forms as discussed above is the driving theme for future research on Human-centered Information Systems. A new paradigm to advance this research can be organized around the following five conceptual threads. These threads, in turn, can be the major elements for collaborative work to be pursued by any or a combination of many of the scientific and engineering disciplines.

Interactivity - Connectivity, interfaces, resource-sharing Representation - Knowledge/expertise, concepts, objects, artifacts Cognition - group behavior, perception, action, coordination on the net Agents - Intelligent software/middleware for the net worked users Corpora /Repository - Scientific databases, digital libraries/museums

The specific goals and scopes of research for these threads are discussed below.

1. Interactivity

Interactivity between humans and information systems is one of the most fundamental research and development threads. Interactivity technologies encompass all input and output devices and their interface characteristics honed with the aim of making the best match, or adaptive match, to what is known about the needs and requirements of individual humans, groups, teams and organizations so that information flows properly. The spectrum of research needed includes, for example: speech recognition for ordinary citizen interfaces; database query methods; dialogue and discourse structure and function methodologies; collaboration methodologies; query and image query; gestures, body language and facial expression; community, group and team interaction models; ecological & virtual environmental interactions; computer vision; teleoperation; haptics, smell, taste and balance technologies; signal processing and understanding.

Sensor and effector subsystems are interposed between the human and the environment being interacted with. Dialog, discourse and collaboration research seeks to understand and describe the details of the give-and-take between humans and agents, among human in groups, and by extension, possible dialog strategies between agents. In many cases the environment interactions provide clues and feedback from environmental cognitive processes. Signal capture, filtering, processing and understanding precedes encoding for permanent information storage and for human usage. Remote and networked-based systems involved issues in teleoperation and telerobotics, sensory access via vision, audio tactile and other feedback channels. The understanding and embodiment of perceptual and motor cognition intercepts this research and development thread.

Research and development in interactivity technologies includes understanding and developing methods that promote discourse and collaboration; dialog constraints in support agents; intermodal mappings for people with disabilities; query languages; understanding the information user's intent; rapid prototyping tools and workbenches; usability methods; problem solving environments; virtual environments; knowledge-based sensors and effectors; dexterity; and interactivity in training and education.

2. Representation

Research in Representation can be divided for convenience into human representation, computational representation and multimodal and multimedia representation. The examination of human knowledge presentation provides the basis for determining how to best encode, manipulate and display internal information or signals which indicate their nature and content. Importantly, the way humans assign meaning and semantics to their representation must be understood and modeled. The variety of computational representations display various taxonomies depending on the scope of objects and activities. Signal representation and display includes all sensory modalities (visual, audio, haptics, etc.) and in suitable combinations (multimodal and multimedia).

Technologies in developed under Representation include various complex systems and their interiors; complex data types; representation of uncertainty; multimedia indexing, abstraction integration and compression; scalability, functionality mapping an data fusion; domain-independent text and image abstraction; visualization and display; illumination and rendering; properties and functionality of immersion environments; gesture and facial expression representation; representation of haptics, smell, taste and balance; common knowledge representation; representation of languages, syntax and constraints. As an example, the representation of quantum-mechanical objects, such as are anticipated in future simulations and virtual environments surrounding current nanotechnology developments, is a necessary concomitant tom and probably a tool for, the physical building, development and understanding of such technologies.

In short, the scope of research for Representation covers from conceptual models , processes, methods of all kinds in both informational and physical spaces.

3. Cognition

Understanding human cognition, including perception and motor actions helps us to do two things: (1) To discover clues on how to build and evaluate computational models of cognition, and (2) to determine characteristics of such systems and the best interface to human, other agents and the environment. Models are exercised in the research activities to observe their emergent behaviors and properties. These outcomes are evaluated along with appropriate evaluation criteria. The overall approach is to improve the partnership of computational systems and machines with humans. Research in cognition also examines human learning mechanisms to discover the nature, organization, mechanisms and properties of realizable systems consistent with the needs of humans for learning and discovery. This includes determining ways of modifying representations and system behaviors. The overall impact on learning transfer and skill acquisition when human capability is augmented are themes for evaluation which underlie further developments.

Research and development in the area of Cognition include, for example, such things as exploiting parallel architecture for computation; behavior and event processing of non-ridgit objects; models of individuals, small groups, teams and organizations; focus of attention; dynamic adaptation; error processing; sensory-motor models; perceptual and motor cognition; non-conceptual cognition; knowledge-based information processing; high-level reasoning; symbolic and geometric processing; transfer of learning; skill training; and an understanding of learning effect of human exposure to virtual and real environments.

4. Agents

Information agents at the most abstract level may be said to include all systems, software and communications. The spectrum of research in this area includes understanding the nature and scope of capabilities of agents which can interact with humans or other agents in useful ways. Information knowbots that seek specific information from distributed storage systems is an example. Also included are cooperating physical agents such as robots, intelligent devices and other non-human natural agents or environments. Research issues include how to determine the possible or optimal domain range and scope of the functionality of agents. How to decide the degree and type of interactivity with humans and other agents is another area. These capabilities are packaged in a way that are guided by principles of decomposition and organization which themselves need to be understood and further developed under this research and development thread. The assimilation and accommodation of adaptive agents, as well as information-seeking and cooperative behaviors, are important capabilities of agents that must be researched and developed under this part of the initiative.

Research and development of information agents includes exploiting parallel architecture for information agents; dynamic adaptation and evolution of agents; focus of attention of agents; sensory-motor systems, design and criteria; modularity, parallelism and complexity; the impact of computation on the latency of signals over networks and their timeliness; interoperability; load management agents; connectivity; knowbots, robots and heterogeneous systems; evolution, adaptation, and maintenance of intelligent agents ; and the development of robustness and fault-tolerant behaviors for such agents.

5. Corpora

The ability of humans to access, retrieve and comprehend information depends on the structure, function and capabilities of systems that store, manipulate and communicate with corpora. Database structure and function is the basis for all sorts of information stores from ad hoc databases to large digital libraries and repositories. The key to retrieval is the nature of the information itself and how it is searched, indexed and used by humans. Corpora in research and development domains also include databases containing the results of evaluations of databases and database systems.

Research and development in Corpora includes advances in database structure, functionality and organization; the evolution of structure, function and content; object databases; multimedia databases; multi-modal information management; experimental evaluation databases; high-confidence systems; security and authorization; dissemination and distributed databases; intelligent transaction processing; searching, filtering, information fusion, indexing and retrieval; digital libraries and repositories and the development of disciplinary databases. Also included in Corpora are such digital collections of artifacts as in traditional art galleries and museums.

VI. New Research Initiatives

A major shift in research paradigm often means a new opportunity to pursue new work that cuts across traditional boundaries. At the National Science Foundation, we are formulating a number of research initiatives as new ways to stimulate high-risk, interdisciplinary work in the HCIS arena. These new activities will build on the results of their predecessors or attempt to provide new opportunities to do collaborative research in the context of the research threads discussed in the previous section. Two of these are briefly described below.

Digital Libraries - Phase 2

As the first phase of the cross-agency Digital Library Initiative (DLI) draws to a close, plans are underway to extend the efforts of the six consortia projects as well as many others that have since started both in the U.S. and around the world. In a way, Digital Libraries has now become both an exciting field of study and a rapidly growing force in the digital economy. The goals for the follow-on initiative - Digital Library Initiative phase 2 - hopefully will further advance and expand the research scope of DLI, while also looking to build real collections and produce operational digital libraries as valuable intellectual infrastructure for other domain communities. Programmatically, the new initiative will also enrich the continuing shift to human-centered systems perspectives by informing design approaches and allowing early implementation of systems based on these design principles. Thus, the new program will help address the entire life cycle of digital libraries: information and knowledge creation in the form of new information objects and collections; new technologies for access, discovery, search and retrieval; enduring resources to allow long term use and continuous embellishment; and archival and preservation strategies and tools. The following table is a comparative summary of the complementary goals of DLI and its successor, DLI 2, in the planning.

Universal Access

Another new theme in the broad HCIS area being advanced at NSF and its research communities is providing new opportunities for interdisciplinary efforts that may lead to dramatically improved access to information resources by all of its citizens [6]. In the digital economy, information technology, and especially that associated with knowledge creation and use, affords a dual opportunity of increasing the productive portion of society by reducing the barriers to participation and reducing the enormous human and economic cost of those citizens who must depend on others or other ways for information access. Furthermore, if technology is developed to augment all modalities by which people with a variety of requirements and needs ultimately communicate and interact, it will end up serving everyone by removing time, distance and many technological barriers to normal human discourse. This is the basic motivation behind the "Universal Access" initiative.

VII. Conclusion

Digital libraries and more generally knowledge networks are two complementary ingredients for human-centered information systems. They together will become increasingly a powerful force in both the scientific and economic contexts as we enter the 21st century. It is important to seed and cultivate their growth, however, with proper fundamental research perspectives, whether from the viewpoint of the industry or from that of government agencies which fund research. This paper discusses such a perspective and suggests five research threads as the foci for interdisciplinary research.

References

[1] Don Tapscott, The Digital Economy: Promise and Peril of Networked Intelligence, McGraw-Hill, 1996. [2] Jim Flanagan, et al (eds.), Human-centered Systems: Information, Interactivity, and Intelligence, Final Report, NSF Workshop held February 17-19, 1997. [3] Dan Atkins (ed.), Digital Libraries, Report of the NSF Santa Fe Workshop on Distributed Knowledge Work Environments, held March 9-11, 1997. [4] Y.T. Chien, John Hestenes, Gary Strong, and Steve Griffin, Human-centered Information Systems, NSF Internal Document, 1996. [5] Y. T. Chien, et al (Eds.), Interdisciplinary Research in Knowledge Networking: an NSF Prospectus, an NSF Internal Document, 1997. [6] National Research Council, More Than Screen Deep: Towards Every-citizen Interfaces to the Nation's Information Infrastructure, Report by the Computer and Telecommunications Board, National Academy Press, 1997. [7] Jean Smith and Fred Weinggarten (Eds.), Research Challenges for the Next Generation Internet, Report of a Workshop by the Computing Research Association, May 12-14, 1997. [8] Special Issue on Digital Library Initiative, IEEE Computer Magazine, May 1996.

Acknowledgments

The author is indebted to his colleagues at the National Science Foundation, especially those in the Division of Information, Robotics, and Intelligent Systems. While he alone is responsible for the content, many of the concepts and ideas were a result of the discussions and shared work with his colleagues. The viewpoints expressed in this paper are solely those of the author's and do not necessarily reflect any official position of the National Science Foundation.