Using Digital Libraries
as a Community Hall for
Worldwide Information Spiral Development

Minoru Ashizawa, Hideaki Kikuchi, Yusuke Mishina, Hiromichi Fujisawa,
Minoru Hidaka, Naoko Yamazaki, and Akito Sakurai
Hitachi Ltd., Central Research Laboratory
1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185, Japan
http://koigakubo.hitachi.co.jp/
ashiza@crl.hitachi.co.jp

Abstract

This paper describes our concept of a digital library from the user's viewpoint based on information flow and information organization, and explains it using a three-layer model of information distribution -- information provider, information broker, and information user.

Our worldwide communication hall prototype system is connected to the World Wide Web and supports the total flow of information around the user. It consists of the virtual personal library client Webshelf, an information publishing server, and the hypermedia archive server Webarchive. Webshelf allows users to virtually personalize information on the World Wide Web, to organize it to meet their needs, to create new information on the basis of the organized information, and to publish this new information. Our prototype implementation has shown that Webarchive offers an effective solution for the information volatility problem by storing all the information obtained by the user and managing link consistency.

Keywords: digital library, World Wide Web, HTML, annotation, information sharing, archive system, virtual personal library, information volatility problem, stiff-necked link problem, link-consistency management

1 Introduction

A digital library consists of (1) computer networks, (2) a catalog database, and (3) a multimedia database. Regarding (1) and (2), Hitachi has developed the Library Information Integrated System which helps librarians and users to manage budgets, inventory books in stacks, purchase books, and search, loan and return books [1]. Regarding (1) and (3), the Hitachi Central Research Laboratory is developing a multimedia digital library prototype system [2] connected to the World Wide Web. The Greek word cornucopia means "a goat's horn overflowing with fruit, flowers, and corn" and is associated with the prototype system with the meaning of "abundance of knowledge." This prototype system uses the Hitachi Integrated Document Information System for multimedia document management with multiview classification, full-text search [3], and a virtual personal library [4]. The virtual personal library supports the "information activities" of users, i.e. searching, retrieving, browsing, annotating, editing, storing, circulating, and so on.

In this paper we describe our concept of the digital library from the user's viewpoint and the three-layer model of information distribution. We next discuss the spiral development of information and describe our worldwide community hall prototype.

2 Digital Library Concept from Users' Viewpoint

2.1 Worldwide Information On Demand

People generally want to be able to access information on demand from anywhere they are. A system that allows us to access databases and get information quickly through a personal computer (PC) or some other computer is called an "Information on Demand (IOD)" System. A digital library is a typical IOD system.

Some users might be satisfied with a single digital library, while other users might require information that one library alone cannot deliver. In this case, the users must navigate through other digital libraries to get the information they need.

Once a user finds the desired information, he or she keeps only the links to the information, not the information itself, to minimize PC storage requirements. The user can always recall the contents by using the links.

2.2 Information Organization

The amount of information that a digital library stores is not small, of course. However, users do not always know, and cannot always describe, exactly what they want. Information organization is thus very important. It enables a digital library to guide the users. Information should be reorganized in various ways, to meet the requirements of the users.

From the global viewpoint, there is more information, more ways to organize information, and more requirements by users. Although it is difficult to meet all the requirements for information organization by users, information organization is still very important.

2.3 Open System and Connectivity

When searching for information, many users will connect to many dissimilar digital libraries. To ensure that the information can be displayed on the users PCs, the data describing the information must be in a standard application-independent format, such as SGML, HTML, JPEG, or MPEG. If the data is application-dependent, users must prepare not only various programs but also various versions of the programs to browse old information. Furthermore, if the information is application-dependent, it may eventually become undisplayable when, in the future, the computer architectures and software become quite different.

Turning to the past, there is a huge amount of information recorded on conventional media: paper, microfilm, and so on. Putting all of this information into digital libraries is of course too expensive. A practical solution is to digitize only the catalogues and to loan and transport conventional media.

3 Three-layer Model of Information Distribution

To determine the optimal digital library, we first analyzed information distribution process from the user's viewpoint. A diagram of our analysis is shown in Figure 1. Sections 3.1 to 3.3 describe the three layers of our model.

Figure 1: Three-layer model of information distribution from viewpoint of users.

3.1 Information Provider

The first layer of the model consists of information providers. Information providers are typically publishing houses, newspapers, museums, galleries, and so on. They produce digital contents or digitize old paper media, microfilm, and exhibits, and store them in multimedia content databases. Some information providers purchase databases from other information providers.

To make it easier for users to retrieve information, information providers organize information by attaching keywords to the contents, sequencing them, and so on, based on typical retrieval patterns. However the methods used in, for example, attaching keywords and sequencing, vary among providers.

3.2 Information Broker

The second layer consists of information brokers. Rather than create databases of contents, information brokers create databases of links to contents of information providers. Yahoo [5] and Hole-In-One [6] are two such information brokers.

Because information organization varies between information providers and does not always suit user requirements, a user accessing information from more than one information provider may be inconvenienced.

An information broker collects links to contents and organizes the links according to how they believe their users want to access the information. The users accessing the broker database search for and select links, then obtain the actual contents which the links point to. An information broker thus appears to the users as a huge information provider. In other words, an information broker is a virtual library. The users no longer need to navigate many different information providers.

Information brokers can also provide other various services, for example, information filtering, database mirroring, copyright management, and collection of payment for information providers.

3.3 Information User

The third layer of the model consists of information users.

The amount of information available from providers and brokers is overwhelming and most of it does not match the needs of users. The users must therefore search for the desired information by browsing the obtained information. Because the results of this searching are not insignificant, the user stores the links to the desired information, and sequences the links according to how the information will be accessed again. As a result, the user creates his or her own virtual personal library. This activity is information organization.

4 Spiral Development of Information

To clarify the total flow of information, we analyzed the user activities.

A user may create a virtual personal library not only to obtain information, but also possibly to research, consider, plan, report, and publish. The user may annotate, make scrapbooks of, or rearrange the information. The user may write a manuscript quoting and citing the information. When the user finishes writing, the user may create new information. All of these processes constitute information organization by the user. The information obtained is personalized, and most of the new information is built on the personalized information.

Finally, the new information may be published by registering it with an information provider. It then is searched and obtained by other users. The new information may be sent to the user's supervisor, colleagues, or other parties. In this case the user acts as an information broker for the recipients.

This new information may be used as the basis for another round of information creation. Information thus develops in a spiral as shown in Figure 2, and the spiral extends worldwide through digital libraries connected by networks.

Figure 2: Spiral development of information.

5 Worldwide Communication Hall Prototype

By focusing on the spiral development of information, we have developed a worldwide communication hall prototype system. The system is connected to the World Wide Web and allows users to virtually obtain, personalize, organize, and publish information. The system consists of a virtual personal library client (Webshelf) that runs on a Windows PC, an information publishing server (equivalent to a World Wide Web server on Unix), and a hypermedia archive server (Webarchive) that runs on Unix (Figure 3).

Figure 3: Prototype of worldwide communication hall system.

5.1 Virtual Personal Library: Webshelf

Webshelf uses a book-metaphor interface, to showing virtual books and bookshelves on the user displays to give the users a feeling of possessing the information they have obtained. In general, the users get a feeling of possession through organization, because information itself gives no sense of reality to the user unless it is strongly connected to the memory of the user during the time the information was organized. The book-metaphor interface is valid because it supports information management by using spatial memory and an outline of the whole body of information [7] [8].

Webshelf has two types of windows: a book window and a bookshelf window. The book window, an example of which is displayed in Figure 4, enables users

  1. to browse the Hypertext Markup Language (HTML) documents obtained from the World Wide Web in a virtual book form.
  2. to annotate the documents virtually by using a label metaphor. The user can put labels anywhere on the pages of a virtual book and can easily jump to the location of any label.
  3. to make virtual scrapbooks of the documents.
  4. to make personal links in addition to the original links.
Annotations, scrapbooks, and personal links are value-added information to the original information.

Figure 4: Webshelf's book (left) and scrapbook (right) windows with annotations.

The bookshelf window, shown in Figure 5, enables users

  1. to make virtual bookshelves with alterable names and widths in the local storage of the PCs.
  2. to virtually personalize the documents by storing the URL (Uniform Resource Locators) to the original documents and the URL to any value-added information as book icons in the virtual bookshelves.
  3. to recall the original documents to a book window with their value-added information by clicking on the book icons.
  4. to rearrange the book icons in the virtual bookshelves by using drag and drop actions.
  5. to make virtual bookshelves in the storage area of an information publishing server in order to make the value-added information available to other users.
  6. to copy the value-added information in local storage to server storage by using drag and drop actions of book icons among the virtual bookshelves.
  7. to search the book icons by strings in names and URLs.
  8. to store the URL and alterable name of document from, and to recall the original document to standard browsers such as Netscape Navigator instead of a book window.

Figure 5: Webshelf's bookshelf window.

5.2 Virtual Personalization of Information

Webshelf uses a pointer-storing method based on the concept of "transclusion" [9] to store the value-added information. In our interpretation, transclusion means not storing the original information but storing only the links to the original information and reacquiring the original information only when the value-added information is to be used. The hyper-structure of the value-added information in the virtual personal library is shown in Figure 6.

Figure 6: Hyper-structure of virtual personal library.

The pointer-storing method allows the users to personalize the original information virtually. This method makes it easy to determine if the original information to which many users added is the same or not, and to easily synthesize the value-added information created from the same original information. Furthermore, the pointer-storing method enables digital libraries to handle copyright management and to legally validate the use of information resources.

To denote value-added information, we defined an extended version of HTML. Link <A> of the HTML has extended attributes. These include the location where the information in a scrapbook begins ("start=206," in Figure 7) and ends ("end=336," in Figure 7) [4]. These locations are represented by the number of bytes from the top of the original information in the book window.

Figure 7: Example of extended HTML (in scrapbook).

These extensions do not cause errors in standard browsers, and the titles, or headlines of the original documents can be displayed by the standard browser. An example of the Webshelf's bookshelf data is shown to Figure 8. The links are displayed instead of the bookshelf and book icons. An example of the Webshelf's scrapbook data is shown to Figure 9. The links to the clipped information are displayed instead of their contents. Even the users who don't use Webshelf can browse some part of the value-added information that is published.

Figure 8: Webshelf's bookshelf data displayed by Netscape Navigator. Figure 9: Webshelf's Scrapbook data displayed by Netscape Navigator.

5.3 Webarchive: A Solution for the Information Volatility Problem

A big problem with the information on the World Wide Web is its volatility: digital information is modified and deleted easily and frequently. This problem creates a big obstacle to using the pointer-storing method. Modification or deletion of the original information invalidates the links from the value-added information. Information in an intranet can be managed to avoid this problem, but it is impossible for all information on the World Wide Web to be managed in a similar way.

Our solution to the information volatility problem is Webarchive, a hypermedia archive server which operates as a proxy server between the user's PC and the World Wide Web servers. Webarchive stores all information obtained by the users from the World Wide Web servers.

Webarchive uses the created/updated time of the original information as an index to manage the consistency of the links (Figure 10). Because link consistency is maintained users can easily navigate both spatially and chronologically between the World Wide Web sites.

Figure 10: Link-consistency management in Webarchive.

Webarchive works not only with Webshelf, but also with standard browsers like Netscape Navigator. Webarchive automatically adds a time index to URLs as a query by using the redirection mechanism and the "Referer" of Hypertext Transfer Protocol (HTTP) [10]. The details of this process are explained below and shown in Figure 11.

When the user specifies URL "A", which has no time index, the browser sends an HTTP request to Webarchive. When Webarchive receives the request, it searches versions which are stored in cache memory and selects one of the time indices of "A" to make a response which the browser then redirects. In this case we assume the time index "t2" is selected according to some user-designated strategy, so Webarchive specifies a URL "A-t2" in the "Location" header of the response. A sample of the URL with a time index is shown in Figure 7.

After receiving the redirection response, the browser sends the same request to Webarchive, but in the new request the URL "A" is replaced by the URL "A-t2," according to the Location header. Webarchive searches and returns the content because the URL of the request has a time index.

If the content specified by URL "A-t2" has an inline image or a link to URL "B," the HTTP request of URL "B" has a "Referer" header which specifies URL "A-t2", i.e. starting point of the link. Because URL "B" has no time index, Webarchive searches stored versions and selects one of the time indices of "B" to make a response for the browser to redirect. In this case Webarchive specifies URL "B-t1" in the Location header of response, according to URL "A-t2" in the "Referer" header.

Figure 11: Link-consistency management mechanism of Webarchive.

5.4 Link Conversion: A Solution for the Stiff-necked Link Problem

When the value-added information is copied and relocated from local PC storage to an information publishing server, another problem occurs, which we call the "stiff-necked link problem" (Figure 12). If a link pointing to the local PC storage is described by an absolute address, the link still points to the local PC storage even after the information is copied to the server. We call this a "stiff-necked link." Webshelf detects stiff-necked links in the server and converts them by using a simple algorithm.

Another solution is to rewrite the link at the time of relocation. But link rewriting can invalidate other links by shifting the number of bytes in the link descriptions. If other links point to this value-added information containing rewritten links, the other links are invalidated. If the other links are rewritten, link invalidation is propagated backwards. Therefore, link rewriting must be avoided to prevent the propagation of invalid links.

Figure 12: Stiff-necked link and link conversion by Webshelf.

6 Discussion

6.1 Bookshelf Metaphor

We demonstrated our prototype system to about one thousand persons at our laboratory and at some exhibitions in Japan. After discussions with some observers, we have concluded that different users may prefer different GUIs. Some of the observers prefer the folder-type GUI adopted by Apple Macintosh, Microsoft Windows, and so on. Some of them, however, prefer the bookshelf metaphor. Many persons who are not already familiar with computers prefer the bookshelf metaphor GUI to the folder-type GUI.

Though the metaphor subjects to the GUI design and restricts convenience and efficiency of view design, the metaphor is approachable for persons who are not accustomed to computers. This approachability will be very important for digital libraries because various users will want to access digital libraries. Designing operations well, so that the metaphor reminds users of the real object, will increase the entire system's approachability.

6.2 Book Metaphor

Page-turning of the book metaphor has two advantages over scrolling:
  1. The thickness of the virtual book helps users to assess the amount of content more easily than scroll bar.
  2. The human eye more easily keeps track of its place in the contents when page turning, rather than scrolling, is used.
Many HTML documents on the World Wide Web do not fit the book metaphor because the documents have complex frames, forms and wide figures which require two-dimensional scrolling. Long and simply structured text, such as plain text and HTML texts using a few tags (<A>, <H1>, <H2>, and so on) is fit the book metaphor. The amount of this kind of text on the World Wide Web is substantial.

6.3 Label Metaphor

The label metaphor accurately represents the actual activity of many persons who make memos and annotations in real books and articles. It offers high browsability. The label metaphor will be applicable to mail and work flow.

6.4 Copyright Problems

Because Webarchive stores copies of original information, copyrights may be infringed upon. A management policy such as the following is therefore needed.
  1. If the user(s) of Webarchive is/are the author(s) of the original information, copyright infringement cannot occur. This holds true for an intranet.
  2. If the user(s) of Webarchive is/are not the author(s):

  3. (2-1) If the authors have specifically given permission for the users to copy and distribute the information, copyright infringement is not a problem.
    (2-2) If the authors have not given permission, copyright infringement is possible. In this case Webarchive must be operated for only personal use.
    (2-3) If the authors have prohibited copying and distribution of the information, copyright infringement is possible. In this case Webarchive must be operated for only personal use, or the users of Webarchive must make a contract with the authors.

6.5 Storage Required

Because Webarchive stores all the information obtained by users from the World Wide Web servers, Webarchive requires a large amount of storage. We estimate the amount of required storage to be about 2 GBs per user per year when the mean length of information is 200 KBs and one user browses about 20 pages a day. This estimate shows that Webarchive running on a PC can be practically used for only personal use or small group use.

6.6 Further Applications of Webarchive

Webarchive can be used in various applications when it is connected to a DBMS managing more than 1 TBs of data. One such application is a copyright library; another is a deposit library for hypermedia publications which are available only online. Much important information is now missing from the World Wide Web [11]. Using the link-consistency management of Webarchive will allow the hyper-structure of any point in time to be recalled.

7 Conclusion

We have described our concept of a digital library from the user's viewpoint using a three-layer model of information distribution: information provider, information broker, and information user. The information flowing through these layers is organized in various ways, and is developed spirally. The users virtually personalize the obtained information and create new information on the basis of the personalized information.

Our worldwide communication hall prototype system is connected to the World Wide Web and supports the total flow of information around a user's virtual personal library. This system consists of the virtual personal library client Webshelf, an information publishing server, and the hypermedia archive server Webarchive. Webshelf allows users to virtually personalize information obtained from the World Wide Web, to organize it to meet their needs and to share the personalized and organized information.

Our prototype implementation has shown that our hypermedia archive server Webarchive is an effective solution for the information volatility problem, because it stores all the information obtained by the user, and maintains the link consistency.

The metaphor is approachable for persons who are not accustomed to computers. This approachability will be very important for digital libraries because various users will want to access digital libraries.

Our future research goals are archive management to control huge databases and to delete unnecessary data, refinement the Webshelf GUI, connectivity of Webshelf to other Internet information media, such as mail and news, and realization of those functions on standard browsers using Java, XML, and so on.

References

[1] Hitachi Ltd., "Library Information Integrated System -- LOOKS21 (in Japanese)," http://www.hitachi.co.jp/app/looks/

[2] Fujisawa, H., Mishina, Y., Ashizawa, M., and Kato. K., "Multimedia Digital Library Systems for the Global Information Network," Hitachi Review Vol. 44, No. 5, Oct, 1995, pp. 273-280.

[3] Kato, K., et al., "An Index-Free Full-Text Search Machine for Large Japanese Text Bases," Proc. Advanced Database System Symposium, 1989, pp. 75-82.

[4]* Kikuchi, H., Mishina, Y., Ashizawa, M. Yamazaki, N., and Fujisawa, H., "User Interface for a Digital Library to Support Construction of a eVirtual Personal Library,' " International Conference on Multimedia Computing and Systems (ICMCS96), IEEE Computer Society, Jun 1996, Hiroshima, Japan, pp. 429-432.

[5] http://www.yahoo.com/

[6] http://www.hole-in-one.com/

[7] Okada, K., Kinoshita, K., and Matsushita, Y., "Window System with Leafing Through Mode: Book Window," proc. of 1st Moscow Int. HCI '91, 1991, pp. 242-248.

[8] Shipman, F. M., Chaney, R. J., and Gorry, G. A., "Distributed Hypertext for Collaborative Research: The Virtual Notebook System," proc. of Hypertext '89, 1989, pp. 129-135.

[9] Nelson, T. H., "Literary Machines," Self-Published, 1983.

[10] Berners-Lee, T., Fielding, R., and Frystyk, H., "Hypertext Transfer Protocol -- HTTP/1.0," RFC1945, May 1996.

[11] Kahle, B., "Preserving the Internet," Scientific American, March 1997.