Using Digital Libraries
as a Community Hall for
Worldwide Information Spiral Development
Minoru Ashizawa, Hideaki Kikuchi, Yusuke Mishina, Hiromichi Fujisawa,
Minoru Hidaka, Naoko Yamazaki, and Akito Sakurai
Hitachi Ltd., Central Research Laboratory
1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185, Japan
http://koigakubo.hitachi.co.jp/
ashiza@crl.hitachi.co.jp
Abstract
This paper describes our concept of a digital library from the user's viewpoint
based on information flow and information organization, and explains it
using a three-layer model of information distribution -- information provider,
information broker, and information user.
Our worldwide communication hall prototype system is connected to the
World Wide Web and supports the total flow of information around the user.
It consists of the virtual personal library client Webshelf, an
information publishing server, and the hypermedia archive server Webarchive.
Webshelf allows users to virtually personalize information on the World
Wide Web, to organize it to meet their needs, to create new information
on the basis of the organized information, and to publish this new information.
Our prototype implementation has shown that Webarchive offers an effective
solution for the information volatility problem by storing all the information
obtained by the user and managing link consistency.
Keywords: digital library, World Wide Web, HTML, annotation,
information sharing, archive system, virtual personal library, information
volatility problem, stiff-necked link problem, link-consistency management
1 Introduction
A digital library consists of (1) computer networks, (2) a catalog database,
and (3) a multimedia database. Regarding (1) and (2), Hitachi has developed
the Library Information Integrated System which helps librarians and users
to manage budgets, inventory books in stacks, purchase books, and search,
loan and return books [1].
Regarding (1) and (3), the Hitachi Central Research Laboratory is developing
a multimedia digital library prototype system [2]
connected to the World Wide Web. The Greek word cornucopia means
"a goat's horn overflowing with fruit, flowers, and corn" and is associated
with the prototype system with the meaning of "abundance of knowledge."
This prototype system uses the Hitachi Integrated Document Information
System for multimedia document management with multiview classification,
full-text search [3], and a
virtual personal library [4].
The virtual personal library supports the "information activities" of users,
i.e. searching, retrieving, browsing, annotating, editing, storing, circulating,
and so on.
In this paper we describe our concept of the digital library from the
user's viewpoint and the three-layer model of information distribution.
We next discuss the spiral development of information and describe our
worldwide community hall prototype.
2 Digital Library Concept from Users' Viewpoint
2.1 Worldwide Information On Demand
People generally want to be able to access information on demand from anywhere
they are. A system that allows us to access databases and get information
quickly through a personal computer (PC) or some other computer is called
an "Information on Demand (IOD)" System. A digital library is a typical
IOD system.
Some users might be satisfied with a single digital library, while other
users might require information that one library alone cannot deliver.
In this case, the users must navigate through other digital libraries to
get the information they need.
Once a user finds the desired information, he or she keeps only the
links to the information, not the information itself, to minimize PC storage
requirements. The user can always recall the contents by using the links.
2.2 Information Organization
The amount of information that a digital library stores is not small, of
course. However, users do not always know, and cannot always describe,
exactly what they want. Information organization is thus very important.
It enables a digital library to guide the users. Information should be
reorganized in various ways, to meet the requirements of the users.
From the global viewpoint, there is more information, more ways to organize
information, and more requirements by users. Although it is difficult to
meet all the requirements for information organization by users, information
organization is still very important.
2.3 Open System and Connectivity
When searching for information, many users will connect to many dissimilar
digital libraries. To ensure that the information can be displayed on the
users PCs, the data describing the information must be in a standard application-independent
format, such as SGML, HTML, JPEG, or MPEG. If the data is application-dependent,
users must prepare not only various programs but also various versions
of the programs to browse old information. Furthermore, if the information
is application-dependent, it may eventually become undisplayable when,
in the future, the computer architectures and software become quite different.
Turning to the past, there is a huge amount of information recorded
on conventional media: paper, microfilm, and so on. Putting all of this
information into digital libraries is of course too expensive. A practical
solution is to digitize only the catalogues and to loan and transport conventional
media.
3 Three-layer Model of Information Distribution
To determine the optimal digital library, we first analyzed information
distribution process from the user's viewpoint. A diagram of our analysis
is shown in Figure 1. Sections
3.1 to 3.3 describe the three layers
of our model.
Figure 1: Three-layer model of information
distribution from viewpoint of users.
3.1 Information Provider
The first layer of the model consists of information providers. Information
providers are typically publishing houses, newspapers, museums, galleries,
and so on. They produce digital contents or digitize old paper media, microfilm,
and exhibits, and store them in multimedia content databases. Some information
providers purchase databases from other information providers.
To make it easier for users to retrieve information, information providers
organize information by attaching keywords to the contents, sequencing
them, and so on, based on typical retrieval patterns. However the methods
used in, for example, attaching keywords and sequencing, vary among providers.
3.2 Information Broker
The second layer consists of information brokers. Rather than create databases
of contents, information brokers create databases of links to contents
of information providers. Yahoo [5]
and Hole-In-One [6] are
two such information brokers.
Because information organization varies between information providers
and does not always suit user requirements, a user accessing information
from more than one information provider may be inconvenienced.
An information broker collects links to contents and organizes the links
according to how they believe their users want to access the information.
The users accessing the broker database search for and select links, then
obtain the actual contents which the links point to. An information broker
thus appears to the users as a huge information provider. In other words,
an information broker is a virtual library. The users no longer need to
navigate many different information providers.
Information brokers can also provide other various services, for example,
information filtering, database mirroring, copyright management, and collection
of payment for information providers.
3.3 Information User
The third layer of the model consists of information users.
The amount of information available from providers and brokers is overwhelming
and most of it does not match the needs of users. The users must therefore
search for the desired information by browsing the obtained information.
Because the results of this searching are not insignificant, the user stores
the links to the desired information, and sequences the links according
to how the information will be accessed again. As a result, the user creates
his or her own virtual personal library. This activity is information organization.
4 Spiral Development of Information
To clarify the total flow of information, we analyzed the user activities.
A user may create a virtual personal library not only to obtain information,
but also possibly to research, consider, plan, report, and publish. The
user may annotate, make scrapbooks of, or rearrange the information. The
user may write a manuscript quoting and citing the information. When the
user finishes writing, the user may create new information. All of these
processes constitute information organization by the user. The information
obtained is personalized, and most of the new information is built on the
personalized information.
Finally, the new information may be published by registering it with
an information provider. It then is searched and obtained by other users.
The new information may be sent to the user's supervisor, colleagues, or
other parties. In this case the user acts as an information broker for
the recipients.
This new information may be used as the basis for another round of information
creation. Information thus develops in a spiral as shown in Figure
2, and the spiral extends worldwide through digital libraries connected
by networks.
Figure 2: Spiral development of information.
5 Worldwide Communication Hall Prototype
By focusing on the spiral development of information, we have developed
a worldwide communication hall prototype system. The system is connected
to the World Wide Web and allows users to virtually obtain, personalize,
organize, and publish information. The system consists of a virtual personal
library client (Webshelf) that runs on a Windows PC, an information
publishing server (equivalent to a World Wide Web server on Unix), and
a hypermedia archive server (Webarchive) that runs on Unix (Figure
3).
Figure 3: Prototype of worldwide communication
hall system.
5.1 Virtual Personal Library: Webshelf
Webshelf uses a book-metaphor interface, to showing virtual books and bookshelves
on the user displays to give the users a feeling of possessing the information
they have obtained. In general, the users get a feeling of possession through
organization, because information itself gives no sense of reality to the
user unless it is strongly connected to the memory of the user during the
time the information was organized. The book-metaphor interface is valid
because it supports information management by using spatial memory and
an outline of the whole body of information [7] [8].
Webshelf has two types of windows: a book window and a bookshelf window.
The book window, an example of which is displayed in Figure
4, enables users
-
to browse the Hypertext Markup Language (HTML) documents obtained from
the World Wide Web in a virtual book form.
-
to annotate the documents virtually by using a label metaphor. The user
can put labels anywhere on the pages of a virtual book and can easily jump
to the location of any label.
-
to make virtual scrapbooks of the documents.
-
to make personal links in addition to the original links.
Annotations, scrapbooks, and personal links are value-added information
to the original information.
Figure 4: Webshelf's book (left) and scrapbook
(right) windows with annotations.
The bookshelf window, shown in Figure
5, enables users
-
to make virtual bookshelves with alterable names and widths in the local
storage of the PCs.
-
to virtually personalize the documents by storing the URL (Uniform Resource
Locators) to the original documents and the URL to any value-added information
as book icons in the virtual bookshelves.
-
to recall the original documents to a book window with their value-added
information by clicking on the book icons.
-
to rearrange the book icons in the virtual bookshelves by using drag and
drop actions.
-
to make virtual bookshelves in the storage area of an information publishing
server in order to make the value-added information available to other
users.
-
to copy the value-added information in local storage to server storage
by using drag and drop actions of book icons among the virtual bookshelves.
-
to search the book icons by strings in names and URLs.
-
to store the URL and alterable name of document from, and to recall the
original document to standard browsers such as Netscape Navigator instead
of a book window.
5.2 Virtual Personalization of Information
Webshelf uses a pointer-storing method based on the concept of "transclusion" [9]
to store the value-added information. In our interpretation, transclusion
means not storing the original information but storing only the links to
the original information and reacquiring the original information only
when the value-added information is to be used. The hyper-structure of
the value-added information in the virtual personal library is shown in Figure
6.
Figure 6: Hyper-structure of virtual personal
library.
The pointer-storing method allows the users to personalize the original
information virtually. This method makes it easy to determine if the original
information to which many users added is the same or not, and to easily
synthesize the value-added information created from the same original information.
Furthermore, the pointer-storing method enables digital libraries to handle
copyright management and to legally validate the use of information resources.
To denote value-added information, we defined an extended version of
HTML. Link <A> of the HTML has extended attributes. These include the
location where the information in a scrapbook begins ("start=206," in Figure
7) and ends ("end=336," in Figure 7) [4].
These locations are represented by the number of bytes from the top of
the original information in the book window.
Figure 7: Example
of extended HTML (in scrapbook).
These extensions do not cause errors in standard browsers, and the titles,
or headlines of the original documents can be displayed by the standard
browser. An example of the Webshelf's bookshelf data is shown to Figure
8. The links are displayed instead of the bookshelf and book icons.
An example of the Webshelf's scrapbook data is shown to Figure
9. The links to the clipped information are displayed instead of their
contents. Even the users who don't use Webshelf can browse some part of
the value-added information that is published.
Figure 8: Webshelf's bookshelf data displayed
by Netscape Navigator. |
Figure 9: Webshelf's Scrapbook data displayed
by Netscape Navigator. |
5.3 Webarchive: A Solution for the Information
Volatility Problem
A big problem with the information on the World Wide Web is its volatility:
digital information is modified and deleted easily and frequently. This
problem creates a big obstacle to using the pointer-storing method. Modification
or deletion of the original information invalidates the links from the
value-added information. Information in an intranet can be managed to avoid
this problem, but it is impossible for all information on the World Wide
Web to be managed in a similar way.
Our solution to the information volatility problem is Webarchive,
a hypermedia archive server which operates as a proxy server between the
user's PC and the World Wide Web servers. Webarchive stores all information
obtained by the users from the World Wide Web servers.
Webarchive uses the created/updated time of the original information
as an index to manage the consistency of the links (Figure
10). Because link consistency is maintained users can easily navigate
both spatially and chronologically between the World Wide Web sites.
Figure 10: Link-consistency management in
Webarchive.
Webarchive works not only with Webshelf, but also with standard browsers
like Netscape Navigator. Webarchive automatically adds a time index to
URLs as a query by using the redirection mechanism and the "Referer" of
Hypertext Transfer Protocol (HTTP) [10].
The details of this process are explained below and shown in Figure
11.
When the user specifies URL "A", which has no time index, the browser
sends an HTTP request to Webarchive. When Webarchive receives the request,
it searches versions which are stored in cache memory and selects one of
the time indices of "A" to make a response which the browser then redirects.
In this case we assume the time index "t2" is selected according to some
user-designated strategy, so Webarchive specifies a URL "A-t2" in the "Location"
header of the response. A sample of the URL with a time index is shown
in Figure 7.
After receiving the redirection response, the browser sends the same
request to Webarchive, but in the new request the URL "A" is replaced by
the URL "A-t2," according to the Location header. Webarchive searches and
returns the content because the URL of the request has a time index.
If the content specified by URL "A-t2" has an inline image or a link
to URL "B," the HTTP request of URL "B" has a "Referer" header which specifies
URL "A-t2", i.e. starting point of the link. Because URL "B" has no time
index, Webarchive searches stored versions and selects one of the time
indices of "B" to make a response for the browser to redirect. In this
case Webarchive specifies URL "B-t1" in the Location header of response,
according to URL "A-t2" in the "Referer" header.
Figure 11: Link-consistency management mechanism
of Webarchive.
5.4 Link Conversion: A Solution for the Stiff-necked Link
Problem
When the value-added information is copied and relocated from local PC
storage to an information publishing server, another problem occurs, which
we call the "stiff-necked link problem" (Figure
12). If a link pointing to the local PC storage is described by an
absolute address, the link still points to the local PC storage even after
the information is copied to the server. We call this a "stiff-necked link."
Webshelf detects stiff-necked links in the server and converts them by
using a simple algorithm.
Another solution is to rewrite the link at the time of relocation. But
link rewriting can invalidate other links by shifting the number of bytes
in the link descriptions. If other links point to this value-added information
containing rewritten links, the other links are invalidated. If the other
links are rewritten, link invalidation is propagated backwards. Therefore,
link rewriting must be avoided to prevent the propagation of invalid links.
Figure 12: Stiff-necked link and link conversion
by Webshelf.
6 Discussion
6.1 Bookshelf Metaphor
We demonstrated our prototype system to about one thousand persons at our
laboratory and at some exhibitions in Japan. After discussions with some
observers, we have concluded that different users may prefer different
GUIs. Some of the observers prefer the folder-type GUI adopted by Apple
Macintosh, Microsoft Windows, and so on. Some of them, however, prefer
the bookshelf metaphor. Many persons who are not already familiar with
computers prefer the bookshelf metaphor GUI to the folder-type GUI.
Though the metaphor subjects to the GUI design and restricts convenience
and efficiency of view design, the metaphor is approachable for persons
who are not accustomed to computers. This approachability will be very
important for digital libraries because various users will want to access
digital libraries. Designing operations well, so that the metaphor reminds
users of the real object, will increase the entire system's approachability.
6.2 Book Metaphor
Page-turning of the book metaphor has two advantages over scrolling:
-
The thickness of the virtual book helps users to assess the amount of content
more easily than scroll bar.
-
The human eye more easily keeps track of its place in the contents when
page turning, rather than scrolling, is used.
Many HTML documents on the World Wide Web do not fit the book metaphor
because the documents have complex frames, forms and wide figures which
require two-dimensional scrolling. Long and simply structured text, such
as plain text and HTML texts using a few tags (<A>, <H1>, <H2>,
and so on) is fit the book metaphor. The amount of this kind of text on
the World Wide Web is substantial.
6.3 Label Metaphor
The label metaphor accurately represents the actual activity of many persons
who make memos and annotations in real books and articles. It offers high
browsability. The label metaphor will be applicable to mail and work flow.
6.4 Copyright Problems
Because Webarchive stores copies of original information, copyrights may
be infringed upon. A management policy such as the following is therefore
needed.
-
If the user(s) of Webarchive is/are the author(s) of the original information,
copyright infringement cannot occur. This holds true for an intranet.
-
If the user(s) of Webarchive is/are not the author(s):
(2-1) If the authors have specifically given permission for the users
to copy and distribute the information, copyright infringement is not a
problem.
(2-2) If the authors have not given permission, copyright infringement
is possible. In this case Webarchive must be operated for only personal
use.
(2-3) If the authors have prohibited copying and distribution of the
information, copyright infringement is possible. In this case Webarchive
must be operated for only personal use, or the users of Webarchive must
make a contract with the authors.
6.5 Storage Required
Because Webarchive stores all the information obtained by users from the
World Wide Web servers, Webarchive requires a large amount of storage.
We estimate the amount of required storage to be about 2 GBs per user per
year when the mean length of information is 200 KBs and one user browses
about 20 pages a day. This estimate shows that Webarchive running on a
PC can be practically used for only personal use or small group use.
6.6 Further Applications of Webarchive
Webarchive can be used in various applications when it is connected to
a DBMS managing more than 1 TBs of data. One such application is a copyright
library; another is a deposit library for hypermedia publications which
are available only online. Much important information is now missing from
the World Wide Web [11]. Using
the link-consistency management of Webarchive will allow the hyper-structure
of any point in time to be recalled.
7 Conclusion
We have described our concept of a digital library from the user's viewpoint
using a three-layer model of information distribution: information provider,
information broker, and information user. The information flowing through
these layers is organized in various ways, and is developed spirally. The
users virtually personalize the obtained information and create new information
on the basis of the personalized information.
Our worldwide communication hall prototype system is connected to the
World Wide Web and supports the total flow of information around a user's
virtual personal library. This system consists of the virtual personal
library client Webshelf, an information publishing server, and the
hypermedia archive server Webarchive. Webshelf allows users to virtually
personalize information obtained from the World Wide Web, to organize it
to meet their needs and to share the personalized and organized information.
Our prototype implementation has shown that our hypermedia archive server
Webarchive is an effective solution for the information volatility
problem, because it stores all the information obtained by the user, and
maintains the link consistency.
The metaphor is approachable for persons who are not accustomed to computers.
This approachability will be very important for digital libraries because
various users will want to access digital libraries.
Our future research goals are archive management to control huge databases
and to delete unnecessary data, refinement the Webshelf GUI, connectivity
of Webshelf to other Internet information media, such as mail and news,
and realization of those functions on standard browsers using Java, XML,
and so on.
References
[1] Hitachi Ltd., "Library Information Integrated
System -- LOOKS21 (in Japanese)," http://www.hitachi.co.jp/app/looks/
[2] Fujisawa, H.,
Mishina, Y., Ashizawa, M., and Kato. K., "Multimedia Digital Library Systems
for the Global Information Network," Hitachi Review Vol. 44, No. 5, Oct,
1995, pp. 273-280.
[3] Kato, K., et al., "An
Index-Free Full-Text Search Machine for Large Japanese Text Bases," Proc.
Advanced Database System Symposium, 1989, pp. 75-82.
[4]* Kikuchi,
H., Mishina, Y., Ashizawa, M. Yamazaki, N., and Fujisawa, H., "User Interface
for a Digital Library to Support Construction of a eVirtual Personal Library,'
" International Conference on Multimedia Computing and Systems (ICMCS96),
IEEE Computer Society, Jun 1996, Hiroshima, Japan, pp. 429-432.
[5] http://www.yahoo.com/
[6] http://www.hole-in-one.com/
[7] Okada, K., Kinoshita,
K., and Matsushita, Y., "Window System with Leafing Through Mode: Book
Window," proc. of 1st Moscow Int. HCI '91, 1991, pp. 242-248.
[8] Shipman, F. M.,
Chaney, R. J., and Gorry, G. A., "Distributed Hypertext for Collaborative
Research: The Virtual Notebook System," proc. of Hypertext '89, 1989, pp.
129-135.
[9] Nelson, T. H., "Literary
Machines," Self-Published, 1983.
[10] Berners-Lee, T., Fielding, R., and Frystyk,
H., "Hypertext Transfer Protocol
-- HTTP/1.0," RFC1945, May 1996.
[11] Kahle, B., "Preserving
the Internet," Scientific American, March 1997.