Digital Library System at Kyoto University

Sadao Kurohashi Makoto Nagao

Department of Electronics and Communication, Kyoto University
Yoshida-Honmachi, Sakyo-ku, Kyoto, 606, Japan
E-mail: {kuro,nagao}@kuee.kyoto-u.ac.jp

Abstract

This paper describes a digital library system at Kyoto University. Construction of the system started in April, 1997, targeting the opening of the first version of the library system in January, 1998. The key aims of this library system are to realize the two facilities: 1) Encyclopedia of Kyoto University, 2) Digital publishing support system. The library also provides several useful support functions for retrieval and electronic reading. The paper also reports a preliminary experiment of a retrieval system utilizing tables of contents of books.

Keywords:

Digital Library, Encyclopedia of Kyoto University, Retrieval utilizing Tables of Contents.

1 Introduction

One of the authors of this paper was presiding over the Digital Library Research Group from 1990 to 1997. This research group was seeking a framework for an ideal digital library, not a compromised framework which can be easily realized by the present-day technologies. As a result we have developed a prototype digital library system Ariadne [1] in 1994, in which we realized several useful support functions for retrieval and electronic reading.
Including Ariadne there have been many digital library research projects and prototype systems published since early 90's, and these days digital libraries have entered a practical, large-scale experimental phase.
Since the fiscal year 1997, Kyoto University Library has been approved a regular budget for digitalizing library activities, including digital library functions, from the Ministry of Education. Within this budget, we are trying to realize an ideal digital library system with several functions developed in Ariadne. This paper describes several aspects of the digital library system at Kyoto University.

2 Key Concepts

The key aims of this library system are to realize the following two facilities (Figure 1):
  1. Encyclopedia of Kyoto University,
  2. Digital publishing support system.
The encyclopedia is aimed at answering any questions about Kyoto University and its academic and research activities. It covers both the historical materials possessed at Kyoto University and the reports of advanced research activities carried out at Kyoto University. These reports must be added and updated day by day. The second facility of the system, namely the digital publishing support system, is to help with producing such reports.

3 Contents

The current contents and future plans of the digital library are summarized in Table 1. Kyoto University Library currently houses 5.5 million books, out of which 0.7 million have been cataloged into OPAC (Open Public Access Catalog; bibliographic database). Every year about 90 thousand new books are obtained. By adding them and by cataloging some older books, the library plans to increase the OPAC to one million entries by the year 2000. Not only bibliographic data, but also tables of contents of the books are planned to be digitalized in the budget. So far, however, only 1,000 tables of contents of scientific books have been digitalized in order to test the retrieval method utilizing tables of contents (see Section 5).

Table 1 : The contents of Kyoto University Digital Library.
Present contents Future plans
By the Library OPAC (700,000 entries)
Tables of contents (1,000 scientific books)
National treasure : ``Konjaku Monogatari''(1 item, 9 volumes)
Important cultural properties (4 items, 53 volumes)
Treasure books : ``Okuni Kabuki'' (1 volume)
Special Collection in the Meiji Restoration (3,000 volumes)
Catalog of the ``Zokyo Syoin Collection'' (7,600 volumes)
OPAC (1,000,000 entries by the year 2,000)
Tables of contents (monographs and series)
Centennial History of Kyoto University (7 written volumes and 1 volume of photographs)
Dissertation abstracts (700 titles a year)
Important Cultural Properties(35 items, 117 volumes)
Treasure books, archival materials, maps, etc.
By Faculties, Schools & Institutes Summaries of research activities of the Graduate School of Engineering Textbooks
Projects reports supported by \ Grant-in-Aid for Scientific Research Bulletins
Electronic journals (EES etc.)
University publications

As for the historical material, the library has already digitalized several materials over the years, including a special collection of the Meiji Restoration (3,000 volumes), a treasure book ``Okuni Kabuki,'' a national treasure ``Konjaku Monogatari,'' and others. These historical materials have been made available via the WWW home-page of Kyoto University Library (http://www.kulib.kyoto-u.ac.jp). The library is planning to digitalize other important cultural properties (170 volumes in total), and many treasure books, pictures and maps. The library also plans to digitalize ``Centennial History of Kyoto University,'' seven written volumes and an volume of photographs. It will represent one important body of the Encyclopedia of Kyoto University.
As for the advanced research reports, the core is made from the dissertation abstracts of Kyoto University (about 700 titles a year), projects reports supported by Grant-in-Aid for Scientific Research, and bulletins. Digitalization of such documents can be supported by digital publishing support system both by software (DTP software) and hardware (OCR etc.).

4 System Overview

The practical design of the digital library started in April, 1997. After the careful design of the overall architecture, we are currently finalizing the software construction in cooperation with Fujitsu Limited, targeting the opening in January, 1998.
The overview of the system is shown in Figure 2. The digital library system not only handles digital contents, but also supports conventional library administrative activities. The system contains several server workstations for purchasing/cataloging administration, loan administration, database management, retrieval management, client management and advanced research. It also contains about 250 client PCs for retrieval, printers and OCRs.
All these hardware are connected through KUINS (Kyoto University Integrated Information Network System) to almost all workstations and PCs at Kyoto University, so that faculties and students can access the library from their own labs and several computer rooms.
KUINS is then connected to the Internet, so that all the contents of the library can be accessed at anytime by anybody from anywhere in the world.

5 Advanced Aspects

Ariadne, the ideal of the prototype digital library system, provides the following functions:
  1. Support functions for electronic reading:
    1. concurrent reading of more than one document,
    2. dictionary look-up while reading,
    3. digital insertion of memos and markings,
    4. automatic translation,
    5. automatic text readings, and others.

  2. Support functions for retrieval:
    1. hypertext retrieval,
    2. retrieval by bibliographic data,
    3. retrieval by full-text, and
    4. retrieval using table of contents.
Functions 1-a through 2-c can be realized by present-day technologies to some extent, and they are supposed to be incorporated into the digital library system of Kyoto University.
The remaining function, that is retrieval using table of contents (RTOCs), is currently being investigated for an experimental integration into the digital library in the near future.
The idea of RTOCs is to search relevant books by matching user input words with TOCs of the books. TOCs of a book usually represents its contents appropriately so that TOCs can be used as a set of keywords of the book. Furthermore, by exploiting the hierarchy structure of TOCs, the method can become superior to conventional AND retrieval over a set of keywords. In the hierarchy structure (tree structure) of TOCs (title, chapter titles, section titles, and so on), we can say the closer the two words are, the stronger their relation. That is, the strength of the relationship between two words can be ranked according to their location in the TOCs text as follows (Figure 3):
A = B : both A and B are in the same title,

A > B : A appears in a higher level of the TOCs tree than B,

A < B : B appears in a higher level of the TOCs tree than A,

A || B : A and B share the same immediate upper title,

A # B : other than the above four relations.
If this is the case, it can be utilized to improve AND retrieval performance. Given two words, A and B, as a user input, the system first retrieves all books whose TOCs contain both A and B (conventional AND retrieval). Then, it can provide ranked outputs according to the above order of A and B's location in the TOCs.
A preliminary experiment has been done using TOCs data of 1,000 scientific books of Kyoto University Library [2]. We asked graduate students of the engineering department to use the RTOCs system: entering two words for what they want to know, and then ranking the outputs from 5 (good) to 1 (bad). The result is shown in Table 2. The table clearly indicates that the human evaluation of outputs and the locations of the two input words in TOCs are closely correlated to each other.

Table 2 : The experimental results of retrieval utilizing table of contents.
# of books retrieved
(ind ; acc)
Ranked 4 or 5
Precision (%) Recall (%)
(ind ; acc) (ind ; acc)
Ranked 3, 4 or 5
Precision (%) Recall (%)
(ind ; acc) (ind ; acc)
A = B 0.7 ; 0.7
71 ; 71 26 ; 26
74 ; 74 22 ; 22
A > B 0.8 ; 1.5
60 ; 66 16 ; 42
77 ; 76 20 ; 42
A < B 0.4 ; 1.9
52 ; 62 08 ; 50
75 ; 77 09 ; 51
A || B 0.5 ; 2.4
37 ; 58 09 ; 59
52 ; 74 09 ; 60
A # B 2.7 ; 5.1
29 ; 43 41 ; 100
42 ; 59 40 ; 100

* The above results are the averages of 80 queries (four subjects, each of which submitted 20 queries).
The items under ``ind'' show the result of individual location on its left; the items under ``acc'' show the accumulated result of locations from the top (A = B) to its left.
Precision and recall were calculated as follows:

Precision =# of retrieved and highly ranked books
# of retrieved books

Recall =# of retrieved and highly ranked books
# of highly ranked books retrieved by AND retrieval
(We supposed AND retrieval detects all appropriate books for queries, so that accumulated recall to A # B was calculated as 100%.)

Currently we are continuing research on RTOCc: 1) enlarging the data of TOCs, 2) investigating the relationship between words' role ambiguity and TOCs hierarchical structure, 3) trying to disambiguate and exploit semantic relations in the title phrases (especially very ambiguous Japanese phrase ``A no B (A of B)'').
Apart from RTOCs, we are working on several research topics on Information Retrieval. One is to realize fact retrieval function, which is closely related to the Encyclopedia of Kyoto University. Fact retrieval function is the function to answer queries like ``who is the president of Kyoto University?'', ``how many departments compose Kyoto University?'', or ``what's new about Kyoto University?'' When data is highly organized like the case of relational database, it is easy to answer queries (within the subjects covered by the database). The Encyclopedia of Kyoto University, however, is basically the large volume collection of documents/texts. To answer queries like the above, they have to be analyzed by utilizing natural language processing techniques, and information there have to be extracted and organized. This would be an interesting and challenging topic in the field of natural language processing [3].
Another important topic is the user interface. In most cases, a user cannot obtain satisfactory results from the first query. The problem is how to provide intuitively understandable interfaces of the results (especially when lots of records are retrieved) and how to guide a user to specify his/her demands properly [4].
Such research does not make sense in an imaginary environment. The Kyoto University digital library provides very good experimental grounds for testing new ideas and new frameworks in IR. This environment possibly brings a revolution to IR research.

6 Conclusion

To achieve really serviceable information handling environment, we have to manage several levels of integrations/collaborations: 1) integration between conventional manual library administrations and computalized administrations, 2) integration between managements for physical books/documents and those for electronic books/documents, and 3) integration/collaboration between many organizations in Kyoto University: University Library, University Museum, Center for Information and Multimedia Studies, Data Processing Center, and Graduate School of Information Science.

Acknowledgment

We would like to thank Mr. Jun Katayama and Mr. Shinpei Ogawa, Kyoto University Library, for their help.

References

[1]
Makoto Nagao, ``Multimedia digital library : ARIADNE,'' in Proc. of International Symposium on Digital Libraries, 1995.
[2]
Sadao Kurohashi and Makoto Nagao, ``A system for document retrieval by utilizing the table of contents information,'' IPSJ SIGFI Note 45-5, 1997.
[3]
Proceedings of sixth message understanding conference (MUC-6), Morgan Kaufmann Publishers, Inc., 1995.
[4]
R. Rao, J. O. Pedersen, M. A. Hearst, J. D. Mackinlay, S. K. Card, L. Masiner, P. K. Halvorsen, and G. G. Robertson, ``Rich interaction in the digital library,'' Communications of the ACM, 38(4), 1995.