Based on the submission to the request for proposal, we are now developing the following technologies: system architecture that consists of messaging architecture, agent architecture multimedia database architecture, application architecture; individual technologies such as digital conversion of documents, intellectual information retrieval agent, selective information distribution agent, concept-based text retrieval, hypermedia retrieval using 3D visualization, content-based retrieval for video; integration technologies such as contents entry framework.
This paper first describes the system architecture and next individual technologies built on the system architecture.
Contents processing technology
Technology that provides effective creation, storage, and retrieval of primary information and secondary information: including digital conversion from conventional, non-digital media.
Information access technology
Technology that enables efficient accesses to myriad types of
information without time or location limitations.
Human-friendly, intelligent interface
User interface that brings, to diverse users, increased intellectual
productivity and an improvement to the active cultural environment.
Interoperability
Technology to make interoperable works possible in heterogeneous
environments.
Scalability
Technology that enables DL systems to handle increases in information
and users.
Open system development
Development using international and de facto standards, without loss of
performance.
Highly flexible system development
Technology that can adjust quickly to new information and related
changes to social systems.
After the preliminary study of these technologies, we issued the request for proposal in the public. The RFP contained the system architecture development based on three-layer model, individual technology development, and integration technology. Three-layer model separates the system's functions into the Presentation layer, Function layer and Data layer. It will bring high flexibility and high extendibility. Each technology development was requested to use the three-layer model and object oriented technology.
Through the evaluation of the proposal, we selected the following development: the system architecture that consists of message architecture, agent architecture, multimedia database architecture and application system architecture; contents entry frame work that incorporates the entry functions as components; advanced contents entry technologies; advanced search and retrieval technologies; integration technologies to combine each technologies as a DL system.
We are now developing these technologies by forming a project that consists of Hitachi, Fujitsu, NEC, IBM Japan, Toshiba, Mitsubishi Electric, Oki Electric Industry, Nihon UNISYS, and Ricoh.
This project is getting a fund from Ministry of International Trade and Industry (MITI) between fiscal year 1996 and fiscal year 1999.
The system architecture is consisted of a basic architecture and application architecture. The basic architecture has three sub-architectures (the Messaging Architecture, Agent Architecture, and Database Architecture) and the reference model that enables mapping of the sub-architectures onto a basic architecture.
Within the Presentation layer, the synchronous messaging service provides an HTTP-based communication service with an optional CORBA-based service. A service within the Function layer, and between the Function and Data layer, provides a CORBA-based communication service. If a CORBA message needs to go through another messaging platform, or if you need interoperability with a system constructed on another messaging platform, a gateway (also called a proxy object) will convert the protocol.
Within the Presentation layer, the asynchronous messaging service provides the mail service if no CORBA-based service is provided within the layer. When a CORBA-based service is provided, the asynchronous service provides an asynchronous extension service of CORBA. The service within the Function layer and between the Function layer and the Data layer also provides an asynchronous extension service of CORBA.
When the synchronous messaging service within the Presentation layer provides a CORBA-based service, a CORBA message can be sent from the Presentation layer to the Function layer. The WWW CORBA gateway in the Function layer makes it possible to send a pseudo CORBA message from the Presentation layer to the Function layer by translating an HTTP message from the Presentation layer into a CORBA message.
Messaging architecture provides the following services as primitive services for DL systems: Event Notification, Lifecycle, Naming, Transaction, and Security. These services provide common functions and common interfaces to all the functional objects which make up a DL system.
A place provides the following services:
When a message has no address, the Facilitator finds a suitable receiver agent in accordance with the content and ontology specified in the message. The Facilitator also provides a function for multicasting information to interested agents. The main features of the Facilitator are:
Publishing and subscribing
A client agent requests information monitored by the Facilitator. A
server agent subsequently notifies the Facilitator of the information
corresponding to the request. The Facilitator can then notify the client
agent.
Recommending
The Facilitator can find a suitable agent and notify a message sender of
the location and identifier of the agent.
Brokering
The Facilitator can find an agent which can provide a requested service,
and can then send a message to the agent. A reply message corresponding
to the request is also transferred to the original agent.
The Facilitator uses a match-making algorithm to provide these feature. Users can customize this match-making algorithm, which increases the flexibility of the message routing function.
An agent can move among nodes by using the Migration Service in the Agent Architecture. The Migration Service provides agent mobility by using the Lifecycle Service, Persistent Memory Service, Security Service, and the Directory Service in the Agent Architecture.
The Migration Service is implemented as a CORBA object in the Messaging Architecture and has the following two methods:
One-way model
In the one-way model, an agent can move from an original node to a
destination node; however, the destination node must be different from
the original node. The original node is the node where the agent was
first created.
Synchronous model
In the synchronous model, a child of the original agent is created when
the original agent requests migration. The child agent then moves to the
destination node to execute its own task, and then moves back to the
original node. The child agent then reports the execution results to the
original agent and terminates. Until the original agent obtains
execution results from the child agent, the original agent is blocked.
Asynchronous model
The asynchronous model is basically the same as the synchronous model;
however, in the asynchronous model the original agent is not blocked.
After execution of its own task, the child agent moves back to the
original node. The child agent then writes the execution results in
persistent memory and terminates. The original agent or another agent
can access that persistent memory and obtain the execution results.
The Mobile Agent Facility consists of the following layers:
This extension enables users to integrate their own data types, which can have methods such as new index searches that can handle new media content. To implement the plug-in extension to the DB, we are developing the following interfaces and plug-ins.
SQL3 Abstract Data Type (ADT) capability that we add to the MMDB can provide definition, construction, inheritance, implied observer and mutator functions, encapsulation, polymorphism and reference. We believe that these features are necessary for managing SGML structured documents, image data, voice data, and so on.
Because of the lack of the functions, conventional SQL procedures encounter difficulty when we want to implement user-defined indexes (access methods) with high performance and high reliability. A plug-in function solves this problem and makes it possible for DBMS users to incorporate their own defined methods, such as an index to the database kernel.
To implement new indexes, we also provide new entry points to maintain, rollback, and recover an index. Since these entry points are not convenient in SQL3 ADT, the entry points are defined using new IDL (Interface Definition Language) that we developed. IDL can also define the details of interfaces used for calling from the database kernel to plug-in modules, and for generating stub modules and C language headers such as in CORBA IDL. To access internal database resources such as records, pages, BLOBs and database journals without overhead and security risks, we plan low-level access interfaces from the plug-in modules to the database kernel.
The SGML plug-in converts SGML texts into internal tree-structured data using their DTD (Document Type Definition), stores the data in a BLOB, and calls an n-gram plug-in to maintain the index. The new Contains ADT function can execute a full text search using the n-gram plug-in.
Using structural queries, the n-gram Japanese text search index plug-in can quickly search not only plain Japanese text but also SGML text.
To solve the scalability problem, our MMDB can be used as shared nothing, parallel database servers.
DL middleware has facilities for APIs, document registration, document retrieval, document version control, and compound document management.
The functional specifications of the middleware is based on the proposed DMA (Document Management Alliance) model. The major part of our specifications comes from the DMA specifications; however a structured document management facility and extended document retrieval facilities are added to the DMA specifications. Through this extension, DL middleware will be able to handle SGML documents.
The DMA specification does not discuss sophisticated retrieval facilities: such as ranking, proximity, Z39.50 queries, or structure specific retrieval. Also, practical usage will require the merging of ranked retrieval results from heterogeneous information sources. STARTS (Stanford Protocol Proposal for Internet Retrieval and Search) is one solution to these problems.
In addition to the basic SGML handling function, these features will be incorporated into the application system architecture.
Some additional problems relating to access of information sources are:
Uniform interface
A uniform interface absorbs the differences among access methods and
provides unified operation on various kinds of information sources.
Templates for access command scripts for each information source are
held in the system. Because a conditional command sequence can be coded
in the script, even an interactive retrieval can be executed
autonomously. For an entered query, a template appropriate for the query
is selected and keywords in the query are substituted for variables in
the template.
Automatic selection of source to be accessed
Appropriate information sources are selected automatically according to
the user's query and network conditions. A knowledge database is formed,
which includes the specialty, service hours, and expected response time
of each source.
Merging and unifying search results
Individual search results from various kinds of servers are modified to
a uniform format, merged, and made less redundant.
A thesaurus, which plays an essential role in coping with the vocabulary problem in text retrieval, is automatically generated from a text corpus. This automation will drastically reduce the cost of creating and maintaining a corpus-specific thesaurus.
The thesaurus browser is used as the front end for a text retrieval engine. It helps users navigate in the concept space of the subject field of the corpus. Therefore, they can easily articulate their information needs, and choose appropriate terms for retrieval.
The document clustering module post-processes the results of text retrieval. It extracts clusters of similar documents from the set of retrieved documents, and shows a digest of each cluster to users. The users can thereby efficiently judge the relevance of the retrieved documents.
Automatic thesaurus generation is the most essential issue of this research. We are integrating technologies for the following: extracting terms (including compound words), performing constituent analysis of compound words, acquiring co-occurring data, and analyzing term correlations.
In order to extract terms precisely from a corpus, statistical processing is combined with morphological analysis. The co-occurrence of data is the most important clue for extracting relations between terms. We extract several kinds of co-occurring data including co-occurrence in sentences, co-occurrence in windows, and syntactic co-occurrence. Correlations between terms are analyzed based on the co-occurrence of data. We calculate not only first order correlations (like mutual information) but also second order correlations (like contextual similarity). Thus we can extract various types of relations between terms: for example, synonym relations, broader term-narrower term relations, and predicate-argument relations.
Having developed a prototype thesaurus generator and browser, we are evaluating it with a large newspaper corpus. Both the quality of the thesauruses and the computing efficiency are to be improved. The document clustering module has just been designed.
The experience of 3D hypermedia technologies which can relate each media to others in 3D VR space constructed with various media such as CG, video, images and texts, shows that a user can use the visual information to search documents effectively. On the other hand, it is reported that it is hard to exactly select a target when complicated 3D models overlapped each other. We are developing the technology that makes it possible to define the relevance of meaning for anchoring and linking in 3D VR space. If the user points out the relevant area roughly, the system can analyze the search goals of the user, the relevance of meaning, and relevance among links in 3D space; and then access the information which has high relevancy to the user's goals.
We are also developing an technology which enables a user to make any 3D shape anywhere in the 3D VR space. Even persons who do not know how to construct a simple 3D model will be able to make such a shape. This technology allows to construct a 3D anchor not only for the 3D object itself in 3D space, but also for a 3D area that has several overlapping objects or an area that has no relation to a 3D object. The user will be able to search information easily by using flexible 3D anchor.
Many conventional content-based retrieval methods such as QBIC and Jacob use features of the overall image or frame. The information about features used for retrieval is a mixture of object features and background features. Thus, with conventional methods, it is difficult to retrieve video shots that contain desired objects because the background affects the information about features.
Our content-based retrieval method for video data uses the features of each object in the video (MPEG-2). We focused our attention on moving objects in the video to simplify the object region detection. Our method uses colors, color location, and the motion direction of the object as the information about features. The information is automatically extracted from the raw data.
The information about features is calculated as follows:
The moving object region consists of some regions that have a unique color. The object region is segmented and the values of the representative color, area, and the location of the centroid are used as the information about the object features.
The user uses a GUI to submit the user's query. The user selects a color from the color palette, and then locates the colored rectangle boxes on a color pattern input subwindow. The motion direction input subwindow is used to specify the direction of the object. The search program uses the color values, locations, and the motion direction to calculate the similarity between the specified color pattern and the data in the database. Video data that has higher similarity is displayed as the result.
This system is based on social information recommendation technology. In social information recommendations, we recommend information that other similar users have liked. The social approach has the merit that no analysis of the information is needed. A demerit is that new information that nobody has rated can not be recommended. To avoid this demerit, we devised a new method which combines content-based recommendation with the social recommendation approach.
This system includes two kind of agents: distribution agents and information server agents. Distribution agents can select appropriate information servers and registers based on the content of the information that an author released. An information server agent can select readers who are likely to be interested in the information. The distribution agent can automatically adapt to the changing of user's interests.
This system has the following merits:
A document can be represented as a set of various types of data whose structural level might vary as digitization progresses. The most fundamental level would be a combination of bibliographic information and information about the physical location of a document, perhaps with page images scanned in by a scanner. As the table of contents, the abstract, the full text, and images for figures and pictures are added incrementally, the contents of the digital document become richer and richer. The final structural level will be decided document by document.
Retroactive contents entry will have to be made for a great variety of documents that are printed with too many different layout styles and too many font styles. Some of these documents might be written in old languages or might be printed with old characters such as old Japanese characters (Kanji). In addition, fields of digitization interest might vary from literature and history, to science and technology, or even to law and legislation.
To improve the performance of such retroactive contents entry, it is necessary to cope with many different layouts and many font styles, recognize both current and old Japanese characters, maintain many post-processing dictionaries, etc. Also, although the basic functions are common, the user interface might have to be tuned on a case-by-case basis, depending on the types of target documents or the operational environment.
To satisfy the above requirements, we must have a framework that can integrate various component technologies as open, highly flexible parts. Therefore, a new OCR (ICR) framework has been designed to achieve prioritized, step-by-step content entry.
The framework will provide the flexibility to combine various processing objects depending on needs, to define and adopt standardized common protocols that are used to record the exchange of interim data among processing objects, and so on. The framework research is based on the design of an OCR (ICR) system which implements hierarchical object management and common message protocols among processing objects. Our research is intended to define common specifications such as for inter-object messages in the Function layer, the interim data models in the Data layer, the verification and correction interface in the Presentation layer, and batch script definition. The specifications are to be implemented in the DL system architecture. The final goal is to establish a highly-flexible framework called the "Contents Entry Framework" which will cover all tasks concerning the contents entry.
When we started the project we had a policy to adopt the international standards and de fact standards as possible as we can and to extend the standards if the functions are insufficient.
According to this guideline we have added the CORBA/WWW gateway function to CORBA standard and extended DMA and SQL to handle SGML document.
In the field of no standards we are planing to present our technologies to the standard bodies. For example we have already submitted our agent architecture to FIPA (The Foundation For Intelligent Physical Agents).
Our technology development dose not cover all area that future digital library needs. We think that missing area will be covered by the other projects. For example we could use the payment mechanism that is being developed in the Electric Commerce project.
Integration technology that we are developing will make those matters possible.
we are also putting significant research efforts other than we described here: for example, property right protection mechanisms, SGML automatic conversion methods, filtering system, component integration technology and so on. A subsequent paper will cover these activities.