A Multimedia Document Retrieval Technique in DigitalLibraries

Jongpil Yoon
Computer Science Dept
Sookmyung W. Univ.
Seoul, Korea

Sung-Hyuk Kim
Library & Info. Sci. Dept
Sookmyung W. Univ.
Seoul, Korea

Sang-Wan Han
Library & Info. Sci. Dept
Yonsei Univ
Seoul, Korea


This paper proposes a new document retrieval technique that makes it possible to query SGML/HyTime document instances. Of various research issues, this paper focuses on document retrieval techniques partly being implemented in Korea and addresses our new approach for retrieving SGML document instances efficiently from very large document databases. Our approach to document retrieval called "three phase document retrieval" employees database techniques with intelligent multimedia retrieval. Queries can be constituted with metadata about a document in the first phase. Then, in the second phase, queries can be constituted with any combination of "ELEMENTs" that are defined in the corresponding document type definition. Finally, for sophisticated users semantic approaches are used for queries. They are asked for annotations or heuristics with one's subjective meanings.


Multimedia Document Modeling, Content-based retrieval, Meta-data supporting, query processing, query optimization

1 Introduction

Recently, as the Internet and the World Wide Web are getting popular, the topic of digital libraries has become hot and its research has began to be carried out actively in some degree. As digital libraries become larger, current techniques of information retrieval cannot be used efficiently both for browsing and querying. Browsing in digital libraries is not so possible as in conventional databases because digital documents are free-formatted and very large volumes. Querying for exact document instances in digital libraries is not so easy because digital documents are in multiple media types. To resolve these difficulties, we propose a framework of the digital document retrieval model by employing database and intelligence techniques.

In addition, we have implemented a way of combining browsing and querying documents.

Our approach directly motivated by the querying mechanism PESTO [CHMW96] is even further originated from QBE [Zlo77]. By extending our previous work [YK96] we propose "three phase multimedia document retrieval" (3PR in short) employees database techniques with intelligent multimedia retrieval. Queries can be constituted with metadata about a document in the first phase. As in a QBE, queries are asked with the entry values by filling frame slots with values to select. Each slot entry is of metadata, by metadata we mean the external information about document instances. An example of metadata is such document catalog information as publisher, document type, subject, call number, publication year, etc. So the slots requested in the first phase will be filled with user-provided values only if needed to ask.

Then, in the second phase, queries can be constituted with any combination of "ELEMENTs" that are defined in the corresponding document type definition (DTD). Part of the DTD regarding Korean technical journals (or articles) will be provided in the later section. Document instances are populated in the database according to a corresponding DTD. The ELEMENTs in a DTD are used to form the slots in the second phase. By ELEMENT in the Korean technical journals DTD we mean that abstract, chapters, sections, table or figure captions, references, etc.

Finally, a semantic approach is used for sophisticated users to ask queries. In this third phase, queries take user's subjective interests or meanings into account. Subjective meanings include user's annotations or user's heuristics, and can be represented in the IF-THEN rule format.

The contribution of this paper includes inventing a framework for multimedia document retrieval facility, which is far beyond a current text-based document retrieval technology. We include not only current technologies but also extend toward metadata-based, content-based, and semantic-based query. The 3PR makes it possibe for users to constitute queries a large number of documents efficiently.

In this paper, we assume that documents, not only text-based but multimedia document, are structurally tagged in the form of SGML or more generally HyTime, and they are defined in an object relational databases (ORDB). Queries are constituted to retrieve SGML/HyTime instances which may be in text, image, video, or audio forms.

The remaining of this paper is organized as follows: Section 2 describes related work, especially on-going research work being funded in Korea. Section 3 models a document database in object-relational model and Section 4 describes 3PR query model for multimedia document databases. Finally, Section 5 describes conclusions.

2 Related Work

Traditionally, SQL features two classes of types: named built-in types with nominal type equivalence, and user-defined types with structural type equivalence. The latter applies to tables and its constituent rows [PB97]. With the advent of SQL3, user defined types become available when they are registered in the database. Named types with nominal type equivalence are abstract data types (ADTs) and reference types, while structural type equivalence is a characteristic of named row types. The category of unnamed user-defined types with structural type equivalence is extended by support for collection types. In this paper we also use most features available in SQL3.

User friendly queries have been prototyped in research database systems. Zloof has developed also a query language which specifies an example in a form [Zlo77]. Stonebraker and Kalash have developed a browser in relational databases [SK82]. Rowe and Shoens have developed a form based query system [RS82]. Motro et. al. have developed a browser in relational databases [MDT88]. Carey et. al. have developed a form based language combining queries and browsers in object-oriented databases [CHMW96].

This paper is closest in its browsing style to PESTO; we were heavily influenced by PESTO's browsing facilities. However, unlike PESTO, where frame slots are explicitly indicated by the system, we develop DTD based frame slots depending on the documents available in databases.

For digital library researches, Clifton et. al. have developed a document query model by exploiting a filtering mechanism [CGMB95]. Schatz et. al. have developed document thesaurus to use for query processing [SJC96, SMC+96]. Our paper is similar to this paper in that document thesaurus is used. Unlikely, we also develop database thesaurus about database operators for similarity matching. Kobsa et al. [KNF97] Vassileva

[Vas97] describe work on adaptive hypertext and hypermedia systems that are tailored to a user's knowledge, experts, interests and abilities.

Although many digital libraries are being constructed, a few are in service, and only a couple of them have a capability of querying structually designed and marked-up documents. The MIRAGE system allows users to retrieve and browse through multimedia information [Mya96]. Kim and Yoon have prototyped a new information retrieving technique in Sookmyung W. University [YK96].

3 Multimedia Document Database Model

In this section, we define a general strategy for defining multimedia document databases. Introducing a DTD for multimedia documents, we use the SGML/HyTime formalism to encode the structural characteristics of documents.

3.1 Document Modeling

The DTD defines the rules for marking up a class of documents [O94a, O86, O94b] The DTD for document type "article" is shown in Figure 1.

Each document instance contains text or multimedia data being marked up according to a DTD. There are several data models for multimedia enrich databases: from using relational data model to object-oriented data model. At one end of a spectrum of those data models, we propose an object relational data model, implemented in the Illustra ODBMS. Each data in multimedia document databases is assumed in the pair <data_type, data>. The data type of a data can be integer, text, audio, image, video, etc. Each tuple can represent a composite data type, which represents another tuple, and complex data type, which represents more than one data of the same type.

For example, an article can be defined as in Figure 2. The section attribute of article refers to one or more sections, each of which in turn contains title and bodies. Again, the body attribute of section refers to one or more figures and/or paragraphs. The section attribute is a composite object type that its value is another tuple or a set of tuples, and the section attribute is a complex data type that is set-valued.

3.2 Multimedia Data Modeling

Typically, multimedia data, either image, graphics, audio, or video, are stored as a whole in a multimedia database. Storing multimedia data as a whole does not allow content-based retrieval by nature. That is, part of contents of those multimedia data is not retrieved, and so is its meaning. We investigate those attributes associated with multimedia data and provide view points as to how a database thesaurus serves for multimedia content retrieval.

Figure 1: A HyTime DTD for Article Documents

Figure 2: An Object Relational Data Model Example

Multimedia data can be represented as being meta-attributes, logical-attributes, and semanticattributes. Meta-attributes of multimedia data are of information externally represented without referring to and internal contents. Logicalattributes are typical database attributes which represent internal contents of multimedia data. Semantic-attributes are user annotations about the multimedia data, therefore they are not meta or logical attributes. For example in the book "History of Movies," the author or publisher attribute is a meta-attribute, while the Oscar-awarded-movie name attribute is a logicalattribute, and the annotation attribute is a rule that Oscar-awarded-movies are well marketed.

Since a multimedia datum may consist of one of multimedia component objects, those three attributes defined above can be classified as being six different attribute types: Multimedia Metaattributes, Object Meta-attributes, Multimedia Logical-attributes, Object Logical-attributes, Multimedia Semantic-attributes, Object Semanticattributes. The suffix indicates that whether it is for multimedia as a whole, or for multimedia as a component. In Figure 2, the multimedia type is defined over one or more objects that are in turn multimedia types.

For example, logical-attributes can be defined in an object-oriented data model as in Figure 3. Semantic-attributes will be defined as triggers or rules available in object-oriented data models. Figure 4 specifies that "if dominant color of mountains is red, the season is fall."

4 Query Model: 3PR

SQL-like database query specifies a subset of a database, as a table. An SQL expression consists of three clauses: the select clause specifies what to be retrieved; the from clause specifies a scope to be evaluated; the where clause specifies a condition to be satisfied. However, such a query is not appropriate for multimedia document databases because it is not sufficiently enough to express for multimedia data, multimedia document being represented in three different attributes. Notice those three attributes discussed in the previous section: meta-attributes, logical-attributes, and semantic-attributes. Queries posed to multimedia document databases should be expressed in different ways and also be processed to return their results in different ways. That is, the select clause of multimedia queries is not only text-formatted but also image-, video-, audio- based. The from clause includes not only wellformatted tables but also format-free multimedia data sets. A condition (in the where clause) specifies not only for logical comparisons but also for "semantic" comparisons. Possible queries to be presented in multimedia document databases are in Table 1.

In order to retrieve multimedia data efficiently (or for QOS [VKvBG95]) from multimedia document databases, we extend SQL query answering techniques toward multimedia document retrieval. The new technique is based on the multimedia data attributes. As meta-attributes, logical-attributes, and semantic-attributes are used to represent multimedia data, queries are constituted with those attributes. Those queries are then classified as being metadata-based queries, content-based queries, and semanticbased queries. These queries are used in an order to constitute a user interactive query. We call such an order 3PR query. Each one of these queries are discussed in the following sections.

The condition specified in the where clause is defined as a formula. A formula is recursively defined as follows: 1) an atom is a formula, 2) if p is a formula, then so are :p, and (p), 3) if p1 and p2 are formulae, then so are p1 . p2, p1 ^ p2, and p1 =) p2. An atom is (attribute \Theta attribute) or (attribute \Theta c), where \Theta is a comparison operator, e.g., =; 6=; ?; !; ?=; !=, and attribute can be one of those six attributes as defined in the previous section. An attribute in object relational databases can be specified over a dot notation through which joins with other attributes become possible. For example, the expression "(article.section.body.film.type='AVD')" compares if the file type shown in the body of an article section is AVD.

4.1 First Phase: Metadata-based Queries

We propose a new paradigm for specifying a query with metadata. Metadata in this case includes external information of multimedia documents. Meta-attributes can be of document types (e.g., text, audio, video, image, etc), document size (e.g., number of bytes, number of pages, number of minutes to take playing-back, etc), publication year, publisher, document color, document texture, etc. Although some of these meta-attributes are shown in DTD's, most should be defined by a document database designer. Metadata-based queries are constructed based on the metadata-attributes. Among a set of the metadata-attributes a query designer can choose to construct a query with a selection condition.

For example, suppose that "Find all multimedia documents which are published in 1997 and contain audio files of over 10 minutes." Then an SQL-like query will be

Figure 3: An Object Relational Data Model for Meta-attributes

Figure 4: An Object Relational Data Model for Semantic-attributes

SELECT d.title
FROM meta-article d
WHERE d.media.type='Audio' AND
d.media.minute ? 10 AND

4.2 Second Phase: Content-based Queries

Content-based queries are defined over logicalattributes. As discussed earlier, the logicalattributes can be of internal content of multimedia documents. These logical-attributes are shown in DTD's. Among a set of the logicalattributes a query designer can choose to construct a query with a selection condition.

Consider an example: Suppose that "Find all multimedia documents which are written about `Mountain' in a section title and contain those mountain image files." Then an SQL-like query will be

SELECT d.title
FROM article d
WHERE d.section.body.film.type='Image'
d.section.body.title like '%mountain%'

4.3 Third Phase: Semantic-based Queries

Semantic-based queries are defined over semantic-attributes. As discussed earlier, the semantic-attributes can be of annotations or heuristics represented by users. These semanticattributes are not shown in DTD's, but they are defined using a database capability. Among a set of the semantic-attributes a query designer can choose to construct a query with a selection condition.

Consider an example: Suppose that "Find all multimedia documents which are written about `mountain' in a section title and contain those fall mountain image files." Then an SQL-like query will be

FROM article d WHERE d.section.body.film.type='Image'
='mountain' AND
d.section.body.title like '%mountain%'

In this case, be aware that the attribute "season" has not been defined. However, since semantic-attributes are defined over the attribute "season," the rule is activated to substitute the predicate in the above query. This procedure is a so-call semantic query optimization [YK93b]. By using an optimization technique, the given query will be rewritten as follows:

SELECT * FROM article d WHERE d.section.body.film.type='Image' AND d.section.body.film.object.naming ='mountain' AND d.section.body.title like '%mountain%' AND GetImgMaxColor( d.section.body.title.object.file) ='#00FFFF'

5 Conclusion

This paper proposed the 3PR, a multimedia document information retrieval language which can query multimedia document databases. The metadata-based query is first used to ask with some external information, besides document contents themselves. The metadata can be modeled in nature in a typical relational database. Notice that in order to comply with queries in the other two phases we employee an object-relational database. Then, content-based query is asked with DTD information. Those DTD's are defined in an object-relational database. Based on this definition, each HyTime document instance is populated in the database. Lastly, semantic-based query is used to refer to user's annotations or heuristics. Even defined attributes may be used in the query as far as those semantic-attributes are defined as rules or triggers in databases.

Table 1: Queries in Multimedia Document Database Systems
queries objective attribute constraint similarity semantic attribute
Structured formatted data integrity values meaning
Unstructured full-text data tagging constraint
Image image, graphic data spatial constraint shapes, colors features
Video video data video constraint movement
Audio speech, audio data audio constraint pitch, tone

The contribution of this paper includes a framework for multimedia document retrieval facility, which is not only including the current technology but also metadata-based, contentbased, and semantic-based query constituents. The 3PR makes it possibe for users to constitute queries a large number of documents efficiently. The 3PR queries can be constituted in the form of frames to which users can fill slots with appropriate search values.

This work will be used for data mining and knowledge discovery (KDD) process from multimedia documents. Querying documents delivers a user's intention to the KDD process [YK93a, Yoo96]. We believe that various aspects of constituting user's queries can be used to extract useful knowledge sets from multimedia document databases.


This work was partially funded by Korean Science and Engineering Foundation grant 96-0101- 08-01-3, and partially by Ministry of Information and Communication grant 1996-52.


[CGMB95] C. Clifton, H. Garcia-Molina, and D. Bloom. HyperFile: A data and query model for documents. The VLDB Journal, 4(1):45-86, 1995.

[CHMW96] M. Carey, L. Haas, V. Maganty, and J. Williams. PESTO: An integrated query/browser for object databases. In Proc. Intl. Conf. on Very Large Data Bases, 1996.

[KNF97] A. Kobsa, A. Nill, and J. Fink. Hypertext and hypermedia clients of the user modeling system BGPMS. In M. Maybury, editor, Intelligent Multimedia Information Retrieval, pages 339-356. MIT Press, 1997.

[MDT88] A. Motro, A. D'Atri, and L. Tarantino. The design of KIVIEW: An objectoriented browser. In 2nd Int'l Conf. on Expert Database Systems, pages 17-32, Fairfax, 1988.

[Mya96] Sung-Hyun Myaeng. MIRAGE: A prototype for a multimedia information retrieval and gathering environment. In Proc. of the Int'l Conf. on Digital Libraries and Information Services for the 21st Century, pages 115-125, Seoul, Korea, 1996.

[O86] I S O. Information Processing Text and Office Systems - Standardized Generalized Markup Language (SGM L). International Organization for Standardization, ISO 8879- 1986, 1986.

[O94a] I S O. Information and Documentation - Electronic Manuscript Preparation and Markup. International Organization for Standardization, Switzerland, 1994.

[O94b] I S O. Information Technology - Hypermedia/Time-based Structuring Language (HyT ime). International Organization for Standardization, ISO/IEC 10744-1992, 1994.

[PB97] P. Pistor and H. Blanken. The SQL3 server interface. In Multimedia Databases In Perspective, pages 101-116. 1997.

[RS82] L. Rowe and K. Shoens. FADS - a forms application development system. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1982.

[SJC96] B. Schatz, E. Johnson, and P. Cochrane. Interactive term suggestion for users of digital libraries: Using subject thesarui and co-occurrence lists for information retrieval. In 1st ACM Int'l Conf. on Digital Libraries, pages 126-133, 1996.

[SK82] M. Stonebraker and J. Kalash. TIMBER - a sophisticated relational browser. In Proc. Intl. Conf. on Very Large Data Bases, 1982.

[SMC+96] B. Schatz, W. Mischo, T. Cole, J. Hardin, and A. Bishop. Federating diverse collections of scientific literature. In IEEE Computer, May, pages 28-36. 1996.

[Vas97] J. Vassileva. Ensuring a task-based individualized interface for hypermedia information retrieval through user modeling. In M. Maybury, editor, Intelligent Multimedia Information Retrieval, pages 357-380. MIT Press, 1997.

[VKvBG95] A. Vogel, B. Kerherve, G. von Bochmann, and J. Gecsei. Distributed multimedia and QOS: A survey. IEEE Multimedia, 2(2):10-19, 1995.

[YK93a] Jong P. Yoon and Larry Kerschberg. A framework for knowledge discovery and evolution in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):973- 978, December 1993.

[YK93b] Jong P. Yoon and Larry Kerschberg. Semantic query optimization in deductive object-oriented databases. In Proc. of the Third International Conference on Deductive and Object-Oriented Databases, pages 169-182, Phoenix, Arizona, 1993.

[YK96] Jongpil Yoon and Sung-Hyuk Kim. Multimedia query processing in digital libraries. In Proc. of the Int'l Conf. on Digital Libraries and Information Services for the 21st Century, pages 88-106, Seoul, Korea, 1996.

[Yoo96] Jongpil Yoon. Extracting database knowledge from query trees. Journal of Electrical Engineering and Information Science, 1(2):145-156, 1996.

[Zlo77] M. Zloof. Query by example. IBM System. J., 16, 1977.