Context-Dependent Information Filtering

Maria R. Lee

Research Data Network Cooperative Research Centre
CSIRO Mathematical and Information Sciences
Lock Bag 17, North Ryde, NSW 2113, Australia
E-mail: maria.lee@cmis.csiro.au, Fax: +61-2-9325-3101

Abstract

People tend to use different terms to describe a similar concept. Due to the unique backgrounds, training and experiences of different people, it is impractical to force them to use the same set of terms for information retrieval. This paper presents an approach to allow different user groups to access and view information from heterogenous systems by using their own preferred vocabularies. In the meantime, the retrieval concept depends on the task context. A task ontology is used to reflect users' common perception of problem solving processes. The discovered concepts then uniquely reflects the contextual need of distinct user groups.

Keywords: Task Ontology, Information Filtering, User Context

1 Introduction

Finding and filtering information are the challenges in the area of Digital Libraries [1]. Preventing users receiving an overload of irrelevant information is a major problem in information filtering [2]. If everyone always agreed upon what to call things, the searcher's word would be the designer's word would be the indexer's word, and what the searcher typed would be well understood by all groups. Hence, we would be able to find precise and useful information. Unfortunately, few knowledge fields have an agreed upon, consistent taxonomy. For example:

One concept can be represented by many names (e.g. railroad and railway).
Many concepts can be represented with the same name (e.g. a bank may be a financial bank or a river bank).
Concepts overlap (e.g. light rail and tramway).

Research has shown that two people favour the same term with probability < 0.20 [4]. This vocabulary difference has created difficulties in information retrieval.

However, it is impractical to force users to use the same term to describe a concept or an object. This occurs because of different user-groups often interpret information in different ways based on their own domain knowledge. Different contextual interpretations of the same piece of information have often resulted in the inherent difference among different user groups. It is also governed by a user's perception of information content in a particular context [3, 11]. The differences suggest that it is not sensible to force users to use identical sets of terms for information retrieval. It is necessary to add some intelligence to the external view with which each user group is familiar.

We hypothesise that ontologies lay the foundations for coping with a variety of users' needs. An ontology is an explicit representation of concepts and their relations [5, 8]. In general, there are two types of ontologies: a general ontology and a task ontology. The general ontology consists of taxonomy and axioms. Taxonomy represents concepts where axioms are established rules and principles for the concepts. Taking an information retrieval task for example, the general ontology can be used to represent the content of information.

However, it is important to address a general activity to a specific problem solving task in information filtering. Generally speaking, the deeper a computer gets involved in problem solving, the less important a general ontology becomes [6]. It is important to note that the concept of ontology depends on the task context. The task ontology is a human-friendly vocabulary that users can easily describe their own problem solving processes [6, 9].

2 A Proposed Framework

The main goal of our proposed framework is to allow users to search for information using terms from domains with which they are most familiar. This framework allows users to browse and extend their search using concepts that are meaningful to them. Meanwhile, users need to describe their own problem solving processes, the task ontology.

Ikeda et al [6] define the generic problem solving process as:

Generic Problem Solving Process =
Generic Verb + Generic Noun

We have adapted this idea for discovering contextual dependent concepts. The discovered concepts are based on the retrieval concepts and task descriptions from different user groups. The retrieval concepts can be represented by generic nouns and task descriptions can be represented by generic verbs or phases to reflect users' common perception of problem solving, e.g. collect information, inquire loan, etc.

Figure 1 shows a proposed architecture. Users represent various groups of users, e.g. employee from computer department, finance department or tourist department. The rectangular boxes show the main components for implementing the framework. The storage boxes are databases for the retrieval concepts and task descriptions. The cloud with documents represents heterogeneous systems, e.g. internets or intranets. The oval shapes represent the retrieval results for a particular user group.

Different user groups retrieve information from heterogeneous systems.
Context Identifier identifies the interpretation differences among different user groups.
Retrieval concepts contain preferred vocabularies used for each user group, e.g. thesauri.
Task Description provides a task ontology to reflect users' problem solving processes.
Context Modeller models the needs of a particular user group and retrieves context related information. It then passes the result to the discovered concept layer.
The discovered concept layer displays the retrieved results for individual user groups.

Figure 1. A proposed architecture.

3 Examples

In this section, three examples are illustrated to show that different user-groups use their preferred vocabularies with task descriptions for information retrieval.

Example 1: A many-to-one relationship between concepts and name.

This example demonstrates that three user groups, a Computing user-group, a Finance user-group, and a Tourism user-group, use their preferred vocabulary to discover their own domain information. Table 1 shows that three user groups retrieve the same term "bank" with the same task description "collect information". Depending on the user-group (domain) context, the discovered concepts are semantically different. This occurs when the same term is interpreted differently by different users.

User-Group	Retrieval Concept	Task Description	Discovered Concept
Computing	Bank	Collect information	Information bank Data bank
Finance	Bank	Collect information	Money bank
Tourism	Bank	Collect information	River bank

Table 1. A many-to-one relationship between concepts and name - Example 1.

Example 2: A one-to-one relationship between concept and name.

To prevent users with an overload of irrelevant information, table 2 demonstrates that the three user groups use the same term "bank" with the same task description "inquire loans" to discover the concept of "money bank". Depending on the task context, the discovered concepts do not include irrelevant information, e.g. data bank or river bank.

User-Group	Retrieval Concept	Task Description	Discovered Concept
Computing	Bank	Inquire loans	Money bank
Finance	Bank	Inquire loans	Money bank
Tourism	Bank	Inquire loans	Money bank

Table 2. A one-to-one relationship between concept and name - Example 2.

Example 3: A one-to-many relationship between concept and names.

Table 3 demonstrates that one concept may be represented by many names, e.g. synonym. In order to retrieve the relevant information and filter the irrelevant ones, the discovered concept reflects the context needed by different user groups. Suppose three user groups, Council, Police Department and State Rail Authority, are looking for information at the Transport and Regional Development Government Department. Based on their own domain knowledge, three groups use vocabularies that are familiar to them, e.g. railroad or railway. Depending on the task description, although the retrieval concepts are different, the discovered concepts are the same.

User-Group	Retrieval Concept	Task Description	Discovered Concept
Council	Railroad	plan	Plan for local railroad
Police Department	Railroad	plan	Plan for local railroad
State Rail Authority	Railway	plan	Plan for local railroad

Table 3. A one-to-many relationship between concept and names - Example 3.

4 Related Work

4.1 IICA: an Ontology-based Internet Navigation System

IICA (Intelligent Information Collector and Analyser) is a Ontology-based Internet Navigation System [7]. It provides functions for information gathering, categorising and reorganising from heterogeneous information resources on the Internet. Ontologies are used to provide the common background knowledge shared by users and agents.

Major differences between our proposed framework and IICA are that we provide a task ontology to customise user's problem solving processes and users can use their preferred vocabularies to access information. However, the construction of task ontology is similar to the IICA, it depends on a pre-defined set of terms for organising heterogenous task information. The construction effort may be inflexible.

5 Conclusions

Increasing use of the Internet and the World Wide Web has heightened attention in digital libraries' research and development [10, 1]. This paper has presented an information filtering service that is customised to suit the needs of distinct user groups. The proposed framework achieves this through the use of the retrieval concepts and task descriptions to address the contextual differences among different user groups. The retrieval concepts are created to store the terminology preferences of different user groups and the task descriptions define users problem solving processes. Rather than provide users with uniform information, the discovered concepts will reflect the contextual differences among different user groups.

We are currently developing the context identifier and context modeller components of the system and are running small pilot studies with various groups of users. Through interactive test and design, we expect to finalise our system, perform the formal experiment and analyse the results. We expect to learn more about the technical issues involved in implementation of the context modelling components including the response time impact, integration and maintenance issues. Our final aim is to increase recall and precision in information filtering.

6 Acknowledgments

The work reported in this paper has been funded in part by the Research Data Networks (RDN) Co-operative Research Centre (CRC) program, Australia.

7 References

[1] Adam, N. and Yesha, Y. (eds), Introduction, International Journal on Digital Libraries, Springer-Verlag, vol 1, 1-2, 1997.

[2] Belkin, N. and Croft, W. Information Filtering and Information Retrieval: Two sides of the same coin? Communication of ACM, 35(12), 29-38, 1992.

[3] Carrol, J. and Olson, J. Metal Models, in Foundation of Cognitive Sciences, Posner, M (ed), Cambridge MA, MIT Press, 469-393, 1988.

[4] Furnas G., Landauer, I., Gomex, L and Dumais, S. The Vocabulary Problem in Human-System Communication, Communication of ACM, 30(11), 964-971, 1987.

[5] Gruber, T. Ontolingua: a Mechanism to Support Portable Ontologies, Technical Report 91-66, Stanford University, Knowledge system Laboratory, 1992.

[6] Ikeda, M., Seta, K., Kakusho, O., and Mizoguchi, R. An Environment for Building Conceptual Model of Problem Solving, Proceedings of the Pacific Knowledge Acquisition Workshop (PKAW'96), 210-225, 1996.

[7] Iwazume, M., Shirakami, K., Hatadani, K., Takeda, H. and Nishida, T. IICA: an Ontology-based Internet Navigation System, http://ai-www.aist-nara.ac.jp/doc/people/mitiak-i/aaai96/, 1996.

[8] Mizoguchi, R. Knowledge Acquisition and Ontology, Proceedings of KB&KS'93, 121-128, 1993.

[9] Mizoguchi, R. and Ikeda, M. Towards Ontology Engineering, Proceedings of the Joint 1997 Pacific Asian Conference on Expert Systems / Singapore International Conference on Intelligent Systems, 259-266.

[10] Schatz, B. and Chen, H. (eds), IEEE Computer, Vol 29 (5), May 1996.

[11] Srinivasan, U. A Framework for Conceptual Integration of Heterogeneous Databases, PhD thesis, University of New South Wales, 1997.