Emmy-Noether Group Grant - German Science Foundation (DFG)
 

Topic

Supporting semantic information processing for complex applications using distributed knowledge representation and reasoning


Goals

Many important applications of information technology are characterized by the heterogeneity and distributedness of the information sources involved. The most prominent example is the World Wide Web. Originally, information on the Web has mainly been created for human consumption and therefore weakly structured information used to dominate Web-Based applications. With the increasing importance of the Web as a medium for business and other forms of cooperation, the lack of structure more and more turned out to be a problem. As a result, technologies for structuring and describing information in weakly structured environments have been developed unter the term 'semantic web'. Key technologies in this context are the use of machine-readable metadata encoded in the Resources Description Framework RDF and the creation of explicit models of the terminology used in a specific application area (so-called ontologies) encoded in the Web Ontology Language OWL.

Existing infrastructures supporting the use of RDF and OWL for the development of semantic web applications that provide support for creating, storing and accessing RDF and OWL models but most of these infrastructures do not adequately address the nature of the Web in terms of heterogeneity and distributedness and scale. In particular most systems assume the use of a single data store and a single ontology for describing the domain of interest. This often causes problems like the following:

  • Information from different sources is described using different terminologies that needs to be aligned
  • Merging different ontologies into a single model causes inconsistencies due to misinterpretations of concepts
  • Systems fail to process very large integrated ontologies due to lack of memory and processing resources

The goal of this project is to address these problems by developing advanced methods for supporting distributed and highly scalable processing of ontologies and metadata. In particular, the project aims at:

  1. The extension of semantic web languages, in particular OWL, with new features supporting the integration of distributed and heterogeneous ontologies.
  2. The development of methods for testing and maintaining the consistency of integrated models based on the formal semantics of the descriptions involved.
  3. The implementation of a scalable infrastructure for processing integrated models by distributing storage and inference process over a number of nodes in a network.

Extending Semantic Web Langauges

While the necessity of integrating multiple ontologies into a common model has already been identified by the developers of the OWL standard, it has turned out that the import mechanism which is part of the OWL specification is insufficient for adequately representing such integrated ontologies. What is needed is an extension of the language with the possibility to describe the relation between the intended meaning of terms in different ontologies. In previous work we have already proposed such an extension called C-OWL provides the possibility of specifying inter-ontology links as logical axioms.
In the presence of real world ontologies, it is unrealistic to assume that mappings between ontologies are created manually by domain experts, since existing ontologies, e.g., in the area of medicine contain thousands of concepts and hundreds of relations. Recently, a number of heuristic methods for matching elements from different ontologies have been proposed that support the creation of mappings between different languages by suggesting candidate mappings. These methods rely on linguistic and structural criteria. As a result, automatically created mappings often contain uncertain hypothesis and errors that need to be dealt with, as briefly summarized as follows:

  • mapping hypotheses are often oversimplifying, since most matchers only support very simple kinds ofsemantic relations (mostly equivalence between individual elements)
  • there may be conflicts between different hypotheses for semantic relations from different matching components and often even from the same matcher;
  • semantic relations are only given with a certain degree of confidence in their correctness.

To account for these problems, we are investigating probabilistic extensions of semantic web languages as a basis for representing and reasoning with automatically generated mappings. For this purpose, we conducted a survey of existing approaches for extending semantic web languages with probabilistic information [1]. Based on the results of this survey, we developed two different models for representing uncertain mappings.

The first model has a limited expressiveness but allows us to use Bayesian networks as an efficient way of implementing probabilistic reasoning. In this framework that is described in [4] the representation of ontologies is restricted to the logic programming fragment of description logics. Mappings between elements of different ontologies are represented by conjunctive queries quantified with a conditional probability distribution that the head follows from the body. We can pose conjuctive queries to the model and compute the probability of possible answers by computing the herbrand model of the integrated logic programm and constructing a bayesian network where nodes represent elements in the herbrand model that computes the probability of answers.

The second model presented in [2] generalizes this simple model by allowing the ontologies to be represented in OWL-DL and mappings to be disjunctive logic programs. The connection between the ontologies and rules are established via a novel model theoretic semantics that preserves the classical semantics of ontologies and rules and still remains decidable. Uncertainty about mappings can be represented based on independent choice logics that can be used to enrich the rule set with special terms that represent random variables. We have shown that this approach can be used to represent confidence values attached to mappings as well as levels of trust in matching systems [3]. There is no obvious translation of this more expressive model into bayesian networks. Reasoning can be performed by solving a linear equation system whose construction involves ontological reasoning. Currently, we are investigating ways of simplifying the reasoning process by compiling the whole model into disjunctive datalog and using a disjunctive datalog engine as a basis for the non-probabilistic part of the reasoning problem.

 


Publications on Extending Semantic Web Languages

1.   Livia Predoiu and Heiner Stuckenschmidt. Probabilistic Extensions of Semantic Web Languages - A Survey. The Semantic Web for Knowledge and Data Management: Technologies and Practices, Idea Group Inc, 2008, to appear.
2.   Andrea Cali, Thomas Lukasiewicz, Livia Predoiu, Heiner Stuckenschmidt. Tightly Integrated Probabilistic Description Logic Programs for Representing Ontology Mappings. Proceedings of the International Symposium on Foundations of Information and Knowledge Systems, Pisa, Italy, 2008.
3.   Andrea Cali, Thomas Lukasiewicz, Livia Predoiu and Heiner Stuckenschmidt. A Framework for Representing Ontology Mappings under Probabilities and Inconsistency. Proceedings of the ISWC 2007 Workshop on Uncertainty Reasoning for the Semantic Web, Pusan, Korea, 2007.
4.   Livia Predoiu, Heiner Stuckenschmidt. A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web. Proceedings of 3rd International Workshop on Database Interoperability (InterDB) in conjunction with the VLDB conference, Vienna, Austria, 2007.
5.   Livia Predoiu. Probabilistic Information Integration and Retrieval on the Semantic Web. Proceedings of the 6th International Semantic Web Conference (ISWC), Pusan, Korea, 2007.

Supporting Mapping Creation and Revision

The ability to reason across ontologies using mappings depends on the the consistency of the overall model. As automatically created mappings often contain errors, the resulting models often do contain conflicts. This is not only a probelm with respect to purely logical reasoning but also for probabilistic mappings as the probability assignment is defined on the models of the logical models. If no such models exist due to an inconsistency, the probability distribution is not defined. In order to solve this problem, we have to make sure that only such mappings are retained that do not cause the overall model to become inconsistent. In order to reach this goal, we developed methods for diagnosing and debugging automatically created mappings based on the formal semantics of mappings  as defined in the C-OWL language.

A starting point for the the development of debugging methods is the ability to reason about ontology mappings. In [8,10] we defined a number of inference problems such as consistency and entailment of mappings. These methods have been implemented as a part of the DRAGO reasoner developed at the university of Trento. Based on these basic methods we have developed debugging strategies for inconsistent mappings that follow the following steps:

  • Detection of inconsistencies in the mapping
  • Computation of minimal conflict sets for each inconsistency
  • Determination of a diagnosis that maximizes the confidence in the correctness of the mappings involved.

We have investigated complete an approximate methods for computing conflict sets as well as greedy and optimal strategies for maximising the confidence of the retained mappings.

We have extensively tested the methods on the OntoFarm Benchmark dataset for ontology matching. In a first set of experiments we have shown that debugging can be applied completely automatically reaching a precision between 80 and 100% and a recall of 25 to 50% [4]. In a second set of expertiments we have investigated the use of our system to support the manual evaluation of mappings. In this context we showed that logical reasoning can be used to reduce the manual effort of mapping evaluation by up to 50%. Further, we showed that the use of ordering heuristics for determining in which order mappings are  assesed  consistently lead to advantages over a lexicographic ordering [1].

We have also investigated the option of including logical reasoning into existing matching systems for verifying mapping hypotheses early in the matching process [3]. In partiuclar, we have investigated a new class of extraction functions that are used to derive a mapping from a similarity matrix which is the basis of most matching systems. We could show that logical reasoning consistently improves precision and recall of matching systems [2]. Current work is concerned with the development of a software component that can be integrated in existing matching systems and mapping editors.

 


Publications on Mapping Creation and Revision

1.   Christian Meilicke, Heiner Stuckenschmidt and Andrei Tamilin. Supporting Manual Mapping Revision using Logical Reasoning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008.
2.   Christian Meilicke and Heiner Stuckenschmidt. Analyzing Mapping Extraction Approaches. Proceedings of the ISWC 2007 Workshop on Ontology Matching, Busan, Korea, 2007.
3.   Christian Meilicke and Heiner Stuckenschmidt. Applying Logical Constraints to Ontology Matching. Proceedings of the 30th German Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, Springer, 2007, 4667, 99-113.
4.   Christian Meilicke, Heiner Stuckenschmidt and Andrei Tamilin. Repairing Ontology Mappings. Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, Canada, 2007.
5.   Jérôme Euzenat, Antoine Isaac, Christian Meilicke, Pavel Shvaiko, Heiner Stuckenschmidt, Vojtech Svátek, Willem Robert van Hage and Mikalai Yatskevich. Results of the Ontology Alignment Evaluation Initiative 2007. Proceedings of the ISWC 2007 Workshop on Ontology Matching, Busan, Korea, 2007.
6.   Ondrej Svab, Vojtech Svatek and Heiner Stuckenschmidt. A Study in Empirical and Casuistic Ontology Mapping. Proceedings of the 4th European Semantic Web Conference ESWC'2007, Innsbruck, Austria, 2007.
7.   Christian Meilicke, Heiner Stuckenschmidt and Andrei Tamilin. Improving Automatically Created Mappings using Logical Reasoning. Proceedings of the ISWC 2006 Workshop on Ontology Matching, Athens, GA, USA, 2006.
8.   Christian Meilicke. Reasoning about Mappings in Distributed Description Logics. Bachelor Thesis, University of Mannheim, 2006.
9.   Jerome Euzenat, Malgorzata Mochol, Pavel Shvaiko, Heiner Stuckenschmidt, Ondrej Svab, Vojtech Svatek, Willem Robert van Hage and Mikalai Yatskevich. Results of the Ontology Alignment Evaluation Initiative 2006. Proceedings of the ISWC 2006 Workshop on Ontology Matching at the International Semantic Web Conference ISWC 2006, 2006.
10.   Heiner Stuckenschmidt, Luciano Serafini and Holger Wache. Reasoning about Ontology Mappings. Proceedings of the ECAI workshop on contextual representation and reasoning, 2006.

Distributed Representation and Reasoning

The ability to integrate different ontologies into a common logical model using ontology mappings creates new challenges with respect to the scalability of infrastructures for storing and processing these models. Most current reasoning systems for ontologies are designed as centralized systems where the whole model has to be stored in one repository to enable reasoning. At the start of the project the only notable exception was the DRAGO system supporting reasoning in the C-OWL language. In the course of the project we are investigating methods for scaling up reasoning about very large and possibly connected ontologies by modularizing and distributing representation and reasoning. This task can roughly be broken down into the following tasks:

  • Representation of interlinked ontologies
  • Distributed Reasoning Methods
  • Partitioning Strategies for optimal knowledge Distribution

We took the C-OWL language and the underlying logic as a starting point for developing representations for interlinked ontologies. In particular, we defined a notion of modular ontologies based on distributed descriptions logics that allows a limited form of interaction between different modules [7]. Based on this representation model, we investigated options for lcalizing reasoning in the individual modules. It turned out that due to the limited interaction between modules, it is possible to compile derived knowledge into the corresponding module using a distributed tableaux algorithm and perform reasoning locally using standard DL reasoners without sacrificing completeness [5].

In more recent work, we investigated ways to perform distributed reasoning without limiting the expressiveness of connections between different modules. It turs out that tableaux methods are not a good basis for this kindof diostributed reasoning because the distribution disables most of the existing optimization techniques essential for practical use. Therefore, we are investigating sound and complete resolution methods for description logics as a basis for distributing reasoning. We have shown that existing resolution methods for ALC can be distributed without creating redundancy in the inference process [1]. Currently we are investigating options for extending these results to ALCHIQ-. So far, it is clear, that distribution will introduce some redundancy, however, we still expect good practical properties of the resulting distributed resolution method. Currently we are using the SPASS reasoner for simulating distributed inference locally as a basis for improving the partitioning strategy.

We have also started investigating partitioning and distribution strategies for optimizing distributed inference. We used existing work on partitioning ontologies with the PATO system as a starting point. We developed criteria for measuring the quality of a partitioning that can be expected to reduce communication costs between modules [6] and extended the PATO System with a partitioning method that optimizes the partitioning according to these criteria [2]. We tested the result on real world ontologies and compared it with other state of the art partitioning tools [3]. Currently we are investigating partitioning methods that are more specifically aimed at optimizing distributed resolution that uses the actual communication between modules in the inference process as a basis for improving the distribution.


Publications on Distributed Representation and Reasoning

1.   Anne Schlicht and Heiner Stuckenschmidt. Distributed Resolution for ALC. Proceedings of the The 21st International Workshop on Description Logics (DL2008), Dresden, Germany, 2008.
2.   Anne Schlicht and Heiner Stuckenschmidt. Criteria-Based Partitioning of Large Ontologies (poster paper). 4th international conference on Knowledge capture (KCAP), 2007.
3.   Mathieu d’Aquin, Anne Schlicht, Heiner Stuckenschmidt and Marta Sabou. Ontology Modularization for Knowledge Selection: Experiments and Evaluations. 18th International Conference on Database and Expert Systems Applications (DEXA), 2007.
4.   Anne Schlicht. Improving the Usability of Large Ontologies by Modularization. Knowledge Web PhD Symposium, 2007.
5.   Heiner Stuckenschmidt and Michel Klein. Reasoning and Change Management in Modular Ontologies. Data & Knowledge Engineering, 63, 2007, 2.
6.   Anne Schlicht and Heiner Stuckenschmidt. Towards Structural Criteria for Ontology Modularization. Workshop on Modular Ontologies (WoMO), 2006.
7.   Heiner Stuckenschmidt. Implementing Modular Ontologies with Distributed Description Logics. Proceedings of the first international workshop on modular ontologies at the International Semantic Web Conference ISWC'06, Athens, GA, USA, 2006.