Reasoning Tools

Semantic Studio includes several tools that perform reasoning and related operations on ontologies. 

One set of these tools are named SETL - an acronym for "Semantic Extract, Transformation, and Load."  These tools operate in ways that are somewhat analogous to ETL tools in the database world: They are responsible for changes in format and form, for merging and extracting, and for aligning and matching.  The difference is in the degree that semantics plays a role in these operations - namely, understanding the complex ontological relationships that force most operations to not be record-by-record, as in conventional ETL, but rather the interlinked set of concepts that must be operated upon in tandem.

Another set of tools are responsible for reasoning using "description logics" and rules. Description logics provide reasoning over formal definitions of terms ("a software company is a company that makes at least on software product") and assertions ("Microsoft makes Excel").  Many different description logics based on different formal operators allowed for constituting definitions and assertions have been investigated.  Arity has built several reasoning engines based on description logics, the most important and useful being based on a particular logic named "EL++".  This logic has the virtues of being useful, reasonably expressive, and computationally efficient.  Arity's implementation of EL++ has been extended to include reasoning extensions that are particularly useful within Semantic Studio.

 

Going a bit deeper ...

Arity matches the problem to be solved to the style of knowledge representation and reasoning used. There is a wide range of choices to make in this process.

Relational databases are great for collecting and processing transactional data. The records in the relational database are then queried to match some set of criteria. The Resource Description Format (RDF) “triple” is a style of knowledge encoding that attempts to extend the database paradigm of storing facts and querying against it. RDF represents an important special case of knowledge representation which is not well suited to reasoning without transformation.  RDF is analogous to programming in assembly language instead of a higher-level language.

There is a lot of knowledge that does not encode well in the RDF style and a lot of things that you want to do for reasoning that do not fit the querying paradigm. Description logics provide the opportunity to model the world faithfully and more completely so that reasoning over them becomes practical. Typically, these descriptions are concept-based. There are currently two competing views about the representation of concepts. One view is that a concept such as “people” includes, by extension, the set of all people as instances of that concept. The other view is that the concept is a definition of who/what belongs to the set of “people.” Definitions and descriptions create a more complete picture and can include relationships between items such as “owns” or “part of” etc. Axioms that encode knowledge can be based on the descriptions and relationships represented in this more complex model. Reasoning in the form of classification, similarity, and analogy become feasible to utilize in this context.

Arity has developed an advanced reasoning engine based on OWL 2’s EL ++ fragment. This makes a unique contribution by analyzing implicit knowledge and making it explicit.

In Arity’s products, these reasoning tools improve investigations by:

  • Suggesting alternative areas for research using “reasoning by analogy” across similar entities (you may want to consider B as you are looking at A and B is similar to A);
  • Suggesting new items to consider using “reasoning by conclusion” (you may want to consider X as you have already considered A and X is very likely a conclusion of A);
  • Providing the exploration of concepts by having them linked via explicit relationships and associations rather than relying on search; and
  • Indicating the strength of inferences through confidence factors.

 

A bit deeper still ...

Here we use the term ontology to refer only to a set of data with a particular organization. An ontology typically consists of one or more hierarchical descriptions of important concepts in some domain and descriptions of properties of instances of each concept. Ontologies capture important semantic relationships that may be expressed informally as “is a” and “part of” statements. An automobile “is a” transportation device. An engine is a “part of” an automobile.

The structure, richness, and diversity of relationships that are typically expressed in an ontology are formalized in several ways. First, the language for the expression of those relationships is made rigorous. Differences in types of relationships are made explicit – for example, “X is a part of Y” may mean that X is a component (as in the example above), an ingredient (as in flour in a cake), a member (as in a person in a club), or other partonomic type.

Second, different qualifications on what may be expressed are formalized. For example, you may wish to say that “Joe has 3 daughters” without necessarily listing all (or any) of the daughters explicitly.

Third, we distinguish between those assertions that are both necessary and sufficient to fully define a relationship and those assertions which are only necessary. This corresponds to those things, which may be defined exactly, and those, which cannot and must have some additional qualitative verification.

Ontologies are formalized and exist because they enable knowledge to be redacted in ways that are more expressive or more natural for understanding than other formalisms such as relational datasets or logic-based rules. Ontologies exist and are in development for many domains. In fact, though, the power of an ontology is typically unlocked by some automatic reasoning engine. The nature of this reasoning engine much like a program that proves theorems (such as in high school geometry). And, given such a reasoning engine, a powerful new set of features of applications may be provided. The language that ontologies are written in for such an engine is called Description Logic (DL).

Description Logic engines enable the construction of many types of intelligent software agents to act on behalf of people. The most important agent behaviors are automatic classification of knowledge (particularly incrementally learned), translation of one organization of knowledge to another organization, and negotiation of protocols and services provided by different programs or data sources. A core enabling operation of many agents is the ability to create semantic metadata of documents (and other media) and combine knowledge from that metadata in ways that are meaningful to automated processes. For this to happen, ontologies will play a key role as a source of precisely defined and related terms (vocabulary) that can be shared across applications (and humans). DL technology formalizes ontology construction and use by providing a decidable fragment of First Order Logic (FOL). This formality and regularity enables machine understanding for the support of agent-agent communication, semantic-based searches, and provide richer service descriptions that can be interpreted by intelligent agents.