Semantic Studio

Arity’s Semantic Studio is the workhorse that powers Semantic Search.  It predigests documents using its natural language processing (NLP) and ontologies so that the knowledge within is exposed.  This processing includes identifying the “gist” of a document or subsection of a document so precise matching to need can be facilitated.  When updates to documents occur, the indices will be automatically updated as well.

Arity’s Semantic Studio improves the quality of search across both internal and external resources by adding the dimension of meaning. Determining relevant results exploits both the context of terms used within documents and the context of user preferences and expected usage.  This product is now being deployed as the indexing and search engine within EXPRESSway, as the information extraction engine for content being re-targeted in a new product to be released by Elsevier later this year, and for an exciting new portal Web site for physicians also to be rolled out later this year.

The naïve approach to indexing is to recognize key terms and their relative frequency in a corpus.  In addition to traditional search tactics, Arity employs semantic indexing to provide a richer experience based on these strategies:

  • Document Context
    1. recognition that the same concept has many manifestations via synonyms and linguistic variations (such as a single enzyme can have 20 or more different names or identifiers, some of which may be unique to a given resource),
    2. the gist of a concept (roughly: the way the context applies to the concept, such as use vs. synthesis of a material) can be assessed via the framework and context in which it appears, and
    3. an ontology of associations and relationships (such as certain biomarkers might be associated with certain signals or adverse events) can reveal related, analogous or more focused trains of thought or interest; and
  • User Context
    1. recognition of the user’s objectives and interests (i.e., job role or sophistication of articles from instructional to survey to study results)
    2. an ontology of associations and relationships based on the user’s formal and informal work groups, job functions and work flow.

The Semantic Studio includes an ontology development and information extraction engine that promotes continuous refinement of the vocabularies and concept relationships that are specific to the business situation and goals.  The Semantic Studio can be pointed at any set of documents – including those that are in PDF form – to extract and index.

One of the most exciting features of the Semantic Studio is the bootstrapping of ontologies and lexicons through a linguistic analysis of content.  Semantic Studio automatically discovers and suggests likely synonyms and new concepts.  Novel terms and trends are likewise discovered.