FIELD OF THE INVENTION
The invention relates to a system and method for graphically displaying multi-relational ontology data.
BACKGROUND OF THE INVENTION
Knowledge within a given domain may be represented in many ways. One form of knowledge representation may comprise a list representing all available values for a given subject. For example, knowledge in the area of “human body tissue types” may be represented by a list including “hepatic tissue,” “muscle tissue,” “epithelial tissue,” and many others. To represent the total knowledge in a given domain, a number of lists may be needed. For instance, one list may be needed for each subject contained in a domain. Lists may be useful for some applications, however, they generally lack the ability to define relationships between the terms comprising the lists. Moreover, the further division and subdivision of subjects in a given domain typically results in the generation of additional lists, which often include repeated terms, and which do not provide comprehensive representation of concepts as a whole.
Some lists, such as structured lists, for example, may enable computer-implemented keyword searching. The shallow information store often contained in list-formatted knowledge, however, may lead to searches that return incomplete representations of a concept in a given domain.
An additional method of representing knowledge is through thesauri. Thesauri are similar to lists, but they further include synonyms provided alongside each list entry. Synonyms may be useful for improving the recall of a search by returning results for related terms not specifically provided in a query. Thesauri still fail, however, to provide information regarding relationships between terms in a given domain.
Taxonomies build on thesauri by adding an additional level of relationships to a collection of terms. For example, taxonomies provide parent-child relationships between terms. “Anorexia is-a eating disorder” is an example of a parent-child relationship via the “is-a” relationship form. Other parent-child relationship forms, such as “is-a-part-of” or “contains,” may be used in a taxonomy. The parent-child relationships of taxonomies may be useful for improving the precision of a search by removing false positive search results. Unfortunately, exploring only hierarchical parent-child relationships may limit the type and depth of information that may be conveyed using a taxonomy. Accordingly, the use of lists, thesauri, and taxonomies present drawbacks for those attempting to explore and utilize knowledge organized in these traditional formats.
Additional drawbacks may be encountered when searches of electronic data sources are conducted. As an example, searches of electronic data sources typically return a voluminous amount of results, many of which tend to be only marginally relevant to the specific problem or subject being investigated. Researchers or other individuals are then often forced to spend valuable time sorting through a multitude of search results to find the most relevant results. It is estimated, for example, that scientists spend 20% of their time searching for information existing in a particular area. This is time that highly-trained investigative researchers must spend simply uncovering background knowledge. Furthermore, when an electronic search is conducted, data sources containing highly relevant information may not be returned to a researcher because the concept sought by the researcher is identified by a different set of terms in the relevant data source. This may lead to an incomplete representation of the knowledge in a given subject area. These and other drawbacks exist.
SUMMARY OF THE INVENTION
The invention addresses these and other drawbacks. According to one embodiment, the invention relates to a system and method for graphically displaying ontology data from one or more ontologies. According to one aspect of the invention, the one or more ontologies may be domain specific ontologies that may be used individually or collectively, in whole or in part, based on user preferences, user access rights, or other criteria.
As used herein, a domain may include a subject matter topic such as, for example, a disease, an organism, a drug, or other topic. A domain may also include one or more entities such as, for example, a person or group of people, a corporation, a governmental entity, or other entities. A domain involving an organization may focus on the organization's activities. For example, a pharmaceutical company may produce numerous drugs or focus on treating numerous diseases. An ontology built on the domain of that pharmaceutical company may include information on the company's drugs, their target diseases, or both. A domain may also include an entire industry such as, for example, automobile production, pharmaceuticals, legal services, or other industries. Other types of domains may be used.
As used herein, an ontology may include a collection of assertions. An assertion may include a pair of concepts that have some specified relationship. One aspect of the invention relates to the creation of a multi-relational ontology. A multi-relational ontology is an ontology containing pairs of related concepts. For each pair of related concepts there is a broad set of descriptive relationships connecting them. As each concept within each pair may also be paired (and thus related by multiple descriptive relationships) with other concepts within the ontology, a complex set of logical connections is formed. These complex connections provide a comprehensive “knowledge network” of what is known directly and indirectly about concepts within a single domain. The knowledge network may also be used to represent knowledge between and among multiple domains. This knowledge network allows discovery of complex relationships between the different concepts or concept types in the ontology. The knowledge network enables, inter alia, queries involving both direct and indirect relationships between multiple concepts such as, for example, “show me all genes expressed-in liver tissue that-are-associated-with diabetes.”
As described herein, one or more ontologies may be displayed to users, curators, or other persons in a graphic display that facilitates representation of the complex knowledge network contained within one or more ontologies. As such, the graphic display also enables navigation and utilization of the comprehensive set of direct and indirect relationships associated with any given concept within an ontology. These features enable users, curators or other persons to fully utilize one or more functions one or more ontologies.
According to one embodiment of the invention, graph-centered views of ontology data may include the representation of assertions as concept-relationship-concept (CRC) “triplets.” In these triplets, two nodes are connected by an edge, wherein the nodes correspond to concepts and the edge corresponds to a relationship.
In one embodiment, CRC triplets may be used to produce a directed graph representing the knowledge network contained in one or more ontologies. A directed graph may include two or more interconnected CRC triplets that potentially form cyclic paths of direct and indirect relationships between concepts in an ontology or part thereof.
In one embodiment, one form of directed graph used for visualizing ontologies may include a clustered cone graph. A clustered cone graph may display a selected concept as a central node, surrounded by sets of connected nodes, the sets of connected nodes being concepts connected by relationships. In one embodiment, the sets of connected nodes may be clustered or grouped by common characteristics. These common characteristics may include one or more of concept type, data source, relationship to the central node, associated property, or other common characteristic.
In one embodiment, connected nodes in a clustered cone graph may also have relationships with each other, which may be represented by edges connecting the connected nodes. Additionally, edges and nodes within a clustered cone graph may be varied in appearance to convey specific characteristics of relationships or concepts (thicker edges for high assertion confidence weights, etc). The textual information underlying a node or edge in a clustered cone graph may be displayed to a user upon user selection of a node or edge. Furthermore, a connected node may be selected by a user and placed as the central node in the graph. Accordingly, all concepts directly related to the new central node may be arranged in clustered sets around the new central node.
In one embodiment, more than one concept may be selected and placed as a merged central node (merged graph). Accordingly, all of the concepts directly related to at least one of the two or more concepts in the merged central node may be arranged in clustered sets around the merged central node. If concepts in the clustered sets have relationships to all of the merged central concepts, this quality may be indicated by varying the appearance of these connected nodes or their connecting edges (e.g., displaying them in a different color, etc.). In one embodiment, two or more nodes (concepts) sharing the same relationship (e.g., “causes”) may be selected and merged into a single central node. Thus the nodes connected to the merged central node may show the context surrounding concepts that share the selected relationship.
In one embodiment, more than one concept may be aggregated into a single connected node. That is, a node connected to a central node may represent more than one concept. For example, a central node in a clustered cone graph may be a concept “compound X.” Compound X may cause “disease Y” in many different species of animals. As such, the central node of the clustered cone graph may have numerous connected nodes, each representing disease Y as it occurs in each species. If a user is not in need of immediately investigating possible differences that disease Y may have in each separate species, each of these connected nodes may be aggregated into a single connected node. The single merged connected node may then simply represent the fact that “compound X” causes the “disease Y” in a number of species. This may simplify display of the graph, while conveying all relevant information.
In one embodiment, each of the sets of clustered nodes of a clustered cone graph may be faceted. Faceting may include grouping concepts within a clustered set by common characteristics. These common characteristics may include one or more of data source, concept type, common relationship, properties, or other characteristic. Faceting may also include displaying empirical or other information regarding concepts within a clustered group. Faceting within a set of connected nodes may take the form of a graph, a chart, a list, display of different colors, or other indicator capable of conveying faceting information. A user may sort through, and selectively apply, different types of faceting for each of the sets of connected nodes in a clustered cone graph. Furthermore, a user may switch faceting on or off for each of the sets of connected nodes within a clustered cone graph.
Additionally, faceting may also apply to a taxonomy view of ontology data. For example, a user may wish to reconstruct the organization of data represented in a taxonomy view such as, for example, chemical compound data. The user may reconstruct this taxonomic organization using therapeutic class, pharmacological class, molecular weight, or by other category or characteristic of the data. Other characteristics may be used to reconstruct organizations of other data.
These and other objects, features, and advantages of the invention will be apparent through the detailed description of the preferred embodiments and the drawings attached hereto. It is also to be understood that both the foregoing general description and the following detailed description are exemplary and not restrictive of the scope of the invention.