FIELD OF THE INVENTION
The invention relates to a system and method for support of chemical data within multi-relational ontologies.
BACKGROUND OF THE INVENTION
Knowledge within a given domain may be represented in many ways. One form of knowledge representation may comprise a list representing all available values for a given subject. For example, knowledge in the area of “human body tissue types” may be represented by a list including “hepatic tissue,” “muscle tissue,” “epithelial tissue,” and many others. To represent the total knowledge in a given domain, a number of lists may be needed. For instance, one list may be needed for each subject contained in a domain. Lists may be useful for some applications, however, they generally lack the ability to define relationships between the terms comprising the lists. Moreover, the further division and subdivision of subjects in a given domain typically results in the generation of additional lists, which often include repeated terms, and which do not provide comprehensive representation of concepts as a whole.
Some lists, such as structured lists, for example, may enable computer-implemented keyword searching. The shallow information store often contained in list-formatted knowledge, however, may lead to searches that return incomplete representations of a concept in a given domain.
An additional method of representing knowledge is through thesauri. Thesauri are similar to lists, but they further include synonyms provided alongside each list entry. Synonyms may be useful for improving the recall of a search by returning results for related terms not specifically provided in a query. Thesauri still fail, however, to provide information regarding relationships between terms in a given domain.
Taxonomies build on thesauri by adding an additional level of relationships to a collection of terms. For example, taxonomies provide parent-child relationships between terms. “Anorexia is-a eating disorder” is an example of a parent-child relationship via the “is-a” relationship form. Other parent-child relationship forms, such as “is-a-part-of” or “contains,” may be used in a taxonomy. The parent-child relationships of taxonomies may be useful for improving the precision of a search by removing false positive search results. Unfortunately, exploring only hierarchical parent-child relationships may limit the type and depth of information that may be conveyed using a taxonomy. Accordingly, the use of lists, thesauri, and taxonomies present drawbacks for those attempting to explore and utilize knowledge organized in these traditional formats.
Additional drawbacks may be encountered when searches of electronic data sources are conducted. As an example, searches of electronic data sources typically return a voluminous amount of results, many of which tend to be only marginally relevant to the specific problem or subject being investigated. Researchers or other individuals are then often forced to spend valuable time sorting through a multitude of search results to find the most relevant results. It is estimated, for example, that scientists spend 20% of their time searching for information existing in a particular area. This is time that highly-trained investigative researchers must spend simply uncovering background knowledge. Furthermore, when an electronic search is conducted, data sources containing highly relevant information may not be returned to a researcher because the concept sought by the researcher is identified by a different set of terms in the relevant data source. This may lead to an incomplete representation of the knowledge in a given subject area. These and other drawbacks exist.
SUMMARY OF THE INVENTION
The invention addresses these and other drawbacks. According to one embodiment, the invention relates to a system and method for support of chemical data within multi-relational ontologies. As described below, enabling entry, storage, manipulation, search, and use of chemical data in multi-relational ontologies adds to the broad knowledge network contained therein. Furthermore, the functionality provided by support of chemical data within multi-relational ontologies enables multi-faceted avenues for discovery within knowledge domains where chemistry plays even a peripheral role.
According to one aspect of the invention, the one or more ontologies may be domain-specific ontologies that may be used individually or collectively, in whole or in part, based on user preferences, user access rights, or other criteria. As used herein, a domain may include a subject matter topic such as, for example, a disease, an organism, a drug, or other topic. A domain may also include one or more entities such as, for example, a person or group of people, a corporation, a governmental entity or other entities. A domain involving an organization may focus on the organization's activities. For example, a pharmaceutical company may produce numerous drugs or focus on treating numerous diseases. An ontology built on the domain of that pharmaceutical company may include information on the company's drugs, their target diseases, or both. A domain may also include an entire industry such as, for example, automobile production, pharmaceuticals, legal services, or other industries. Other types of domains may be used.
As used herein, an ontology may include a collection of assertions. An assertion may include a pair of concepts that have some specified relationship. One aspect of the invention relates to the creation of a multi-relational ontology. A multi-relational ontology is an ontology containing pairs of related concepts. For each pair of related concepts there may be a broad set of descriptive relationships connecting them. As each concept within each pair may also be paired (and thus related by multiple descriptive relationships) with other concepts within the ontology, a complex set of logical connections is formed. These complex connections provide a comprehensive “knowledge network” of what is known directly and indirectly about concepts within a single domain. The knowledge network may also be used to represent knowledge between and among multiple domains. This knowledge network enables discovery of complex relationships between the different concepts or concept types in the ontology. The knowledge network also enables, inter alia, queries involving both direct and indirect relationships between multiple concepts such as, for example, “show me all genes expressed-in liver tissue that-are-associated-with diabetes.”
Another aspect of the invention relates to specifying each concept type and relationship type that may exist in an ontology. These concept types and relationship types may be arranged according to a structured organization. This structured organization may include defining the set of possible relationships that may exist for each pair of concept types (e.g., two concept types that can be related in one or more ways). In one embodiment, this set of possible relationships may be organized as a hierarchy. The hierarchy may include one or more levels of relationships and/or synonyms. In one embodiment, the set of possible concept types and the set of possible relationships that can be used to relate each pair of concept types may be organized as an ontology. As detailed below, these organizational features (as well as other features) enable novel uses of multi-relational ontologies that contain knowledge within a particular domain.
Concept types may themselves be concepts within an ontology (and vice versa). For example, the term “muscle tissue” may exist as a specific concept within an ontology, but may also be considered a concept type within the same ontology, as there may be different kinds of muscle tissue represented within the ontology. As such, a pair of concept types that can be related in one or more ways may be referred to herein as a “concept pair.” Thus, reference herein to “concept pairs” and “concepts” does not preclude these objects from retaining the qualities of both concepts and concept types.
According to one embodiment of the invention, the computer implemented system may include an upper ontology, an extraction module, a rules engine, an editor module, a chemical support module, one or more databases and servers, and a user interface module. Additionally, the system may include one or more of a quality assurance module, a publishing module, a path-finding module, an alerts module and an export manager. Other modules may be used.
According to one embodiment, the upper ontology may store rules regarding the concept types that may exist in an ontology, the relationship types that may exist in an ontology, the specific relationship types that may exist for a given pair of concept types, and the types of properties that those concepts and relationships may have.
The system may have access to various data sources. These data sources may be structured, semi-structured, or unstructured data sources. The data sources may include public or private databases; books, journals, or other textual materials in print or electronic format; websites, or other data sources. In one embodiment, data sources may also include one or more searches of locally or remotely available information stores, including, for example, hard drives, email repositories, shared files systems, or other information stores. These information stores may be useful when utilizing an organization's internal information to provide ontology services to the organization. From this plurality of data sources, a “corpus” of documents may be selected. A corpus may include a body of documents within the specific domain from which one or more ontologies are to be constructed. As used herein, the term “document” is used broadly and is not limited to text-based documents. For example, it may include database records, web pages, and much more.
The upper ontology may also include curator information. As detailed below, one or more curators may interact with the system. The upper ontology may store information about the curator and curator activity.
In one embodiment of the invention, a data extraction module may be used to extract data, including assertions, from one or more specified data sources. For different ontologies, different data sources may be specified. The rules engine, and rules included therein, may be used by the data extraction module for this extraction. According to one embodiment, the data extraction module may perform a series of steps to extract “rules-based assertions” from one or more data sources. These rules-based assertions may be based on concept types and relationship types specified in the upper ontology, rules in the rules engine, or other rules.
Some rules based-assertions may be “virtual assertions.” Virtual assertions may be created when data is extracted from certain data sources (usually structured data sources). In one embodiment, one or more structured data sources may be mapped to discern their structure. The resultant “mappings” may be considered rules that may be created using, and/or utilized by, the rules engine. Mappings may include rules that bind two or more data fields from one or more data sources (usually structured data sources). The specific assertions created by mappings may not physically exist in the data sources in explicit linguistic form (hence, the term “virtual assertion”), they may be created by applying a mapping to the structured data sources.
Virtual assertions and other rules-based assertions extracted by the extraction module may be stored in one or more databases. For convenience, this may be referred to as a “rules-based assertion store.” According to another aspect of the invention, various types of information related to an assertion may be extracted by the extraction module and stored with the virtual assertions or other assertions within the rules-based assertion store.
In one embodiment, properties may be extracted from the corpus and stored with concept, relationship and assertion data. Properties may include one or more of the data source from which a concept was extracted, the type of data source from which it was extracted, the mechanism by which it was extracted, when it was extracted, the evidence underlying concepts and assertions, confidence weights associated with concepts and assertions, and/or other information. In addition, each concept within an ontology may be associated with a label, at least one relationship, at least one concept type, and/or any number of other properties. In some embodiments, properties may indicate specific units of measurement.
In one embodiment, one or more multi-relational ontologies may include chemical compounds as concepts. In some embodiments, the structure of a chemical compound may be considered the name of a chemical compound concept. The use of an actual structure rather than a lexical (text) name may avoid potential ambiguity over what the compound actually is, especially among compounds where the same lexical name is used for structurally distinct compounds (e.g., a salt form or a racemic form of the same compound). In some embodiments, chemical compounds have lexical names, as well as structural names.
In some embodiments, the chemical structure of a chemical compound may be stored as a simplified molecular input line entry specification (SMILES) string or other chemical structure nomenclature or representation. As used herein, a SMILES string refers to a particular comprehensive chemical nomenclature capable of representing the structure of a chemical compound using text characters. One-dimensional SMILES string or other nomenclature or representation may be used to regenerate two-dimensional drawings and three-dimensional coordinates of chemical structures, and may therefore enable a compressed representation of the structure. As mentioned throughout the specification, chemical structure nomenclatures other than SMILES strings may be used.
Because the chemical structure of a chemical compound is a concept within the ontology, it may form assertions with other concepts and/or properties within the ontology. The chemical structure, its lexical names, its properties, and other information may present a multi-dimensional description of the chemical compound concept within the ontology.
In one embodiment, rules may be applied to the documents to generate “rules-based assertions” from the tagged and/or parsed concept, relationship, assertion, or other information within the corpus. The upper ontology of concept and relationship types may be used by the rules to guide the generation of these rules-based assertions.
Disambiguation may be applied as part of rules-based assertion generation. Disambiguation may utilize semantic divergence of single terms to correctly identify concepts relevant to the ontology. For a term that may have multiple meanings, disambiguation may discern what meanings are relevant to the specific domain for which one or more ontologies are to be created. The context and relationships around instances of a term (lexical name/lexical label) may be recognized and utilized for disambiguation. For example, rules used to create a disease-based ontology may create the rules-based assertion “cancer is-caused-by smoking” upon tagging the term “cancer” in a document. However, the same rules may tag the term “cancer,” but may recognize that the text “cancer is a sign of the zodiac” does not contain relevant information for a disease-based ontology.
Another example that is closely wed to ontology seeded NLP may include the text “compound x eradicates BP.” BP could be an acronym for Blood Pressure, or Bacillus pneumoniae, but since it does not make sense to eradicate blood pressure (as informed by an ontology as a priori knowledge), the system can disambiguate the acronym properly from the context to be Bacillus pneumoniae. This is an example of using the relationships in the multi-relational ontology as a seed as well as the concept types and specific instances. In practical terms, the ERADICATES relation only occurs between COMPOUND to ORGANISM, and not between COMPOUND to PHYSIOLOGICAL PHENOMENON.
The knowledge that underpins decisions such as these may be based on a full matrix analysis of previous instances of terms and/or verbs. The number of times a given verb connects all pairs of concept types may be measured and used as a guide to the likely validity of a given assertion when it is identified. For example, the verb “activates” may occur 56 times between the concept pair COMPOUND and BIOCHEMICAL PROCESS, but never between the concept pair COMPOUND and PHARMACEUTICAL COMPANY. This knowledge may be utilized by rules and/or curators to identify, disambiguate assertions, and/or for other purposes.
As mentioned above, the application of rules may be directed by the upper ontology. In defining relationship types that can exist in one or more domain specific ontologies and the rules that can be used for extraction and creation of rules-based assertions, the upper ontology may factor in semantic variations of relationships. Semantic variations may dictate that different words may be used to describe the same relationship. The upper ontology may take this variation into account. Additionally, the upper ontology may take into account the inverse of each relationship type used. As a result, the vocabulary for assertions being entered into the system is accurately controlled. By enabling this rich set of relationships for a given concept, the system of the invention may connect concepts within and across domains, and may provide a comprehensive knowledge network of what is known directly and indirectly about each particular concept.
The upper ontology may also enable flags that factor negation and inevitability of relationships into specific instances of assertions. In some embodiments, certain flags (e.g., negation, uncertainty, or others) may be used with a single form of a relationship to alter the meaning of the relationship. For example, instead of storing all the variations of the relationship “causes” (e.g., does-not-cause, may-cause) the upper ontology may simply add one or more flags to the root form “causes” when specific assertions require one of the variations. For example, a statement from a document such as “compound X does not cause disease Y” may be initially generated as the assertion “compound X causes disease Y.” The assertion may be tagged with a negation flag to indicate that the intended sense is “compound X does-not-cause disease Y.” Similarly, an inevitability flag may be used to indicate that there is a degree of uncertainty or lack of complete applicability about an original statement, e.g., “compound X may-cause disease Y.” These flags can be used together to indicate that “compound X may-not-cause disease Y.” Inverse relationship flags may also be utilized for assertions representing inverse relationships. For example, applying an inverse relationship flag to the relationship “causes” may produce the relationship “is-caused-by.” Other flags may be used alone or in combination with one another.
In one embodiment, the system and/or a curator may curate assertions by undertaking one or more actions regarding assertions within the rules-based assertion store. Examples of actions/processes of curation may include, for example, reifying/validating rules-based assertions (which entails accepting individual, many, or all assertions created by a rule or mapping), identifying new assertions (including those created by inferencing methods), editing assertions, or other actions.
In some embodiments, the actions undertaken in curation may be automated, manual, or a combination of both. For example, manual curation processes may be used when a curator has identified a novel association between two concepts in an ontology that has not previously been present at any level. The curator may directly enter these novel assertions into an ontology in a manual fashion. Manually created assertions are considered automatically validated because they are the product of human thought. However, they may still be subject to the same or similar semantic normalization and quality assurance processes as rules-based assertions.
Automated curation processes may be conducted by rules stored by the rules engine. Automated curation may also result from the application of other rules, such as extraction rules. For example, one or more rules may be run against a corpus of documents to identify and extract rules-based assertions. If a rule has been identified as sufficiently accurate (e.g., >98% accurate as determined by application against a test-corpus), the rules-based assertions that it extracts/generates may be automatically considered curated without further validation. If a rule falls below this (or other) accuracy threshold, the assertions it extracts/generates may be identified as requiring further attention. A curator may choose to perform further validation by applying a curation rule or by validating the assertions manually. Automated curation of virtual assertions may be accomplished in a similar fashion. If a mapping (rule) is identified as performing above a certain threshold, a curator may decide to reify or validate all of the virtual assertions in one step. A curator may also decide to reify them individually or in groups.
In some embodiments, curators may also work with and further annotate reified assertions in the same way as rules-based assertions.
Throughout the invention, it may be desirable to document through evidence and properties, the mechanisms by which assertions were created and curated. As such, curator information (e.g., who curated and what they did) may be associated with assertions. Accordingly, curators or other persons may filter out some or all assertions based on curator information, confidence scores, inference types, rules, mechanisms, and/or other properties.
In one embodiment, curation processes may utilize an editor module. The editor module may include an interface through which a curator interacts with various parts of the system and the data contained therein. The editor module may be used to facilitate various functions. For example, the editor module may enable an authorized individual (e.g., a curator) to engage in a curation process. Through the curation processes, one or more curators may interact with the rules-based assertions and/or create new assertions. Interacting with the rules-based assertions may include one or more of viewing the assertions and related information (e.g., evidence sets), reifying the rules-based assertions, editing the rules-based assertions, rejecting the validity of the rules-based assertions, or performing other tasks. In one embodiment, assertions whose validity has been rejected may be retained in the system alongside other “dark nodes” or assertions considered to be untrue. The curator may also use the editor module to create new assertions. In some embodiments, the editor module may be used to define and coordinate some or all automated elements of data (e.g., concept, relationship, assertion) extraction.
In some embodiments, the curator may also add tags to assertions regarding confidence weights or other weighing factors determined by the curator to be relevant to the purpose of the ontology. Confidence weights may also be added by the system through an automated process.
Curation processes may produce a plurality of reified assertions. Reified assertions may be stored in one or more databases. For convenience, this may be referred to as the reified assertion store. The reified assertion store may also include assertions resulting from manual creation/editing, and other non-rule based assertions. The rules-based assertion store and the reified assertion store may exist in the same database or may exist in separate databases. Both the rules-based assertion store and the reified assertion store may be queried by SQL or other procedures. Additionally, both the rules-based and reified assertions stores may contain version information. Version information may include information regarding the contents of the rules-based and/or reified assertion stores at particular points in time.
In one embodiment, a quality assurance module may perform various quality assurance operations on the reified assertion store. The quality assurance module may include a series of rules, which may be utilized by the rules engine to test the internal and external consistency of the assertions that comprise an ontology. The tests performed by these rules may include, for example, certain “mundane” tests such as, for example, tests for proper capitalization or connectedness of individual concepts (in some embodiments, concepts may be required to be connected to at least one other concept). Other tests may exist such as, for example, tests to ensure that concept typing is consistent with the relationships for individual concepts (upstream process/elements such as, for example, various rules and/or the upper ontology generally ensure that these will already be correct, but they still may be checked). More complex tests may include those that ensure semantic consistency. For example, if an individual concept shares 75% of its synonyms with another individual concept, they may be candidates for semantic normalization, and therefore may be flagged for manual curation.
A publishing module may then publish reified assertions as a functional ontology. In connection with publication of reified assertions, the reified assertion store may be converted from a node-centered edit schema, to a graph-centered browse schema. In some embodiments, virtual assertions derived from structured data sources may not be considered “reified.” However, if these virtual assertions are the product of high percentage rules/mappings, they may not require substantive reification during curation and may achieve a nominal “reified” status upon preparation for publication. As such, the conversion from browse schema to edit schema may also serve to reify any of the remaining un-reified virtual assertions in the system (at least those included in publication).
Publication and/or conversion (from edit to browse schema) may occur whenever it is desired to “freeze” a version of an ontology as it exists with the information accumulated at that time and use the accumulated information according to the systems and methods described herein (or with other systems or methods). In some embodiments, the publishing module may enable an administrative curator or other person with appropriate access rights to indicate that the information as it exists is to be published and/or converted (from edit to browse schema). The publishing module may then perform the conversion (from edit to browse schema) and may load a new set of tables (according to the browse schema) in a database. In some embodiments, data stored in the browse schema may be stored in a separate database from the data stored in an edit schema. In other embodiments, it may be stored in the same database.
During extraction and curation, assertions may be stored in an edit schema using a node-centered approach. Node-centered data focuses on the structural and conceptual framework of the defined logical connection between concepts and relationships. In connection with publication, however, assertions may be stored in a browse schema using a graph-centered approach.
Graph-centered views of ontology data may include the representation of assertions as concept-relationship-concept (CRC) “triplets.” In these triplets, two nodes are connected by an edge, wherein the nodes correspond to concepts and the edge corresponds to a relationship.
In one embodiment, CRC triplets may be used to produce a directed graph representing the knowledge network contained in one or more ontologies. A directed graph may include two or more interconnected CRC triplets that potentially form cyclic paths of direct and indirect relationships between concepts in an ontology or part thereof.
The elements and processes described above may be utilized in whole or in part to generate and publish one or more multi-relational, domain-specific ontologies. In some embodiments, not all elements or processes may be necessary. The one or more ontologies may be then used, collectively or individually, in whole or in part, as described below.
Once one or more ontologies are published, they can be used in a variety of ways. For example, one or more users may view one or more ontologies and perform other knowledge discovery processes via a graphical user interface (GUI) as enabled by a user interface module. A path-finding module may enable the paths of assertions existing between concepts of an ontology to be selectively navigated. A chemical support module may enable the storage, manipulation, and use of chemical structure information within an ontology. Also, the system may enable a service provider to provide various ontology services to one or more entities, including exportation of one or more ontologies (or portions thereof), the creation of custom ontologies, knowledge capture services, ontology alert services, merging of independent taxonomies or existing ontologies, optimization of queries, integration of data, and/or other services.
According to another aspect of the invention, a user interface module may enable a novel graphical user interface. The graphical user interface may enable a user to interact with one or more ontologies. In one embodiment, a graphical user interface may include a search pane. Within the search pane, a user may input a concept of interest, term of interest, or relevant string of characters. The system may search one or more ontologies for the concept of interest, term of interest, or the relevant string (including identifying and searching synonyms of concepts in the ontologies). The graphical user interface may then display the results of the search, including the name of the concepts returned by the search, their concept type, their synonyms, or other information. The user may then select a concept from the displayed results and utilize the functionality described below.
In one embodiment, the graphical user interface may utilize a chemical support module to enable a chemical search pane. The chemical search pane may be part of, or integrated with, the search pane described above. The chemical search pane may enable a user to search for chemical compounds and/or their chemical structures within one or more ontologies. The chemical search pane may enable a user to search the chemical by name, chemical formula, SMILES string (or other chemical structure nomenclature or representation), two-dimensional representation, chemical similarity, chemical substructure, or other identifier or quality. The chemical search pane may also enable a user to search for portions of chemical structures.
In one embodiment, the chemical support module may enable a chemical structure editor. The chemical structure editor may enable a user to select, create, edit, or manipulate chemical structures within one or more ontologies. For example, if the user desires to search for chemical structures by inputting a two-dimensional representation of a chemical structure into the chemical search pane, the user may construct the two-dimensional representation (or modify an existing representation) in a chemical structure editor. The chemical structure editor may enable a user to select constituent atoms and chemical bonds existing therebetween to construct a two-dimensional representation of the chemical structure of interest.
In one embodiment, the chemical support module may return a list or spreadsheet of compounds similar to a searched (or otherwise selected) chemical structure to the extent that the similar compounds exist as concepts within the searched ontologies. The user may then select a compound from the list. The selected compound may be displayed by its lexical label, as any other selected concept would be displayed by the graphical user interface in the various embodiments described below (e.g., in a hierarchical pane, multi-relational pane, etc.). The user may then utilize the totality of tools enabled by the invention as described herein to access and navigate through the knowledge directly or indirectly associated with the selected compound.
In one embodiment, the chemical support module may enable a user to select a group of chemical compounds. The compounds may be grouped by a common characteristic, or may be grouped manually by the user. The chemical support module may then enable the user to visualize the structure and analyze the similarities and differences (structural or otherwise) between the compounds in the group. This functionality, along with the ability to access a knowledge web containing direct and indirect relationships about each compound in the group, may enable further knowledge discovery between and among the compounds in the group.
In one embodiment, the chemical support module may enable a user to select a chemical compound from within one or more ontologies and use a cheminformatics software application (e.g., an application provided by Daylight Chemical Information Systems, Inc.) in conjunction with the collective data of the one or more ontologies to assess a broader set of related information. This related information may include, for example, contextually-related annotation information or other information from the structure of the class of compounds. This related information may also include biological information such as, for example, receptors that a selected compound binds to. Related information may also include legal, business, and/or other information regarding a selected compound such as, for example, patent information (e.g., rights holders, issue date, or other information) or licensing information regarding the compound. This biological, legal, business, or other information may be stored within the ontology properties of a selected compound.
In some embodiments, cheminformatics software may also enable the generation of a number of different physiochemical properties for a chemical or substructure of interest such as, for example, cLogP (a measure of hydrophobicity), hydrogen bond donor/receiver potential, surface area, volume, size/shape parameters, or other properties. These properties may be utilized to cluster compounds or substructures on the basis of similarities or difference in these properties. In some embodiments, these properties may be analyzed by exporting ontology data, including chemical data, to analysis applications. This clustering may be utilized to, for example, differentiate active/non-active or toxic/non-toxic compounds by their physiochemical properties. The chemical support module may also utilize the properties and contextually related information (e.g., biology, business, patent, or other information) of chemical structure concepts to cluster chemical structures based on biological, legal, business, or other criteria, rather than simply on physiochemical properties.
In one embodiment, one or more selected chemical compounds, their associated chemical structure, and other information may be assembled into a subset and exported to a remote location, to cheminfomatics software, or to other software or applications for use.
In one embodiment, the chemical support module may enable chemical structures existing as concepts within one or more ontologies to be displayed to a user as a two-dimensional representation of the chemical structure. Three-dimensional representations may also be enabled by the chemical support module.
In one embodiment, a chemical support module may enable the chemical structure (or a part thereof) of a chemical compound to be subject to a similarity search. The similarity search may enable a user to apply search constraints such as, for example, “return only compounds directly related to rhabdomyolysis.” The similarity search may also enable the user to select appropriate similarity or dissimilarity criteria such as, for example, Tanimoto similarity or dissimilarity, cLogP value, hydrogen bond donor/receiver potential, surface area, size/shape parameters, and/or other criteria. The user may then be presented with compounds existing within the ontology meeting the specified search constraints (if any), and similarity criteria. The user may then view the structure of any of the returned compounds and utilize the system's chemical support functionality as desired.
In some embodiments, the chemical support module may sit alongside any existing or subsequently developed chemistry infrastructure/applications. In one embodiment, a set of canonical SMILES strings are generated for each chemical structure in an ontology. An existing chemistry application may then be used to search, analyze, or otherwise browse or manipulate the chemical data to elucidate compounds of interest. These may then be compared to the SMILES strings in the ontology's structure lookup lists and all contextual information from the ontology can be associated with the compounds of interest. This feature may provide independence from the specific chemistry application and allows issues of scalability to be deferred to the existing chemistry application or cheminformatics system.
In one embodiment, the graphical user interface may include a hierarchical pane. A hierarchical pane may display a hierarchy of concept types as defined by the upper ontology. Within this hierarchy, specific instances of concept types contained within the ontology may be displayed along with certain relationships between these instances and concept types. In one embodiment the relationships that may exist here may include “instance,” “part-of,” or other relationships. Certain concepts may be instances (or parts of) of concept types and may have additional concepts organized underneath them. In one embodiment, a user may select a concept from the hierarchical pane, and view all of the descendents of that concept. The descendents may be displayed with their accompanying assertions as a list, or in a merged graph, similar to those described in detail below.
In one embodiment, the graphical user interface according to the invention may include a relationship pane. The relationship pane may display the relationships that are present in the hierarchical pane for a selected concept. For instance, the relationship pane may display the relationship between a selected concept and its parents. Because of the interconnectedness of an ontology, a given concept may have multiple hierarchical parents. Additionally, the relationship pane may display relationships up one or more levels in the hierarchy, down one or more levels in the hierarchy, or sideways in the hierarchy (e.g., synonyms).
In one embodiment, the graphical user interface according to the invention may include a multi-relational display pane. The multi-relational display pane may display multi-relational information regarding a selected concept. For example, the multi-relational display pane may display descriptive relationships or all known relationships of the selected concept from within one or more ontologies. The multi-relational display pane may enable display of these relationships in one or more forms.
In one embodiment, the multi-relational display pane may display concepts and relationships in graphical form. One form of graphical display that may be used includes a clustered cone graph. A clustered cone graph may display a selected concept as a central node, surrounded by sets of connected nodes, the sets of connected nodes being concepts connected by relationships. In one embodiment, the sets of connected nodes may be clustered or grouped by common characteristics. These common characteristics may include one or more of concept type, data source, relationship to the central node, associated property, or other common characteristic.
In one embodiment, connected nodes in a clustered cone graph may also have relationships with each other, which may be represented by edges connecting the connected nodes. Additionally, edges and nodes within a clustered cone graph may be varied in appearance to convey specific characteristics of relationships or concepts (thicker edges for high assertion confidence weights, etc). The textual information underlying a node or edge in a clustered cone graph may be displayed to a user upon user selection of a node or edge. Furthermore, a connected node may be selected by a user and placed as the central node in the graph. Accordingly, all concepts directly related to the new central node may be arranged in clustered sets around the new central node.
In one embodiment, more than one concept may be selected and placed as a merged central node (merged graph). Accordingly, all of the concepts directly related to at least one of the two or more concepts in the merged central node may be arranged in clustered sets around the merged central node. If concepts in the clustered sets have relationships to all of the merged central concepts, this quality may be indicated by varying the appearance of these connected nodes or their connecting edges (e.g., displaying them in a different color, etc.).
In one embodiment, more than one concept may be aggregated into a single connected node. That is, a node connected to a central node may represent more than one concept. For example, a central node in a clustered cone graph may be a concept “compound X.” Compound X may cause “disease Y” in many different species of animals. As such, the central node of the clustered cone graph may have numerous connected nodes, each representing disease Y as it occurs in each species. If a user is not in need of immediately investigating possible differences that disease Y may have in each separate species, each of these connected nodes may be aggregated into a single connected node. The single merged connected node may then simply represent the fact that “compound X” causes the “disease Y” in a number of species. This may simplify display of the graph, while conveying all relevant information.
In one embodiment, each of the sets of clustered nodes of a clustered cone graph may be faceted. Faceting may include grouping concepts within a clustered set by common characteristics. These common characteristics may include one or more of data source, concept type, common relationship, properties, or other characteristic. Faceting display within a set of connected nodes may take the form of a graph, a list, display of different colors, or other form. A user may sort through, and selectively apply, different types of faceting for each of the sets of connected nodes in a clustered cone graph. Furthermore, a user may switch faceting on or off for each of the sets of connected nodes within a clustered cone graph.
Additionally, faceting may also apply to a taxonomy view of ontology data. For example, a user may wish to reconstruct the organization of data represented in a taxonomy view such as, for example, chemical compound data. The user may reconstruct this taxonomic organization using therapeutic class, pharmacological class, molecular weight, or by other category or characteristic of the data. Other characteristics may be used to reconstruct organizations of other data.
In one embodiment, the multi-relational display pane of the graphical user interface may display information regarding a selected concept in list form (as opposed to the graphical form described above). That information may include all relationships for the selected concept, the label of each related concept, the type of each related concept, evidence information for each assertion of the related concepts, or other information. Evidence information for an assertion may include the number of pieces of evidence underlying the assertion or other information. Additionally, a user may select one or more of the assertions of the selected concept and aggregate all the related concepts of the selected assertions as selected concepts in the multi-relational display pane (either list view or graphical view [i.e., merged graph]).
In one embodiment, the multi-relational display pane may enable the display of confidence weights for assertions in one or more ontologies. Confidence weights may include a measure of the strength of evidence underlying an assertion. The multi-relational display pane may also enable application of filters to displayed data from one or more ontologies. Filters may selectively display data from one or more ontologies based on user preferences, user access rights, or other criteria. Furthermore, the multi-relational display pane and the hierarchical display pane may be linked, such that one or more concepts selected from one, may become selected concepts in the other.
In one embodiment, the graphical user interface of the invention may include an evidence pane. The evidence pane may display information regarding each piece of evidence for a selected assertion. The information displayed may include one or more of the data source of a piece of evidence, its version, information identifying the record or document that contains the evidence, or other information. In one embodiment, the evidence pane may include a document viewer that enables display of actual evidence-laden documents to a user. A user may also link to the data source containing the document via the evidence pane. In some embodiments, a user's access control rights may dictate the user's ability to view or link to evidence underlying a concept. For instance, a user with minimal rights may be presented with a description of the data source for a piece of evidence, but may not be able to view or access the document containing that evidence.
In one embodiment, the graphical user interface may include a details pane. The details pane may show one or more of properties, synonyms, evidence (concept evidence, not assertion evidence), or other information underlying a selected concept.
Other processes and uses may be enabled by the system of the invention. For example, the complex knowledge network of one or more ontologies maintained by the system of the invention may be used to enhance queries, semantically integrate data, contextualize information, or for other useful operations.
These and other objects, features, and advantages of the invention will be apparent through the detailed description of the preferred embodiments and the drawings attached hereto. It is also to be understood that both the foregoing general description and the following detailed description are exemplary and not restrictive of the scope of the invention.