-
An Open-Source Knowledge Graph Ecosystem for the Life Sciences
Authors:
Tiffany J. Callahan,
Ignacio J. Tripodi,
Adrianne L. Stefanski,
Luca Cappelletti,
Sanya B. Taneja,
Jordan M. Wyrwa,
Elena Casiraghi,
Nicolas A. Matentzoglu,
Justin Reese,
Jonathan C. Silverstein,
Charles Tapley Hoyt,
Richard D. Boyce,
Scott A. Malec,
Deepak R. Unni,
Marcin P. Joachimiak,
Peter N. Robinson,
Christopher J. Mungall,
Emanuele Cavalleri,
Tommaso Fontana,
Giorgio Valentini,
Marco Mesiti,
Lucas A. Gillenwater,
Brook Santangelo,
Nicole A. Vasilevsky,
Robert Hoehndorf
, et al. (7 additional authors not shown)
Abstract:
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integrat…
▽ More
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
△ Less
Submitted 30 January, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Developing a Knowledge Graph Framework for Pharmacokinetic Natural Product-Drug Interactions
Authors:
Sanya B. Taneja,
Tiffany J. Callahan,
Mary F. Paine,
Sandra L. Kane-Gill,
Halil Kilicoglu,
Marcin P. Joachimiak,
Richard D. Boyce
Abstract:
Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and…
▽ More
Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature, constructed with the Phenotype Knowledge Translator framework and the semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through path searches and meta-path discovery to determine congruent and contradictory information compared to ground truth data. The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify pharmacokinetic interactions involving enzymes, transporters, and pharmaceutical drugs. We envision that NP-KG will facilitate improved human-machine collaboration to guide researchers in future studies of pharmacokinetic NPDIs. The NP-KG framework is publicly available at https://doi.org/10.5281/zenodo.6814507 and https://github.com/sanyabt/np-kg.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality
Authors:
Tiffany J. Callahan,
Adrianne L. Stefanski,
Jordan M. Wyrwa,
Chenjie Zeng,
Anna Ostropolets,
Juan M. Banda,
William A. Baumgartner Jr.,
Richard D. Boyce,
Elena Casiraghi,
Ben D. Coleman,
Janine H. Collins,
Sara J. Deakyne-Davies,
James A. Feinstein,
Melissa A. Haendel,
Asiyah Y. Lin,
Blake Martin,
Nicolas A. Matentzoglu,
Daniella Meeker,
Justin Reese,
Jessica Sinclair,
Sanya B. Taneja,
Katy E. Trinkley,
Nicole A. Vasilevsky,
Andrew Williams,
Xingman A. Zhang
, et al. (7 additional authors not shown)
Abstract:
Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OB…
▽ More
Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. Objective: We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Results: Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. Conclusions: By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
△ Less
Submitted 30 January, 2023; v1 submitted 10 September, 2022;
originally announced September 2022.
-
Introducing Information Retrieval for Biomedical Informatics Students
Authors:
Sanya B. Taneja,
Richard D. Boyce,
William T. Reynolds,
Denis Newman-Griffis
Abstract:
Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BMI students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities pr…
▽ More
Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BMI students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities provide students with hands-on experience targeted towards common use cases, and introduce fundamental components of NLP workflows for a wide variety of applications.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage
Authors:
Abhishek Abhishek,
Sanya Bathla Taneja,
Garima Malik,
Ashish Anand,
Amit Awekar
Abstract:
Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly…
▽ More
Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly annotated while covering a large spectrum of entity types. Our work directly addresses this issue. We propose Heuristics Allied with Distant Supervision (HAnDS) framework to automatically construct a quality dataset suitable for the FgER task. HAnDS framework exploits the high interlink among Wikipedia and Freebase in a pipelined manner, reducing annotation errors introduced by naively using distant supervision approach. Using HAnDS framework, we create two datasets, one suitable for building FgER systems recognizing up to 118 entity types based on the FIGER type hierarchy and another for up to 1115 entity types based on the TypeNet hierarchy. Our extensive empirical experimentation warrants the quality of the generated datasets. Along with this, we also provide a manually annotated dataset for benchmarking FgER systems.
△ Less
Submitted 30 April, 2019;
originally announced April 2019.