Genetic Discovery Enabled by A Large Language Model

bioRxiv [Preprint]. 2023 Nov 12:2023.11.09.566468. doi: 10.1101/2023.11.09.566468.

Abstract

Artificial intelligence (AI) has been used in many areas of medicine, and recently large language models (LLMs) have shown potential utility for clinical applications. However, since we do not know if the use of LLMs can accelerate the pace of genetic discovery, we used data generated from mouse genetic models to investigate this possibility. We examined whether a recently developed specialized LLM (Med-PaLM 2) could analyze sets of candidate genes generated from analysis of murine models of biomedical traits. In response to free-text input, Med-PaLM 2 correctly identified the murine genes that contained experimentally verified causative genetic factors for six biomedical traits, which included susceptibility to diabetes and cataracts. Med-PaLM 2 was also able to analyze a list of genes with high impact alleles, which were identified by comparative analysis of murine genomic sequence data, and it identified a causative murine genetic factor for spontaneous hearing loss. Based upon this Med-PaLM 2 finding, a novel bigenic model for susceptibility to spontaneous hearing loss was developed. These results demonstrate Med-PaLM 2 can analyze gene-phenotype relationships and generate novel hypotheses, which can facilitate genetic discovery.

Publication types

  • Preprint