LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management

Pac Symp Biocomput. 2025:30:82-93.

Abstract

Over the past decade, wearable technology has dramatically changed how patients manage chronic diseases. The widespread availability of on-body sensors, such as heart rate monitors and continuous glucose monitoring (CGM) sensors, has allowed patients to have real-time data about their health. Most of these data are readily available on patients' smartphone applications, where patients can view their current and retrospective data. For patients with diabetes, CGM has transformed how their disease is managed. Many sensor devices interface with smartphones to display charts, metrics, and alerts. However, these metrics and plots may be challenging for some patients to interpret. In this work, we explore how large language models (LLMs) can be used to answer questions about CGM data. We produce an open-source benchmark of time-series question-answering tasks for CGM data in diabetes management. We evaluate different LLM frameworks to provide a performance benchmark. Lastly, we highlight the need for more research on how to optimize LLM frameworks to best handle questions about wearable data. Our benchmark is publicly available for future use and development. While this benchmark is specifically designed for diabetes care, our model implementation and several of the statistical tasks can be extended to other wearable device domains.

MeSH terms

  • Benchmarking*
  • Blood Glucose / analysis
  • Blood Glucose Self-Monitoring* / instrumentation
  • Blood Glucose Self-Monitoring* / statistics & numerical data
  • Computational Biology*
  • Continuous Glucose Monitoring
  • Diabetes Mellitus* / blood
  • Diabetes Mellitus* / therapy
  • Humans
  • Language
  • Mobile Applications
  • Smartphone
  • Wearable Electronic Devices / statistics & numerical data

Substances

  • Blood Glucose