Factors Associated with Missing Sociodemographic Data in the IRIS® (Intelligent Research in Sight) Registry

Ophthalmol Sci. 2024 Apr 30;4(6):100542. doi: 10.1016/j.xops.2024.100542. eCollection 2024 Nov-Dec.

Abstract

Purpose: To describe the prevalence of missing sociodemographic data in the IRIS® (Intelligent Research in Sight) Registry and to identify practice-level characteristics associated with missing sociodemographic data.

Design: Cross-sectional study.

Participants: All patients with clinical encounters at practices participating in the IRIS Registry prior to December 31, 2020.

Methods: We describe geographic and temporal trends in the prevalence of missing data for each sociodemographic variable (age, sex, race, ethnicity, geographic location, insurance type, and smoking status). Each practice contributing data to the registry was categorized based on the number of patients, number of physicians, geographic location, patient visit frequency, and patient population demographics.

Main outcome measures: Multivariable linear regression was used to describe the association of practice-level characteristics with missing patient-level sociodemographic data.

Results: This study included the electronic health records of 66 477 365 patients receiving care at 3306 practices participating in the IRIS Registry. The median number of patients per practice was 11 415 (interquartile range: 5849-24 148) and the median number of physicians per practice was 3 (interquartile range: 1-7). The prevalence of missing patient sociodemographic data were 0.1% for birth year, 0.4% for sex, 24.8% for race, 30.2% for ethnicity, 2.3% for 3-digit zip code, 14.8% for state, 5.5% for smoking status, and 17.0% for insurance type. The prevalence of missing data increased over time and varied at the state-level. Missing race data were associated with practices that had fewer visits per patient (P < 0.001), cared for a larger nonprivately insured patient population (P = 0.001), and were located in urban areas (P < 0.001). Frequent patient visits were associated with a lower prevalence of missing race (P < 0.001), ethnicity (P < 0.001), and insurance (P < 0.001), but a higher prevalence of missing smoking status (P < 0.001).

Conclusions: There are geographic and temporal trends in missing race, ethnicity, and insurance type data in the IRIS Registry. Several practice-level characteristics, including practice size, geographic location, and patient population, are associated with missing sociodemographic data. While the prevalence and patterns of missing data may change in future versions of the IRIS registry, there will remain a need to develop standardized approaches for minimizing potential sources of bias and ensure reproducibility across research studies.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords: Electronic health records; IRIS registry; Missing data; Sociodemographic data.