A Computable Phenotype Model for Classification of Men Who Have Sex With Men Within a Large Linked Database of Laboratory, Surveillance, and Administrative Healthcare Records

Front Digit Health. 2020 Oct 6:2:547324. doi: 10.3389/fdgth.2020.547324. eCollection 2020.

Abstract

Background: Most public health datasets do not include sexual orientation measures, thereby limiting the availability of data to monitor health disparities, and evaluate tailored interventions. We therefore developed, validated, and applied a novel computable phenotype model to classify men who have sex with men (MSM) using multiple health datasets from British Columbia, Canada, 1990-2015. Methods: Three case surveillance databases, a public health laboratory database, and five administrative health databases were linked and deidentified (BC Hepatitis Testers Cohort), resulting in a retrospective cohort of 727,091 adult men. Known MSM status from the three disease case surveillance databases was used to develop a multivariable model for classifying MSM in the full cohort. Models were selected using "elastic-net" (GLMNet package) in R, and a final model optimized area under the receiver operating characteristics curve. We compared characteristics of known MSM, classified MSM, and classified heterosexual men. Findings: History of gonorrhea and syphilis diagnoses, HIV tests in the past year, history of visit to an identified gay and bisexual men's clinic, and residence in MSM-dense neighborhoods were all positively associated with being MSM. The selected model had sensitivity of 72%, specificity of 94%. Excluding those with known MSM status, a total of 85,521 men (12% of cohort) were classified as MSM. Interpretation: Computable phenotyping is a promising approach for classification of sexual minorities and investigation of health outcomes in the absence of routinely available self-report data.

Keywords: HIV; administrative data; big data; computable phenotypes; sexual and gender minorities.