In the United States, more than 12% of the population will experience thyroid dysfunction. Patient symptoms often reported with thyroid dysfunction include fatigue and weight change. However, little is understood about the relationship between these symptoms documented in the outpatient setting and ordering patterns for thyroid testing among various patient groups by age and sex. We developed a natural language processing and deep learning pipeline to identify patient-reported outcomes of weight change and fatigue among patients with a thyroid stimulating hormone test. We built upon prior works by comparing 5 open-source, Bidirectional Encoder Representations from Transformers (BERT) to determine which models could accurately identify these symptoms from clinical texts. For both fatigue (f) and weight change (wc), Bio_ClinicalBERT achieved the highest F1-score (f: 0.900; wc: 0.906) compared BERT (f: 0.899; wc: 0.890), DistilBERT (f: 0.852; wc: 0.912), Biomedical RoBERTa (f: 0.864; wc: 0.904), and PubMedBERT (f: 0.882; wc: 0.892).
Keywords: Natural language processing; electronic health records; machine learning.