Objective: The Dutch Congenital hypothyroidism (CH) Newborn Screening (NBS) algorithm for thyroidal and central congenital hypothyroidism (CH-T and CH-C, respectively) is primarily based on determination of thyroxine (T4) concentrations in dried blood spots, followed by thyroid-stimulating hormone (TSH) and thyroxine-binding globulin (TBG) measurements enabling detection of both CH-T and CH-C, with a positive predictive value (PPV) of 21%. A calculated T4/TBG ratio serves as an indirect measure for free T4. The aim of this study is to investigate whether machine learning techniques can help to improve the PPV of the algorithm without missing the positive cases that should have been detected with the current algorithm.
Design & methods: NBS data and parameters of CH patients and false-positive referrals in the period 2007-2017 and of a healthy reference population were included in the study. A random forest model was trained and tested using a stratified split and improved using synthetic minority oversampling technique (SMOTE). NBS data of 4668 newborns were included, containing 458 CH-T and 82 CH-C patients, 2332 false-positive referrals and 1670 healthy newborns.
Results: Variables determining identification of CH were (in order of importance) TSH, T4/TBG ratio, gestational age, TBG, T4 and age at NBS sampling. In a Receiver-Operating Characteristic (ROC) analysis on the test set, current sensitivity could be maintained, while increasing the PPV to 26%.
Conclusions: Machine learning techniques have the potential to improve the PPV of the Dutch CH NBS. However, improved detection of currently missed cases is only possible with new, better predictors of especially CH-C and a better registration and inclusion of these cases in future models.
Keywords: Congenital hypothyroidism; Machine learning; Neonatal screening; Random forest.
Copyright © 2023. Published by Elsevier Inc.