Updates to staging models are needed to reflect a greater understanding of tumor behavior and clinical outcomes for well-differentiated thyroid carcinomas. We used a machine learning algorithm and disease-specific survival data of differentiated thyroid carcinoma from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute to integrate clinical factors to improve prognostic accuracy. The concordance statistic (C-index) was used to cut dendrograms resulting from the learning process to generate prognostic groups. We created one computational prognostic model (7 prognostic groups with C-index = 0.8583) based on tumor size (T), regional lymph nodes (N), status of distant metastasis (M), and age to mirror the contemporary American Joint Committee on Cancer (AJCC) staging system (C-index = 0.8387). We showed that adding histologic type (papillary and follicular) improved the survival prediction of the model. We also showed that 55 is the best cutoff of age in the model, consistent with the changes from the most recent 8th edition staging manual from AJCC. The demonstrated approach has the potential to create prognostic systems permitting data driven and real time analysis that can aid decision-making in patient management and prognostication.
Keywords: C-index; cancer staging; dendrogram; machine learning; survival; thyroid cancer.