Reliable preoperative diagnosis of malignant thyroid tumors remains challenging because of the inconclusive cytological examination of fine-needle aspiration biopsies. Although numerous studies have successfully demonstrated the use of high-throughput molecular diagnostics in cancer prediction, the application of microarrays in routine clinical use remains limited. Our aim was, therefore, to identify a small subset of genes to develop a practical and inexpensive diagnostic tool for clinical use. We developed a two-step feature selection method composed of a linear models for microarray data (LIMMA) linear model and an iterative Bayesian model averaging model to identify a suitable gene set signature. Using one public dataset for training, we discovered a three-gene signature dipeptidyl-peptidase 4 (DPP4), secretogranin V (SCG5) and carbonic anhydrase XII (CA12). We then evaluated the robustness of our gene set using three other independent public datasets. The gene signature accuracy was 85.7, 78.8 and 85.7%, respectively. For experimental validation, we collected 70 thyroid samples from surgery and our three-gene signature method achieved an accuracy of 94.3% by quantitative polymerase chain reaction (QPCR) experiment. Furthermore, immunohistochemistry in 29 samples showed proteins expressed by these three genes are also differentially expressed in thyroid samples. Our protocol discovered a robust three-gene signature that can distinguish benign from malignant thyroid tumors, which will have daily clinical application.
Keywords: biomarkers; diagnostic panel; machine learning; prediction model; thyroid cancer.
© 2014 UICC.