The genetic diversity of human papillomavirus (HPV) 16 within cervical cells and tissue is usually associated with persistent virus infection and precancerous lesions. To explore the HPV16 mutation patterns contributing to the cervical cancer (CC) progression, a total of 199 DNA samples from HPV16-positive cervical specimens were collected and divided into high-grade squamous intraepithelial lesion (HSIL) and the non-HSIL(NHSIL) groups. The HPV16 E6 region (nt 7125-7566) was sequenced using next-generation sequencing. Based on HPV16 E6 amino acid mutation features selected by Lasso algorithm, four machine learning approaches were used to establish HSIL prediction models. The receiver operating characteristic was used to evaluate the model performance in both training and validation cohorts. Western blot was used to detect the degradation of p53 by the E6 variants. Based on the 13 significant mutation features, the logistic regression (LR) model demonstrated the best predictive performance in the training cohort (AUC = 0.944, 95% CI: 0.913-0.976), and also achieved a high discriminative ability in the independent validation cohort (AUC = 0.802, 95% CI: 0.601-1.000). Among these features, the E6 D32E and H85Y variants have higher ability to degrade p53 compared to the E6 wildtype (P < 0.05). In conclusion, our study provides evidence for the first time that HPV16 E6 sequences contain vital mutation features in predicting HSIL. Moreover, the D32E and H85Y variants of E6 exhibited a significantly higher ability to degrade p53, which may play a vital role in the development of CC. IMPORTANCE The study provides evidence for the first time that HPV16 E6 sequences contain vital mutation features in predicting the high-grade squamous intraepithelial lesion and can reduce even more unneeded colposcopies without a loss of sensitivity to detect cervical cancer. Moreover, the D32E and H85Y variants of E6 exhibited a significantly higher ability to degrade p53, which may play a vital role in the development of cervical cancer.
Keywords: E6 oncoprotein; cervical cancer; human papillomavirus type 16; machine learning; next-generation sequencing.