Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways

Tuo Jiang; Cong Shen; Pingjian Ding; Lingyun Luo

doi:10.1038/s41598-024-77107-0

Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways

Sci Rep. 2024 Nov 1;14(1):26321. doi: 10.1038/s41598-024-77107-0.

Authors

Tuo Jiang¹, Cong Shen², Pingjian Ding³, Lingyun Luo^{4

5}

Affiliations

¹ School of Computer Science, University of South China, Hengyang, 421001, Hunan, China.
² Department of Mathematics, National University of Singapore, Singapore, 119076, Singapore.
³ School of Computer Science, University of South China, Hengyang, 421001, Hunan, China. [email protected].
⁴ School of Computer Science, University of South China, Hengyang, 421001, Hunan, China. [email protected].
⁵ Hunan Medical Big Data International Science and Technology Innovation Cooperation Base, Hengyang, 421001, Hunan, China. [email protected].

Abstract

RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylation.

Keywords: Generative adversarial nets; Machine learning; Pathways; RNA methylation.

MeSH terms

Algorithms
Computational Biology / methods
Gene Ontology
Humans
Methylation
RNA Methylation
RNA* / genetics

Substances

RNA

Abstract

MeSH terms

Substances

Grants and funding