Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways

Sci Rep. 2024 Nov 1;14(1):26321. doi: 10.1038/s41598-024-77107-0.

Abstract

RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylation.

Keywords: Generative adversarial nets; Machine learning; Pathways; RNA methylation.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Gene Ontology
  • Humans
  • Methylation
  • RNA Methylation
  • RNA* / genetics

Substances

  • RNA