Motivation: 5-Methylcytosine (m5c), a modified cytosine base, arises from adding a methyl group at the 5th carbon position. This modification is a prevalent form of post-transcriptional modification (PTM) found in various types of RNA. Traditional laboratory techniques often fail to provide rapid and accurate identification of m5c sites. However, with the growing accessibility of sequence data, expanding computational models offers a more efficient and reliable approach to m5c site detection. This research focused on creating advanced in-silico methods using ensemble learning techniques. The encoded data was processed through ensemble models, including bagging and boosting techniques. These models were then rigorously evaluated through independent testing and 10-fold cross-validation.
Results: Among the models tested, the Bagging ensemble-based predictor, m5C-iEnsem, demonstrated superior performance to existing m5c prediction tools.
Availability and implementation: To further support the research community, m5c-iEnsem has been made available via a user-friendly web server at https://m5c-iensem.streamlit.app/.
© The Author(s) 2024. Published by Oxford University Press.