BatchEval Pipeline: batch effect evaluation workflow for multiple datasets joint analysis

GigaByte. 2024 Feb 20:2024:gigabyte108. doi: 10.46471/gigabyte.108. eCollection 2024.

Abstract

As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.

Availability & implementation: The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.

Grants and funding

This work was supported by the National Key R&D Program of China (2022YFC3400400).