Background: Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray's 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology.
Results: Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement.
Conclusions: This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.
Keywords: Data imputation; Extracellular vesicle; Normalization; miRNA.
© 2024. The Author(s).