Cell-type identification is the most crucial step in single cell RNA-seq (scRNA-seq) data analysis, for which the supervised cell-type identification method is a desired solution due to the accuracy and efficiency. The performance of such methods is highly dependent on the quality of the reference data. Even though there are many supervised cell-type identification tools, there is no method for selecting and constructing reference data. Here we develop Target-Oriented Reference Construction (TORC), a widely applicable strategy for constructing reference given target dataset in scRNA-seq supervised cell-type identification. TORC alleviates the differences in data distribution and cell-type composition between reference and target. Extensive benchmarks on simulated and real data analyses demonstrate consistent improvements in cell-type identification from TORC. TORC is freely available at https://github.com/weix21/TORC.
Keywords: Cell-type identification; Reference construction; Supervised learning; scRNA-seq.