Sfaira accelerates data and model reuse in single cell genomics

Genome Biol. 2021 Aug 25;22(1):248. doi: 10.1186/s13059-021-02452-6.

Abstract

Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

Keywords: Data zoo; Model zoo; Single-cell genomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Databases, Genetic
  • Gene Ontology
  • Genomics*
  • Humans
  • Mice
  • Molecular Sequence Annotation
  • Reproducibility of Results
  • Single-Cell Analysis*
  • Statistics as Topic