The advent of RNA-sequencing techniques has made it possible to generate large, unbiased gene expression datasets of tissues and cell types. Several studies describing gene expression data of microglia from Alzheimer's disease or multiple sclerosis have been published, aiming to generate more insight into the role of microglia in these neurological diseases. Though the raw sequencing data are often deposited in open access databases, the most accessible source of data for scientists is what is reported in published manuscripts. We observed a relatively limited overlap in reported differentially expressed genes between various microglia RNA-sequencing studies from multiple sclerosis or Alzheimer's diseases. It was clear that differences in experimental set up influenced the number of overlapping reported genes. However, even when the experimental set up was very similar, we observed that overlap in reported genes could be low. We identified that papers reporting large numbers of differentially expressed microglial genes generally showed higher overlap with other papers. In addition, though the pathology present within the tissue used for sequencing can greatly influence microglia gene expression, often the pathology present in samples used for sequencing was underreported, leaving it difficult to assess the data. Whereas reanalyzing every raw dataset could reduce the variation that contributes to the observed limited overlap in reported genes, this is not feasible for labs without (access to) bioinformatic expertise. In this study, we thus provide an overview of data present in manuscripts and their supplementary files and how these data can be interpreted.
Keywords: Alzheimers disease; Jaccard index; RNA-seq; bioinformatics; microglia; multiple sclerosis.
© 2021 The Authors. GLIA published by Wiley Periodicals LLC.