DRAMS: A tool to detect and re-align mixed-up samples for integrative studies of multi-omics data

PLoS Comput Biol. 2020 Apr 13;16(4):e1007522. doi: 10.1371/journal.pcbi.1007522. eCollection 2020 Apr.

Abstract

Studies of complex disorders benefit from integrative analyses of multiple omics data. Yet, sample mix-ups frequently occur in multi-omics studies, weakening statistical power and risking false findings. Accurately aligning sample information, genotype, and corresponding omics data is critical for integrative analyses. We developed DRAMS (https://github.com/Yi-Jiang/DRAMS) to Detect and Re-Align Mixed-up Samples to address the sample mix-up problem. It uses a logistic regression model followed by a modified topological sorting algorithm to identify the potential true IDs based on data relationships of multi-omics. According to tests using simulated data, the more types of omics data used or the smaller the proportion of mix-ups, the better that DRAMS performs. Applying DRAMS to real data from the PsychENCODE BrainGVEX project, we detected and corrected 201 (12.5% of total data generated) mix-ups. Of the 21 mix-ups involving errors of racial identity, DRAMS re-assigned all data to the correct racial group in the 1000 Genomes project. In doing so, quantitative trait loci (QTL) (FDR<0.01) increased by an average of 1.62-fold. The use of DRAMS in multi-omics studies will strengthen statistical power of the study and improve quality of the results. Even though very limited studies have multi-omics data in place, we expect such data will increase quickly with the needs of DRAMS.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatin / chemistry
  • Computational Biology / methods*
  • Computer Simulation
  • Ethnicity
  • Female
  • Frontal Lobe / metabolism*
  • Genome
  • Genomics / methods*
  • Genotype
  • Humans
  • Logistic Models
  • Male
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis
  • Polymorphism, Single Nucleotide*
  • RNA-Seq
  • Reproducibility of Results
  • Sex Factors
  • Software
  • User-Computer Interface
  • Whole Genome Sequencing

Substances

  • Chromatin