Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

PLoS Comput Biol. 2020 Nov 30;16(11):e1008422. doi: 10.1371/journal.pcbi.1008422. eCollection 2020 Nov.

Abstract

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bone Marrow Cells / metabolism
  • Chromatin / metabolism*
  • Cluster Analysis
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Leukemia / genetics*
  • Leukemia / pathology

Substances

  • Chromatin

Grants and funding

This research was supported by JSPS KAKENHI Grant Numbers JP19K16740 (AT), JP18J40119 (AT), JP19H03689 (MM), JP20H03514 (JiY), and by Japan Agency for Medical Research and Development (AMED) Grant Numbers JP20fk0108088h0002 (MM), JP17km0405207h0002 (AF), JP18km0405207S0103 (AF), and by a grant from the Naito Foundation (AT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.