Crumble: reference free lossy compression of sequence quality values

Bioinformatics. 2019 Jan 15;35(2):337-339. doi: 10.1093/bioinformatics/bty608.

Abstract

Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.

Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details).

Availability and implementation: Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Compression*
  • High-Throughput Nucleotide Sequencing*