PriLive: privacy-preserving real-time filtering for next-generation sequencing

Tobias P Loka; Simon H Tausch; Piotr W Dabrowski; Aleksandar Radonic; Andreas Nitsche; Bernhard Y Renard

doi:10.1093/bioinformatics/bty128

PriLive: privacy-preserving real-time filtering for next-generation sequencing

Bioinformatics. 2018 Jul 15;34(14):2376-2383. doi: 10.1093/bioinformatics/bty128.

Authors

Tobias P Loka¹, Simon H Tausch^{1

2}, Piotr W Dabrowski¹, Aleksandar Radonic^{2

3}, Andreas Nitsche², Bernhard Y Renard¹

Affiliations

¹ Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure.
² Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1).
³ Genome Sequencing Unit (MF 2), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany.

PMID: 29522157
DOI: 10.1093/bioinformatics/bty128

Abstract

Motivation: In next-generation sequencing, re-identification of individuals and other privacy-breaching strategies can be applied even for anonymized data. This also holds true for applications in which human DNA is acquired as a by-product, e.g. for viral or metagenomic samples from a human host. Conventional data protection strategies including cryptography and post-hoc filtering are only appropriate for the final and processed sequencing data. This can result in an insufficient level of data protection and a considerable time delay in the further analysis workflow.

Results: We present PriLive, a novel tool for the automated removal of sensitive data while the sequencing machine is running. Thereby, human sequence information can be detected and removed before being completely produced. This facilitates the compliance with strict data protection regulations. The unique characteristic to cause almost no time delay for further analyses is also a clear benefit for applications other than data protection. Especially if the sequencing data are dominated by known background signals, PriLive considerably accelerates consequent analyses by having only fractions of input data. Besides these conceptual advantages, PriLive achieves filtering results at least as accurate as conventional post-hoc filtering tools.

Availability and implementation: PriLive is open-source software available at https://gitlab.com/rki_bioinformatics/PriLive.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genetic Privacy*
Genomics / methods*
High-Throughput Nucleotide Sequencing / methods*
Humans
Sequence Analysis, DNA / methods
Sequence Analysis, RNA / methods
Software*