Opportunities and challenges in long-read sequencing data analysis

Shanika L Amarasinghe; Shian Su; Xueyi Dong; Luke Zappia; Matthew E Ritchie; Quentin Gouil

doi:10.1186/s13059-020-1935-5

Opportunities and challenges in long-read sequencing data analysis

Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.

Authors

Shanika L Amarasinghe^{1

2}, Shian Su^{1

2}, Xueyi Dong^{1

2}, Luke Zappia^{3

4}, Matthew E Ritchie^{1

2

5}, Quentin Gouil^{6

7}

Affiliations

¹ Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
² Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia.
³ Bioinformatics, Murdoch Children's Research Institute, Parkville, 3052, Australia.
⁴ School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010, Australia.
⁵ School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010, Australia.
⁶ Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia. [email protected].
⁷ Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia. [email protected].

Abstract

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

Keywords: Data analysis; Long-read sequencing; Oxford Nanopore; PacBio.

Publication types

Review

MeSH terms

Animals
Data Science / methods
Data Science / standards
Genomics / methods*
Genomics / standards
Humans
Nanopore Sequencing / methods*
Nanopore Sequencing / standards
Whole Genome Sequencing / methods*
Whole Genome Sequencing / standards