Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention

Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.

Abstract

Motivation: In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date.

Results: To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies' basecallers.

Availabilityand implementation: The source code (halcyon) can be found at https://github.com/relastle/halcyon.

MeSH terms

  • DNA
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Nanopore Sequencing*
  • Nanopores*
  • Sequence Analysis, DNA
  • Software

Substances

  • DNA