NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads

Genomics Proteomics Bioinformatics. 2024 May 9;22(1):qzad009. doi: 10.1093/gpbjnl/qzad009.

Abstract

The high-fidelity (HiFi) long-read sequencing technology developed by PacBio has greatly improved the base-level accuracy of genome assemblies. However, these assemblies still contain base-level errors, particularly within the error-prone regions of HiFi long reads. Existing genome polishing tools usually introduce overcorrections and haplotype switch errors when correcting errors in genomes assembled from HiFi long reads. Here, we describe an upgraded genome polishing tool - NextPolish2, which can fix base errors remaining in those "highly accurate" genomes assembled from HiFi long reads without introducing excessive overcorrections and haplotype switch errors. We believe that NextPolish2 has a great significance to further improve the accuracy of telomere-to-telomere (T2T) genomes. NextPolish2 is freely available at https://github.com/Nextomics/NextPolish2.

Keywords: Error correction; Genome assembly; Genome polishing; HiFi long read; Telomere-to-telomere.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome / genetics
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Repetitive Sequences, Nucleic Acid / genetics
  • Sequence Analysis, DNA / methods
  • Software*