ReAlign-N: an integrated realignment approach for multiple nucleic acid sequence alignment, combining global and local realignments

Yixiao Zhai; Tong Zhou; Yanming Wei; Quan Zou; Yansu Wang

doi:10.1093/nargab/lqae170

ReAlign-N: an integrated realignment approach for multiple nucleic acid sequence alignment, combining global and local realignments

NAR Genom Bioinform. 2024 Dec 18;6(4):lqae170. doi: 10.1093/nargab/lqae170. eCollection 2024 Dec.

Authors

Yixiao Zhai^{1

2}, Tong Zhou^{1

2}, Yanming Wei^{2

3}, Quan Zou^{1

2}, Yansu Wang^{1

2}

Affiliations

¹ Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.2006, Xiyuan Avenue, Pidu Zone, Chengdu 610054, China.
² Institute of Digital Health, Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1, Chengdian Road, Kecheng Zone, Quzhou 324003, China.
³ School of Computer Science and Technology, Xidian University, No.266, Xifeng Road, Chang'an Zone, Xi'an 710071, China.

Abstract

Ensuring accurate multiple sequence alignment (MSA) is essential for comprehensive biological sequence analysis. However, the complexity of evolutionary relationships often results in variations that generic alignment tools may not adequately address. Realignment is crucial to remedy this issue. Currently, there is a lack of realignment methods tailored for nucleic acid sequences, particularly for lengthy sequences. Thus, there's an urgent need for the development of realignment methods better suited to address these challenges. This study presents ReAlign-N, a realignment method explicitly designed for multiple nucleic acid sequence alignment. ReAlign-N integrates both global and local realignment strategies for improved accuracy. In the global realignment phase, ReAlign-N incorporates K-Band and innovative memory-saving technology into the dynamic programming approach, ensuring high efficiency and minimal memory requirements for large-scale realignment tasks. The local realignment stage employs full matching and entropy scoring methods to identify low-quality regions and conducts realignment through MAFFT. Experimental results demonstrate that ReAlign-N consistently outperforms initial alignments on simulated and real datasets. Furthermore, compared to ReformAlign, the only existing multiple nucleic acid sequence realignment tool, ReAlign-N, exhibits shorter running times and occupies less memory space. The source code and test data for ReAlign-N are available on GitHub (https://github.com/malabz/ReAlign-N).

Associated data

figshare/10.6084/m9.figshare.25801384.v1