Fast exact gap-affine partial order alignment with POASTA

Lucas R van Dijk; Abigail L Manson; Ashlee M Earl; Kiran V Garimella; Thomas Abeel

doi:10.1093/bioinformatics/btae757

Fast exact gap-affine partial order alignment with POASTA

Bioinformatics. 2025 Jan 3:btae757. doi: 10.1093/bioinformatics/btae757. Online ahead of print.

Authors

Lucas R van Dijk^{1

2}, Abigail L Manson¹, Ashlee M Earl¹, Kiran V Garimella³, Thomas Abeel^{1

2}

Affiliations

¹ Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States.
² Delft Bioinformatics Lab, TU Delft, Van Mourik Broekmanweg 6, 2628 XE, Delft, Zuid-Holland, The Netherlands.
³ Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States.

PMID: 39752324
DOI: 10.1093/bioinformatics/btae757

Abstract

Motivation: Partial order alignment is a widely used method for computing multiple sequence alignments, with applications in genome assembly and pangenomics, among many others. Current algorithms to compute the optimal, gap-affine partial order alignment do not scale well to larger graphs and sequences. While heuristic approaches exist, they do not guarantee optimal alignment and sacrifice alignment accuracy.

Results: We present POASTA, a new optimal algorithm for partial order alignment that exploits long stretches of matching sequence between the graph and a query. We benchmarked POASTA against the state-of-the-art on several diverse bacterial gene datasets and demonstrated an average speed-up of 4.1x and up to 9.8x, using less memory. POASTA's memory scaling characteristics enabled the construction of much larger POA graphs than previously possible, as demonstrated by megabase-length alignments of 342 Mycobacterium tuberculosis sequences.

Availability and implementation: POASTA is available on Github at https://github.com/broadinstitute/poasta.

Keywords: multiple sequence alignment; pangenome graphs; partial order alignment.