What can we infer about mutation calling by using time-series mutation accumulation data and a Bayesian Mutation Finder?

Ecol Evol. 2024 Nov 10;14(11):e70339. doi: 10.1002/ece3.70339. eCollection 2024 Nov.

Abstract

Accurate estimates of mutation rates derived from genome-wide mutation accumulation (MA) data are fundamental to understanding basic evolutionary processes. The rapidly improving high-throughput sequencing technologies provide unprecedented opportunities to identify single nucleotide mutations across genomes. However, such MA derived data are often difficult to analyze and the performance of the available methods of analysis is not well understood. In this study, we used the existing Bayesian Genotype Caller adapted for MA data that we refer to as Bayesian Mutation Finder (BMF) for identifying single nucleotide mutations while considering the characteristics of the data. We compared the performance of BMF with the widely used Genome Analysis Toolkit (GATK) by applying these two methods to time-series MA data as well as simulated data. The time-series data were obtained by propagating Daphnia pulex over an average of 188 generations and performing whole-genome sequencing of 14 MA lines across three time points. The results indicate that BMF enables more accurate identification of single nucleotide mutations than GATK especially when applied to the empirical data. Furthermore, BMF involves the use of fewer parameters and is more computationally efficient than GATK. Both BMF and GATK found surprisingly many candidate mutations that were not confirmed at later time points. We systematically infer causes of the unconfirmed candidate mutations, introduce a framework for estimating mutation rates based on genome-wide candidate mutations confirmed by subsequent sequencing, and provide an improved mutation rate estimate for D. pulex.

Keywords: Bayesian Mutation Finder; Daphnia pulex; mutation rate; single nucleotide mutations; time‐series mutation accumulation data.