Tree shape metrics can be computed fast for trees of any size, which makes them promising alternatives to intensive statistical methods and parameter-rich evolutionary models in the era of massive data availability. Previous studies have demonstrated their effectiveness in unveiling important parameters in viral evolutionary dynamics, although the impact of natural selection on the shape of tree topologies has not been thoroughly investigated. We carried out a forward-time and individual-based simulation to investigate whether tree shape metrics of several kinds could predict the selection regime employed to generate the data. To examine the impact of the genetic diversity of the founder viral population, simulations were run under two opposing starting configurations of the genetic diversity of the infecting viral population. We found that four evolutionary regimes, namely, negative, positive, and frequency-dependent selection, as well as neutral evolution, were successfully distinguished by tree topology shape metrics. Two metrics from the Laplacian spectral density profile (principal eigenvalue and peakedness) and the number of cherries were the most informative for indicating selection type. The genetic diversity of the founder population had an impact on differentiating evolutionary scenarios. Tree imbalance, which has been frequently associated with the action of natural selection on intrahost viral diversity, was also characteristic of neutrally evolving serially sampled data. Metrics calculated from empirical analysis of HIV datasets indicated that most tree topologies exhibited shapes closer to the frequency-dependent selection or neutral evolution regimes.
Keywords: Big data; Intrahost evolution; Tree topology shape; Virus diversity; Virus evolution.
Copyright © 2023 Elsevier Inc. All rights reserved.