Analysis of selection in protein-coding sequences accounting for common biases

Brief Bioinform. 2021 Sep 2;22(5):bbaa431. doi: 10.1093/bib/bbaa431.

Abstract

The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.

Keywords: dN/dS estimation; dN/dS interpretation; codon evolution; codon frequencies; molecular adaptation; recombination.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Evolution, Molecular*
  • Models, Genetic*
  • Mutation, Missense*
  • Open Reading Frames*
  • Proteins / genetics*
  • Selection, Genetic*

Substances

  • Proteins