Disease association and comparative genomics of compositional bias in human proteins

F1000Res. 2023 Feb 20:12:198. doi: 10.12688/f1000research.129929.2. eCollection 2023.

Abstract

Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.

Keywords: compositional bias; disease-associated gene; human disease; human genome; intrinsically disordered protein (IDP); intrinsically disordered region (IDR); low complexity; phylogenetic profile.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Genome, Human
  • Genomics*
  • Humans
  • Phylogeny
  • Proteome* / genetics

Substances

  • Proteome

Grants and funding

This research was co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project “Reinforcement of Postdoctoral Researchers - 2nd Cycle” (MIS-5033021), implemented by the State Scholarships Foundation (IKY). The work was also supported by Elixir-GR (grant # MIS 5002780), implemented under the Action “Reinforcement of the Research & Innovation Infrastructure,” funded by the Operational Program Competitiveness, Entrepreneurship, & Innovation (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).