Analyzing sex imbalance in EGA and dbGaP biological databases: Recommendations for better practices

iScience. 2024 Sep 23;27(10):110831. doi: 10.1016/j.isci.2024.110831. eCollection 2024 Oct 18.

Abstract

Precision medicine aims at tailoring treatments to individual patient's characteristics. In this regard, recognizing the significance of sex and gender becomes indispensable for meeting the distinct healthcare needs of diverse populations. To this end, continuing a trend of improving data quality observed since 2014, the European Genome-phenome Archive (EGA) established a policy in 2018 that mandates data providers to declare the sex of donor samples, aiming to enhance data accuracy and prevent imbalance in sex classification. We analyzed sex classification imbalance in human data from EGA and the U.S. counterpart, the database of genotypes and phenotypes (dbGaP). Our findings show a significant decrease in samples classified as unknown in EGA, potentially promoting better sex reporting during data collection. Based on our findings, we raise awareness of sample imbalance problems and provide a list of recommendations for enhancing biomedical research practices.

Keywords: Artificial intelligence; Genomics; Human genetics.