GENCODE 2025: reference gene annotation for human and mouse

Jonathan M Mudge; Sílvia Carbonell-Sala; Mark Diekhans; Jose Gonzalez Martinez; Toby Hunt; Irwin Jungreis; Jane E Loveland; Carme Arnan; If Barnes; Ruth Bennett; Andrew Berry; Alexandra Bignell; Daniel Cerdán-Vélez; Kelly Cochran; Lucas T Cortés; Claire Davidson; Sarah Donaldson; Cagatay Dursun; Reham Fatima; Matthew Hardy; Prajna Hebbar; Zoe Hollis; Benjamin T James; Yunzhe Jiang; Rory Johnson; Gazaldeep Kaur; Mike Kay; Riley J Mangan; Miguel Maquedano; Laura Martínez Gómez; Nourhen Mathlouthi; Ryan Merritt; Pengyu Ni; Emilio Palumbo; Tamara Perteghella; Fernando Pozo; Shriya Raj; Cristina Sisu; Emily Steed; Dulika Sumathipala; Marie-Marthe Suner; Barbara Uszczynska-Ratajczak; Elizabeth Wass; Yucheng T Yang; Dingyao Zhang; Robert D Finn; Mark Gerstein; Roderic Guigó; Tim J P Hubbard; Manolis Kellis; Anshul Kundaje; Benedict Paten; Michael L Tress; Ewan Birney; Fergal J Martin; Adam Frankish

doi:10.1093/nar/gkae1078

GENCODE 2025: reference gene annotation for human and mouse

Nucleic Acids Res. 2024 Nov 20:gkae1078. doi: 10.1093/nar/gkae1078. Online ahead of print.

Authors

Jonathan M Mudge¹, Sílvia Carbonell-Sala², Mark Diekhans³, Jose Gonzalez Martinez¹, Toby Hunt¹, Irwin Jungreis^{4

5}, Jane E Loveland¹, Carme Arnan², If Barnes¹, Ruth Bennett¹, Andrew Berry¹, Alexandra Bignell¹, Daniel Cerdán-Vélez⁶, Kelly Cochran⁷, Lucas T Cortés¹, Claire Davidson¹, Sarah Donaldson¹, Cagatay Dursun^{8

9}, Reham Fatima¹, Matthew Hardy¹, Prajna Hebbar³, Zoe Hollis¹, Benjamin T James^{4

5}, Yunzhe Jiang^{8

9}, Rory Johnson^{10

11}, Gazaldeep Kaur², Mike Kay¹, Riley J Mangan^{4

5

12}, Miguel Maquedano⁶, Laura Martínez Gómez⁶, Nourhen Mathlouthi¹, Ryan Merritt¹, Pengyu Ni^{8

9}, Emilio Palumbo², Tamara Perteghella^{2

13}, Fernando Pozo⁶, Shriya Raj¹, Cristina Sisu^{9

14}, Emily Steed¹, Dulika Sumathipala¹, Marie-Marthe Suner¹, Barbara Uszczynska-Ratajczak¹⁵, Elizabeth Wass¹, Yucheng T Yang^{9

16}, Dingyao Zhang^{8

9}, Robert D Finn¹, Mark Gerstein^{8

9}, Roderic Guigó^{2

13}, Tim J P Hubbard^{17

18}, Manolis Kellis^{4

5}, Anshul Kundaje^{7

19}, Benedict Paten³, Michael L Tress⁶, Ewan Birney¹, Fergal J Martin¹, Adam Frankish¹

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
² Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain.
³ UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA.
⁴ Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA.
⁵ The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA.
⁶ Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.
⁷ Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA, USA.
⁸ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
⁹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
¹⁰ Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland.
¹¹ School of Biology and Environmental Science, University College Dublin,, Belfield, Dublin 4 D04 V1W8, Ireland.
¹² Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA.
¹³ Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Carrer de la Mercè, 12, Ciutat Vella 08002 Barcelona, Spain.
¹⁴ Department of Life Sciences, Brunel University London, Kingston Lane, Uxbridge, London UB8 3PH, UK.
¹⁵ Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego12/14, 61-704 Poznan, Poland.
¹⁶ Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 220 Handan Road, Shanghai 200433, China.
¹⁷ Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK.
¹⁸ ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
¹⁹ Department of Genetics, Stanford University, Stanford, CA, USA.

PMID: 39565199
DOI: 10.1093/nar/gkae1078

Abstract

GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result. Meanwhile, we are incorporating data from state-of-the-art proteomics and Ribo-seq experiments to fine-tune our annotation of translated sequences, while further insights into function can be gained from multi-genome alignments that grow richer as more species' genomes are sequenced. Such methodologies are combined into a fully integrated annotation workflow. However, the increasing complexity of our resources can present usability challenges, and we are resolving these with the creation of filtered genesets such as MANE Select and GENCODE Primary. The next challenge is to propagate annotations throughout multiple human and mouse genomes, as we enter the pangenome era. Our resources are freely available at our web portal www.gencodegenes.org, and via the Ensembl and UCSC genome browsers.

Abstract

Grants and funding