The prevalence of concomitant proteinopathies and heterogeneous clinical symptoms in neurodegenerative diseases hinders the identification of individuals who might be candidates for a particular intervention. Here, by applying an unsupervised clustering algorithm to post-mortem histopathological data from 895 patients with degeneration in the central nervous system, we show that six non-overlapping disease clusters can simultaneously account for tau neurofibrillary tangles, α-synuclein inclusions, neuritic plaques, inclusions of the transcriptional repressor TDP-43, angiopathy, neuron loss and gliosis. We also show that membership to the six transdiagnostic disease clusters, which explains more variance in cognitive phenotypes than can be explained by individual diagnoses, can be accurately predicted from scores of the Mini-Mental Status Exam, protein levels in cerebrospinal fluid, and genotype at the APOE and MAPT loci, via cross-validated multiple logistic regression. This combination of unsupervised and supervised data-driven tools provides a framework that could be used to identify latent disease subtypes in other areas of medicine.