Although machine learning bears enormous potential to accelerate developments in homogeneous catalysis, the frequent need for extensive experimental data can be a bottleneck for implementation. Here, we report an unsupervised machine learning workflow that uses only five experimental data points. It makes use of generalized parameter databases that are complemented with problem-specific in silico data acquisition and clustering. We showcase the power of this strategy for the challenging problem of speciation of palladium (Pd) catalysts, for which a mechanistic rationale is currently lacking. From a total space of 348 ligands, the algorithm predicted, and we experimentally verified, a number of phosphine ligands (including previously never synthesized ones) that give dinuclear Pd(I) complexes over the more common Pd(0) and Pd(II) species.