The rapid advancement of high-content, single-cell technologies like robotic confocal microscopy with multiplexed dyes (morphological profiling) can be leveraged to reveal fundamental biology, ranging from microbial and abiotic stress to organ development. Specifically, heterogeneous cell systems can be perturbed genetically or with chemical treatments to allow for inference of causal mechanisms. An exciting strategy to navigate the high-dimensional space of possible perturbation and cell type combinations is to use generative models as priors to anticipate high-content outcomes in order to design informative experiments. Towards this goal, we present the Latent diffUsion for Multiplexed Images of Cells (LUMIC) framework that can generate high quality and high fidelity images of cells. LUMIC combines diffusion models with DINO (self-Distillation with NO labels), a vision-transformer based, self-supervised method that can be trained on images to learn feature embeddings, and HGraph2Graph, a hierarchical graph encoder-decoder to represent chemicals. To demonstrate the ability of LUMIC to generalize across cell lines and treatments, we apply it to a dataset of ~27,000 images of two cell lines treated with 306 chemicals and stained with three dyes from the JUMP Pilot dataset and a newly-generated dataset of ~3,000 images of five cell lines treated with 61 chemicals and stained with three dyes. To quantify prediction quality, we evaluate the DINO embeddings, Kernel Inception Distance (KID) score, and recovery of morphological feature distributions. LUMIC significantly outperforms previous methods and generates realistic out-of-sample images of cells across unseen compounds and cell types.