ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

EM Bakr, L Zhao, VT Hu, M Cord, P Perez… - arXiv preprint arXiv …, 2023 - arxiv.org
arXiv preprint arXiv:2311.14542, 2023arxiv.org
Diffusion-based generative models excel in perceptually impressive synthesis but face
challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D
diffusion image-synthesis framework inspired by the human generation system. Unlike
traditional diffusion models with opaque denoising steps, our approach decomposes the
generation process into simpler, interpretable stages; generating contours, a palette, and a
detailed colored image. This not only enhances overall performance but also enables robust …
Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible.
arxiv.org