Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Hsu, Kyle; Hamid, Jubayer Ibn; Burns, Kaylee; Finn, Chelsea; Wu, Jiajun

Computer Science > Machine Learning

arXiv:2404.10282 (cs)

[Submitted on 16 Apr 2024 (v1), last revised 24 May 2024 (this version, v2)]

Title:Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Authors:Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu

View PDF HTML (experimental)

Abstract:Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.

Comments:	ICML 2024 camera-ready. 22 pages, 10 figures, code available at this https URL
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.10282 [cs.LG]
	(or arXiv:2404.10282v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.10282

Submission history

From: Kyle Hsu [view email]
[v1] Tue, 16 Apr 2024 04:52:41 UTC (14,861 KB)
[v2] Fri, 24 May 2024 20:52:02 UTC (14,871 KB)

Computer Science > Machine Learning

Title:Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators