Semi-supervised emotion-driven music generation model based on category-dispersed Gaussian Mixture Variational Autoencoders

Zihao Ning; Xiao Han; Jie Pan

doi:10.1371/journal.pone.0311541

Semi-supervised emotion-driven music generation model based on category-dispersed Gaussian Mixture Variational Autoencoders

PLoS One. 2024 Dec 30;19(12):e0311541. doi: 10.1371/journal.pone.0311541. eCollection 2024.

Authors

Zihao Ning¹, Xiao Han², Jie Pan¹

Affiliations

¹ Communication University of China, Nanjing, China.
² Nanjing University of Aeronautics and Astronautics, Nanjing, China.

Abstract

Existing emotion-driven music generation models heavily rely on labeled data and lack interpretability and controllability of emotions. To address these limitations, a semi-supervised emotion-driven music generation model based on category-dispersed Gaussian mixture variational autoencoders is proposed. Initially, a controllable music generation model is introduced, which disentangles and manipulates rhythm and tonal features, enabling controlled music generation. Building on this, a semi-supervised model is developed, leveraging a category-dispersed Gaussian mixture variational autoencoder to infer emotions from the latent representations of rhythm and tonal features. Finally, the objective loss function is optimized to enhance the separation of distinct emotional clusters. Experimental results on real-world datasets demonstrate that the proposed method effectively separates music with different emotions in the latent space, thereby strengthening the association between music and emotions. Additionally, the model successfully disentangles and separates various musical features, facilitating more accurate emotion-driven music generation and emotion transitions through feature manipulation.

Copyright: © 2024 Ning et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms
Emotions* / physiology
Humans
Models, Theoretical
Music* / psychology
Normal Distribution

Grants and funding

The author(s) received no specific funding for this work.