Semi-supervised emotion-driven music generation model based on category-dispersed Gaussian Mixture Variational Autoencoders

PLoS One. 2024 Dec 30;19(12):e0311541. doi: 10.1371/journal.pone.0311541. eCollection 2024.

Abstract

Existing emotion-driven music generation models heavily rely on labeled data and lack interpretability and controllability of emotions. To address these limitations, a semi-supervised emotion-driven music generation model based on category-dispersed Gaussian mixture variational autoencoders is proposed. Initially, a controllable music generation model is introduced, which disentangles and manipulates rhythm and tonal features, enabling controlled music generation. Building on this, a semi-supervised model is developed, leveraging a category-dispersed Gaussian mixture variational autoencoder to infer emotions from the latent representations of rhythm and tonal features. Finally, the objective loss function is optimized to enhance the separation of distinct emotional clusters. Experimental results on real-world datasets demonstrate that the proposed method effectively separates music with different emotions in the latent space, thereby strengthening the association between music and emotions. Additionally, the model successfully disentangles and separates various musical features, facilitating more accurate emotion-driven music generation and emotion transitions through feature manipulation.

MeSH terms

  • Algorithms
  • Emotions* / physiology
  • Humans
  • Models, Theoretical
  • Music* / psychology
  • Normal Distribution

Grants and funding

The author(s) received no specific funding for this work.