Neural audio synthesis of musical notes with wavenet autoencoders

J Engel, C Resnick, A Roberts… - International …, 2017 - proceedings.mlr.press
Generative models in vision have seen rapid progress due to algorithmic improvements and
the availability of high-quality image datasets. In this paper, we offer contributions in both …

Sing: Symbol-to-instrument neural generator

A Défossez, N Zeghidour, N Usunier… - Advances in neural …, 2018 - proceedings.neurips.cc
Recent progress in deep learning for audio synthesis opens the way to models that directly
produce the waveform, shifting away from the traditional paradigm of relying on vocoders or …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

A Caillon, P Esling - arXiv preprint arXiv:2111.05011, 2021 - arxiv.org
Deep generative models applied to audio have improved by a large margin the state-of-the-
art in many speech and music related tasks. However, as raw waveform modelling remains …

Neural waveshaping synthesis

B Hayes, C Saitis, G Fazekas - arXiv preprint arXiv:2107.05050, 2021 - arxiv.org
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal
approach to neural audio synthesis which operates directly in the waveform domain, with an …

A generative model for raw audio using transformer architectures

P Verma, C Chafe - … Conference on Digital Audio Effects (DAFx …, 2021 - ieeexplore.ieee.org
This paper proposes a novel way of doing audio synthesis at the waveform level using
Transformer architectures. We propose a deep neural network for generating waveforms …

Gansynth: Adversarial neural audio synthesis

J Engel, KK Agrawal, S Chen, I Gulrajani… - arXiv preprint arXiv …, 2019 - arxiv.org
Efficient audio synthesis is an inherently difficult machine learning task, as human
perception is sensitive to both global structure and fine-scale waveform coherence …

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

DDSP: Differentiable digital signal processing

J Engel, L Hantrakul, C Gu, A Roberts - arXiv preprint arXiv:2001.04643, 2020 - arxiv.org
Most generative models of audio directly generate samples in one of two domains: time or
frequency. While sufficient to express any signal, these representations are inefficient, as …

MidiNet: A convolutional generative adversarial network for symbolic-domain music generation

LC Yang, SY Chou, YH Yang - arXiv preprint arXiv:1703.10847, 2017 - arxiv.org
Most existing neural network models for music generation use recurrent neural networks.
However, the recent WaveNet model proposed by DeepMind shows that convolutional …