Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Esling, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09792  [pdf, other

    cs.LG cs.SD eess.AS

    Unsupervised Composable Representations for Audio

    Authors: Giovanni Bindi, Philippe Esling

    Abstract: Current generative models are able to generate high-quality artefacts but have been shown to struggle with compositional reasoning, which can be defined as the ability to generate complex structures from simpler elements. In this paper, we focus on the problem of compositional representation learning for music data, specifically targeting the fully-unsupervised setting. We propose a simple and ext… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ISMIR 2024

  2. arXiv:2408.00196  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Combining audio control and style transfer using latent diffusion

    Authors: Nils Demerlé, Philippe Esling, Guillaume Doras, David Genova

    Abstract: Deep generative models are now able to synthesize high-quality audio signals, shifting the critical aspect in their development from audio quality to control capabilities. Although text-to-music generation is getting largely adopted by the general public, explicit control and example-based style transfer are more adequate modalities to capture the intents of artists and musicians. In this paper,… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: ISMIR 2024

    Journal ref: Proceedings of the 25th Int. Society for Music Information Retrieval Conference, San Francisco, United States, 2024

  3. arXiv:2302.13542  [pdf, other

    cs.SD cs.LG eess.AS

    Continuous descriptor-based control for deep audio synthesis

    Authors: Ninon Devis, Nils Demerlé, Sarah Nabi, David Genova, Philippe Esling

    Abstract: Despite significant advances in deep models for music generation, the use of these techniques remains restricted to expert users. Before being democratized among musicians, generative models must first provide expressive control over the generation, as this conditions the integration of deep generative models in creative workflows. In this paper, we tackle this issue by introducing a deep generati… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: ICASSP 2023

  4. arXiv:2301.12662  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    SingSong: Generating musical accompaniments from singing

    Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

    Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  5. arXiv:2211.08861  [pdf, other

    cs.LG stat.ML

    Creative divergent synthesis with generative models

    Authors: Axel Chemla--Romeu-Santos, Philippe Esling

    Abstract: Machine learning approaches now achieve impressive generation capabilities in numerous domains such as image, audio or video. However, most training \& evaluation frameworks revolve around the idea of strictly modelling the original data distribution rather than trying to extrapolate from it. This precludes the ability of such models to diverge from the original distribution and, hence, exhibit so… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  6. arXiv:2211.08856  [pdf, other

    stat.ML cs.LG stat.AP

    Challenges in creative generative models for music: a divergence maximization perspective

    Authors: Axel Chemla--Romeu-Santos, Philippe Esling

    Abstract: The development of generative Machine Learning (ML) models in creative practices, enabled by the recent improvements in usability and availability of pre-trained models, is raising more and more interest among artists, practitioners and performers. Yet, the introduction of such techniques in artistic domains also revealed multiple limitations that escape current evaluation methods used by scientis… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: to be published in AI Music Creativity Conference proceedings (AIMC2022)

  7. arXiv:2204.07064  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Streamable Neural Audio Synthesis With Non-Causal Convolutions

    Authors: Antoine Caillon, Philippe Esling

    Abstract: Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges.… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  8. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  9. arXiv:2111.05011  [pdf, other

    cs.LG cs.SD eess.AS

    RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

    Authors: Antoine Caillon, Philippe Esling

    Abstract: Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are either computationally intensive, rely on low sampling rates, are complicated to control or restrict the nature of possible signals. Among those models, Variational A… ▽ More

    Submitted 15 December, 2021; v1 submitted 9 November, 2021; originally announced November 2021.

  10. arXiv:2109.03454  [pdf, other

    cs.LG cs.SD eess.AS

    Signal-domain representation of symbolic music for learning embedding spaces

    Authors: Mathieu Prang, Philippe Esling

    Abstract: A key aspect of machine learning models lies in their ability to learn efficient intermediate features. However, the input representation plays a crucial role in this process, and polyphonic musical scores remain a particularly complex type of information. In this paper, we introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal. We eva… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Journal ref: The 2020 Joint Conference on AI Music Creativity, Oct 2020, Stockholm, Sweden

  11. arXiv:2107.02621  [pdf, other

    cs.LG cs.SD eess.AS

    Energy Consumption of Deep Generative Audio Models

    Authors: Constance Douwes, Philippe Esling, Jean-Pierre Briot

    Abstract: In most scientific domains, the deep learning community has largely focused on the quality of deep generative models, resulting in highly accurate and successful solutions. However, this race for quality comes at a tremendous computational cost, which incurs vast energy consumption and greenhouse gas emissions. At the heart of this problem are the measures that we use as a scientific community to… ▽ More

    Submitted 13 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: 5 pages, 2 figures, ICASSP 2022

  12. arXiv:2104.07519  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    Spectrogram Inpainting for Interactive Generation of Instrument Sounds

    Authors: Théis Bazin, Gaëtan Hadjeres, Philippe Esling, Mikhail Malt

    Abstract: Modern approaches to sound synthesis using deep neural networks are hard to control, especially when fine-grained conditioning information is not available, hindering their adoption by musicians. In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds. To this end, we propose a two-step appr… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 8 pages + references + appendices. 4 figures. Published as a conference paper at the The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, organized and hosted virtually by the Royal Institute of Technology (KTH), Stockholm, Sweden

    Journal ref: Proceedings of the 1st Joint Conference on AI Music Creativity, 2020 (p. 10). Stockholm, Sweden: AIMC

  13. arXiv:2008.05959  [pdf, ps, other

    cs.CY cs.AI cs.HC cs.LG

    Creativity in the era of artificial intelligence

    Authors: Philippe Esling, Ninon Devis

    Abstract: Creativity is a deeply debated topic, as this concept is arguably quintessential to our humanity. Across different epochs, it has been infused with an extensive variety of meanings relevant to that era. Along these, the evolution of technology have provided a plurality of novel tools for creative purposes. Recently, the advent of Artificial Intelligence (AI), through deep learning approaches, have… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Keynote paper - JIM Conference 2020 - 12 pages

  14. arXiv:2008.01393  [pdf, other

    cs.SD cs.LG eess.AS

    Neural Granular Sound Synthesis

    Authors: Adrien Bitton, Philippe Esling, Tatsuya Harada

    Abstract: Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows. In order to control the synthesis, all grains in a given corpus are analyzed through a set of acoustic descriptors. This provides a representation reflecting some form of local similarities across the grains. However, the quality of this grain space is bound by that of the des… ▽ More

    Submitted 3 July, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: presented for ICMC 2021 (2020 postponed)

  15. arXiv:2008.01370  [pdf

    cs.SD cs.LG eess.AS

    Timbre latent space: exploration and creative aspects

    Authors: Antoine Caillon, Adrien Bitton, Brice Gatinet, Philippe Esling

    Abstract: Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders. They enable high-quality sound synthesis but a limited control since the latent spaces do not disentangle timbre properties. The emergence of disentangled representations was studied in Variational Auto-Encoders (VAEs), and has been applied to audio. Using an additional perceptual… ▽ More

    Submitted 17 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

  16. arXiv:2007.16187  [pdf, other

    cs.LG cs.IR cs.MM cs.SD eess.AS stat.ML

    Ultra-light deep MIR by trimming lottery tickets

    Authors: Philippe Esling, Theis Bazin, Adrien Bitton, Tristan Carsault, Ninon Devis

    Abstract: Current state-of-the-art results in Music Information Retrieval are largely dominated by deep learning approaches. These provide unprecedented accuracy across all tasks. However, the consistently overlooked downside of these models is their stunningly massive complexity, which seems concomitantly crucial to their success. In this paper, we address this issue by proposing a model pruning method bas… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: 8 pages, 2 figures. 21st International Society for Music Information Retrieval Conference 11-15 October 2020, Montreal, Canada

  17. arXiv:2007.16170  [pdf, other

    cs.LG cs.MM cs.SD eess.AS stat.ML

    Diet deep generative audio models with structured lottery

    Authors: Philippe Esling, Ninon Devis, Adrien Bitton, Antoine Caillon, Axel Chemla--Romeu-Santos, Constance Douwes

    Abstract: Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect is especially crit… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: 8 pages, 5 figures. Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8-12, 2020

  18. arXiv:2007.06349  [pdf, other

    eess.AS cs.LG

    Vector-Quantized Timbre Representation

    Authors: Adrien Bitton, Philippe Esling, Tatsuya Harada

    Abstract: Timbre is a set of perceptual attributes that identifies different types of sound sources. Although its definition is usually elusive, it can be seen from a signal processing viewpoint as all the spectral features that are perceived independently from pitch and loudness. Some works have studied high-level timbre synthesis by analyzing the feature relationships of different instruments, but acousti… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  19. arXiv:2002.03862  [pdf, other

    stat.ML cs.LG

    Cross-modal variational inference for bijective signal-symbol translation

    Authors: Axel Chemla--Romeu-Santos, Stavros Ntalampiras, Philippe Esling, Goffredo Haus, Gérard Assayag

    Abstract: Extraction of symbolic information from signals is an active field of research enabling numerous applications especially in the Musical Information Retrieval domain. This complex task, that is also related to other topics such as pitch extraction or instrument recognition, is a demanding subject that gave birth to numerous approaches, mostly based on advanced signal processing-based algorithms. Ho… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-2019)

  20. arXiv:1911.04973  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Using musical relationships between chord labels in automatic chord extraction tasks

    Authors: Tristan Carsault, Jérôme Nika, Philippe Esling

    Abstract: Recent researches on Automatic Chord Extraction (ACE) have focused on the improvement of models based on machine learning. However, most models still fail to take into account the prior knowledge underlying the labeling alphabets (chord labels). Furthermore, recent works have shown that ACE performances are converging towards a glass ceiling. Therefore, this prompts the need to focus on other aspe… ▽ More

    Submitted 14 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Accepted for publication in ISMIR, 2018

  21. arXiv:1911.04972  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network

    Authors: Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii

    Abstract: This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulat… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Accepted for publication in MLSP, 2019

  22. arXiv:1907.02637  [pdf, other

    cs.SD cs.LG eess.AS

    Neural Drum Machine : An Interactive System for Real-time Synthesis of Drum Sounds

    Authors: Cyran Aouameur, Philippe Esling, Gaëtan Hadjeres

    Abstract: In this work, we introduce a system for real-time generation of drum sounds. This system is composed of two parts: a generative model for drum sounds together with a Max4Live plugin providing intuitive controls on the generative process. The generative model consists of a Conditional Wasserstein autoencoder (CWAE), which learns to generate Mel-scaled magnitude spectrograms of short percussion samp… ▽ More

    Submitted 13 November, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

    Comments: 8 pages, accepted at the International Conference on Computational Creativity 2019

    MSC Class: 68T99

  23. arXiv:1907.00971  [pdf, other

    cs.LG cs.HC cs.MM cs.SD eess.AS stat.ML

    Universal audio synthesizer control with normalizing flows

    Authors: Philippe Esling, Naotake Masuda, Adrien Bardet, Romeo Despres, Axel Chemla--Romeu-Santos

    Abstract: The ubiquity of sound synthesizers has reshaped music production and even entirely defined new music genres. However, the increasing complexity and number of parameters in modern synthesizers make them harder to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Here, we introduce a novel formulation of audio synthesizer control. We… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: DaFX 2019

  24. arXiv:1904.06215  [pdf, other

    cs.SD cs.LG eess.AS

    Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

    Authors: Adrien Bitton, Philippe Esling, Antoine Caillon, Martin Fouilleul

    Abstract: Generative models have thrived in computer vision, enabling unprecedented image processes. Yet the results in audio remain less advanced. Our project targets real-time sound synthesis from a reduced set of high-level parameters, including semantic controls that can be adapted to different sound libraries and specific tags. These generative variables should allow expressive modulations of target mu… ▽ More

    Submitted 22 June, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: this article has been accepted for presentation to the 22nd International Conference on Digital Audio Effects (DAFx 2019) ; we provide additional content on this companion repository https://github.com/acids-ircam/Expressive_WAE_FADER

  25. arXiv:1810.08611  [pdf, other

    cs.SD cs.LG eess.AS

    A database linking piano and orchestral MIDI scores with application to automatic projective orchestration

    Authors: Léopold Crestel, Philippe Esling, Lena Heng, Stephen McAdams

    Abstract: This article introduces the Projective Orchestral Database (POD), a collection of MIDI scores composed of pairs linking piano scores to their corresponding orchestrations. To the best of our knowledge, this is the first database of its kind, which performs piano or orchestral prediction, but more importantly which tries to learn the correlations between piano and orchestral scores. Hence, we also… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  26. arXiv:1810.00222  [pdf, other

    cs.SD eess.AS

    Modulated Variational auto-Encoders for many-to-many musical timbre transfer

    Authors: Adrien Bitton, Philippe Esling, Axel Chemla-Romeu-Santos

    Abstract: Generative models have been successfully applied to image style transfer and domain translation. However, there is still a wide gap in the quality of results when learning such tasks on musical audio. Furthermore, most translation models only enable one-to-one or one-to-many transfer by relying on separate encoders or decoders and complex, computationally-heavy models. In this paper, we introduce… ▽ More

    Submitted 29 September, 2018; originally announced October 2018.

  27. arXiv:1805.08501  [pdf, other

    cs.SD eess.AS

    Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

    Authors: Philippe Esling, Axel Chemla--Romeu-Santos, Adrien Bitton

    Abstract: Timbre spaces have been used in music perception to study the perceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel examples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an… ▽ More

    Submitted 1 October, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Digital Audio Conference (DaFX 2018)

  28. arXiv:1609.01203  [pdf, other

    cs.LG

    Live Orchestral Piano, a system for real-time orchestral music generation

    Authors: Léopold Crestel, Philippe Esling

    Abstract: This paper introduces the first system for performing automatic orchestration based on a real-time piano input. We believe that it is possible to learn the underlying regularities existing between piano scores and their orchestrations by notorious composers, in order to automatically perform this task on novel piano inputs. To that end, we investigate a class of statistical inference models called… ▽ More

    Submitted 18 May, 2017; v1 submitted 5 September, 2016; originally announced September 2016.