Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Manilow, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2301.12662  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    SingSong: Generating musical accompaniments from singing

    Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

    Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  2. arXiv:2212.08038  [pdf, ps, other

    cs.CY

    Redefining Relationships in Music

    Authors: Christian Detweiler, Beth Coleman, Fernando Diaz, Lieke Dom, Chris Donahue, Jesse Engel, Cheng-Zhi Anna Huang, Larry James, Ethan Manilow, Amanda McCroskery, Kyle Pedersen, Pamela Peter-Agbia, Negar Rostamzadeh, Robert Thomas, Marco Zamarato, Ben Zevenbergen

    Abstract: AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse)… ▽ More

    Submitted 16 December, 2022; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Presented at Cultures in AI/AI in Culture workshop at NeurIPS 2022

  3. arXiv:2209.14458  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling

    Authors: Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel

    Abstract: Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck using generative modeling. By pipelining a generative model of notes (Coconet trained on Bach Chorales) with a structured synthesis model of chamber ensembles (… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  4. arXiv:2208.12387  [pdf, other

    cs.SD cs.LG eess.AS

    Music Separation Enhancement with Generative Modeling

    Authors: Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo

    Abstract: Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-a… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022

  5. arXiv:2206.05408  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-instrument Music Synthesis with Spectrogram Diffusion

    Authors: Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

    Abstract: An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generat… ▽ More

    Submitted 12 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

  6. arXiv:2203.15140  [pdf, other

    cs.SD eess.AS

    Improving Source Separation by Explicitly Modeling Dependencies Between Sources

    Authors: Ethan Manilow, Curtis Hawthorne, Cheng-Zhi Anna Huang, Bryan Pardo, Jesse Engel

    Abstract: We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all combinations of sources in a mixture. Rather than independently estimating each source from a mix, we reframe the source separation problem as an Orderless Neural Autoregressive Density Estimator (NADE), and estimate each source from both the mix and a random s… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: To appear at ICASSP 2022

  7. arXiv:2112.09312  [pdf, other

    cs.SD cs.LG eess.AS

    MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

    Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

    Abstract: Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments… ▽ More

    Submitted 17 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2022

  8. arXiv:2111.03017  [pdf, other

    cs.SD cs.LG eess.AS

    MT3: Multi-Task Multitrack Music Transcription

    Authors: Josh Gardner, Ian Simon, Ethan Manilow, Curtis Hawthorne, Jesse Engel

    Abstract: Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding. Unlike Automatic Speech Recognition (ASR), which typically focuses on the words of a single speaker, AMT often requires transcribing multiple instruments simultaneously, all while preserving fine-scale pitch and timing information. Further, many AMT datasets are "l… ▽ More

    Submitted 15 March, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: ICLR 2022 camera-ready version

  9. arXiv:2110.13323  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit

    Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo

    Abstract: We present a software framework that integrates neural networks into the popular open-source audio editing software, Audacity, with a minimal amount of developer effort. In this paper, we showcase some example use cases for both end-users and neural network developers. We hope that this work fosters a new level of interactivity between deep learning practitioners and end-users.

    Submitted 28 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

  10. arXiv:2110.13071  [pdf, other

    cs.SD cs.LG eess.AS

    Unsupervised Source Separation By Steering Pretrained Music Models

    Authors: Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

    Abstract: We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a latent encoding of the audio used to generate audio. This generated audio is fed to a pretrained music tagger that creates source labels. The cross-entropy loss be… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  11. arXiv:2107.09142  [pdf, other

    cs.SD cs.LG eess.AS

    Sequence-to-Sequence Piano Transcription with Transformers

    Authors: Curtis Hawthorne, Ian Simon, Rigel Swavely, Ethan Manilow, Jesse Engel

    Abstract: Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/output representations, and complex decoding schemes. In this work, we show that equivalent performance can be achieved using a generic encoder-decoder Transformer… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

  12. arXiv:2107.07029  [pdf, other

    cs.SD cs.LG eess.AS

    Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition

    Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Bryan Pardo

    Abstract: Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data. In this work, we exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments, given a few examples at inference. We apply a hierarchical loss function to the training of prototypical… ▽ More

    Submitted 29 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

  13. arXiv:2009.13729  [pdf, other

    cs.SD cs.LG eess.AS

    Bespoke Neural Networks for Score-Informed Source Separation

    Authors: Ethan Manilow, Bryan Pardo

    Abstract: In this paper, we introduce a simple method that can separate arbitrary musical instruments from an audio mixture. Given an unaligned MIDI transcription for a target instrument from an input mixture, we synthesize new mixtures from the midi transcription that sound similar to the mixture to be separated. This lets us create a labeled training set to train a network on the specific bespoke task. Wh… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: ISMIR 2020 - Late Breaking Demo

  14. arXiv:2009.02051  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Towards Musically Meaningful Explanations Using Source Separation

    Authors: Verena Haunschmid, Ethan Manilow, Gerhard Widmer

    Abstract: Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks. Such models are usually considered "black boxes", meaning that their predictions are not interpretable. Prior work on explainable models in MIR has generally used image processing tools to produce explanations for DNN predictions, but these are not necessarily musically meaningful, or… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: 6+2 pages, 4 figures; Submitted to International Society for Music Information Retrieval Conference 2020

  15. arXiv:2008.00582  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    audioLIME: Listenable Explanations Using Source Separation

    Authors: Verena Haunschmid, Ethan Manilow, Gerhard Widmer

    Abstract: Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks but their predictions are usually not interpretable. We propose audioLIME, a method based on Local Interpretable Model-agnostic Explanations (LIME) extended by a musical definition of locality. The perturbations used in LIME are created by switching on/off components extracted by sourc… ▽ More

    Submitted 7 September, 2020; v1 submitted 2 August, 2020; originally announced August 2020.

    Comments: In The 13th International Workshop on Machine Learning and Music, ECML-PKDD 2020

  16. arXiv:1910.12621  [pdf, other

    eess.AS cs.LG cs.SD

    Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments

    Authors: Ethan Manilow, Prem Seetharaman, Bryan Pardo

    Abstract: We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. This novel architecture, which we call Cerberus, builds on the Chimera network for source separation by add… ▽ More

    Submitted 12 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP 2020

  17. arXiv:1909.08494  [pdf, other

    cs.SD cs.LG eess.AS

    Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity

    Authors: Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux

    Abstract: Music source separation performance has greatly improved in recent years with the advent of approaches based on deep learning. Such methods typically require large amounts of labelled training data, which in the case of music consist of mixtures and corresponding instrument stems. However, stems are unavailable for most commercial music, and only limited datasets have so far been released to the p… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted for publication at WASPAA 2019

  18. arXiv:1907.01160  [pdf, other

    cs.SD cs.CL cs.LG eess.AS stat.ML

    WHAM!: Extending Speech Separation to Noisy Environments

    Authors: Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux

    Abstract: Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we st… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: Accepted for publication at Interspeech 2019