Zum Hauptinhalt springen

Showing 1–50 of 58 results for author: Richard, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.03795  [pdf, other

    cs.AI

    Frank's triangular norms in Piaget's logical proportions

    Authors: Henri Prade, Gilles Richard

    Abstract: Starting from the Boolean notion of logical proportion in Piaget's sense, which turns out to be equivalent to analogical proportion, this note proposes a definition of analogical proportion between numerical values based on triangular norms (and dual co-norms). Frank's family of triangular norms is particularly interesting from this perspective. The article concludes with a comparative discussion… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages

  2. arXiv:2407.15580  [pdf, other

    cs.LG cs.SD eess.AS math.PR stat.ML

    Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

    Authors: David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard

    Abstract: We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-takes-all (WTA) scheme, which promotes the diversity of the predictions. However, this scheme may converge toward an arbitrarily suboptimal local minim… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.08657  [pdf, other

    cs.SD eess.AS eess.SP

    Speech dereverberation constrained on room impulse response characteristics

    Authors: Louis Bahrman, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard

    Abstract: Single-channel speech dereverberation aims at extracting a dry speech signal from a recording affected by the acoustic reflections in a room. However, most current deep learning-based approaches for speech dereverberation are not interpretable for room acoustics, and can be considered as black-box systems in that regard. In this work, we address this problem by regularizing the training loss using… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Journal ref: INTERSPEECH, Sep 2024, Kos Island, Greece

  4. arXiv:2406.14150  [pdf, other

    cs.LG

    Multi-modal Transfer Learning between Biological Foundation Models

    Authors: Juan Jose Garau-Luis, Patrick Bordes, Liam Gonzalez, Masa Roller, Bernardo P. de Almeida, Lorenz Hexemer, Christopher Blum, Stefan Laurent, Jan Grzegorzewski, Maren Lang, Thomas Pierrot, Guillaume Richard

    Abstract: Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence moda… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    MSC Class: 68T07 (Primary)

  5. arXiv:2406.04706  [pdf, other

    cs.LG cs.NE eess.SP math.PR stat.ML

    Winner-takes-all learners are geometry-aware conditional density estimators

    Authors: Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

    Abstract: Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypothe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning, Jul 2024, Vienne (Autriche), Austria

  6. arXiv:2402.15516  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model

    Authors: Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gael Richard

    Abstract: Diffusion models are receiving a growing interest for a variety of signal generation tasks such as speech or music synthesis. WaveGrad, for example, is a successful diffusion model that conditionally uses the mel spectrogram to guide a diffusion process for the generation of high-fidelity audio. However, such models face important challenges concerning the noise diffusion process for training and… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea

  7. arXiv:2402.13301  [pdf, other

    cs.SD cs.AI eess.AS

    Structure-informed Positional Encoding for Music Generation

    Authors: Manvi Agarwal, Changhong Wang, Gaël Richard

    Abstract: Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary po… ▽ More

    Submitted 28 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024, Seoul, South Korea

  8. arXiv:2402.01753  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

    Authors: Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, Jonathan Le Roux, Gael Richard

    Abstract: Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: Accepted at ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea

  9. arXiv:2401.05064  [pdf, other

    cs.SD cs.LG eess.AS

    Singer Identity Representation Learning using Self-Supervised Techniques

    Authors: Bernardo Torres, Stefan Lattner, Gaël Richard

    Abstract: Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted at the ISMIR conference, Milan, Italy, 2023

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

  10. arXiv:2312.14507  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport

    Authors: Bernardo Torres, Geoffroy Peeters, Gaël Richard

    Abstract: In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory th… ▽ More

    Submitted 15 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted in ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul, South Korea

  11. arXiv:2311.01052  [pdf, other

    stat.ML cs.LG

    Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

    Authors: Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard

    Abstract: We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existi… ▽ More

    Submitted 16 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Journal ref: Advances in neural information processing systems, Dec 2023, New Orleans, United States

  12. arXiv:2307.10936  [pdf, other

    cs.AI cs.LG

    PASTA: Pretrained Action-State Transformer Agents

    Authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot

    Abstract: Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models… ▽ More

    Submitted 4 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  13. arXiv:2307.10834  [pdf, other

    eess.AS cs.SD

    Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

    Authors: Changhong Wang, Gaël Richard, Brian McFee

    Abstract: Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allo… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 7 pages, 3 figures, accepted to the conference of the International Society for Music Information Retrieval (ISMIR 2023)

  14. arXiv:2306.07187  [pdf, other

    cs.MM cs.IR cs.LG cs.SD eess.AS

    Video-to-Music Recommendation using Temporal Alignment of Segments

    Authors: Laure Prétet, Gaël Richard, Clément Souchier, Geoffroy Peeters

    Abstract: We study cross-modal recommendation of music tracks to be used as soundtracks for videos. This problem is known as the music supervision task. We build on a self-supervised system that learns a content association between music and video. In addition to the adequacy of content, adequacy of structure is crucial in music supervision to obtain relevant recommendations. We propose a novel approach to… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: IEEE Transactions on Multimedia, 18 February 2022

  15. Gradient-Informed Quality Diversity for the Illumination of Discrete Spaces

    Authors: Raphael Boige, Guillaume Richard, Jérémie Dona, Thomas Pierrot, Antoine Cully

    Abstract: Quality Diversity (QD) algorithms have been proposed to search for a large collection of both diverse and high-performing solutions instead of a single set of local optima. While early QD algorithms view the objective and descriptor functions as black-box functions, novel tools have been introduced to use gradient information to accelerate the search and improve overall performance of those algori… ▽ More

    Submitted 13 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Journal ref: GECCO 2023 Proceedings of the Genetic and Evolutionary Computation Conference; Pages 119-128

  16. arXiv:2305.07132  [pdf, other

    cs.SD cs.LG eess.AS

    Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization

    Authors: Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Gaël Richard, Florence d'Alché-Buc

    Abstract: This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a n… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Under submission at IEEE/ACM TASLP. arXiv admin note: text overlap with arXiv:2202.11479

  17. The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

    Authors: Valentin Macé, Raphaël Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert

    Abstract: In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain enviro… ▽ More

    Submitted 13 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 10+7 pages

  18. arXiv:2301.04134  [pdf, other

    cs.LG cs.AI

    Analogical Relevance Index

    Authors: Suryani Lim, Henri Prade, Gilles Richard

    Abstract: Focusing on the most significant features of a dataset is useful both in machine learning (ML) and data mining. In ML, it can lead to a higher accuracy, a faster learning process, and ultimately a simpler and more understandable model. In data mining, identifying significant features is essential not only for gaining a better understanding of the data but also for visualization. In this paper, we… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

    Comments: 14 pages, 5 figures, 6 tables

  19. arXiv:2212.11717  [pdf, ps, other

    cs.AI

    Some recent advances in reasoning based on analogical proportions

    Authors: Myriam Bounhas, Henri Prade, Gilles Richard

    Abstract: Analogical proportions compare pairs of items (a, b) and (c, d) in terms of their differences and similarities. They play a key role in the formalization of analogical inference. The paper first discusses how to improve analogical inference in terms of accuracy and in terms of computational cost. Then it indicates the potential of analogical proportions for explanation. Finally, it highlights the… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: 11 pages

  20. arXiv:2212.07531  [pdf, other

    cs.CY

    A Reverse Engineering Education Needs Analysis Survey

    Authors: Charles R. Barone IV, Robert Serafin, Ilya Shavrov, Ibrahim Baggili, Aisha Ali-Gombe, Golden G. Richard III, Andrew Case

    Abstract: This paper presents the results of a needs analysis survey for Reverse Engineering (RE). The need for reverse engineers in digital forensics, continues to grow as malware analysis becomes more complicated. The survey was created to investigate tools used in the cybersecurity industry, the methods for teaching RE and educational resources related to RE. Ninety-three (n=93) people responded to our 5… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  21. arXiv:2211.07250  [pdf, other

    cs.SD cs.LG eess.AS

    Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts

    Authors: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, Gaël Richard

    Abstract: As music has become more available especially on music streaming platforms, people have started to have distinct preferences to fit to their varying listening situations, also known as context. Hence, there has been a growing interest in considering the user's situation when recommending music to users. Previous works have proposed user-aware autotaggers to infer situation-related tags from music… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Published in ISMIR

  22. arXiv:2203.02474  [pdf, other

    stat.ML cs.IT cs.LG

    Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

    Authors: Milad Sefidgaran, Amin Gohari, Gaël Richard, Umut Şimşekli

    Abstract: Understanding generalization in modern machine learning settings has been one of the major challenges in statistical learning theory. In this context, recent years have witnessed the development of various generalization bounds suggesting different complexity notions such as the mutual information between the data sample and the algorithm output, compressibility of the hypothesis space, and the fr… ▽ More

    Submitted 29 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

  23. arXiv:2202.11479  [pdf, other

    cs.SD cs.LG eess.AS

    Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

    Authors: Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc, Gaël Richard

    Abstract: This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden la… ▽ More

    Submitted 24 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: Accepted at NeurIPS 2022

  24. Multi-Objective Quality Diversity Optimization

    Authors: Thomas Pierrot, Guillaume Richard, Karim Beguir, Antoine Cully

    Abstract: In this work, we consider the problem of Quality-Diversity (QD) optimization with multiple objectives. QD algorithms have been proposed to search for a large collection of both diverse and high-performing solutions instead of a single set of local optima. Thriving for diversity was shown to be useful in many industrial and robotics applications. On the other hand, most real-life problems exhibit s… ▽ More

    Submitted 31 May, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  25. arXiv:2201.09592  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Unsupervised Music Source Separation Using Differentiable Parametric Source Models

    Authors: Kilian Schulze-Forster, Gaël Richard, Liam Kelley, Clement S. J. Doire, Roland Badeau

    Abstract: Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised model-based deep learning approach to musical s… ▽ More

    Submitted 31 January, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: Revised version of the submission

  26. arXiv:2112.14072  [pdf, other

    astro-ph.GA cs.AI

    Unsupervised Domain Adaptation for Constraining Star Formation Histories

    Authors: Sankalp Gilda, Antoine de Mathelin, Sabine Bellstedt, Guillaume Richard

    Abstract: The prevalent paradigm of machine learning today is to use past observations to predict future ones. What if, however, we are interested in knowing the past given the present? This situation is indeed one that astronomers must contend with often. To understand the formation of our universe, we must derive the time evolution of the visible mass content of galaxies. However, to observe a complete st… ▽ More

    Submitted 26 August, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

    Comments: Accepted for oral presentation at the 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE). Journal article to follow

    Journal ref: Astronomy 2024, 3(3), 189-207

  27. arXiv:2108.01216  [pdf, other

    cs.SD eess.AS

    DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domai… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 2 tables, accepted to ISMIR2021

    Journal ref: 22nd International Society for Music Information Retrieval (ISMIR 2021)

  28. arXiv:2108.00970  [pdf, other

    cs.MM

    Is there a "language of music-video clips" ? A qualitative and quantitative study

    Authors: Laure Prétet, Gaël Richard, Geoffroy Peeters

    Abstract: Recommending automatically a video given a music or a music given a video has become an important asset for the audiovisual industry - with user-generated or professional content. While both music and video have specific temporal organizations, most current works do not consider those and only focus on globally recommending a media. As a first step toward the improvement of these recommendation sy… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  29. arXiv:2107.03317  [pdf, other

    cs.LG stat.ML

    Probabilistic semi-nonnegative matrix factorization: a Skellam-based framework

    Authors: Benoit Fuentes, Gaël Richard

    Abstract: We present a new probabilistic model to address semi-nonnegative matrix factorization (SNMF), called Skellam-SNMF. It is a hierarchical generative model consisting of prior components, Skellam-distributed hidden variables and observed data. Two inference algorithms are derived: Expectation-Maximization (EM) algorithm for maximum \emph{a posteriori} estimation and Variational Bayes EM (VBEM) for fu… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Submitted for publication

  30. arXiv:2107.03049  [pdf, other

    cs.LG

    ADAPT : Awesome Domain Adaptation Python Toolbox

    Authors: Antoine de Mathelin, Mounir Atiq, Guillaume Richard, Alejandro de la Concha, Mouad Yachouti, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

    Abstract: In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.… ▽ More

    Submitted 1 February, 2023; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: 11 pages, 6 figures

  31. arXiv:2106.03795  [pdf, other

    stat.ML cs.LG

    Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

    Authors: Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli

    Abstract: Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies have illustrated that even simple pruning strategies can be surprisingly effective, and several theoretical studies have shown that compressible networks (in specific senses) should achieve a low generalizat… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  32. arXiv:2105.08399  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Relative Positional Encoding for Transformers with Linear Complexity

    Authors: Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard

    Abstract: Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requir… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: ICML 2021 (long talk) camera-ready. 24 pages

  33. arXiv:2105.01531  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

    Authors: Javier Nistal, Cyran Aouameur, Stefan Lattner, Gaël Richard

    Abstract: Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio… ▽ More

    Submitted 30 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure, 1 table; accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

    Journal ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

  34. arXiv:2104.14799  [pdf, other

    cs.MM

    Cross-Modal Music-Video Recommendation: A Study of Design Choices

    Authors: Laure Pretet, Gael Richard, Geoffroy Peeters

    Abstract: In this work, we study music/video cross-modal recommendation, i.e. recommending a music track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. More precisely, we jointly learn audio and video embeddings by using their co-occurren… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

  35. arXiv:2102.05749  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Self-Supervised VQ-VAE for One-Shot Music Style Transfer

    Authors: Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

    Abstract: Neural style transfer, allowing to apply the artistic style of one image to another, has become one of the most widely showcased computer vision applications shortly after its introduction. In contrast, related tasks in the music audio domain remained, until recently, largely untackled. While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot'… ▽ More

    Submitted 10 June, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: ICASSP 2021. Website: https://adasp.telecom-paris.fr/s/ss-vq-vae

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021) 96-100

  36. arXiv:2008.12073  [pdf, other

    eess.AS cs.SD

    DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks

    Authors: J. Nistal, S. Lattner, G. Richard

    Abstract: Synthetic creation of drum sounds (e.g., in drum machines) is commonly performed using analog or digital synthesis, allowing a musician to sculpt the desired timbre modifying various parameters. Typically, such parameters control low-level features of the sound and often have no musical meaning or perceptual correspondence. With the rise of Deep Learning, data-driven processing of audio emerges as… ▽ More

    Submitted 28 June, 2022; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st International Society for Music Information Retrieval (ISMIR2020)

  37. arXiv:2007.00186  [pdf, other

    cs.DC

    The Hermes BFT for Blockchains

    Authors: Mohammad M. Jalalzai, Chen Feng, Costas Busch, Golden G. Richard III, Jianyu Niu

    Abstract: The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process.Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol depend… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  38. arXiv:2006.09266  [pdf, other

    eess.AS cs.SD

    Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a full… ▽ More

    Submitted 17 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 5 pages, 1 figure, 5 tables, to be published in European Signal Processing Conference (EUSIPCO)

  39. arXiv:2006.08251  [pdf, other

    cs.LG stat.ML

    Adversarial Weighting for Domain Adaptation in Regression

    Authors: Antoine de Mathelin, Guillaume Richard, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

    Abstract: We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation under an assumption of covariate shift. The approach developed in this paper is based on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimiz… ▽ More

    Submitted 15 September, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 8 pages, 6 figures

  40. arXiv:2005.12977  [pdf, other

    cs.IR cs.CV cs.SD eess.AS

    Learning to rank music tracks using triplet loss

    Authors: Laure Prétet, Gaël Richard, Geoffroy Peeters

    Abstract: Most music streaming services rely on automatic recommendation algorithms to exploit their large music catalogs. These algorithms aim at retrieving a ranked list of music tracks based on their similarity with a target music track. In this work, we propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. To that aim, we propose several strat… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  41. arXiv:2005.06401  [pdf, other

    cs.OH cs.LG stat.ML

    Dyslexia and Dysgraphia prediction: A new machine learning approach

    Authors: Gilles Richard, Mathieu Serrurier

    Abstract: Learning disabilities like dysgraphia, dyslexia, dyspraxia, etc. interfere with academic achievements but have also long terms consequences beyond the academic time. It is widely admitted that between 5% to 10% of the world population is subject to this kind of disabilities. For assessing such disabilities in early childhood, children have to solve a battery of tests. Human experts score these tes… ▽ More

    Submitted 15 April, 2020; originally announced May 2020.

  42. arXiv:2002.03624  [pdf, other

    stat.ML cs.LG

    Autoencoder-based time series clustering with energy applications

    Authors: Guillaume Richard, Benoît Grossin, Guillaume Germaine, Georges Hébrail, Anne de Moliner

    Abstract: Time series clustering is a challenging task due to the specific nature of the data. Classical approaches do not perform well and need to be adapted either through a new distance measure or a data transformation. In this paper we investigate the combination of a convolutional autoencoder and a k-medoids algorithm to perfom time series clustering. The convolutional autoencoder allows to extract mea… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Journal ref: Conférence sur l'Apprentissage Automatique 2018

  43. arXiv:1912.00018  [pdf, other

    stat.ML cs.LG math.CA

    On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Ga… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1901.06053

  44. arXiv:1907.02265  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Supervised Symbolic Music Style Translation Using Synthetic Data

    Authors: Ondřej Cífka, Umut Şimşekli, Gaël Richard

    Abstract: Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style. More recently, several attempts have been made to extend such approaches to music (both symbolic and audio) in order to enable transforming musical style in a similar manner. In this study, we focus on symbolic music with the goal… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: ISMIR 2019 camera-ready

    Journal ref: Proceedings of the 20th International Society for Music Information Retrieval Conference (2019) 588-595

  45. arXiv:1906.09069  [pdf, other

    stat.ML cs.LG

    First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

    Abstract: Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $α$-stable distributions, a family… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

  46. arXiv:1903.04134  [pdf, other

    cs.DC

    Proteus: A Scalable BFT Consesus Protocol for Blockchains

    Authors: Mohammad M. Jalalzai, Costas Busch, Golden Richard III

    Abstract: Byzantine Fault Tolerant (BFT) consensus exhibits higher throughput in comparison to Proof of Work (PoW) in blockchains. But BFT-based protocols suffer from scalability problems with respect to the number of replicas in the network. The main reason for this limitation is the quadratic message complexity of BFT protocols. Previously, proposed solutions improve BFT performance for normal operation,… ▽ More

    Submitted 26 April, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

  47. arXiv:1901.07487  [pdf, other

    math.OC cs.LG stat.ML

    Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Gaël Richard

    Abstract: Recent studies on diffusion-based sampling methods have shown that Langevin Monte Carlo (LMC) algorithms can be beneficial for non-convex optimization, and rigorous theoretical guarantees have been proven for both asymptotic and finite-time regimes. Algorithmically, LMC-based algorithms resemble the well-known gradient descent (GD) algorithm, where the GD recursion is perturbed by an additive Gaus… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  48. arXiv:1811.04000  [pdf, other

    cs.CV cs.NE

    Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

    Authors: Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, Gaël Richard

    Abstract: We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

  49. arXiv:1806.02617  [pdf, other

    stat.ML cs.LG

    Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

    Authors: Umut Şimşekli, Çağatay Yıldız, Thanh Huy Nguyen, Gaël Richard, A. Taylan Cemgil

    Abstract: Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm i… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: Published in the International Conference on Machine Learning (ICML 2018)

  50. arXiv:1804.07345  [pdf, other

    cs.CV cs.SD eess.AS

    Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events

    Authors: Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Pérez, Gaël Richard

    Abstract: Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is traine… ▽ More

    Submitted 9 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.