Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Oore, S

.
  1. arXiv:2406.17229  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

    Authors: Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore

    Abstract: Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better util… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2406.16000  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Predicting Individual Depression Symptoms from Acoustic Features During Speech

    Authors: Sebastian Rodriguez, Sri Harsha Dumpala, Katerina Dikaios, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Current automatic depression detection systems provide predictions directly without relying on the individual symptoms/items of depression as denoted in the clinical depression rating scales. In contrast, clinicians assess each item in the depression rating scale in a clinical setting, thus implicitly providing a more detailed rationale for a depression diagnosis. In this work, we make a first ste… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.11171  [pdf, other

    cs.CV cs.CL cs.LG

    SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is n… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Added the dataset link to the abstract

    MSC Class: 68T45; 68T50 ACM Class: I.2.7; I.2.10

  4. arXiv:2406.02465  [pdf, other

    cs.LG cs.AI cs.CV

    An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders

    Authors: Scott C. Lowe, Joakim Bruslund Haurum, Sageev Oore, Thomas B. Moeslund, Graham W. Taylor

    Abstract: Can pretrained models generalize to new datasets without any retraining? We deploy pretrained image models on datasets they were not trained for, and investigate whether their embeddings form meaningful clusters. Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, deployed on image datasets that were not… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2404.16365  [pdf, other

    cs.CL cs.AI

    VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  6. arXiv:2404.05071  [pdf, other

    cs.LG cs.SD eess.AS

    Test-Time Training for Depression Detection

    Authors: Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore

    Abstract: Previous works on depression detection use datasets collected in similar environments to train and test the models. In practice, however, the train and test distributions cannot be guaranteed to be identical. Distribution shifts can be introduced due to variations such as recording environment (e.g., background noise) and demographics (e.g., gender, age, etc). Such distributional shifts can surpri… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  7. arXiv:2402.14285  [pdf, other

    cs.SD cs.LG eess.AS

    Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

    Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

    Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui… ▽ More

    Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 (Oral)

  8. arXiv:2402.03884  [pdf, other

    physics.optics cond-mat.mes-hall cond-mat.mtrl-sci

    Investigation of the Nonlinear Optical Frequency Conversion in Ultrathin Franckeite Heterostructures

    Authors: Alisson R. Cadore, Alexandre S. M. V. Ore, David Steinberg, Juan D. Zapata, Eunézio A. T. de Souza, Dario A. Bahamon, Christiano J. S. de Matos

    Abstract: Layered franckeite is a natural superlattice composed of two alternating layers of different compositions, SnS$_2$- and PbS-like. This creates incommensurability between the two species along the planes of the layers, resulting in spontaneous symmetry-break periodic ripples in the \textit{a}-axis orientation. Nevertheless, natural franckeite heterostructure has shown potential for optoelectronic a… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 27 pages, 5 figures. The following article has been accepted by the Journal of Applied Physics. After it is published, it will be found by DOI: 10.1063/5.0186615

  9. arXiv:2312.04690  [pdf, other

    cs.HC cs.AI cs.SD eess.AS

    SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

    Authors: Stephen Brade, Bryan Wang, Mauricio Sousa, Gregory Lee Newsome, Sageev Oore, Tovi Grossman

    Abstract: Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe -- a fullstack system that uses multimodal deep learning to let users express their… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  10. arXiv:2309.10930  [pdf, other

    cs.SD cs.LG eess.AS

    Test-Time Training for Speech

    Authors: Sri Harsha Dumpala, Chandramouli Sastry, Sageev Oore

    Abstract: In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In ou… ▽ More

    Submitted 28 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  11. arXiv:2306.09192  [pdf, other

    cs.CV cs.LG

    DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers

    Authors: Chandramouli Sastry, Sri Harsha Dumpala, Sageev Oore

    Abstract: We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step. Using both ResNet-50 and Vision Transformer architectures, we comprehensively evaluate classifiers tra… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Shorter version of this work was accepted in the CVPR 2024 Workshop on Synthetic Data for Computer Vision

  12. arXiv:2304.09337  [pdf, other

    cs.HC cs.AI cs.MM

    Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

    Authors: Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, Tovi Grossman

    Abstract: Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address the challenges… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  13. arXiv:2207.12816  [pdf, other

    cs.CR cs.SD eess.AS

    Generative Extraction of Audio Classifiers for Speaker Identification

    Authors: Tejumade Afonja, Lucas Bourtoule, Varun Chandrasekaran, Sageev Oore, Nicolas Papernot

    Abstract: It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the attacker attempts to steal a victim's model by training a surrogate model to mimic the decision boundaries of the victim model. Previous works have demonstrated the… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  14. arXiv:2202.09648  [pdf, other

    cs.LG cs.CV eess.SP

    Echofilter: A Deep Learning Segmentation Model Improves the Automation, Standardization, and Timeliness for Post-Processing Echosounder Data in Tidal Energy Streams

    Authors: Scott C. Lowe, Louise P. McGarry, Jessica Douglas, Jason Newport, Sageev Oore, Christopher Whidden, Daniel J. Hasselman

    Abstract: Understanding the abundance and distribution of fish in tidal energy streams is important to assess risks presented by introducing tidal energy devices to the habitat. However tidal current flows suitable for tidal energy are often highly turbulent, complicating the interpretation of echosounder data. The portion of the water column contaminated by returns from entrained air must be excluded from… ▽ More

    Submitted 18 August, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Journal ref: Front. Mar. Sci. 9:867857 (2022)

  15. arXiv:2111.01742  [pdf, ps, other

    cs.LG cs.AI cs.CV

    LogAvgExp Provides a Principled and Performant Global Pooling Operator

    Authors: Scott C. Lowe, Thomas Trappenberg, Sageev Oore

    Abstract: We seek to improve the pooling operation in neural networks, by applying a more theoretically justified operator. We demonstrate that LogSumExp provides a natural OR operator for logits. When one corrects for the number of elements inside the pooling operator, this becomes $\text{LogAvgExp} := \log(\text{mean}(\exp(x)))$. By introducing a single temperature parameter, LogAvgExp smoothly transition… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

  16. arXiv:2110.11940  [pdf, other

    cs.LG cs.AI cs.CV

    Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators

    Authors: Scott C. Lowe, Robert Earle, Jason d'Eon, Thomas Trappenberg, Sageev Oore

    Abstract: The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independe… ▽ More

    Submitted 29 November, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Journal ref: Neural Information Processing Systems (2022)

  17. arXiv:2108.01043  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Musical Speech: A Transformer-based Composition Tool

    Authors: Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Dani Oore, Sageev Oore

    Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our p… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: NeurIPS 2020 Demonstration Track; extended for PMLR

  18. arXiv:2107.13969  [pdf, other

    cs.CY cs.LG cs.SD eess.AS

    Significance of Speaker Embeddings and Temporal Context for Depression Detection

    Authors: Sri Harsha Dumpala, Sebastian Rodriguez, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

  19. arXiv:1912.12510  [pdf, other

    cs.LG cs.CV stat.ML

    Detecting Out-of-Distribution Examples with In-distribution Examples and Gram Matrices

    Authors: Chandramouli Shama Sastry, Sageev Oore

    Abstract: When presented with Out-of-Distribution (OOD) examples, deep neural networks yield confident, incorrect predictions. Detecting OOD examples is challenging, and the potential risks are high. In this paper, we propose to detect OOD examples by identifying inconsistencies between activity patterns and class predicted. We find that characterizing activity patterns by Gram matrices and identifying anom… ▽ More

    Submitted 9 January, 2020; v1 submitted 28 December, 2019; originally announced December 2019.

    Comments: NeurIPS 2019 Workshop on Safety and Robustness in Decision Making

  20. arXiv:1907.04352  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring Conditioning for Generative Music Systems with Human-Interpretable Controls

    Authors: Nicholas Meade, Nicholas Barreyre, Scott C. Lowe, Sageev Oore

    Abstract: Performance RNN is a machine-learning system designed primarily for the generation of solo piano performances using an event-based (rather than audio) representation. More specifically, Performance RNN is a long short-term memory (LSTM) based recurrent neural network that models polyphonic music with expressive timing and dynamics (Oore et al., 2018). The neural network uses a simple language mode… ▽ More

    Submitted 3 August, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Journal ref: International Conference on Computational Creativity, 2019

  21. arXiv:1811.09620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

    Authors: Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse

    Abstract: In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having… ▽ More

    Submitted 22 October, 2023; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: 17 pages, published as a conference paper at ICLR 2019

    Journal ref: ICLR 2019

  22. arXiv:1808.03715  [pdf, ps, other

    cs.SD cs.LG eess.AS

    This Time with Feeling: Learning Expressive Musical Performance

    Authors: Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan

    Abstract: Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set need… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: Includes links to urls for audio samples

  23. arXiv:1710.11153  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Onsets and Frames: Dual-Objective Piano Transcription

    Authors: Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

    Abstract: We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note t… ▽ More

    Submitted 5 June, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Examples available at https://goo.gl/magenta/onsets-frames-examples

  24. arXiv:1706.04486  [pdf, other

    cs.SD cs.AI

    Learning and Evaluating Musical Features with Deep Autoencoders

    Authors: Mason Bretan, Sageev Oore, Doug Eck, Larry Heck

    Abstract: In this work we describe and evaluate methods to learn musical embeddings. Each embedding is a vector that represents four contiguous beats of music and is derived from a symbolic representation. We consider autoencoding-based methods including denoising autoencoders, and context reconstruction, and evaluate the resulting embeddings on a forward prediction and a classification task.

    Submitted 15 June, 2017; v1 submitted 14 June, 2017; originally announced June 2017.