Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Jou, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13762  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

    Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

    Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2309.03978  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LanSER: Language-Model Supported Speech Emotion Recognition

    Authors: Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

    Abstract: Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Presented at INTERSPEECH 2023

    Journal ref: INTERSPEECH (2023) 2408-2412

  3. arXiv:2206.12494  [pdf, other

    cs.SD cs.LG eess.AS

    Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

    Authors: Josh Belanich, Krishna Somandepalli, Brian Eoff, Brendan Jou

    Abstract: This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop & Competition multitask track (ExVo-MultiTask). We first applied image classification models of various sizes on mel-spectrogram representations of the vocal bursts, as is standard in sound event detection literature. Results from these models show an increase of 21.24% over… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To be published in the ICML Expressive Vocalizations Workshop & Competition 2022 (https://www.competitions.hume.ai/exvo2022)

  4. arXiv:2105.15164  [pdf, other

    cs.LG cs.AI

    DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

    Authors: Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard

    Abstract: Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of expla… ▽ More

    Submitted 15 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at ICLR 2022

  5. arXiv:2105.03014  [pdf, other

    cs.CV

    BasisNet: Two-stage Model Synthesis for Efficient Inference

    Authors: Mingda Zhang, Chun-Te Chu, Andrey Zhmoginov, Andrew Howard, Brendan Jou, Yukun Zhu, Li Zhang, Rebecca Hwa, Adriana Kovashka

    Abstract: In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction.… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear, 4th Workshop on Efficient Deep Learning for Computer Vision (ECV2021), CVPR2021 Workshop

  6. arXiv:2002.05037  [pdf, other

    cs.NI

    An Extensible Network Slicing Framework for Satellite Integration into 5G

    Authors: Youssouf Drif, Emmanuel Chaput, Emmanuel Lavinal, Pascal Berthou, Boris Tiomela Jou, Olivier Gremillet, Fabrice Arnal

    Abstract: For the past decades, networks have evolved to increase their performances, their capacities, to reduce latencies and optimize their resource management in order to remain competitive and adapted to the market. Today, the way consumers use networks has changed and more heterogeneous services with their own requirements have emerged. This has led network operators to define the network slicing para… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

    Comments: 2 pages, 1 figure

    ACM Class: C.2.1

  7. arXiv:1909.09285  [pdf, other

    cs.LG stat.ML

    Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

    Authors: Asma Ghandeharioun, Brian Eoff, Brendan Jou, Rosalind W. Picard

    Abstract: Supporting model interpretability for complex phenomena where annotators can legitimately disagree, such as emotion recognition, is a challenging machine learning task. In this work, we show that explicitly quantifying the uncertainty in such settings has interpretability benefits. We use a simple modification of a classical network inference using Monte Carlo dropout to give measures of epistemic… ▽ More

    Submitted 5 October, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted for presentation at 2019 ICCV Workshop on Interpreting and Explaining Visual Artificial Intelligence Models

  8. arXiv:1708.06834  [pdf, other

    cs.AI cs.CV

    Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

    Authors: Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang

    Abstract: Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfol… ▽ More

    Submitted 5 February, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: Accepted as conference paper at ICLR 2018

  9. arXiv:1708.06039  [pdf, other

    cs.CV cs.AI cs.MM

    More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

    Authors: Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang

    Abstract: The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: Oral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)

  10. arXiv:1606.02276  [pdf, other

    cs.CL cs.CV cs.IR cs.MM

    Multilingual Visual Sentiment Concept Matching

    Authors: Nikolaos Pappas, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, Shih-Fu Chang

    Abstract: The impact of culture in visual emotion perception has recently captured the attention of multimedia research. In this study, we pro- vide powerful computational linguistics tools to explore, retrieve and browse a dataset of 16K multilingual affective visual concepts and 7.3M Flickr images. First, we design an effective crowdsourc- ing experiment to collect human judgements of sentiment connected… ▽ More

    Submitted 7 June, 2016; originally announced June 2016.

    Journal ref: Proceedings ICMR '16 Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval Pages 151-158

  11. arXiv:1605.09211  [pdf, other

    cs.MM cs.CL cs.CV

    Going Deeper for Multilingual Visual Sentiment Detection

    Authors: Brendan Jou, Shih-Fu Chang

    Abstract: This technical report details several improvements to the visual concept detector banks built on images from the Multilingual Visual Sentiment Ontology (MVSO). The detector banks are trained to detect a total of 9,918 sentiment-biased visual concepts from six major languages: English, Spanish, Italian, French, German and Chinese. In the original MVSO release, adjective-noun pair (ANP) detectors we… ▽ More

    Submitted 30 May, 2016; originally announced May 2016.

    Comments: technical report, 7 pages

  12. arXiv:1604.03489  [pdf, other

    cs.CV cs.MM

    From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction

    Authors: Victor Campos, Brendan Jou, Xavier Giro-i-Nieto

    Abstract: Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool pa… ▽ More

    Submitted 27 January, 2017; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: Accepted for publication in Image and Vision Computing. Models and source code available at https://github.com/imatge-upc/sentiment-2016

  13. arXiv:1604.01335  [pdf, other

    cs.CV cs.AI cs.MM

    Deep Cross Residual Learning for Multitask Visual Recognition

    Authors: Brendan Jou, Shih-Fu Chang

    Abstract: Residual learning has recently surfaced as an effective means of constructing very deep neural networks for object recognition. However, current incarnations of residual networks do not allow for the modeling and integration of complex relations between closely coupled recognition tasks or across domains. Such problems are often encountered in multimedia applications involving large-scale content… ▽ More

    Submitted 19 July, 2016; v1 submitted 5 April, 2016; originally announced April 2016.

    Comments: 10 pages, 6 figures, To appear in ACM Multimedia

    ACM Class: I.2.6; I.5.1; I.5.4; H.5.1

  14. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction

    Authors: Victor Campos, Amaia Salvador, Brendan Jou, Xavier Giró-i-Nieto

    Abstract: Visual media are powerful means of expressing emotions and sentiments. The constant generation of new content in social networks highlights the need of automated visual sentiment analysis tools. While Convolutional Neural Networks (CNNs) have established a new state-of-the-art in several vision problems, their application to the task of sentiment analysis is mostly unexplored and there are few stu… ▽ More

    Submitted 24 August, 2015; v1 submitted 20 August, 2015; originally announced August 2015.

    Comments: Preprint of the paper accepted at the 1st Workshop on Affect and Sentiment in Multimedia (ASM), in ACM MultiMedia 2015. Brisbane, Australia

    ACM Class: I.2.10; H.1.2

  15. arXiv:1508.03868  [pdf, other

    cs.MM cs.CL cs.CV cs.IR

    Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

    Authors: Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang

    Abstract: Every culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of sentiment- and emotion-polarized visual concepts by adapting semantic structures called adjective-noun pairs, originally introduced by Borth et al. (2013),… ▽ More

    Submitted 7 October, 2015; v1 submitted 16 August, 2015; originally announced August 2015.

    Comments: 11 pages, to appear at ACM MM'15

    ACM Class: H.1.2; H.5.1; H.5.4; I.2.10