Zum Hauptinhalt springen

Showing 1–33 of 33 results for author: Cardinael, P

.
  1. arXiv:2404.02707  [pdf

    physics.app-ph

    AlN/Si interface engineering to mitigate RF losses in MOCVD grown GaN-on-Si substrates

    Authors: Pieter Cardinael, Sachin Yadav, Herwig Hahn, Ming Zhao, Sourish Banerjee, Babak Kazemi Esfeh, Christof Mauder, Barry O Sullivan, Uthayasankaran Peralagu, Anurag Vohra, Robert Langer, Nadine Collaert, Bertrand Parvais, Jean-Pierre Raskin

    Abstract: Fabrication of low-RF loss GaN-on-Si HEMT stacks is critical to enable competitive front-end-modules for 5G and 6G applications. The main contribution to RF losses is the interface between the III-N layer and the HR Si wafer, more specifically the AlN/Si interface. At this interface, a parasitic surface conduction layer exists in Si, which decreases the substrate effective resistivity sensed by ov… ▽ More

    Submitted 13 August, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: The following article has been accepted for publication in Applied Physics Letters. After it is published, it will be found at https://pubs.aip.org/aip/apl

  2. arXiv:2304.07958  [pdf, other

    cs.CV cs.SD eess.AS

    Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition

    Authors: R Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intra-modal characteristics of individual modalities. In this paper, a recursive joint attention model is proposed along with long short-term memory (LSTM) modules for the fusion of vocal and facial expressions in regression-… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

  3. arXiv:2209.09068  [pdf, other

    cs.CV

    Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention

    Authors: R Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Automatic emotion recognition (ER) has recently gained lot of interest due to its potential in many real-world applications. In this context, multimodal approaches have been shown to improve performance (over unimodal approaches) by combining diverse and complementary sources of information, providing some robustness to noisy and missing modalities. In this paper, we focus on dimensional ER based… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.14779, arXiv:2111.05222

  4. arXiv:2207.06858  [pdf, ps, other

    cs.SD cs.LG eess.AS

    RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks

    Authors: Mohammad Esmaeilpour, Nourhene Chaalia, Patrick Cardinal

    Abstract: This paper introduces a new synthesis-based defense algorithm for counteracting with a varieties of adversarial attacks developed for challenging the performance of the cutting-edge speech-to-text transcription systems. Our algorithm implements a Sobolev-based GAN and proposes a novel regularizer for effectively controlling over the functionality of the entire generative model, particularly the di… ▽ More

    Submitted 24 September, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Paper ACCEPTED FOR PUBLICATION IEEE Signal Processing Letters Journal

  5. arXiv:2205.11693  [pdf, other

    cs.LG cs.AI cs.DB

    RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis

    Authors: Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Francois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal

    Abstract: This paper introduces a novel generative adversarial network (GAN) for synthesizing large-scale tabular databases which contain various features such as continuous, discrete, and binary. Technically, our GAN belongs to the category of class-conditioned generative models with a predefined conditional vector. However, we propose a new formulation for deriving such a vector incorporating both binary… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: Paper submitted to IEEE Transactions on Neural Networks and Learning Systems

  6. arXiv:2204.12622  [pdf, other

    cs.SD cs.CR eess.AS

    Named Entity Recognition for Audio De-Identification

    Authors: Guillaume Baril, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: Data anonymization is often a task carried out by humans. Automating it would reduce the cost and time required to complete this task. This paper presents a pipeline to automate the anonymization of audio data in French. We propose a pipeline, which takes audio files with their transcriptions and removes the named entities (NEs) present in the audio. Our pipeline is made up of a forced aligner, wh… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 8 pages

  7. arXiv:2204.07018  [pdf, other

    cs.SD cs.CR cs.CV cs.LG eess.AS

    From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 32 pages, Preprint Submitted to Journal of Applied Acoustics. arXiv admin note: substantial text overlap with arXiv:2007.13703

  8. arXiv:2203.14779  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger

    Abstract: Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.05222

  9. arXiv:2111.06549  [pdf, other

    cs.LG

    Bi-Discriminator Class-Conditional Tabular GAN

    Authors: Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Francois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal

    Abstract: This paper introduces a bi-discriminator GAN for synthesizing tabular datasets containing continuous, binary, and discrete columns. Our proposed approach employs an adapted preprocessing scheme and a novel conditional term for the generator network to more effectively capture the input sample distributions. Additionally, we implement straightforward yet effective architectures for discriminator ne… ▽ More

    Submitted 2 December, 2021; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: Submitted to Elsevier Pattern Recognition Letter

  10. arXiv:2111.05222  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature… ▽ More

    Submitted 6 July, 2024; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: Accepted in FG2021

  11. arXiv:2103.14717  [pdf, other

    cs.SD cs.CR eess.AS

    Cyclic Defense GAN Against Speech Adversarial Attacks

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: This paper proposes a new defense approach for counteracting state-of-the-art white and black-box adversarial attack algorithms. Our approach fits into the implicit reactive defense algorithm category since it does not directly manipulate the potentially malicious input signals. Instead, it reconstructs a similar signal with a synthesized spectrogram using a cyclic generative adversarial network.… ▽ More

    Submitted 22 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: 5

    Journal ref: IEEE Signal Processing Letters (2021) 1-5

  12. arXiv:2103.08095  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Towards Robust Speech-to-Text Adversarial Attack

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo. Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation using the Cramèr integral probability metric. Minimizing over this metric, which measures the discrepancies between… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

    Comments: 5 pages

  13. arXiv:2103.08086  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for End-to-End Speech Systems

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems. The proposed defense algorithm has four major steps. First, we represent speech signals with 2D spectrograms using the short-time Fourier transform. Second, we iteratively find a safe vector using a spectrogram subspace projection operation. This operation minimizes th… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

    Comments: 10 pages

  14. arXiv:2101.09858  [pdf, other

    cs.CV

    Weakly Supervised Learning for Facial Behavior Analysis : A Review

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: In the recent years, there has been a shift in facial behavior analysis from the laboratory-controlled conditions to the challenging in-the-wild conditions due to the superior performance of deep learning based approaches for many real world applications.However, the performance of deep learning approaches relies on the amount of training data. One of the major problems with data acquisition is th… ▽ More

    Submitted 6 July, 2024; v1 submitted 24 January, 2021; originally announced January 2021.

  15. arXiv:2010.15675   

    cs.CV

    Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos

    Authors: Gnana Praveen R, Eric Granger, Patrick Cardinal

    Abstract: Automatic estimation of pain intensity from facial expressions in videos has an immense potential in health care applications. However, domain adaptation (DA) is needed to alleviate the problem of domain shifts that typically occurs between video data captured in source and target do-mains. Given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: due to multiple submission

  16. arXiv:2010.11352  [pdf, other

    cs.SD cs.CR cs.CV cs.LG eess.AS

    Class-Conditional Defense GAN Against End-to-End Speech Attacks

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper we propose a novel defense approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo. Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal aiming at removing potential adversarial perturbation. Instead of that, we… ▽ More

    Submitted 19 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: 5 pages

    Journal ref: 46th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2021

  17. arXiv:2010.05844  [pdf, other

    cs.SD cs.LG eess.AS

    Conditioning Trick for Training Stable GANs

    Authors: Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not li… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  18. arXiv:2008.11618  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Adversarially Training for Audio Classifiers

    Authors: Raymel Alfonso Sallo, Mohammad Esmaeilpour, Patrick Cardinal

    Abstract: In this paper, we investigate the potential effect of the adversarially training on the robustness of six advanced deep neural networks against a variety of targeted and non-targeted adversarial attacks. We firstly show that, the ResNet-56 model trained on the 2D representation of the discrete wavelet transform appended with the tonnetz chromagram outperforms other models in terms of recognition a… ▽ More

    Submitted 25 October, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: Paper accepted to International Conference on Pattern Recognition (ICPR) 2020

  19. arXiv:2008.06392  [pdf, other

    cs.CV

    Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem o… ▽ More

    Submitted 6 July, 2024; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Under review for a journal. arXiv admin note: text overlap with arXiv:1910.08173

  20. arXiv:2008.05454  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Improving Stability of LS-GANs for Audio and Speech Signals

    Authors: Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: 10 pages

  21. arXiv:2007.13703  [pdf, other

    eess.AS cs.LG cs.SD

    From Sound Representation to Model Robustness

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper, we investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. Averaged over various experiments on three benchmarking environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures such as G… ▽ More

    Submitted 17 January, 2021; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: 12 pages

  22. arXiv:1910.12084  [pdf, ps, other

    cs.LG cs.CR cs.SD eess.AS stat.ML

    Detection of Adversarial Attacks and Characterization of Adversarial Subspace

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: Adversarial attacks have always been a serious threat for any data-driven model. In this paper, we explore subspaces of adversarial examples in unitary vector domain, and we propose a novel detector for defending our models trained for environmental sound classification. We measure chordal distance between legitimate and malicious representation of sounds in unitary space of generalized Schur deco… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  23. arXiv:1910.08173  [pdf, other

    cs.CV

    Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain… ▽ More

    Submitted 6 July, 2024; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted in FG 2020

  24. Emotion Recognition with Spatial Attention and Temporal Softmax Pooling

    Authors: Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, Eric Granger

    Abstract: Video-based emotion recognition is a challenging task because it requires to distinguish the small deformations of the human face that represent emotions, while being invariant to stronger visual differences due to different identities. State-of-the-art methods normally use complex deep learning models such as recurrent neural networks (RNNs, LSTMs, GRUs), convolutional neural networks (CNNs, C3D,… ▽ More

    Submitted 3 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: 9 pages; 2 figures; 2 tables; Best paper award at ICIAR 2019

  25. arXiv:1908.03173  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Universal Adversarial Audio Perturbations

    Authors: Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, Alessandro L. Koerich

    Abstract: We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios. We propose two methods for finding such perturbations. The first method is based on an iterative, greedy approach that is well-known in computer vision: it aggregates small perturbations to the input so as to push it… ▽ More

    Submitted 16 November, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

  26. arXiv:1907.04928  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction

    Authors: Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dic… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

  27. arXiv:1907.03196  [pdf, other

    cs.CV eess.AS eess.IV

    Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

    Authors: Juan D. S. Ortega, Mohammed Senoussaoui, Eric Granger, Marco Pedersoli, Patrick Cardinal, Alessandro L. Koerich

    Abstract: This paper presents a novel deep neural network (DNN) for multimodal fusion of audio, video and text modalities for emotion recognition. The proposed DNN architecture has independent and shared layers which aim to learn the representation for each modality, as well as the best combined representation to achieve the best prediction. Experimental results on the AVEC Sentiment Analysis in the Wild da… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

  28. arXiv:1906.10623  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Emotion Recognition Using Fusion of Audio and Video Features

    Authors: Juan D. S. Ortega, Patrick Cardinal, Alessandro L. Koerich

    Abstract: In this paper we propose a fusion approach to continuous emotion recognition that combines visual and auditory modalities in their representation spaces to predict the arousal and valence levels. The proposed approach employs a pre-trained convolution neural network and transfer learning to extract features from video frames that capture the emotional content. For the auditory content, a minimalis… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  29. arXiv:1904.11641  [pdf, other

    cs.SD cs.CL eess.AS

    Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods

    Authors: Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich

    Abstract: Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics. This paper proposes covariance-based feature vectors to model speech and ensembles of support vector regressors to estimate the degree of sincerity of a speaker. The elements of each covariance vector are pairwise statistics between the short-term feature components. These features are use… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

  30. arXiv:1904.10990  [pdf, other

    cs.LG cs.CR cs.SD eess.AS stat.ML

    A Robust Approach for Securing Audio Classification Against Adversarial Attacks

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: Adversarial audio attacks can be considered as a small perturbation unperceptive to human ears that is intentionally added to the audio signal and causes a machine learning model to make mistakes. This poses a security concern about the safety of machine learning models since the adversarial attacks can fool such models toward the wrong predictions. In this paper we first review some strong advers… ▽ More

    Submitted 25 November, 2019; v1 submitted 24 April, 2019; originally announced April 2019.

    Comments: Paper Accepted for Publication in IEEE Transactions on Information Forensics and Security

  31. arXiv:1904.08990  [pdf, other

    cs.SD cs.LG stat.ML

    End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

    Authors: Sajjad Abdoli, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to capture the signal's fine time structure and learn diverse filters that are relevant to the classification task. The proposed approach can deal with audio signals… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

  32. arXiv:1904.04221  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Unsupervised Feature Learning for Environmental Sound Classification Using Weighted Cycle-Consistent Generative Adversarial Network

    Authors: Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

    Abstract: In this paper we propose a novel environmental sound classification approach incorporating unsupervised feature learning from codebook via spherical $K$-Means++ algorithm and a new architecture for high-level data augmentation. The audio signal is transformed into a 2D representation using a discrete wavelet transform (DWT). The DWT spectrograms are then augmented by a novel architecture for cycle… ▽ More

    Submitted 25 November, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Paper Accepted for Publication in Elsevier Applied Soft Computing

  33. arXiv:1509.06928  [pdf, ps, other

    cs.CL

    Automatic Dialect Detection in Arabic Broadcast Speech

    Authors: Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals

    Abstract: We investigate different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework. We studied both generative and discriminate classifiers, and we combined these features using a multi-class Support Vector Machine (SVM). We validated our results on an Arabic/Engli… ▽ More

    Submitted 10 August, 2016; v1 submitted 23 September, 2015; originally announced September 2015.