Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: Comminiello, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.09976  [pdf, other

    cs.CV eess.SP

    Language-Oriented Semantic Latent Representation for Image Transmission

    Authors: Giordano Cicchetti, Eleonora Grassucci, Jihong Park, Jinho Choi, Sergio Barbarossa, Danilo Comminiello

    Abstract: In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too c… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under review at IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024

  2. arXiv:2405.09866  [pdf, other

    eess.SP cs.LG

    Rethinking Multi-User Semantic Communications with Deep Generative Models

    Authors: Eleonora Grassucci, Jinho Choi, Jihong Park, Riccardo F. Gramaccioni, Giordano Cicchetti, Danilo Comminiello

    Abstract: In recent years, novel communication strategies have emerged to face the challenges that the increased number of connected devices and the higher quality of transmitted information are posing. Among them, semantic communication obtained promising results especially when combined with state-of-the-art deep generative models, such as large language or diffusion models, able to regenerate content fro… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under review in IEEE Journal on Selected Areas in Communications

  3. arXiv:2405.07024  [pdf, other

    cs.LG eess.SP

    Demystifying the Hypercomplex: Inductive Biases in Hypercomplex Deep Learning

    Authors: Danilo Comminiello, Eleonora Grassucci, Danilo P. Mandic, Aurelio Uncini

    Abstract: Hypercomplex algebras have recently been gaining prominence in the field of deep learning owing to the advantages of their division algebras over real vector spaces and their superior results when dealing with multidimensional signals in real-world 3D and 4D paradigms. This paper provides a foundational framework that serves as a roadmap for understanding why hypercomplex deep learning methods are… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted for Publication in IEEE Signal Processing Magazine

  4. arXiv:2405.02961  [pdf, other

    cs.CV eess.IV

    JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos

    Authors: Pietro Nardelli, Danilo Comminiello

    Abstract: The increasing proliferation of video surveillance cameras and the escalating demand for crime prevention have intensified interest in the task of violence detection within the research community. Compared to other action recognition tasks, violence detection in surveillance videos presents additional issues, such as the wide variety of real fight scenes. Unfortunately, existing datasets for viole… ▽ More

    Submitted 3 August, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  5. arXiv:2402.09245  [pdf, other

    eess.AS cs.LG eess.SP

    Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality

    Authors: Christian Marinoni, Riccardo Fosco Gramaccioni, Changan Chen, Aurelio Uncini, Danilo Comminiello

    Abstract: The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications. As part of our latest competition, we provide a brand-new dataset, which maintains the s… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

  6. arXiv:2310.15247  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

    Authors: Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss

    Abstract: Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no refer… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  7. arXiv:2310.10224  [pdf, other

    eess.IV cs.CV cs.LG

    Generalizing Medical Image Representations via Quaternion Wavelet Networks

    Authors: Luigi Sigillo, Eleonora Grassucci, Aurelio Uncini, Danilo Comminiello

    Abstract: Neural network generalizability is becoming a broad research field due to the increasing availability of datasets from different sources and for various tasks. This issue is even wider when processing medical data, where a lack of methodological standards causes large variations being provided by different imaging centers or acquired with various devices and cofactors. To overcome these limitation… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: This paper is currently under review

  8. arXiv:2310.07648  [pdf, other

    cs.HC cs.LG eess.SP

    Hypercomplex Multimodal Emotion Recognition from EEG and Peripheral Physiological Signals

    Authors: Eleonora Lopez, Eleonora Chiarantano, Eleonora Grassucci, Danilo Comminiello

    Abstract: Multimodal emotion recognition from physiological signals is receiving an increasing amount of attention due to the impossibility to control them at will unlike behavioral reactions, thus providing more reliable information. Existing deep learning-based methods still rely on extracted handcrafted features, not taking full advantage of the learning ability of neural networks, and often adopt a sing… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Published at IEEE ICASSP workshops 2023

  9. arXiv:2310.07633  [pdf, other

    eess.IV cs.CV

    Attention-Map Augmentation for Hypercomplex Breast Cancer Classification

    Authors: Eleonora Lopez, Filippo Betello, Federico Carmignani, Eleonora Grassucci, Danilo Comminiello

    Abstract: Breast cancer is the most widespread neoplasm among women and early detection of this disease is critical. Deep learning techniques have become of great interest to improve diagnostic performance. However, distinguishing between malignant and benign masses in whole mammograms poses a challenge, as they appear nearly identical to an untrained eye, and the region of interest (ROI) constitutes only a… ▽ More

    Submitted 23 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Published in Elsevier Pattern Recognition Letters

  10. arXiv:2309.07195  [pdf, other

    cs.SD cs.ET eess.AS

    Diffusion models for audio semantic communication

    Authors: Eleonora Grassucci, Christian Marinoni, Andrea Rodriguez, Danilo Comminiello

    Abstract: Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this pape… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  11. arXiv:2309.02478  [pdf, other

    cs.LG cs.AI eess.SP

    Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

    Authors: Eleonora Grassucci, Yuki Mitsufuji, Ping Zhang, Danilo Comminiello

    Abstract: Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possibly being robust to channel corruptions, can be addressed with deep generative models. This ICASSP special session overview p… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP

  12. arXiv:2309.02387  [pdf, other

    eess.SP

    Semantic Communications Based on Adaptive Generative Models and Information Bottleneck

    Authors: S. Barbarossa, D. Comminiello, E. Grassucci, F. Pezone, S. Sardellitti, P. Di Lorenzo

    Abstract: Semantic communications represent a significant breakthrough with respect to the current communication paradigm, as they focus on recovering the meaning behind the transmitted sequence of symbols, rather than the symbols themselves. In semantic communications, the scope of the destination is not to recover a list of symbols symbolically identical to the transmitted ones, but rather to recover a me… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: To appear on IEEE Communications Magazine, special issue on Semantic Communications: Transmission beyond Shannon, 2023

  13. arXiv:2305.10882  [pdf, other

    cs.CV cs.LG eess.IV

    StawGAN: Structural-Aware Generative Adversarial Networks for Infrared Image Translation

    Authors: Luigi Sigillo, Eleonora Grassucci, Danilo Comminiello

    Abstract: This paper addresses the problem of translating night-time thermal infrared images, which are the most adopted image modalities to analyze night-time scenes, to daytime color images (NTIT2DC), which provide better perceptions of objects. We introduce a novel model that focuses on enhancing the quality of the target generation without merely colorizing it. The proposed structural aware (StawGAN) en… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Journal ref: 2023 IEEE International Symposium on Circuits and Systems (ISCAS)

  14. arXiv:2204.02385  [pdf, other

    eess.AS cs.LG cs.SD

    Learning Speech Emotion Representations in the Quaternion Domain

    Authors: Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello

    Abstract: The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our meth… ▽ More

    Submitted 3 March, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted for Publication in IEEE/ACM Transactions on Audio, Speech and Language Processing

  15. arXiv:2204.01851  [pdf, other

    eess.AS cs.LG cs.SD

    Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

    Authors: Eleonora Grassucci, Gioia Mancini, Christian Brignone, Aurelio Uncini, Danilo Comminiello

    Abstract: Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this paper, we propose a dual quaternion representation… ▽ More

    Submitted 14 December, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Paper accepted for publication in Elsevier Pattern Recognition Letters

  16. arXiv:2202.10372  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

    Authors: Eric Guizzo, Christian Marinoni, Marco Pennese, Xinlei Ren, Xiguang Zheng, Chen Zhang, Bruno Masiero, Aurelio Uncini, Danilo Comminiello

    Abstract: The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points a… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: Accepted to 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022). arXiv admin note: substantial text overlap with arXiv:2104.05499

    Journal ref: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9186-9190

  17. arXiv:2104.09641  [pdf, ps, other

    cs.LG cs.SD eess.AS eess.SP eess.SY

    A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling

    Authors: Danilo Comminiello, Alireza Nezamdoust, Simone Scardapane, Michele Scarpiniti, Amir Hussain, Aurelio Uncini

    Abstract: Nonlinear models are known to provide excellent performance in real-world applications that often operate in non-ideal conditions. However, such applications often require online processing to be performed with limited computational resources. To address this problem, we propose a new class of efficient nonlinear models for online applications. The proposed algorithms are based on linear-in-the-pa… ▽ More

    Submitted 26 August, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: This work has been accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems. Copyright may be transferred without notice, after which this version may no longer be accessible

  18. arXiv:2104.09630  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Quaternion Generative Adversarial Networks

    Authors: Eleonora Grassucci, Edoardo Cicero, Danilo Comminiello

    Abstract: Latest Generative Adversarial Networks (GANs) are gathering outstanding results through a large-scale training, thus employing models composed of millions of parameters requiring extensive computational capabilities. Building such huge models undermines their replicability and increases the training instability. Moreover, multi-channel data, such as images or audio, are usually processed by realva… ▽ More

    Submitted 27 July, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted as a Chapter for the SPRINGER book "Generative Adversarial Learning: Architectures and Applications"

    Journal ref: Generative Adversarial Learning: Architectures and Applications. Intelligent Systems Reference Library, vol 217. Springer, Cham, Feb. 2022

  19. arXiv:2104.05499  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

    Authors: Eric Guizzo, Riccardo F. Gramaccioni, Saeid Jamili, Christian Marinoni, Edoardo Massaro, Claudia Medaglia, Giuseppe Nachira, Leonardo Nucciarelli, Ludovica Paglialunga, Marco Pennese, Sveva Pepe, Enrico Rocchi, Aurelio Uncini, Danilo Comminiello

    Abstract: The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results s… ▽ More

    Submitted 29 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Documentation paper for the L3DAS21 Challenge for IEEE MLSP 2021. Further information on www.l3das.com/mlsp2021

    Journal ref: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021, pp. 1-6

  20. A Quaternion-Valued Variational Autoencoder

    Authors: Eleonora Grassucci, Danilo Comminiello, Aurelio Uncini

    Abstract: Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by learning a latent representation of the input. In this paper, we propose a novel VAE defined in the quaternion domain, which exploits the properties of quaternion algebra to improve perf… ▽ More

    Submitted 22 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Journal ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3310-3314

  21. Combined Sparse Regularization for Nonlinear Adaptive Filters

    Authors: Danilo Comminiello, Michele Scarpiniti, Simone Scardapane, Luis A. Azpicueta-Ruiz, Aurelio Uncini

    Abstract: Nonlinear adaptive filters often show some sparse behavior due to the fact that not all the coefficients are equally useful for the modeling of any nonlinearity. Recently, a class of proportionate algorithms has been proposed for nonlinear filters to leverage sparsity of their coefficients. However, the choice of the norm penalty of the cost function may be not always appropriate depending on the… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: This is a corrected version of the paper presented at EUSIPCO 2018 and published on IEEE https://ieeexplore.ieee.org/document/8552955

    Journal ref: 2018 26th European Signal Processing Conference (EUSIPCO), Sep. 2018

  22. A Multimodal Deep Network for the Reconstruction of T2W MR Images

    Authors: Antonio Falvo, Danilo Comminiello, Simone Scardapane, Michele Scarpiniti, Aurelio Uncini

    Abstract: Multiple sclerosis is one of the most common chronic neurological diseases affecting the central nervous system. Lesions produced by the MS can be observed through two modalities of magnetic resonance (MR), known as T2W and FLAIR sequences, both providing useful information for formulating a diagnosis. However, long acquisition time makes the acquired MR image vulnerable to motion artifacts. This… ▽ More

    Submitted 24 February, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

    Comments: 29th Italian Neural Networks Workshop (WIRN 2019)

    Journal ref: Progresses in Artificial Intelligence and Neural Systems. Smart Innovation, Systems and Technologies, vol 184. Springer, Singapore, Jul. 2020

  23. arXiv:1812.06811  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

    Authors: Danilo Comminiello, Marco Lella, Simone Scardapane, Aurelio Uncini

    Abstract: Learning from data in the quaternion domain enables us to exploit internal dependencies of 4D signals and treating them as a single entity. One of the models that perfectly suits with quaternion-valued data processing is represented by 3D acoustic signals in their spherical harmonics decomposition. In this paper, we address the problem of localizing and detecting sound events in the spatial sound… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

    Comments: Submitted to ICASSP 2019

    Journal ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8533-8537