Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Roman, S

Searching in archive cs. Search in all archives.
.
  1. Enhancing Students' Learning Process Through Self-Generated Tests

    Authors: Marcos Sánchez-Élez, Inmaculada Pardines, Pablo García, Guadalupe Miñana, Sara Román, Margarita Sánchez, José L. Risco-Martín

    Abstract: The use of new technologies in higher education has surprisingly emphasized students' tendency to adopt a passive behavior in class. Participation and interaction of students are essential to improve academic results. This paper describes an educational experiment aimed at the promotion of students' autonomous learning by requiring them to generate test type questions related to the contents of th… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Journal ref: Journal of Science Education and Technology, 23(1), pp. 15-25, 2014

  2. arXiv:2401.17264  [pdf, other

    cs.SD cs.AI cs.CR

    Proactive Detection of Voice Cloning with Localized Watermarking

    Authors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

    Abstract: In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized waterma… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Published at ICML 2024. Code at https://github.com/facebookresearch/audioseal - webpage at https://pierrefdz.github.io/publications/audioseal/

  3. arXiv:2401.17129  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

    Authors: Adrian S. Roman, Baladithya Balamurugan, Rithik Pothuganti

    Abstract: This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a fr… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  4. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  5. arXiv:2401.08717  [pdf, other

    cs.SD eess.AS

    Robust DOA estimation using deep acoustic imaging

    Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

    Abstract: Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  6. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  7. arXiv:2308.02560  [pdf, other

    cs.SD cs.LG eess.AS

    From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

    Authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez

    Abstract: Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the condi… ▽ More

    Submitted 8 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 10 pages

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

  8. arXiv:2110.05948  [pdf, other

    eess.SP cs.AI cs.CV cs.GR cs.LG cs.SD eess.AS eess.IV

    Denoising Diffusion Gamma Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582

  9. arXiv:2106.07582  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Non Gaussian Denoising Diffusion Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion pro… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  10. arXiv:1910.13794  [pdf, other

    cs.CL

    Let Me Know What to Ask: Interrogative-Word-Aware Question Generation

    Authors: Junmo Kang, Haritz Puerto San Roman, Sung-Hyon Myaeng

    Abstract: Question Generation (QG) is a Natural Language Processing (NLP) task that aids advances in Question Answering (QA) and conversational assistants. Existing models focus on generating a question based on a text and possibly the answer to the generated question. They need to determine the type of interrogative word to be generated while having to pay attention to the grammar and vocabulary of the que… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted at 2nd Workshop on Machine Reading for Question Answering (MRQA), EMNLP 2019

  11. arXiv:1809.08887  [pdf, other

    cs.CL cs.AI

    Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

    Authors: Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev

    Abstract: We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task where different complex SQL queries and databas… ▽ More

    Submitted 2 February, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018, Long Paper

  12. arXiv:1606.07256  [pdf, ps, other

    cs.CV

    Saliency Driven Object recognition in egocentric videos with deep CNN

    Authors: Philippe Pérez de San Roman, Jenny Benois-Pineau, Jean-Philippe Domenger, Florent Paclet, Daniel Cataert, Aymar de Rugy

    Abstract: The problem of object recognition in natural scenes has been recently successfully addressed with Deep Convolutional Neuronal Networks giving a significant break-through in recognition scores. The computational efficiency of Deep CNNs as a function of their depth, allows for their use in real-time applications. One of the key issues here is to reduce the number of windows selected from images to b… ▽ More

    Submitted 23 June, 2016; originally announced June 2016.

    Comments: 20 pages, 8 figures, 3 tables, Submitted to the Journal of Computer Vision and Image Understanding

  13. Reconfiguration Strategies for Online Hardware Multitasking in Embedded Systems

    Authors: Marcos Sanchez-Elez, Sara Roman

    Abstract: An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the system has to decide which tasks are more suitable for hardware execution. In order to make an efficient use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by using partial dynamic reconfiguration. One of the challenges for hardware multitaskin… ▽ More

    Submitted 15 January, 2013; originally announced January 2013.

    Comments: Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012