Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Gris, L R S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.09979  [pdf, other

    cs.SD cs.AI eess.AS

    Evaluation of Speech Representations for MOS prediction

    Authors: Frederico S. Oliveira, Edresson Casanova, Arnaldo Cândido Júnior, Lucas R. S. Gris, Anderson S. Soares, Arlindo R. Galvão Filho

    Abstract: In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was creat… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 12 pages, 4 figures, Accepted to the 26th International Conference of Text, Speech and Dialogue (TSD2023)

  2. arXiv:2305.14580  [pdf, other

    cs.CL cs.AI

    Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

    Authors: Lucas Rafael Stefanel Gris, Ricardo Marcacini, Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Sandra Maria Aluísio

    Abstract: Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct identification of punctuation marks in automatic transcriptions, which hinder the use of transcriptions by other systems, models, and even by humans.… ▽ More

    Submitted 26 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  3. arXiv:2211.14372  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Interpretability Analysis of Deep Models for COVID-19 Detection

    Authors: Daniel Peixoto Pinto da Silva, Edresson Casanova, Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Marcelo Finger, Flaviane Svartman, Beatriz Raposo, Marcus Vinícius Moreira Martins, Sandra Maria Aluísio, Larissa Cristina Berti, João Paulo Teixeira

    Abstract: During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age.… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 14 pages, 4 figures

  4. arXiv:2210.07852  [pdf, other

    cs.CL cs.SD eess.AS

    Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

    Authors: Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio

    Abstract: The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  5. arXiv:2110.15731  [pdf, other

    cs.CL cs.SD eess.AS

    CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

    Authors: Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Frederico Santos de Oliveira, Lucas Oliveira, Ricardo Corso Fernandes Junior, Daniel Peixoto Pinto da Silva, Fernando Gorgulho Fayet, Bruno Baldissera Carlotto, Lucas Rafael Stefanel Gris, Sandra Maria Aluísio

    Abstract: Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however,… ▽ More

    Submitted 18 November, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: This paper is under consideration at Language Resources and Evaluation (LREV)

  6. arXiv:2107.11414  [pdf, other

    cs.CL

    Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

    Authors: Lucas Rafael Stefanel Gris, Edresson Casanova, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior

    Abstract: Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In… ▽ More

    Submitted 22 December, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

  7. arXiv:2002.11213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

    Authors: Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, Lucas Rafael Stefanel Gris, Hamilton Pereira da Silva, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice… ▽ More

    Submitted 18 June, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Submitted to BRACIS