Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Domingues, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12832  [pdf, other

    cs.CL

    Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation

    Authors: Paulo Cavalin, Pedro Henrique Domingues, Claudio Pinhanez

    Abstract: In this paper we show that corpus-level aggregation hinders considerably the capability of lexical metrics to accurately evaluate machine translation (MT) systems. With empirical experiments we demonstrate that averaging individual segment-level scores can make metrics such as BLEU and chrF correlate much stronger with human judgements and make them behave considerably more similar to neural metri… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2407.12620  [pdf, other

    cs.CL cs.AI

    Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences

    Authors: Claudio Pinhanez, Paulo Cavalin, Luciana Storto, Thomas Finbow, Alexander Cobbinah, Julio Nogima, Marisa Vasconcelos, Pedro Domingues, Priscila de Souza Mizukami, Nicole Grell, Majoí Gongora, Isabel Gonçalves

    Abstract: Since 2022 we have been exploring application areas and technologies in which Artificial Intelligence (AI) and modern Natural Language Processing (NLP), such as Large Language Models (LLMs), can be employed to foster the usage and facilitate the documentation of Indigenous languages which are in danger of disappearing. We start by discussing the decreasing diversity of languages in the world and h… ▽ More

    Submitted 29 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2402.19204  [pdf, other

    cs.CL

    PeLLE: Encoder-based language models for Brazilian Portuguese based on open data

    Authors: Guilherme Lamartine de Mello, Marcelo Finger, and Felipe Serras, Miguel de Mello Carpi, Marcos Menon Jose, Pedro Henrique Domingues, Paulo Cavalim

    Abstract: In this paper we present PeLLE, a family of large language models based on the RoBERTa architecture, for Brazilian Portuguese, trained on curated, open data from the Carolina corpus. Aiming at reproducible results, we describe details of the pretraining of the models. We also evaluate PeLLE models against a set of existing multilingual and PT-BR refined pretrained Transformer-based LLM encoders, c… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 15 pages

    ACM Class: I.2.7

  4. arXiv:1707.06336  [pdf

    cs.DL

    Open Source Software for Digital Preservation Repositories: a Survey

    Authors: Carlos André Rosa, Olga Craveiro, Patricio Domingues

    Abstract: In the digital age, the amount of data produced is growing exponentially. Governments and institutions can no longer rely on old methods for storing data and passing on the knowledge to future generations. Digital data preservation is a mandatory issue that needs proper strategies and tools. With this awareness, efforts are being made to create and perfect software solutions capable of responding… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: http://airccse.org/journal/ijcses/

    Journal ref: International Journal of Computer Science & Engineering Survey (IJCSES) Vol.8, No.3, June 2017