Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Bueno, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.06976  [pdf, other

    cs.IR

    Quati: A Brazilian Portuguese Information Retrieval Dataset from Native Speakers

    Authors: Mirelle Bueno, Eduardo Seiti de Oliveira, Rodrigo Nogueira, Roberto A. Lotufo, Jayr Alencar Pereira

    Abstract: Despite Portuguese being one of the most spoken languages in the world, there is a lack of high-quality information retrieval datasets in that language. We present Quati, a dataset specifically designed for the Brazilian Portuguese language. It comprises a collection of queries formulated by native speakers and a curated set of documents sourced from a selection of high-quality Brazilian Portugues… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 22 pages

  2. arXiv:2402.07859  [pdf, other

    cs.CL cs.AI

    Lissard: Long and Simple Sequential Reasoning Datasets

    Authors: Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira

    Abstract: Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens. However, they often fail on tasks that require repetitive use of simple rules, even on sequences that are much shorter than those seen during training. For example, state-of-the-art LLMs can find common items in two lists with up to 20 items but fail when lists h… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2208.11445  [pdf, other

    cs.CL

    Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

    Authors: Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo Nogueira

    Abstract: The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We d… ▽ More

    Submitted 28 November, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

  4. arXiv:2207.12560  [pdf, other

    cs.LG stat.ML

    AMLB: an AutoML Benchmark

    Authors: Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren

    Abstract: Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explo… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: UNDER REVIEW: Revised submission to JMLR, with updated results from June 2023

  5. arXiv:2104.01717  [pdf, other

    cs.SE

    Issue Auto-Assignment in Software Projects with Machine Learning Techniques

    Authors: Pedro Oliveira, Rossana M. C. Andrade, Tales P. Nogueira, Isaac Barreto, Leandro Morais Bueno

    Abstract: Usually, managers or technical leaders in software projects assign issues manually. This task may become more complex as more detailed is the issue description. This complexity can also make the process more prone to errors (misassignments) and time-consuming. In the literature, many studies aim to address this problem by using machine learning strategies. Although there is no specific solution th… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

  6. arXiv:1812.09951  [pdf, other

    cs.CG

    3-Colorable Delaunay Triangulations

    Authors: Lucas Moutinho Bueno

    Abstract: We propose an algorithm to create a 3-colorable Delaunay Triangulation. The input of the problem we are trying to solve is a set X of n twodimensional points. The output is a 3-colorable two-dimensional Delaunay triangulation T for X U Y , where Y is a set of m new points. We want to m be as few as possible.

    Submitted 24 December, 2018; originally announced December 2018.

  7. Bayesian approach for near-duplicate image detection

    Authors: Lucas Moutinho Bueno, Eduardo Valle, Ricardo da Silva Torres

    Abstract: In this paper we propose a bayesian approach for near-duplicate image detection, and investigate how different probabilistic models affect the performance obtained. The task of identifying an image whose metadata are missing is often demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, investigation of latent cross-links in archives… ▽ More

    Submitted 25 April, 2011; originally announced April 2011.