Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Zavrel, J

.
  1. arXiv:2406.14783  [pdf, other

    cs.IR cs.CL

    Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework

    Authors: Zackary Rackauckas, Arthur Câmara, Jakub Zavrel

    Abstract: Challenges in the automated evaluation of Retrieval-Augmented Generation (RAG) Question-Answering (QA) systems include hallucination problems in domain-specific knowledge and the lack of gold standard benchmarks for company internal tasks. This results in difficulties in evaluating RAG variations, like RAG-Fusion (RAGF), in the context of a product QA task at Infineon Technologies. To solve these… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted to LLM4Eval @ SIGIR24

  2. arXiv:2307.04601  [pdf, ps, other

    cs.IR

    InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

    Authors: Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

    Abstract: Recent work has explored Large Language Models (LLMs) to overcome the lack of training data for Information Retrieval (IR) tasks. The generalization abilities of these models have enabled the creation of synthetic in-domain data by providing instructions and a few examples on a prompt. InPars and Promptagator have pioneered this approach and both methods have demonstrated the potential of using LL… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  3. arXiv:2301.01820  [pdf, ps, other

    cs.IR cs.AI

    InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

    Authors: Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

    Abstract: Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents. These synthetic query-document pairs can then be used to train a retriever. However, InPars and, more recently, Promptagator, rely on proprietary LLMs such as GPT-3 and FLAN to generate such dataset… ▽ More

    Submitted 26 May, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

  4. arXiv:2109.12854  [pdf, other

    cs.NI cs.PF

    Quality Control Methodology for Simulation Models of Computer Network Protocols

    Authors: Vladimír Veselý, Jan Zavřel

    Abstract: This paper summarizes know-how about modeling and simulation of computer networking protocols we contributed to the OMNeT++ community. We propose a methodology aiming to set a reliable ground truth for the quality of simulation models of networking protocols. We demonstrate the application of this methodology on our EIGRP source code pull-requested to the INET framework.

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Published in: M. Marek, G. Nardini, V. Vesely (Eds.), Proceedings of the 8th OMNeT++ Community Summit, Virtual Summit, September 8-10, 2021

    Report number: OMNET/2021/09

  5. arXiv:2011.00061  [pdf, other

    cs.CL cs.IR

    A New Neural Search and Insights Platform for Navigating and Organizing AI Research

    Authors: Marzieh Fadaee, Olga Gureenkova, Fernando Rejon Barrera, Carsten Schnober, Wouter Weerkamp, Jakub Zavrel

    Abstract: To provide AI researchers with modern tools for dealing with the explosive growth of the research literature in their field, we introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature. The system provides search at multiple levels of textual granularity, from sentences to aggregations across document… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: Accepted to Workshop on Scholarly Document Processing (SDP) at EMNLP 2020

  6. arXiv:2010.08269  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Effective Distributed Representations for Academic Expert Search

    Authors: Mark Berger, Jakub Zavrel, Paul Groth

    Abstract: Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval. We use the Microsoft Academic Graph dataset and experiment with different configurations of a… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: To be published in the Scholarly Document Processing 2020 Workshop @ EMNLP 2020 proceedings

  7. arXiv:cs/0007018  [pdf, ps, other

    cs.CL

    Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

    Authors: Jakub Zavrel, Walter Daelemans

    Abstract: This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can… ▽ More

    Submitted 13 July, 2000; originally announced July 2000.

    Comments: 4 pages

    ACM Class: I.2.7; I.2.6

    Journal ref: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), pp. 17--20

  8. arXiv:cs/9812021  [pdf, ps, other

    cs.CL cs.LG

    Forgetting Exceptions is Harmful in Language Learning

    Authors: Walter Daelemans, Antal van den Bosch, Jakub Zavrel

    Abstract: We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first s… ▽ More

    Submitted 22 December, 1998; originally announced December 1998.

    Comments: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex styles. Pre-print version of article to appear in Machine Learning 11:1-3, Special Issue on Natural Language Learning. Figures on page 22 slightly compressed to avoid page overload

    ACM Class: I.2.6; I.2.7

  9. Improving Data Driven Wordclass Tagging by System Combination

    Authors: Hans van Halteren, Jakub Zavrel, Walter Daelemans

    Abstract: In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximu… ▽ More

    Submitted 31 July, 1998; originally announced July 1998.

    Comments: 7 pages, LaTeX, uses acl.bst, colacl.sty

    Journal ref: Proceedings of the 17th International Conference on Computational Linguistics (COLING-ACL'98)

  10. Memory-Based Learning: Using Similarity for Smoothing

    Authors: Jakub Zavrel, Walter Daelemans

    Abstract: This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific… ▽ More

    Submitted 12 May, 1997; originally announced May 1997.

    Comments: 8 pages, uses aclap.sty, To appear in Proc. ACL/EACL 97

    Report number: ILK-9702

  11. MBT: A Memory-Based Part of Speech Tagger-Generator

    Authors: Walter Daelemans, Jakub Zavrel, Peter Berck, Steven Gillis

    Abstract: We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger… ▽ More

    Submitted 11 July, 1996; originally announced July 1996.

    Comments: 14 pages, 2 Postscript figures

    Journal ref: Proceedings WVLC, Copenhagen