Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Smrz, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01930  [pdf, other

    cs.CL

    OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

    Authors: Martin Docekal, Martin Fajcik, Pavel Smrz

    Abstract: This paper introduces OARelatedWork, the first large-scale multi-document summarization dataset for related work generation containing whole related work sections and full-texts of cited papers. The dataset includes 94 450 papers and 5 824 689 unique referenced papers. It was designed for the task of automatically generating related work to shift the field toward generating entire related work sec… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2306.08464  [pdf, other

    cs.RO cs.HC

    ARCOR2: Framework for Collaborative End-User Management of Industrial Robotic Workplaces using Augmented Reality

    Authors: Michal Kapinus, Zdeněk Materna, Daniel Bambušek, Vítězslav Beran, Pavel Smrž

    Abstract: This paper presents a novel framework enabling end-users to perform the management of complex robotic workplaces using a tablet and augmented reality. The framework allows users to commission the workplace comprising different types of robots, machines, or services irrespective of the vendor, set task-important points in space, specify program steps, generate a code, and control its execution. Mor… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Submitted to Journal of Intelligent and Robotic Systems

  3. IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

    Authors: Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek

    Abstract: In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction appr… ▽ More

    Submitted 14 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: To be published in CASE@EMNLP 2022 (5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text)

    Journal ref: CASE @ EMNLP 2022

  4. arXiv:2209.03891  [pdf, other

    cs.CL cs.AI

    IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

    Authors: Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz

    Abstract: In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cau… ▽ More

    Submitted 20 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Camera-ready for CASE@EMNLP

  5. arXiv:2207.14116  [pdf, other

    cs.CL cs.AI

    Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

    Authors: Martin Fajcik, Petr Motlicek, Pavel Smrz

    Abstract: We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the… ▽ More

    Submitted 7 August, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: updated acknowledgement

  6. Query-Based Keyphrase Extraction from Long Documents

    Authors: Martin Docekal, Pavel Smrz

    Abstract: Transformer-based architectures in natural language processing force input size limits that can be problematic when long documents need to be processed. This paper overcomes this issue for keyphrase extraction by chunking the long documents while keeping a global context as a query defining the topic for which relevant keyphrases should be extracted. The developed system employs a pre-trained BERT… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Journal ref: The International FLAIRS Conference Proceedings. 35, (May 2022)

  7. arXiv:2109.03502  [pdf, other

    cs.CL cs.IR cs.LG

    R2-D2: A Modular Baseline for Open-Domain Question Answering

    Authors: Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz

    Abstract: This work presents a novel four-stage open-domain QA pipeline R2-D2 (Rank twice, reaD twice). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system's components. We demonstrate its strength across three open-domain QA datasets: NaturalQuestions, TriviaQA and EfficientQA, surpassing state-… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted to Findings of EMNLP'21. arXiv admin note: substantial text overlap with arXiv:2102.10697

  8. arXiv:2102.10697  [pdf, other

    cs.CL cs.AI cs.LG

    Pruning the Index Contents for Memory Efficient Open-Domain QA

    Authors: Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz

    Abstract: This work presents a novel pipeline that demonstrates what is achievable with a combined effort of state-of-the-art approaches. Specifically, it proposes the novel R2-D2 (Rank twice, reaD twice) pipeline composed of retriever, passage reranker, extractive reader, generative reader and a simple way to combine them. Furthermore, previous work often comes with a massive index of external documents th… ▽ More

    Submitted 9 April, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

    Comments: v2 - added connection between pruner and DPR, results on TriviaQA, new reranker, results with HN-DPR checkpoint and additional analyses

  9. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  10. arXiv:2008.12804  [pdf, other

    cs.CL cs.AI cs.LG

    Rethinking the Objectives of Extractive Question Answering

    Authors: Martin Fajcik, Josef Jon, Pavel Smrz

    Abstract: This work demonstrates that using the objective with independence assumption for modelling the span probability $P(a_s,a_e) = P(a_s)P(a_e)$ of span starting at position $a_s$ and ending at position $a_e$ has adverse effects. Therefore we propose multiple approaches to modelling joint probability $P(a_s,a_e)$ directly. Among those, we propose a compound objective, composed from the joint probabilit… ▽ More

    Submitted 12 October, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: camera-ready version accepted to MRQA'21

  11. arXiv:2008.11053  [pdf, ps, other

    cs.CL

    JokeMeter at SemEval-2020 Task 7: Convolutional humor

    Authors: Martin Docekal, Martin Fajcik, Josef Jon, Pavel Smrz

    Abstract: This paper describes our system that was designed for Humor evaluation within the SemEval-2020 Task 7. The system is based on convolutional neural network architecture. We investigate the system on the official dataset, and we provide more insight to model itself to see how the learned inner features look.

    Submitted 25 August, 2020; originally announced August 2020.

  12. arXiv:2008.07259  [pdf, ps, other

    cs.CL

    BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense

    Authors: Josef Jon, Martin Fajčík, Martin Dočekal, Pavel Smrž

    Abstract: This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation. We participated in all three subtasks. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine transl… ▽ More

    Submitted 21 August, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  13. arXiv:2007.14128  [pdf, other

    cs.CL cs.LG stat.ML

    BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models

    Authors: Martin Fajcik, Josef Jon, Martin Docekal, Pavel Smrz

    Abstract: This paper describes BUT-FIT's submission at SemEval-2020 Task 5: Modelling Causal Reasoning in Language: Detecting Counterfactuals. The challenge focused on detecting whether a given statement contains a counterfactual (Subtask 1) and extracting both antecedent and consequent parts of the counterfactual from the text (Subtask 2). We experimented with various state-of-the-art language representati… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  14. arXiv:1902.10126  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers

    Authors: Martin Fajcik, Lukáš Burget, Pavel Smrz

    Abstract: This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019). The challenge focused on classifying whether posts from Twitter and Reddit support, deny, query, or comment a hidden rumour, truthfulness of which is the topic of an underlying discussion thread. We formulate the problem as a stan… ▽ More

    Submitted 21 March, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: This work has been submitted to NAACL SemEval workshop. Work in progress

    Journal ref: Proceedings of the 13th International Workshop on Semantic Evaluation 13 (2019) 1097-1104

  15. arXiv:1803.09810  [pdf, other

    cs.OH

    Automation of Processor Verification Using Recurrent Neural Networks

    Authors: Martin Fajcik, Marcela Zachariasova, Pavel Smrz

    Abstract: When considering simulation-based verification of processors, the current trend is to generate stimuli using pseudorandom generators (PRGs), apply them to the processor inputs and monitor the achieved coverage of its functionality in order to determine verification completeness. Stimuli can have different forms, for example, they can be represented by bit vectors applied to the input ports of the… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

    Comments: Paper contains 6 pages, 6 figures. Presented on MTVCon 2017. Soon to be released by IEEE