Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Perlitz, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12259  [pdf, other

    cs.AI

    Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

    Authors: Ora Nova Fandina, Leshem Choshen, Eitan Farchi, George Kour, Yotam Perlitz, Orna Raz

    Abstract: Consider a scenario where a harmfulness detection metric is employed by a system to filter unsafe responses generated by a Large Language Model. When analyzing individual harmful and unethical prompt-response pairs, the metric correctly classifies each pair as highly unsafe, assigning the highest score. However, when these same prompts and responses are concatenated, the metric's decision flips, a… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    MSC Class: 68T50

  2. arXiv:2407.13696  [pdf, other

    cs.CL

    Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

    Authors: Yotam Perlitz, Ariel Gera, Ofir Arviv, Asaf Yehudai, Elron Bandel, Eyal Shnarch, Michal Shmueli-Scheuer, Leshem Choshen

    Abstract: Recent advancements in Language Models (LMs) have catalyzed the creation of multiple benchmarks, designed to assess these models' general capabilities. A crucial task, however, is assessing the validity of the benchmarks themselves. This is most commonly done via Benchmark Agreement Testing (BAT), where new benchmarks are validated against established ones using some agreement metric (e.g., rank c… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Under Review

  3. arXiv:2404.18923  [pdf, other

    cs.CL

    Holmes: Benchmark the Linguistic Competence of Language Models

    Authors: Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

    Abstract: We introduce Holmes, a benchmark to assess the linguistic competence of language models (LMs) - their ability to grasp linguistic phenomena. Unlike prior prompting-based evaluations, Holmes assesses the linguistic competence of LMs via their internal representations using classifier-based probing. In doing so, we disentangle specific phenomena (e.g., part-of-speech of words) from other cognitive a… ▽ More

    Submitted 22 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2401.14019  [pdf, other

    cs.CL cs.AI

    Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

    Authors: Elron Bandel, Yotam Perlitz, Elad Venezian, Roni Friedman-Melamed, Ofir Arviv, Matan Orbach, Shachar Don-Yehyia, Dafna Sheinwald, Ariel Gera, Leshem Choshen, Michal Shmueli-Scheuer, Yoav Katz

    Abstract: In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations. The escalating complexity, involving system prompts, model-specific formats, instructions, and more, calls for a shift to a structured, modular, and customizable solution. Addressing this need, we p… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Submitted to NAACL demo track

  5. arXiv:2308.11696  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Efficient Benchmarking of Language Models

    Authors: Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen

    Abstract: The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to thousands of GPU hours per model. However, the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work, we present t… ▽ More

    Submitted 1 April, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted to NAACL main track

  6. arXiv:2305.15040  [pdf, other

    cs.CL

    Active Learning for Natural Language Generation

    Authors: Yotam Perlitz, Ariel Gera, Michal Shmueli-Scheuer, Dafna Sheinwald, Noam Slonim, Liat Ein-Dor

    Abstract: The field of Natural Language Generation (NLG) suffers from a severe shortage of labeled data due to the extremely expensive and time-consuming process involved in manual annotation. A natural approach for coping with this problem is active learning (AL), a well-known machine learning technique for improving annotation efficiency by selectively choosing the most informative examples to label. Howe… ▽ More

    Submitted 17 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP2023 as a long paper

  7. arXiv:2211.04417  [pdf, other

    cs.CL

    nBIIG: A Neural BI Insights Generation System for Table Reporting

    Authors: Yotam Perlitz, Dafna Sheinwald, Noam Slonim, Michal Shmueli-Scheuer

    Abstract: We present nBIIG, a neural Business Intelligence (BI) Insights Generation system. Given a table, our system applies various analyses to create corresponding RDF representations, and then uses a neural model to generate fluent textual insights out of these representations. The generated insights can be used by an analyst, via a human-in-the-loop paradigm, to enhance the task of creating compelling… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI-23

  8. arXiv:2210.17541  [pdf, other

    cs.CL cs.LG

    Zero-Shot Text Classification with Self-Training

    Authors: Ariel Gera, Alon Halfon, Eyal Shnarch, Yotam Perlitz, Liat Ein-Dor, Noam Slonim

    Abstract: Recent advances in large pretrained language models have increased attention to zero-shot text classification. In particular, models finetuned on natural language inference datasets have been widely adopted as zero-shot classifiers due to their promising results and off-the-shelf availability. However, the fact that such models are unfamiliar with the target task can lead to instability and perfor… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: 9 pages, 5 figures; To be published in EMNLP 2022

  9. arXiv:2205.10938  [pdf, other

    cs.CL

    Diversity Enhanced Table-to-Text Generation via Type Control

    Authors: Yotam Perlitz, Liat Ein-Dor, Dafna Sheinwald, Noam Slonim, Michal Shmueli-Scheuer

    Abstract: Generating natural language statements to convey logical inferences from tabular data (i.e., Logical NLG) is a process with one input and a variety of valid outputs. This characteristic underscores the need for a method to produce a diverse set of valid outputs, presenting different perspectives of the input data. We propose a simple yet effective diversity-enhancing scheme that builds upon an inh… ▽ More

    Submitted 30 May, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: 4 pages, 4 figures

  10. arXiv:2107.10050  [pdf, ps, other

    cs.CV

    You Better Look Twice: a new perspective for designing accurate detectors with reduced computations

    Authors: Alexandra Dana, Maor Shutman, Yotam Perlitz, Ran Vitek, Tomer Peleg, Roy J Jevnisek

    Abstract: General object detectors use powerful backbones that uniformly extract features from images for enabling detection of a vast amount of object types. However, utilization of such backbones in object detection applications developed for specific object types can unnecessarily over-process an extensive amount of background. In addition, they are agnostic to object scales, thus redundantly process all… ▽ More

    Submitted 3 August, 2021; v1 submitted 21 July, 2021; originally announced July 2021.