Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Yehudai, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14175  [pdf, other

    cs.PL

    MetaFFI -- Multilingual Indirect Interoperability System

    Authors: Tsvi Cherny-Shahar, Amiram Yehudai

    Abstract: The development of software applications using multiple programming languages has increased in recent years, as it allows the selection of the most suitable language and runtime for each component of the system and the integration of third-party libraries. However, this practice involves complexity and error proneness, due to the absence of an adequate system for the interoperability of multiple p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 46 pages, 3 figures

  2. arXiv:2407.13696  [pdf, other

    cs.CL

    Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

    Authors: Yotam Perlitz, Ariel Gera, Ofir Arviv, Asaf Yehudai, Elron Bandel, Eyal Shnarch, Michal Shmueli-Scheuer, Leshem Choshen

    Abstract: Recent advancements in Language Models (LMs) have catalyzed the creation of multiple benchmarks, designed to assess these models' general capabilities. A crucial task, however, is assessing the validity of the benchmarks themselves. This is most commonly done via Benchmark Agreement Testing (BAT), where new benchmarks are validated against established ones using some agreement metric (e.g., rank c… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Under Review

  3. arXiv:2406.00787  [pdf, other

    cs.CL

    Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

    Authors: Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel Stanovsky

    Abstract: Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect n… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  4. arXiv:2405.14863  [pdf, other

    cs.CL cs.AI cs.LG

    A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns

    Authors: Asaf Yehudai, Taelin Karidi, Gabriel Stanovsky, Ariel Goldstein, Omri Abend

    Abstract: Cross-domain alignment refers to the task of mapping a concept from one domain to another. For example, ``If a \textit{doctor} were a \textit{color}, what color would it be?''. This seemingly peculiar task is designed to investigate how people represent concrete and abstract concepts through their mappings between categories and their reasoning processes over those mappings. In this paper, we adap… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: CogSci

  5. arXiv:2404.12365  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

    Authors: Asaf Yehudai, Elron Bendel

    Abstract: We present FastFit, a method, and a Python package design to provide fast and accurate few-shot classification, especially for scenarios with many semantically similar classes. FastFit utilizes a novel approach integrating batch contrastive learning and token-level similarity score. Compared to existing few-shot learning packages, such as SetFit, Transformers, or few-shot prompting of large langua… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL

  6. arXiv:2401.14367  [pdf, other

    cs.CL cs.AI cs.LG

    Genie: Achieving Human Parity in Content-Grounded Datasets Generation

    Authors: Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, Leshem Choshen

    Abstract: The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR24

  7. arXiv:2303.01593  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    QAID: Question Answering Inspired Few-shot Intent Detection

    Authors: Asaf Yehudai, Matan Vetzler, Yosi Mass, Koren Lazar, Doron Cohen, Boaz Carmeli

    Abstract: Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage… ▽ More

    Submitted 21 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: ICLR paper

  8. arXiv:2302.08464  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating and Improving the Coreference Capabilities of Machine Translation Models

    Authors: Asaf Yehudai, Arie Cattan, Omri Abend, Gabriel Stanovsky

    Abstract: Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT o… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: EACL paper

  9. arXiv:2210.03053  [pdf, other

    cs.CL cs.AI cs.LG

    Reinforcement Learning with Large Action Spaces for Neural Machine Translation

    Authors: Asaf Yehudai, Leshem Choshen, Lior Fox, Omri Abend

    Abstract: Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance. However, recent work has argued that the gains produced by RL for NMT are mostly due to promoting tokens that have already received a fairly high probability in pre-training. We hypothesize that the large action space is a… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted for Coling

  10. arXiv:2112.07308  [pdf, other

    cs.CL

    Conversational Search with Mixed-Initiative -- Asking Good Clarification Questions backed-up by Passage Retrieval

    Authors: Yosi Mass, Doron Cohen, Asaf Yehudai, David Konopnicki

    Abstract: We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given the conversation context. Our… ▽ More

    Submitted 23 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

  11. arXiv:2109.04513  [pdf, other

    cs.CL

    Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

    Authors: Koren Lazar, Benny Saret, Asaf Yehudai, Wayne Horowitz, Nathan Wasserman, Gabriel Stanovsky

    Abstract: We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked lang… ▽ More

    Submitted 24 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021 (Main Conference)

  12. arXiv:1910.08908  [pdf, ps, other

    cs.SE

    Processing Large Datasets of Fined Grained Source Code Changes

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capab… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Preprint

  13. arXiv:1910.08907  [pdf, other

    cs.SE

    Visually Exploring Software Maintenance Activities

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. A line of previous works sought to better understand software maintenance by studying how commits can be classified into three main software maintenance activities. Corrective: fault fixing; Perfective: system improvements; Adaptive: ne… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Preprint

  14. arXiv:1903.04909  [pdf, other

    cs.SE

    Towards Software Analytics: Modeling Maintenance Activities

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. Understanding software maintenance can potentially relieve many of the pains currently experienced by practitioners in the industry and assist in reducing uncertainty, improving cost-effectiveness, reliability and more. The research com… ▽ More

    Submitted 9 March, 2019; originally announced March 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1711.05340

  15. arXiv:1711.05340  [pdf, other

    cs.SE

    Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Background: Understanding maintenance activities performed in a source code repository could help practitioners reduce uncertainty and improve cost-effectiveness by planning ahead and pre-allocating resources towards source code maintenance. The research community uses 3 main classification categories for maintenance activities: Corrective: fault fixing; Perfective: system improvements; Adaptive:… ▽ More

    Submitted 14 November, 2017; originally announced November 2017.

    Comments: postprint, PROMISE 2017

  16. arXiv:1709.09029  [pdf, other

    cs.SE

    The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Automatic testing is a widely adopted technique for improving software quality. Software developers add, remove and update test methods and test classes as part of the software development process as well as during the evolution phase, following the initial release. In this work we conduct a large scale study of 61 popular open source projects and report the relationships we have established betwe… ▽ More

    Submitted 26 September, 2017; originally announced September 2017.

    Comments: postprint, ICSME 2017

  17. arXiv:1611.10053  [pdf, other

    cs.SE

    Using Temporal and Semantic Developer-Level Information to Predict Maintenance Activity Profiles

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each… ▽ More

    Submitted 30 November, 2016; originally announced November 2016.

    Comments: Postprint, ICSME 2016 proceedings

  18. arXiv:1508.01872  [pdf, other

    cs.SE

    Alleviating Merge Conflicts with Fine-grained Visual Awareness

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Merge conflicts created by software team members working on the same code can be costly to resolve, and adversely affect productivity. In this work, we suggest the approach of fine-grained merge conflict awareness, where software team members are notified of potential merge conflicts via graphical decoration of the relevant semantic elements, in near real-time. The novelty of this approach is that… ▽ More

    Submitted 8 August, 2015; originally announced August 2015.

  19. arXiv:1505.01286  [pdf, other

    cs.SE

    Localization of real world regression Bugs using single execution

    Authors: Dekel Cohen, Amiram Yehudai

    Abstract: Regression bugs occur whenever software functionality that previously worked as desired stops working, or no longer works as expected. Code changes, such as bug fixes or new feature work, may result in a regression bug. Regression bugs are an annoying and painful phenomena in the software development process, requiring a great deal of effort to localize, effectively hindering team progress. In thi… ▽ More

    Submitted 6 May, 2015; originally announced May 2015.

  20. arXiv:1504.06742  [pdf, other

    cs.SE

    Improving software team collaboration with Synchronized Software Development

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Effective collaboration is a key factor in the success of a software project developed by a team. In this work, we suggest the approach of Synchronized Software Development (SSD), which promotes a new mechanism of collaboration in general, and for code synchronization in particular. In SSD, code changes made by one developer are automatically propagated to others as long as they keep the code free… ▽ More

    Submitted 28 April, 2015; v1 submitted 25 April, 2015; originally announced April 2015.

    Comments: This paper was written on 2012, added a footnote acknowledging ISF's support. arXiv admin note: text overlap with arXiv:1504.06741

  21. arXiv:1504.06741  [pdf, other

    cs.SE

    Collaborative Real Time Coding or How to Avoid the Dreaded Merge

    Authors: Stanislav Levin, Amiram Yehudai

    Abstract: Software engineers who collaborate to develop software in teams often have to manually merge changes they made to a module (e.g. a class), because the change conflicts with one that has just been made by another engineer to the same or another module (e.g. a supplier class). This is due to the fact that engineers edit code separately, and loosely coordinate their work via a source control or a sof… ▽ More

    Submitted 25 April, 2015; originally announced April 2015.

    Comments: This paper was written on 2011

  22. arXiv:1409.0982  [pdf, ps, other

    cs.SE

    Taming the Concurrency: Controlling Concurrent Behavior while Testing Multithreaded Software

    Authors: Evgeny Vainer, Amiram Yehudai

    Abstract: Developing multithreaded software is an extremely challenging task, even for experienced programmers. The challenge does not end after the code is written. There are other tasks associated with a development process that become exceptionally hard in a multithreaded environment. A good example of this is creating unit tests for concurrent data structures. In addition to the desired test logic, such… ▽ More

    Submitted 3 September, 2014; originally announced September 2014.

    ACM Class: D.2.5; D.3.3