Search | arXiv e-print repository

MetaFFI -- Multilingual Indirect Interoperability System

Authors: Tsvi Cherny-Shahar, Amiram Yehudai

Abstract: The development of software applications using multiple programming languages has increased in recent years, as it allows the selection of the most suitable language and runtime for each component of the system and the integration of third-party libraries. However, this practice involves complexity and error proneness, due to the absence of an adequate system for the interoperability of multiple p… ▽ More The development of software applications using multiple programming languages has increased in recent years, as it allows the selection of the most suitable language and runtime for each component of the system and the integration of third-party libraries. However, this practice involves complexity and error proneness, due to the absence of an adequate system for the interoperability of multiple programming languages. Developers are compelled to resort to workarounds, such as library reimplementation or language-specific wrappers, which are often dependent on C as the common denominator for interoperability. These challenges render the use of multiple programming languages a burdensome and demanding task that necessitates highly skilled developers for implementation, debugging, and maintenance, and raise doubts about the benefits of interoperability. To overcome these challenges, we propose MetaFFI, a pluggable in-process indirect-interoperability system that allows the loading and utilization of entities from multiple programming languages. This is achieved by exploiting the less restrictive shallow binding mechanisms (e.g., Foreign Function Interface) to offer deep binding features (e.g., object creation, methods, fields). MetaFFI provides a runtime-independent framework to load and \emph{xcall} (Cross-Call) foreign entities (e.g., functions, objects). MetaFFI uses Common Data Types (CDTs) to pass parameters and return values, including objects and complex types, and even cross-language callbacks. The indirect interoperability approach of MetaFFI has the significant advantage of requiring only $2n$ mechanisms to support $n$ languages, as opposed to the direct interoperability approaches that need $n^2$ mechanisms. We have successfully tested the binding between Go, Python3.11, and Java in a proof-of-concept on Windows and Ubuntu. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 46 pages, 3 figures

arXiv:2407.13696 [pdf, other]

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Authors: Yotam Perlitz, Ariel Gera, Ofir Arviv, Asaf Yehudai, Elron Bandel, Eyal Shnarch, Michal Shmueli-Scheuer, Leshem Choshen

Abstract: Recent advancements in Language Models (LMs) have catalyzed the creation of multiple benchmarks, designed to assess these models' general capabilities. A crucial task, however, is assessing the validity of the benchmarks themselves. This is most commonly done via Benchmark Agreement Testing (BAT), where new benchmarks are validated against established ones using some agreement metric (e.g., rank c… ▽ More Recent advancements in Language Models (LMs) have catalyzed the creation of multiple benchmarks, designed to assess these models' general capabilities. A crucial task, however, is assessing the validity of the benchmarks themselves. This is most commonly done via Benchmark Agreement Testing (BAT), where new benchmarks are validated against established ones using some agreement metric (e.g., rank correlation). Despite the crucial role of BAT for benchmark builders and consumers, there are no standardized procedures for such agreement testing. This deficiency can lead to invalid conclusions, fostering mistrust in benchmarks and upending the ability to properly choose the appropriate benchmark to use. By analyzing over 40 prominent benchmarks, we demonstrate how some overlooked methodological choices can significantly influence BAT results, potentially undermining the validity of conclusions. To address these inconsistencies, we propose a set of best practices for BAT and demonstrate how utilizing these methodologies greatly improves BAT robustness and validity. To foster adoption and facilitate future research,, we introduce BenchBench, a python package for BAT, and release the BenchBench-leaderboard, a meta-benchmark designed to evaluate benchmarks using their peers. Our findings underscore the necessity for standardized BAT, ensuring the robustness and validity of benchmark evaluations in the evolving landscape of language model research. BenchBench Package: https://github.com/IBM/BenchBench Leaderboard: https://huggingface.co/spaces/per/BenchBench △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Under Review

arXiv:2406.00787 [pdf, other]

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Authors: Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel Stanovsky

Abstract: Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect n… ▽ More Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect neural machine translation models, by measuring the extrinsic bias of such systems under different design choices. We highlight three challenges and mismatches between the debiasing techniques and their end-goal usage, including the choice of embeddings to debias, the mismatch between words and sub-word tokens debiasing, and the effect on different target languages. We find that these considerations have a significant impact on downstream performance and the success of debiasing. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.14863 [pdf, other]

A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns

Authors: Asaf Yehudai, Taelin Karidi, Gabriel Stanovsky, Ariel Goldstein, Omri Abend

Abstract: Cross-domain alignment refers to the task of mapping a concept from one domain to another. For example, ``If a \textit{doctor} were a \textit{color}, what color would it be?''. This seemingly peculiar task is designed to investigate how people represent concrete and abstract concepts through their mappings between categories and their reasoning processes over those mappings. In this paper, we adap… ▽ More Cross-domain alignment refers to the task of mapping a concept from one domain to another. For example, ``If a \textit{doctor} were a \textit{color}, what color would it be?''. This seemingly peculiar task is designed to investigate how people represent concrete and abstract concepts through their mappings between categories and their reasoning processes over those mappings. In this paper, we adapt this task from cognitive science to evaluate the conceptualization and reasoning abilities of large language models (LLMs) through a behavioral study. We examine several LLMs by prompting them with a cross-domain mapping task and analyzing their responses at both the population and individual levels. Additionally, we assess the models' ability to reason about their predictions by analyzing and categorizing their explanations for these mappings. The results reveal several similarities between humans' and models' mappings and explanations, suggesting that models represent concepts similarly to humans. This similarity is evident not only in the model representation but also in their behavior. Furthermore, the models mostly provide valid explanations and deploy reasoning paths that are similar to those of humans. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: CogSci

arXiv:2404.12365 [pdf, other]

When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

Authors: Asaf Yehudai, Elron Bendel

Abstract: We present FastFit, a method, and a Python package design to provide fast and accurate few-shot classification, especially for scenarios with many semantically similar classes. FastFit utilizes a novel approach integrating batch contrastive learning and token-level similarity score. Compared to existing few-shot learning packages, such as SetFit, Transformers, or few-shot prompting of large langua… ▽ More We present FastFit, a method, and a Python package design to provide fast and accurate few-shot classification, especially for scenarios with many semantically similar classes. FastFit utilizes a novel approach integrating batch contrastive learning and token-level similarity score. Compared to existing few-shot learning packages, such as SetFit, Transformers, or few-shot prompting of large language models via API calls, FastFit significantly improves multiclass classification performance in speed and accuracy across FewMany, our newly curated English benchmark, and Multilingual datasets. FastFit demonstrates a 3-20x improvement in training speed, completing training in just a few seconds. The FastFit package is now available on GitHub and PyPi, presenting a user-friendly solution for NLP practitioners. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL

arXiv:2401.14367 [pdf, other]

Genie: Achieving Human Parity in Content-Grounded Datasets Generation

Authors: Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, Leshem Choshen

Abstract: The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs… ▽ More The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs or summaries). (c) Filtering mechanism aiming to ensure the quality and faithfulness of the generated data. We showcase this methodology by generating three large-scale synthetic data, making wishes, for Long-Form Question-Answering (LFQA), summarization, and information extraction. In a human evaluation, our generated data was found to be natural and of high quality. Furthermore, we compare models trained on our data with models trained on human-written data -- ELI5 and ASQA for LFQA and CNN-DailyMail for Summarization. We show that our models are on par with or outperforming models trained on human-generated data and consistently outperforming them in faithfulness. Finally, we applied our method to create LFQA data within the medical domain and compared a model trained on it with models trained on other domains. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted to ICLR24

arXiv:2303.01593 [pdf, other]

QAID: Question Answering Inspired Few-shot Intent Detection

Authors: Asaf Yehudai, Matan Vetzler, Yosi Mass, Koren Lazar, Doron Cohen, Boaz Carmeli

Abstract: Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage… ▽ More Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage, we improve query representations through self-supervised training. Then, in the fine-tuning stage, we increase contextualized token-level similarity scores between queries and answers from the same intent. Our results on three few-shot intent detection benchmarks achieve state-of-the-art performance. △ Less

Submitted 21 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: ICLR paper

arXiv:2302.08464 [pdf, other]

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Authors: Asaf Yehudai, Arie Cattan, Omri Abend, Gabriel Stanovsky

Abstract: Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT o… ▽ More Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT output and evaluates them without requiring annotations in the target language. We further evaluate several prominent open-source and commercial MT systems, translating from English to six target languages, and compare them to state-of-the-art coreference resolvers on three challenging benchmarks. Our results show that the monolingual resolvers greatly outperform MT models. Motivated by this result, we experiment with different methods for incorporating the output of coreference resolution models in MT, showing improvement over strong baselines. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: EACL paper

arXiv:2210.03053 [pdf, other]

Reinforcement Learning with Large Action Spaces for Neural Machine Translation

Authors: Asaf Yehudai, Leshem Choshen, Lior Fox, Omri Abend

Abstract: Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance. However, recent work has argued that the gains produced by RL for NMT are mostly due to promoting tokens that have already received a fairly high probability in pre-training. We hypothesize that the large action space is a… ▽ More Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance. However, recent work has argued that the gains produced by RL for NMT are mostly due to promoting tokens that have already received a fairly high probability in pre-training. We hypothesize that the large action space is a main obstacle to RL's effectiveness in MT, and conduct two sets of experiments that lend support to our hypothesis. First, we find that reducing the size of the vocabulary improves RL's effectiveness. Second, we find that effectively reducing the dimension of the action space without changing the vocabulary also yields notable improvement as evaluated by BLEU, semantic similarity, and human evaluation. Indeed, by initializing the network's final fully connected layer (that maps the network's internal dimension to the vocabulary dimension), with a layer that generalizes over similar actions, we obtain a substantial improvement in RL performance: 1.5 BLEU points on average. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Accepted for Coling

arXiv:2112.07308 [pdf, other]

Conversational Search with Mixed-Initiative -- Asking Good Clarification Questions backed-up by Passage Retrieval

Authors: Yosi Mass, Doron Cohen, Asaf Yehudai, David Konopnicki

Abstract: We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given the conversation context. Our… ▽ More We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given the conversation context. Our method leverages passage retrieval from a background content to fine-tune two deep-learning models for ranking candidate clarification questions. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup. We show that our method performs well on both use-cases. △ Less

Submitted 23 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2109.04513 [pdf, other]

Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Authors: Koren Lazar, Benny Saret, Asaf Yehudai, Wayne Horowitz, Nathan Wasserman, Gabriel Stanovsky

Abstract: We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked lang… ▽ More We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages. △ Less

Submitted 24 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP 2021 (Main Conference)

arXiv:1910.08908 [pdf, ps, other]

Processing Large Datasets of Fined Grained Source Code Changes

Authors: Stanislav Levin, Amiram Yehudai

Abstract: In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capab… ▽ More In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed. △ Less

Submitted 20 October, 2019; originally announced October 2019.

Comments: Preprint

arXiv:1910.08907 [pdf, other]

Visually Exploring Software Maintenance Activities

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. A line of previous works sought to better understand software maintenance by studying how commits can be classified into three main software maintenance activities. Corrective: fault fixing; Perfective: system improvements; Adaptive: ne… ▽ More Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. A line of previous works sought to better understand software maintenance by studying how commits can be classified into three main software maintenance activities. Corrective: fault fixing; Perfective: system improvements; Adaptive: new feature introduction. In this work we suggest visualizations for exploring software maintenance activities in both project and individual developer scopes. We demonstrate our approach using a prototype we have built using the Shiny R framework. In addition, we have also published our prototype as an online demo. This demo allows users to explore the maintenance activities of a number of popular open source projects. We believe that the visualizations we provide can assist practitioners in monitoring and maintaining the health of software projects. In particular, they can be useful for identifying general imbalances, peaks, deeps and other anomalies in projects' and developers' maintenance activities. △ Less

Submitted 20 October, 2019; originally announced October 2019.

Comments: Preprint

arXiv:1903.04909 [pdf, other]

Towards Software Analytics: Modeling Maintenance Activities

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. Understanding software maintenance can potentially relieve many of the pains currently experienced by practitioners in the industry and assist in reducing uncertainty, improving cost-effectiveness, reliability and more. The research com… ▽ More Lehman's Laws teach us that a software system will become progressively less satisfying to its users over time, unless it is continually adapted to meet new needs. Understanding software maintenance can potentially relieve many of the pains currently experienced by practitioners in the industry and assist in reducing uncertainty, improving cost-effectiveness, reliability and more. The research community classifies software maintenance into 3 main activities: Corrective: fault fixing; Perfective: system improvements; Adaptive: new feature introduction. In this work we seek to model software maintenance activities and design a commit classification method capable of yielding a high quality classification model. We performed a comparative analysis of our method and existing techniques based on 11 popular open source projects from which we had manually classified 1151 commits, over 100 commits from each of the studied projects. The model we devised was able to achieve an accuracy of 76% and Kappa of 63% (considered '"Good" in this context) for the test dataset, an improvement of over 20 percentage points, and a relative improvement of ~40% in the context of cross-project classification. We then leverage our commit classification method to demonstrate two applications: (1) a tool aimed at providing an intuitive visualization of software maintenance activities over time, and (2) an in-depth analysis of the relationship between maintenance activities and unit tests. △ Less

Submitted 9 March, 2019; originally announced March 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.05340

arXiv:1711.05340 [pdf, other]

Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Background: Understanding maintenance activities performed in a source code repository could help practitioners reduce uncertainty and improve cost-effectiveness by planning ahead and pre-allocating resources towards source code maintenance. The research community uses 3 main classification categories for maintenance activities: Corrective: fault fixing; Perfective: system improvements; Adaptive:… ▽ More Background: Understanding maintenance activities performed in a source code repository could help practitioners reduce uncertainty and improve cost-effectiveness by planning ahead and pre-allocating resources towards source code maintenance. The research community uses 3 main classification categories for maintenance activities: Corrective: fault fixing; Perfective: system improvements; Adaptive: new feature introduction. Previous work in this area has mostly concentrated on evaluating commit classification (into maintenance activities) models in the scope of a single software project. Aims: In this work we seek to design a commit classification model capable of providing high accuracy and Kappa across different projects. In addition, we wish to compare the accuracy and kappa characteristics of classification models that utilize word frequency analysis, source code changes, and combination thereof. Method: We suggest a novel method for automatically classifying commits into maintenance activities by utilizing source code changes (e.g, statement added, method removed, etc.). The results we report are based on studying 11 popular open source projects from various professional domains from which we had manually classified 1151 commits, over 100 from each of the studied projects. Our models were trained using 85% of the dataset, while the remaining 15% were used as a test set. Results: Our method shows a promising accuracy of 76% and Cohen's kappa of 63% (considered "Good" in this context) for the test dataset, an improvement of over 20 percentage points, and a relative boost of ~40% in the context of cross-project classification. Conclusions: We show that by using source code changes in combination with commit message word frequency analysis we are able to considerably boost classification quality in a project agnostic manner. △ Less

Submitted 14 November, 2017; originally announced November 2017.

Comments: postprint, PROMISE 2017

arXiv:1709.09029 [pdf, other]

The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Automatic testing is a widely adopted technique for improving software quality. Software developers add, remove and update test methods and test classes as part of the software development process as well as during the evolution phase, following the initial release. In this work we conduct a large scale study of 61 popular open source projects and report the relationships we have established betwe… ▽ More Automatic testing is a widely adopted technique for improving software quality. Software developers add, remove and update test methods and test classes as part of the software development process as well as during the evolution phase, following the initial release. In this work we conduct a large scale study of 61 popular open source projects and report the relationships we have established between test maintenance, production code maintenance, and semantic changes (e.g, statement added, method removed, etc.). performed in developers' commits. We build predictive models, and show that the number of tests in a software project can be well predicted by employing code maintenance profiles (i.e., how many commits were performed in each of the maintenance activities: corrective, perfective, adaptive). Our findings also reveal that more often than not, developers perform code fixes without performing complementary test maintenance in the same commit (e.g., update an existing test or add a new one). When developers do perform test maintenance, it is likely to be affected by the semantic changes they perform as part of their commit. Our work is based on studying 61 popular open source projects, comprised of over 240,000 commits consisting of over 16,000,000 semantic change type instances, performed by over 4,000 software engineers. △ Less

Submitted 26 September, 2017; originally announced September 2017.

Comments: postprint, ICSME 2017

arXiv:1611.10053 [pdf, other]

Using Temporal and Semantic Developer-Level Information to Predict Maintenance Activity Profiles

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each… ▽ More Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each and every developer for a large number of source code repositories, we have built a designated repository mining platform. This platform was used to create a metrics dataset based on processing nearly 1000 highly popular open source GitHub repositories, consisting of 147 million LOC, and maintained by 30,000 developers. The computed metrics were then employed to predict the corrective, perfective, and adaptive maintenance activity profiles identified in previous works. Our results show both strong correlation and promising predictive power with R-squared values of 0.83, 0.64, and 0.75. We also show how these results may help project managers to detect anomalies in the development process and to build better development teams. In addition, the platform we built has the potential to yield further predictive models leveraging developer-level metrics at scale. △ Less

Submitted 30 November, 2016; originally announced November 2016.

Comments: Postprint, ICSME 2016 proceedings

arXiv:1508.01872 [pdf, other]

Alleviating Merge Conflicts with Fine-grained Visual Awareness

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Merge conflicts created by software team members working on the same code can be costly to resolve, and adversely affect productivity. In this work, we suggest the approach of fine-grained merge conflict awareness, where software team members are notified of potential merge conflicts via graphical decoration of the relevant semantic elements, in near real-time. The novelty of this approach is that… ▽ More Merge conflicts created by software team members working on the same code can be costly to resolve, and adversely affect productivity. In this work, we suggest the approach of fine-grained merge conflict awareness, where software team members are notified of potential merge conflicts via graphical decoration of the relevant semantic elements, in near real-time. The novelty of this approach is that it allows software developers to pinpoint the element in conflict, such as a method's body, parameter, return value, and so on, promoting communication about conflicting changes soon after they take place and on a semantic level. We have also conducted a preliminary qualitative evaluation of our approach, the results of which we report in this paper. △ Less

Submitted 8 August, 2015; originally announced August 2015.

arXiv:1505.01286 [pdf, other]

Localization of real world regression Bugs using single execution

Authors: Dekel Cohen, Amiram Yehudai

Abstract: Regression bugs occur whenever software functionality that previously worked as desired stops working, or no longer works as expected. Code changes, such as bug fixes or new feature work, may result in a regression bug. Regression bugs are an annoying and painful phenomena in the software development process, requiring a great deal of effort to localize, effectively hindering team progress. In thi… ▽ More Regression bugs occur whenever software functionality that previously worked as desired stops working, or no longer works as expected. Code changes, such as bug fixes or new feature work, may result in a regression bug. Regression bugs are an annoying and painful phenomena in the software development process, requiring a great deal of effort to localize, effectively hindering team progress. In this paper we present Regression Detective, a method which assists the developer locating source code segments that caused a given regression bug. Unlike some of the existing tools, our approach doesn't require an automated test suite or executing past versions of the system. It is highly scalable to millions of loc systems. The developer, who has no prior knowledge of the code or the bug, reproduces the bug according to the steps described in the bug database. We evaluated our approach with bugs from leading open source projects (Eclipse, Tomcat, Ant). In over 90% of the cases, the developer only has to examine 10-20 lines of code in order to locate the bug, regardless of the code base size. △ Less

Submitted 6 May, 2015; originally announced May 2015.

arXiv:1504.06742 [pdf, other]

Improving software team collaboration with Synchronized Software Development

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Effective collaboration is a key factor in the success of a software project developed by a team. In this work, we suggest the approach of Synchronized Software Development (SSD), which promotes a new mechanism of collaboration in general, and for code synchronization in particular. In SSD, code changes made by one developer are automatically propagated to others as long as they keep the code free… ▽ More Effective collaboration is a key factor in the success of a software project developed by a team. In this work, we suggest the approach of Synchronized Software Development (SSD), which promotes a new mechanism of collaboration in general, and for code synchronization in particular. In SSD, code changes made by one developer are automatically propagated to others as long as they keep the code free of compilation errors. Changes that introduce compilation errors are not propagated until the errors are fixed. Moreover, other developers are restricted from concurrently editing the entities involved in these changes. While in this state, developers are, however, free to modify the rest of the entities. The novelty of our approach is that it actively synchronizes developers with the latest error free version of the source code, preventing possible conflicts and merges that may arise due to concurrent changes made by fellow team members. SSD also allows for a more transparent an practically near real time awareness of new code that is being introduced by multiple developers. We built CSI (Code Synchronizing Intelligence), a prototype demonstrating key features of SSD. △ Less

Submitted 28 April, 2015; v1 submitted 25 April, 2015; originally announced April 2015.

Comments: This paper was written on 2012, added a footnote acknowledging ISF's support. arXiv admin note: text overlap with arXiv:1504.06741

arXiv:1504.06741 [pdf, other]

Collaborative Real Time Coding or How to Avoid the Dreaded Merge

Authors: Stanislav Levin, Amiram Yehudai

Abstract: Software engineers who collaborate to develop software in teams often have to manually merge changes they made to a module (e.g. a class), because the change conflicts with one that has just been made by another engineer to the same or another module (e.g. a supplier class). This is due to the fact that engineers edit code separately, and loosely coordinate their work via a source control or a sof… ▽ More Software engineers who collaborate to develop software in teams often have to manually merge changes they made to a module (e.g. a class), because the change conflicts with one that has just been made by another engineer to the same or another module (e.g. a supplier class). This is due to the fact that engineers edit code separately, and loosely coordinate their work via a source control or a software configuration management system (SCM). This work proposes to eliminate almost all the need to manually merge a recent change, by proposing a Collaborative Real Time Coding approach. In this approach, valid changes to the code are seen by others in real time, but intermediate changes (that cause the code not to compile) result in blocking other engineers from making changes related to the entity (e.g. method) being modified, while allowing them to work on most of the system. The subject of collaborative real time editing systems has been studied for the past 20 years. Research in this field has mostly concentrated on collaborative textual and graphical editing. In this work we address the challenges involved in designing a collaborative real time coding system, as well as present the major differences when compared to collaborative editing of plain text. We then present a prototype plug in for the Eclipse Integrated Development Environment (IDE) that allows for a collaborative coding to take place. △ Less

Submitted 25 April, 2015; originally announced April 2015.

Comments: This paper was written on 2011

arXiv:1409.0982 [pdf, ps, other]

Taming the Concurrency: Controlling Concurrent Behavior while Testing Multithreaded Software

Authors: Evgeny Vainer, Amiram Yehudai

Abstract: Developing multithreaded software is an extremely challenging task, even for experienced programmers. The challenge does not end after the code is written. There are other tasks associated with a development process that become exceptionally hard in a multithreaded environment. A good example of this is creating unit tests for concurrent data structures. In addition to the desired test logic, such… ▽ More Developing multithreaded software is an extremely challenging task, even for experienced programmers. The challenge does not end after the code is written. There are other tasks associated with a development process that become exceptionally hard in a multithreaded environment. A good example of this is creating unit tests for concurrent data structures. In addition to the desired test logic, such a test contains plenty of synchronization code that makes it hard to understand and maintain. In our work we propose a novel approach for specifying and executing schedules for multithreaded tests. It allows explicit specification of desired thread scheduling for some unit test and enforces it during the test execution, giving the developer an ability to construct deterministic and repeatable unit tests. This goal is achieved by combining a few basic tools available in every modern runtime/IDE and does not require dedicated runtime environment, new specification language or code under test modifications. △ Less

Submitted 3 September, 2014; originally announced September 2014.

ACM Class: D.2.5; D.3.3

Showing 1–22 of 22 results for author: Yehudai, A