-
Retrieving and Ranking Relevant JavaScript Technologies from Web Repositories
Authors:
Hernan C. Vazquez,
J. Andres Diaz Pace,
Claudia Marcos,
Santiago Vidal
Abstract:
The selection of software technologies is an important but complex task. We consider developers of JavaScript (JS) applications, for whom the assessment of JS libraries has become difficult and time-consuming due to the growing number of technology options available. A common strategy is to browse software repositories via search engines (e.g., NPM, or Google), although it brings some problems. Fi…
▽ More
The selection of software technologies is an important but complex task. We consider developers of JavaScript (JS) applications, for whom the assessment of JS libraries has become difficult and time-consuming due to the growing number of technology options available. A common strategy is to browse software repositories via search engines (e.g., NPM, or Google), although it brings some problems. First, given a technology need, the engines might return a long list of results, which often causes information overload issues. Second, the results should be ranked according to criteria of interest for the developer. However, deciding how to weight these criteria to make a decision is not straightforward. In this work, we propose a two-phase approach for assisting developers to retrieve and rank JS technologies in a semi-automated fashion. The first-phase (ST-Retrieval) uses a meta-search technique for collecting JS technologies that meet the developer's needs. The second-phase (called ST-Rank), relies on a machine learning technique to infer, based on criteria used by other projects in the Web, a ranking of the output of ST-Retrieval. We evaluated our approach with NPM and obtained satisfactory results in terms of the accuracy of the technologies retrieved and the order in which they were ranked.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
A Systematic Mapping Study of Empirical Studies performed with Collections of Software Projects
Authors:
Juan Andres Carruthers,
Jorge Andres Diaz Pace,
Emanuel Agustin Irrazabal
Abstract:
Context: software projects are common resources in Software Engineering experiments, although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarantee their updating over a long period of t…
▽ More
Context: software projects are common resources in Software Engineering experiments, although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarantee their updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Learning Diverse Representations for Fast Adaptation to Distribution Shift
Authors:
Daniel Pace,
Alessandra Russo,
Murray Shanahan
Abstract:
The i.i.d. assumption is a useful idealization that underpins many successful approaches to supervised machine learning. However, its violation can lead to models that learn to exploit spurious correlations in the training data, rendering them vulnerable to adversarial interventions, undermining their reliability, and limiting their practical application. To mitigate this problem, we present a met…
▽ More
The i.i.d. assumption is a useful idealization that underpins many successful approaches to supervised machine learning. However, its violation can lead to models that learn to exploit spurious correlations in the training data, rendering them vulnerable to adversarial interventions, undermining their reliability, and limiting their practical application. To mitigate this problem, we present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We propose a notion of diversity based on minimizing the conditional total correlation of final layer representations across models given the label, which we approximate using a variational estimator and minimize using adversarial training. To demonstrate our framework's ability to facilitate rapid adaptation to distribution shift, we train a number of simple classifiers from scratch on the frozen outputs of our models using a small amount of data from the shifted distribution. Under this evaluation protocol, our framework significantly outperforms a baseline trained using the empirical risk minimization principle.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Iterative Segmentation from Limited Training Data: Applications to Congenital Heart Disease
Authors:
Danielle F. Pace,
Adrian V. Dalca,
Tom Brosch,
Tal Geva,
Andrew J. Powell,
Jürgen Weese,
Mehdi H. Moghari,
Polina Golland
Abstract:
We propose a new iterative segmentation model which can be accurately learned from a small dataset. A common approach is to train a model to directly segment an image, requiring a large collection of manually annotated images to capture the anatomical variability in a cohort. In contrast, we develop a segmentation model that recursively evolves a segmentation in several steps, and implement it as…
▽ More
We propose a new iterative segmentation model which can be accurately learned from a small dataset. A common approach is to train a model to directly segment an image, requiring a large collection of manually annotated images to capture the anatomical variability in a cohort. In contrast, we develop a segmentation model that recursively evolves a segmentation in several steps, and implement it as a recurrent neural network. We learn model parameters by optimizing the interme- diate steps of the evolution in addition to the final segmentation. To this end, we train our segmentation propagation model by presenting incom- plete and/or inaccurate input segmentations paired with a recommended next step. Our work aims to alleviate challenges in segmenting heart structures from cardiac MRI for patients with congenital heart disease (CHD), which encompasses a range of morphological deformations and topological changes. We demonstrate the advantages of this approach on a dataset of 20 images from CHD patients, learning a model that accurately segments individual heart chambers and great vessels. Com- pared to direct segmentation, the iterative method yields more accurate segmentation for patients with the most severe CHD malformations.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.