Search | arXiv e-print repository

Assessing Python Style Guides: An Eye-Tracking Study with Novice Developers

Authors: Pablo Roberto, Rohit Gheyi, José Aldo Silva da Costa, Márcio Ribeiro

Abstract: The incorporation and adaptation of style guides play an essential role in software development, influencing code formatting, naming conventions, and structure to enhance readability and simplify maintenance. However, many of these guides often lack empirical studies to validate their recommendations. Previous studies have examined the impact of code styles on developer performance, concluding tha… ▽ More The incorporation and adaptation of style guides play an essential role in software development, influencing code formatting, naming conventions, and structure to enhance readability and simplify maintenance. However, many of these guides often lack empirical studies to validate their recommendations. Previous studies have examined the impact of code styles on developer performance, concluding that some styles have a negative impact on code readability. However, there is a need for more studies that assess other perspectives and the combination of these perspectives on a common basis through experiments. This study aimed to investigate, through eye-tracking, the impact of guidelines in style guides, with a special focus on the PEP8 guide in Python, recognized for its best practices. We conducted a controlled experiment with 32 Python novices, measuring time, the number of attempts, and visual effort through eye-tracking, using fixation duration, fixation count, and regression count for four PEP8 recommendations. Additionally, we conducted interviews to explore the subjects' difficulties and preferences with the programs. The results highlighted that not following the PEP8 Line Break after an Operator guideline increased the eye regression count by 70% in the code snippet where the standard should have been applied. Most subjects preferred the version that adhered to the PEP8 guideline, and some found the left-aligned organization of operators easier to understand. The other evaluated guidelines revealed other interesting nuances, such as the True Comparison, which negatively impacted eye metrics for the PEP8 standard, although subjects preferred the PEP8 suggestion. We recommend practitioners selecting guidelines supported by experimental evaluations. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.11841 [pdf, other]

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Authors: Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi , et al. (65 additional authors not shown)

Abstract: AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by… ▽ More AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 20 pages, 8 figures

arXiv:2408.06067 [pdf, other]

Don't You (Project Around Discs)? Neural Network Surrogate and Projected Gradient Descent for Calibrating an Intervertebral Disc Finite Element Model

Authors: Matan Atad, Gabriel Gruber, Marx Ribeiro, Luis Fernando Nicolini, Robert Graf, Hendrik Möller, Kati Nispel, Ivan Ezhov, Daniel Rueckert, Jan S. Kirschke

Abstract: Accurate calibration of finite element (FE) models of human intervertebral discs (IVDs) is essential for their reliability and application in diagnosing and planning treatments for spinal conditions. Traditional calibration methods are computationally intensive, requiring iterative, derivative-free optimization algorithms that often take hours or days to converge. This study addresses these chal… ▽ More Accurate calibration of finite element (FE) models of human intervertebral discs (IVDs) is essential for their reliability and application in diagnosing and planning treatments for spinal conditions. Traditional calibration methods are computationally intensive, requiring iterative, derivative-free optimization algorithms that often take hours or days to converge. This study addresses these challenges by introducing a novel, efficient, and effective calibration method for an L4-L5 IVD FE model using a neural network (NN) surrogate. The NN surrogate predicts simulation outcomes with high accuracy, outperforming other machine learning models, and significantly reduces the computational cost associated with traditional FE simulations. Next, a Projected Gradient Descent (PGD) approach guided by gradients of the NN surrogate is proposed to efficiently calibrate FE models. Our method explicitly enforces feasibility with a projection step, thus maintaining material bounds throughout the optimization process. The proposed method is evaluated against state-of-the-art Genetic Algorithm (GA) and inverse model baselines on synthetic and in vitro experimental datasets. Our approach demonstrates superior performance on synthetic data, achieving a Mean Absolute Error (MAE) of 0.06 compared to the baselines' MAE of 0.18 and 0.54, respectively. On experimental specimens, our method outperforms the baseline in 5 out of 6 cases. Most importantly, our approach reduces calibration time to under three seconds, compared to up to 8 days per sample required by traditional calibration. Such efficiency paves the way for applying more complex FE models, enabling accurate patient-specific simulations and advancing spinal treatment planning. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: Under submission. Project code: https://github.com/matanat/IVD-CalibNN/

arXiv:2407.19261 [pdf, ps, other]

Evaluating Large Language Models in Detecting Test Smells

Authors: Keila Lucas, Rohit Gheyi, Elvys Soares, Márcio Ribeiro, Ivan Machado

Abstract: Test smells are coding issues that typically arise from inadequate practices, a lack of knowledge about effective testing, or deadline pressures to complete projects. The presence of test smells can negatively impact the maintainability and reliability of software. While there are tools that use advanced static analysis or machine learning techniques to detect test smells, these tools often requir… ▽ More Test smells are coding issues that typically arise from inadequate practices, a lack of knowledge about effective testing, or deadline pressures to complete projects. The presence of test smells can negatively impact the maintainability and reliability of software. While there are tools that use advanced static analysis or machine learning techniques to detect test smells, these tools often require effort to be used. This study aims to evaluate the capability of Large Language Models (LLMs) in automatically detecting test smells. We evaluated ChatGPT-4, Mistral Large, and Gemini Advanced using 30 types of test smells across codebases in seven different programming languages collected from the literature. ChatGPT-4 identified 21 types of test smells. Gemini Advanced identified 17 types, while Mistral Large detected 15 types of test smells. Conclusion: The LLMs demonstrated potential as a valuable tool in identifying test smells. △ Less

Submitted 30 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

Comments: 7 pages, Accepted at Insightful Ideas and Emerging Results (IIER) Track of the Brazilian Symposium on Software Engineering (SBES 2024)

arXiv:2407.19087 [pdf, ps, other]

Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems

Authors: Lucas Albuquerque, Rohit Gheyi, Márcio Ribeiro

Abstract: Compilation is an important process in developing configurable systems, such as Linux. However, identifying compilation errors in configurable systems is not straightforward because traditional compilers are not variability-aware. Previous approaches that detect some of these compilation errors often rely on advanced techniques that require significant effort from programmers. This study evaluates… ▽ More Compilation is an important process in developing configurable systems, such as Linux. However, identifying compilation errors in configurable systems is not straightforward because traditional compilers are not variability-aware. Previous approaches that detect some of these compilation errors often rely on advanced techniques that require significant effort from programmers. This study evaluates the efficacy of Large Language Models (LLMs), specifically ChatGPT4, Le Chat Mistral and Gemini Advanced 1.5, in identifying compilation errors in configurable systems. Initially, we evaluate 50 small products in C++, Java, and C languages, followed by 30 small configurable systems in C, covering 17 different types of compilation errors. ChatGPT4 successfully identified most compilation errors in individual products and in configurable systems, while Le Chat Mistral and Gemini Advanced 1.5 detected some of them. LLMs have shown potential in assisting developers in identifying compilation errors in configurable systems. △ Less

Submitted 30 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

Comments: Accepted at NIER track of the Brazilian Symposium on Software Engineering (SBES 2024), 7 Pages

arXiv:2407.11978 [pdf, other]

doi 10.1145/3643834.3660707

"It depends": Configuring AI to Improve Clinical Usefulness Across Contexts

Authors: Hubert D. Zając, Jorge M. N. Ribeiro, Silvia Ingala, Simona Gentile, Ruth Wanjohi, Samuel N. Gitau, Jonathan F. Carlsen, Michael B. Nielsen, Tariq O. Andersen

Abstract: Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in D… ▽ More Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in Denmark and Kenya, based on three iterations of a functional AI-based prototype. Ten sociotechnical dependencies were identified as crucial for the design of AI in radiology. We conceptualised four technical dimensions that must be configured to the intended clinical context of use: AI functionality, AI medical focus, AI decision threshold, and AI Explainability. We present four design recommendations on how to address dependencies pertaining to the medical knowledge, clinic type, user expertise level, patient context, and user situation that condition the configuration of these technical dimensions. △ Less

Submitted 27 May, 2024; originally announced July 2024.

arXiv:2407.04858 [pdf, other]

Question Answering with Texts and Tables through Deep Reinforcement Learning

Authors: Marcos M. José, Flávio N. Cação, Maria F. Ribeiro, Rafael M. Cheang, Paulo Pirozelli, Fabio G. Cozman

Abstract: This paper proposes a novel architecture to generate multi-hop answers to open domain questions that require information from texts and tables, using the Open Table-and-Text Question Answering dataset for validation and training. One of the most common ways to generate answers in this setting is to retrieve information sequentially, where a selected piece of data helps searching for the next piece… ▽ More This paper proposes a novel architecture to generate multi-hop answers to open domain questions that require information from texts and tables, using the Open Table-and-Text Question Answering dataset for validation and training. One of the most common ways to generate answers in this setting is to retrieve information sequentially, where a selected piece of data helps searching for the next piece. As different models can have distinct behaviors when called in this sequential information search, a challenge is how to select models at each step. Our architecture employs reinforcement learning to choose between different state-of-the-art tools sequentially until, in the end, a desired answer is generated. This system achieved an F1-score of 19.03, comparable to iterative systems in the literature. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2405.02150 [pdf, other]

The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

Authors: Giuseppe Russo Latona, Manoel Horta Ribeiro, Tim R. Davidson, Veniamin Veselovsky, Robert West

Abstract: Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024… ▽ More Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Manoel Horta Ribeiro, Tim R. Davidson, and Veniamin Veselovsky contributed equally to this work

arXiv:2404.16992 [pdf, other]

doi 10.1145/3661167.3661225

A Catalog of Transformations to Remove Smells From Natural Language Tests

Authors: Manoel Aranda, Naelson Oliveira, Elvys Soares, Márcio Ribeiro, Davi Romão, Ullyanne Patriota, Rohit Gheyi, Emerson Souza, Ivan Machado

Abstract: Test smells can pose difficulties during testing activities, such as poor maintainability, non-deterministic behavior, and incomplete verification. Existing research has extensively addressed test smells in automated software tests but little attention has been given to smells in natural language tests. While some research has identified and catalogued such smells, there is a lack of systematic ap… ▽ More Test smells can pose difficulties during testing activities, such as poor maintainability, non-deterministic behavior, and incomplete verification. Existing research has extensively addressed test smells in automated software tests but little attention has been given to smells in natural language tests. While some research has identified and catalogued such smells, there is a lack of systematic approaches for their removal. Consequently, there is also a lack of tools to automatically identify and remove natural language test smells. This paper introduces a catalog of transformations designed to remove seven natural language test smells and a companion tool implemented using Natural Language Processing (NLP) techniques. Our work aims to enhance the quality and reliability of natural language tests during software development. The research employs a two-fold empirical strategy to evaluate its contributions. First, a survey involving 15 software testing professionals assesses the acceptance and usefulness of the catalog's transformations. Second, an empirical study evaluates our tool to remove natural language test smells by analyzing a sample of real-practice tests from the Ubuntu OS. The results indicate that software testing professionals find the transformations valuable. Additionally, the automated tool demonstrates a good level of precision, as evidenced by a F-Measure rate of 83.70% △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Distinguished Paper Award at International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition

ACM Class: D.2.5

arXiv:2404.11789 [pdf, other]

An Invitation to Resolvent Analysis

Authors: Laura Victoria Rolandi, Jean Hélder Marques Ribeiro, Chi-An Yeh, Kunihiko Taira

Abstract: Resolvent analysis is a powerful tool that can reveal the linear amplification mechanisms between the forcing inputs and the response outputs about a base flow. These mechanisms can be revealed in terms of a pair of forcing and response modes and the associated gains (amplification magnitude) in the order of energy contents at a given frequency. The linear relationship that ties the forcing and th… ▽ More Resolvent analysis is a powerful tool that can reveal the linear amplification mechanisms between the forcing inputs and the response outputs about a base flow. These mechanisms can be revealed in terms of a pair of forcing and response modes and the associated gains (amplification magnitude) in the order of energy contents at a given frequency. The linear relationship that ties the forcing and the response is represented through the resolvent operator (transfer function), which is constructed through spatially discretizing the linearized Navier-Stokes operator. One of the unique strengths of resolvent analysis is its ability to analyze statistically stationary turbulent flows. In light of the increasing interest in using resolvent analysis to study a variety of flows, we offer this guide in hopes of removing the hurdle for students and researchers to initiate the development of a resolvent analysis code and its applications to their problems of interest. To achieve this goal, we discuss various aspects of resolvent analysis and its role in identifying dominant flow structures about the base flow. The discussion in this paper revolves around the compressible Navier-Stokes equations in the most general manner. We cover essential considerations ranging from selecting the base flow and appropriate energy norms to the intricacies of constructing the linear operator and performing eigenvalue and singular value decompositions. Throughout the paper, we offer details and know-how that may not be available to readers in a collective manner elsewhere. Towards the end of this paper, examples are offered to demonstrate the practical applicability of resolvent analysis, aiming to guide readers through its implementation and inspire further extensions. We invite readers to consider resolvent analysis as a companion for their research endeavors. △ Less

Submitted 25 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.00750 [pdf, other]

Can Language Models Recognize Convincing Arguments?

Authors: Paula Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West

Abstract: The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda. To gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans, we propose studying their performance on the related task of detecting convincing arguments… ▽ More The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda. To gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans, we propose studying their performance on the related task of detecting convincing arguments. We extend a dataset by Durmus & Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs' ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, even surpassing human performance. The data and code released with this paper contribute to the crucial ongoing effort of continuously evaluating and monitoring the rapidly evolving capabilities and potential impact of LLMs. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.14380 [pdf, other]

On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial

Authors: Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, Robert West

Abstract: The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLM… ▽ More The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLMs' persuasive capabilities in direct conversations with human counterparts and how personalization can improve their performance. In this pre-registered study, we analyze the effect of AI-driven persuasion in a controlled, harmless setting. We create a web-based platform where participants engage in short, multiple-round debates with a live opponent. Each participant is randomly assigned to one of four treatment conditions, corresponding to a two-by-two factorial design: (1) Games are either played between two humans or between a human and an LLM; (2) Personalization might or might not be enabled, granting one of the two players access to basic sociodemographic information about their opponent. We found that participants who debated GPT-4 with access to their personal information had 81.7% (p < 0.01; N=820 unique participants) higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperforms humans, but the effect is lower and statistically non-significant (p=0.31). Overall, our results suggest that concerns around personalization are meaningful and have important implications for the governance of social media and the design of new online environments. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 33 pages, 10 figures, 7 tables

arXiv:2403.08655 [pdf, ps, other]

Efficient electronic cooling above 2 K by niobium-based superconducting tunnel junctions

Authors: J. Hätinen, A. Ronzani, R. P. Loreto, E. Mykkänen, A. Kemppinen, K. Viisanen, T. Rantanen, J. Geisor, J. Lehtinen, M. Ribeiro, J-P. Kaikkonen, O. Prakash, V. Vesterinen, W. Förbom, E. T. Mannila, M. Kervinen, J. Govenius, M. Prunnila

Abstract: Replacing the bulky cryo-liquid based cooling stages of cryo-enabled instruments by chip scale refrigeration is envisioned to disruptively reduce the system size similarly as microprocessors did for computers. Electronic refrigerators based on superconducting tunnel junctions have been envisioned to provide a solution, but reaching the necessary above 1 kelvin operation temperature range has remai… ▽ More Replacing the bulky cryo-liquid based cooling stages of cryo-enabled instruments by chip scale refrigeration is envisioned to disruptively reduce the system size similarly as microprocessors did for computers. Electronic refrigerators based on superconducting tunnel junctions have been envisioned to provide a solution, but reaching the necessary above 1 kelvin operation temperature range has remained a goal out of reach for several decades. We show efficient electronic refrigeration by Al-AlOx-Nb superconducting tunnel junctions starting from bath temperatures above 2 K. The junctions can deliver electronic cooling power up to $\sim$mW/mm$^2$, which enables us to demonstrate tunnel current driven electron temperature reduction from 2.4 K to below 1.6 K (34% relative cooling) against the phonon bath. Our work shows that the key material of integrated superconducting circuits - niobium - enables powerful cryogenic refrigerator technology. This result is a prerequisite for practical cryogenic chip scale refrigerators and, at the same time, it introduces a new electro-thermal tool for quantum heat transport experiments. △ Less

Submitted 23 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.12553 [pdf, ps, other]

Triglobal resolvent-analysis-based control of separated flows around low-aspect-ratio wings

Authors: Jean Hélder Marques Ribeiro, Kunihiko Taira

Abstract: We perform direct numerical simulations (DNS) of actively controlled laminar separated wakes around low-aspect-ratio wings with two primary goals: (i) reducing the size of the separation bubble and (ii) attenuating the wing tip vortex. Instead of preventing separation, we modify the three-dimensional ($3$-D) dynamics to exploit wake vortices for aerodynamic enhancements. A direct wake modification… ▽ More We perform direct numerical simulations (DNS) of actively controlled laminar separated wakes around low-aspect-ratio wings with two primary goals: (i) reducing the size of the separation bubble and (ii) attenuating the wing tip vortex. Instead of preventing separation, we modify the three-dimensional ($3$-D) dynamics to exploit wake vortices for aerodynamic enhancements. A direct wake modification is considered using optimal harmonic forcing modes from triglobal resolvent analysis. For this study, we consider wings at angles of attack of $14^\circ$ and $22^\circ$, taper ratios $0.27$ and $1$, and leading edge sweep angles of $0^\circ$ and $30^\circ$, at a mean-chord-based Reynolds number of $600$. The wakes behind these wings exhibit $3$-D reversed-flow bubble and large-scale vortical structures. For tapered swept wings, the diversity of wake vortices increases substantially, posing a challenge for flow control. To achieve the first control objective for an untapered unswept wing, root-based actuation at the shedding frequency is introduced to reduce the reversed-flow bubble size by taking advantage of the wake vortices to significantly enhance the aerodynamic performance of the wing. For both untapered and tapered swept wings, root-based actuation modifies the stalled flow, reduces the reversed-flow region, and enhances aerodynamic performance by increasing the root contribution to lift. For the goal of controlling the tip vortex, we demonstrate the effectiveness of actuation with high-frequency perturbations near the tip. This study shows how insights from resolvent analysis for unsteady actuation can enable global modification of $3$-D separated wakes and achieve improved aerodynamics of wings. △ Less

Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.01573 [pdf, other]

An Actionable Framework for Understanding and Improving Talent Retention as a Competitive Advantage in IT Organizations

Authors: Luiz Alexandre Costa, Edson Dias, Danilo Monteiro Ribeiro, Awdren Fontão, Gustavo Pinto, Rodrigo Pereira dos Santos, Alexander Serebrenik

Abstract: In the rapidly evolving global business landscape, the demand for software has intensified competition among organizations, leading to challenges in retaining highly qualified IT members in software organizations. One of the problems faced by IT organizations is the retention of these strategic professionals, also known as talent. This work presents an actionable framework for Talent Retention (TR… ▽ More In the rapidly evolving global business landscape, the demand for software has intensified competition among organizations, leading to challenges in retaining highly qualified IT members in software organizations. One of the problems faced by IT organizations is the retention of these strategic professionals, also known as talent. This work presents an actionable framework for Talent Retention (TR) used in IT organizations. It is based on our findings from interviews performed with 21 IT managers. The TR Framework is our main research outcome. Our framework encompasses a set of factors, contextual characteristics, barriers, strategies, and coping mechanisms. Our findings indicated that software engineers can be differentiated from other professional groups, and beyond competitive salaries, other elements for retaining talent in IT organizations should be considered, such as psychological safety, work-life balance, a positive work environment, innovative and challenging projects, and flexible work. A better understanding of factors could guide IT managers in improving talent management processes by addressing Software Engineering challenges, identifying important elements, and exploring strategies at the individual, team, and organizational levels. △ Less

Submitted 24 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2205.06352 by other authors

arXiv:2401.18060 [pdf, ps, other]

Rarity of the infinite chains in the tree of numerical semigroups

Authors: Maria Bras-Amorós, Mariana Rosas Ribeiro

Abstract: We prove that, for each fixed genus, the portion of semigroups of that genus belonging to infinite chains in the semigroup tree approaches 0 as the genus grows to infinite. This means that most numerical semigroups have a finite number of descendants in the semigroup tree. This problem has been open since 2009. We prove that, for each fixed genus, the portion of semigroups of that genus belonging to infinite chains in the semigroup tree approaches 0 as the genus grows to infinite. This means that most numerical semigroups have a finite number of descendants in the semigroup tree. This problem has been open since 2009. △ Less

Submitted 31 January, 2024; originally announced January 2024.

MSC Class: 68W30; 06F05; 20M14; 05A99

arXiv:2401.01406 [pdf]

Influence of Surface Roughness on Linear Behavior and Mechanical Properties of Three Cyanoacrylate-Based Adhesives Used to Bond Strain Gages

Authors: L. G. Simao, W. P. Jesus, M. E. A. Ribeiro, H. C. Rangel, R. J. S. Rodriguez, E. A. Carvalho

Abstract: The challenge of accessing specialized adhesives designed for strain gage applications has been highlighted due to failures in logistic chains, requiring the exploration of local alternatives. A direct simulation of strain gage bonding behavior with two steel plates is infeasible due to the unique construction of strain gages. Therefore, an indirect simulation method, comparing local alternatives… ▽ More The challenge of accessing specialized adhesives designed for strain gage applications has been highlighted due to failures in logistic chains, requiring the exploration of local alternatives. A direct simulation of strain gage bonding behavior with two steel plates is infeasible due to the unique construction of strain gages. Therefore, an indirect simulation method, comparing local alternatives to a widely accepted adhesive, Loctite 496, was employed in this study. Two potential replacements, Loctite 401 and Tekbond 793, were tested and matched against the benchmark adhesive, with a focus on the key mechanical properties: Proportional Shear Strain (PSS), Proportional Shear Stress (PSSt), and Apparent Shear Modulus (G*). Loctite 401 exhibited the highest G*, suggesting its potential use in strain gage installations if G* is considered most important. However, Tekbond 793 demonstrated superior PSS, Maximum Shear Stress (MSSt), and Rupture Shear Stress (RSSt) performance, displaying linear behavior even without an accelerator. Surface preparation considerations were also discussed, noting that hand abrading results in double the surface roughness than using an orbital sander. The study further identified two main regions concerning failure modes related to Ra, with values below 0.31 μm causing significant variations in observed mechanical properties, pointing towards factors beyond adhesive layer thickness affecting bond properties. Lastly, the general recommendation is the use of an accelerator for all tested adhesives, while the use of a surface conditioner and neutralizer was found to negatively impact adhesive performance. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 44 pages, 24 figures

arXiv:2401.01253 [pdf, other]

Deplatforming Norm-Violating Influencers on Social Media Reduces Overall Online Attention Toward Them

Authors: Manoel Horta Ribeiro, Shagun Jhaver, Jordi Cluet i Martinell, Marie Reignier-Tayar, Robert West

Abstract: From politicians to podcast hosts, online platforms have systematically banned (``deplatformed'') influential users for breaking platform guidelines. Previous inquiries on the effectiveness of this intervention are inconclusive because 1) they consider only few deplatforming events; 2) they consider only overt engagement traces (e.g., likes and posts) but not passive engagement (e.g., views); 3) t… ▽ More From politicians to podcast hosts, online platforms have systematically banned (``deplatformed'') influential users for breaking platform guidelines. Previous inquiries on the effectiveness of this intervention are inconclusive because 1) they consider only few deplatforming events; 2) they consider only overt engagement traces (e.g., likes and posts) but not passive engagement (e.g., views); 3) they do not consider all the potential places users impacted by the deplatforming event might migrate to. We address these limitations in a longitudinal, quasi-experimental study of 165 deplatforming events targeted at 101 influencers. We collect deplatforming events from Reddit posts and then manually curate the data, ensuring the correctness of a large dataset of deplatforming events. Then, we link these events to Google Trends and Wikipedia page views, platform-agnostic measures of online attention that capture the general public's interest in specific influencers. Through a difference-in-differences approach, we find that deplatforming reduces online attention toward influencers. After 12 months, we estimate that online attention toward deplatformed influencers is reduced by -63% (95% CI [-75%,-46%]) on Google and by -43% (95% CI [-57%,-24%]) on Wikipedia. Further, as we study over a hundred deplatforming events, we can analyze in which cases deplatforming is more or less impactful, revealing nuances about the intervention. Notably, we find that both permanent and temporary deplatforming reduce online attention toward influencers; Overall, this work contributes to the ongoing effort to map the effectiveness of content moderation interventions, driving platform governance away from speculation. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.11307 [pdf, other]

New Constraint for Isotropic Lorentz Violation from LHC Data

Authors: David Amram, Killian Bouzoud, Nicolas Chanon, Hubert Hansen, Marcos R. Ribeiro Jr., Marco Schreck

Abstract: New calculations for the kinematics of photon decay to fermions in vacuo under an isotropic violation of Lorentz invariance (LV), parameterized by the Standard-Model Extension (SME), are presented in this paper and used to interpret prompt photon production in LHC data. The measurement of inclusive prompt photon production at the LHC Run 2, with photons observed up to a transverse energy of 2.5 Te… ▽ More New calculations for the kinematics of photon decay to fermions in vacuo under an isotropic violation of Lorentz invariance (LV), parameterized by the Standard-Model Extension (SME), are presented in this paper and used to interpret prompt photon production in LHC data. The measurement of inclusive prompt photon production at the LHC Run 2, with photons observed up to a transverse energy of 2.5 TeV, provides the lower bound $\tildeκ_{\mathrm{tr}} > -1.06 \times 10^{-13}$ on the isotropic coefficient $\tildeκ_{\mathrm{tr}}$ at 95% confidence level. This result improves over the previous bound from hadron colliders by a factor of 55. The calculations for the kinematics of photon decay have further potential use to constrain LV coefficients from the appearance of fermion pairs, for instance, top-antitop. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 9 pages, 3 figures

arXiv:2312.11240 [pdf, other]

Evaluation of Barlow Twins and VICReg self-supervised learning for sound patterns of bird and anuran species

Authors: Fábio Felix Dias, Moacir Antonelli Ponti, Mílton Cezar Ribeiro, Rosane Minghim

Abstract: Taking advantage of the structure of large datasets to pre-train Deep Learning models is a promising strategy to decrease the need for supervised data. Self-supervised learning methods, such as contrastive and its variation are a promising way towards obtaining better representations in many Deep Learning applications. Soundscape ecology is one application in which annotations are expensive and sc… ▽ More Taking advantage of the structure of large datasets to pre-train Deep Learning models is a promising strategy to decrease the need for supervised data. Self-supervised learning methods, such as contrastive and its variation are a promising way towards obtaining better representations in many Deep Learning applications. Soundscape ecology is one application in which annotations are expensive and scarce, therefore deserving investigation to approximate methods that do not require annotations to those that rely on supervision. Our study involves the use of the methods Barlow Twins and VICReg to pre-train different models with the same small dataset with sound patterns of bird and anuran species. In a downstream task to classify those animal species, the models obtained results close to supervised ones, pre-trained in large generic datasets, and fine-tuned with the same task. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 10 pages, 2 figures, 3 tables

arXiv:2310.15683 [pdf, other]

Prevalence and prevention of large language model use in crowd work

Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West

Abstract: We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by… ▽ More We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: VV and MHR equal contribution. 14 pages, 1 figure, 1 table

arXiv:2310.12696 [pdf, other]

Protection from Evil and Good: The Differential Effects of Page Protection on Wikipedia Article Quality

Authors: Thorsten Ruprechter, Manoel Horta Ribeiro, Robert West, Denis Helic

Abstract: Wikipedia, the Web's largest encyclopedia, frequently faces content disputes or malicious users seeking to subvert its integrity. Administrators can mitigate such disruptions by enforcing "page protection" that selectively limits contributions to specific articles to help prevent the degradation of content. However, this practice contradicts one of Wikipedia's fundamental principles$-$that it is o… ▽ More Wikipedia, the Web's largest encyclopedia, frequently faces content disputes or malicious users seeking to subvert its integrity. Administrators can mitigate such disruptions by enforcing "page protection" that selectively limits contributions to specific articles to help prevent the degradation of content. However, this practice contradicts one of Wikipedia's fundamental principles$-$that it is open to all contributors$-$and may hinder further improvement of the encyclopedia. In this paper, we examine the effect of page protection on article quality to better understand whether and when page protections are warranted. Using decade-long data on page protections from the English Wikipedia, we conduct a quasi-experimental study analyzing pages that received "requests for page protection"$-$written appeals submitted by Wikipedia editors to administrators to impose page protections. We match pages that indeed received page protection with similar pages that did not and quantify the causal effect of the interventions on a well-established measure of article quality. Our findings indicate that the effect of page protection on article quality depends on the characteristics of the page prior to the intervention: high-quality articles are affected positively as opposed to low-quality articles that are impacted negatively. Subsequent analysis suggests that high-quality articles degrade when left unprotected, whereas low-quality articles improve. Overall, with our study, we outline page protections on Wikipedia and inform best practices on whether and when to protect an article. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Under Review, 11 pages

arXiv:2310.12186 [pdf, other]

Stranger Danger! Cross-Community Interactions with Fringe Users Increase the Growth of Fringe Communities on Reddit

Authors: Giuseppe Russo, Manoel Horta Ribeiro, Robert West

Abstract: Fringe communities promoting conspiracy theories and extremist ideologies have thrived on mainstream platforms, raising questions about the mechanisms driving their growth. Here, we hypothesize and study a possible mechanism: new members may be recruited through fringe-interactions: the exchange of comments between members and non-members of fringe communities. We apply text-based causal inference… ▽ More Fringe communities promoting conspiracy theories and extremist ideologies have thrived on mainstream platforms, raising questions about the mechanisms driving their growth. Here, we hypothesize and study a possible mechanism: new members may be recruited through fringe-interactions: the exchange of comments between members and non-members of fringe communities. We apply text-based causal inference techniques to study the impact of fringe-interactions on the growth of three prominent fringe communities on Reddit: r/Incel, r/GenderCritical, and r/The_Donald. Our results indicate that fringe-interactions attract new members to fringe communities. Users who receive these interactions are up to 4.2 percentage points (pp) more likely to join fringe communities than similar, matched users who do not. This effect is influenced by 1) the characteristics of communities where the interaction happens (e.g., left vs. right-leaning communities) and 2) the language used in the interactions. Interactions using toxic language have a 5pp higher chance of attracting newcomers to fringe communities than non-toxic interactions. We find no effect when repeating this analysis by replacing fringe (r/Incel, r/GenderCritical, and r/The_Donald) with non-fringe communities (r/climatechange, r/NBA, r/leagueoflegends), suggesting this growth mechanism is specific to fringe communities. Overall, our findings suggest that curtailing fringe-interactions may reduce the growth of fringe communities on mainstream platforms. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 11 Pages, 7 Figures, 3 Tables

arXiv:2309.10190 [pdf, ps, other]

Revisited convexity notions for $L^\infty$ variational problems

Authors: Ana Margarida Ribeiro, Elvira Zappale

Abstract: We address a deep study of the convexity notions that arise in the study of weak* lower semicontinuity of supremal functionals as well as those raised by the power-law approximation of such functionals. Our quest is motivated by the knowledge we have on the analogous integral functionals and aims at establishing a solid groundwork to ease any research in the $L^\infty$ context. We address a deep study of the convexity notions that arise in the study of weak* lower semicontinuity of supremal functionals as well as those raised by the power-law approximation of such functionals. Our quest is motivated by the knowledge we have on the analogous integral functionals and aims at establishing a solid groundwork to ease any research in the $L^\infty$ context. △ Less

Submitted 18 September, 2023; originally announced September 2023.

MSC Class: 26B25; 49J45

arXiv:2308.16428 [pdf, other]

On the topology of the Milnor Boundary for real analytic singularities

Authors: R. Araújo dos Santos, A. Menegon, M. Ribeiro, J. Seade, I. D. Santamaria Guarín

Abstract: We study the topology of the boundaries of the Milnor fibers of real analytics map-germs $f: (\mathbb{R}^M,0) \to (\mathbb{R}^K,0)$ and $f_{I}:=Π_{I}\circ f : (\mathbb{R}^M,0) \to (\mathbb{R}^I,0)$ that admit Milnor's tube fibrations, where $Π_{I}:(\mathbb{R}^K,0)\to (\mathbb{R}^{I},0)$ is the canonical projection for $1\leq I<K.$ For each $I$ we prove that the Milnor boundary $\partial F_{I}$ is… ▽ More We study the topology of the boundaries of the Milnor fibers of real analytics map-germs $f: (\mathbb{R}^M,0) \to (\mathbb{R}^K,0)$ and $f_{I}:=Π_{I}\circ f : (\mathbb{R}^M,0) \to (\mathbb{R}^I,0)$ that admit Milnor's tube fibrations, where $Π_{I}:(\mathbb{R}^K,0)\to (\mathbb{R}^{I},0)$ is the canonical projection for $1\leq I<K.$ For each $I$ we prove that the Milnor boundary $\partial F_{I}$ is given by the double of the Milnor tube fiber $F_{I+1}.$ We prove that if $K-I\geq 2$, then the pair $(\partial F_{I},\partial F_{f})$ is a generalized $(K-I-1)$-open-book decomposition with binding $\partial F_{f}$ and page $F_{f} \setminus \partial F_{f}$ - the interior of the Milnor fibre $F_{f}$ (see the definition below). This allows us to prove several new Euler characteristic formulae connecting the Milnor boundaries $\partial F_{f},$ $\partial F_{I},$ with the respectives links $\mathcal{L}_{f}, \mathcal{L}_{I},$ for each $1\leq I<K,$ and a Lê-Greuel type formula for the Milnor boundary. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.14624 [pdf, ps, other]

doi 10.1016/j.jalgebra.2024.08.024

Universally injective and integral contractions on relative Lipschitz saturation of algebras

Authors: Thiago da Silva, Maico Ribeiro

Abstract: In this work, we obtain contraction results for a class of diagrams of ring morphisms which strictly includes the ones obtained by Lipman. Further, we present some applications on quotient and in the changing of the base ring in the saturation. In this work, we obtain contraction results for a class of diagrams of ring morphisms which strictly includes the ones obtained by Lipman. Further, we present some applications on quotient and in the changing of the base ring in the saturation. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.13359 [pdf, ps, other]

Topology of first integrals via Milnor fibrations II

Authors: Fernando Reis, Maico Ribeiro, Euripedes da Silva

Abstract: This survey is the continuation of a series of works aimed at applying tools from Singularity Theory to Differential Equations. More precisely, we utilize the powerfull Milnor's Fibration Theory to give geometric-topological classifications of first integrals of differential systems. In the previous paper, systems of first-order quasilinear partial differential equations were examined, focusing on… ▽ More This survey is the continuation of a series of works aimed at applying tools from Singularity Theory to Differential Equations. More precisely, we utilize the powerfull Milnor's Fibration Theory to give geometric-topological classifications of first integrals of differential systems. In the previous paper, systems of first-order quasilinear partial differential equations were examined, focusing on the case of an isolated singularity. Now, we address both cases of isolated and \textit{non-isolated singularities} for more general dynamical systems (namely, \textit{foliations}) that admit at least one first integral. For this, we utilize recently established connections between harmonic morphisms and Milnor fibrations to provide topological and geometric descriptions of the foliations under consideration. In particular, we apply these results to analyze the graph of solutions of some quasilinear systems. △ Less

Submitted 25 August, 2023; originally announced August 2023.

MSC Class: 14J17; 57R30; 14D06

arXiv:2308.13350 [pdf, other]

doi 10.1007/s40687-024-00453-y

Some remarks about $ρ$-regularity for real analytic maps

Authors: Maico Ribeiro, Ivan Santamaria, Thiago da Silva

Abstract: In this paper, we discuss the concept of $ρ$-regularity of analytic map germs and its close relationship with the existence of locally trivial smooth fibrations, known as the Milnor fibrations. The presence of a Thom regular stratification or the Milnor condition (b) at the origin, indicates the transversality of the fibers of the map G with respect to the levels of a function $ρ$, which guarantee… ▽ More In this paper, we discuss the concept of $ρ$-regularity of analytic map germs and its close relationship with the existence of locally trivial smooth fibrations, known as the Milnor fibrations. The presence of a Thom regular stratification or the Milnor condition (b) at the origin, indicates the transversality of the fibers of the map G with respect to the levels of a function $ρ$, which guarantees $ρ$-regularity. Consequently, both conditions are crucial for the presence of open book structures and the Milnor fibrations. The work aims to provide a comprehensive overview of the main results concerning the existence of Thom regular stratifications and the Milnor condition (b) for germs of analytic maps. It presents strategies and criteria to identify and ensure these regularity conditions and discusses situations where they may not be satisfied. The goal is to understand the presence and limitations of these conditions in various contexts. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.12442 [pdf, other]

Similarities in Massive Separation Across Reynolds Numbers for Swept and Tapered Finite Span Wings

Authors: Jacob Neal, Anton Burtsev, Jean Helder Marques Ribeiro, Kunihiko Taira, Vassilios Theofilis, Michael Amitay

Abstract: Experimental investigations were performed to elucidate the features of flow fields occurring over cantilevered finite-aspect ratio NACA 0015 wings at high angles of attack with various sweep angles and taper ratios. Volumetric Stereoscopic Particle Image Velocimetry experiments were performed at mean chord based Reynolds number of 247,500 in a wind tunnel and 600 in a water tunnel. Direct Numeric… ▽ More Experimental investigations were performed to elucidate the features of flow fields occurring over cantilevered finite-aspect ratio NACA 0015 wings at high angles of attack with various sweep angles and taper ratios. Volumetric Stereoscopic Particle Image Velocimetry experiments were performed at mean chord based Reynolds number of 247,500 in a wind tunnel and 600 in a water tunnel. Direct Numerical Simulations (DNS) of the water tunnel test section, including the cantilevered model, were also performed at the lower Reynolds number. The low Reynolds number experiments, low Reynolds number simulations, and high Reynolds number experiments all showed that sweeping the leading edge back shifted the largest portion of the reversed flow towards the wingtip while sweeping the trailing edge forward shifted the reversed flow towards the wing root. A detailed parametric sweep of planform geometry systematically varied the leading and trailing edge sweep angles and taper ratios of the finite wings. It was found that the large scale vortical structures resulting from varying these parameters at the two Reynolds numbers share surprisingly many three-dimensional topological features, despite the orders of magnitude different Reynolds numbers. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.10398 [pdf, other]

Causally estimating the effect of YouTube's recommender system using counterfactual bots

Authors: Homa Hosseinmardi, Amir Ghasemian, Miguel Rivera-Lanas, Manoel Horta Ribeiro, Robert West, Duncan J. Watts

Abstract: In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence canno… ▽ More In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence cannot disentangle the effects of the algorithm from a user's intentions. Here we propose a method that we call ``counterfactual bots'' to causally estimate the role of algorithmic recommendations on the consumption of highly partisan content. By comparing bots that replicate real users' consumption patterns with ``counterfactual'' bots that follow rule-based trajectories, we show that, on average, relying exclusively on the recommender results in less partisan consumption, where the effect is most pronounced for heavy partisan consumers. Following a similar method, we also show that if partisan consumers switch to moderate content, YouTube's sidebar recommender ``forgets'' their partisan preference within roughly 30 videos regardless of their prior history, while homepage recommendations shift more gradually towards moderate content. Overall, our findings indicate that, at least since the algorithm changes that YouTube implemented in 2019, individual consumption patterns mostly reflect individual preferences, where algorithmic recommendations play, if anything, a moderating role. △ Less

Submitted 1 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.01386 [pdf, other]

Manual Tests Do Smell! Cataloging and Identifying Natural Language Test Smells

Authors: Elvys Soares, Manoel Aranda, Naelson Oliveira, Márcio Ribeiro, Rohit Gheyi, Emerson Souza, Ivan Machado, André Santos, Baldoino Fonseca, Rodrigo Bonifácio

Abstract: Background: Test smells indicate potential problems in the design and implementation of automated software tests that may negatively impact test code maintainability, coverage, and reliability. When poorly described, manual tests written in natural language may suffer from related problems, which enable their analysis from the point of view of test smells. Despite the possible prejudice to manuall… ▽ More Background: Test smells indicate potential problems in the design and implementation of automated software tests that may negatively impact test code maintainability, coverage, and reliability. When poorly described, manual tests written in natural language may suffer from related problems, which enable their analysis from the point of view of test smells. Despite the possible prejudice to manually tested software products, little is known about test smells in manual tests, which results in many open questions regarding their types, frequency, and harm to tests written in natural language. Aims: Therefore, this study aims to contribute to a catalog of test smells for manual tests. Method: We perform a two-fold empirical strategy. First, an exploratory study in manual tests of three systems: the Ubuntu Operational System, the Brazilian Electronic Voting Machine, and the User Interface of a large smartphone manufacturer. We use our findings to propose a catalog of eight test smells and identification rules based on syntactical and morphological text analysis, validating our catalog with 24 in-company test engineers. Second, using our proposals, we create a tool based on Natural Language Processing (NLP) to analyze the subject systems' tests, validating the results. Results: We observed the occurrence of eight test smells. A survey of 24 in-company test professionals showed that 80.7% agreed with our catalog definitions and examples. Our NLP-based tool achieved a precision of 92%, recall of 95%, and f-measure of 93.5%, and its execution evidenced 13,169 occurrences of our cataloged test smells in the analyzed systems. Conclusion: We contribute with a catalog of natural language test smells and novel detection strategies that better explore the capabilities of current NLP mechanisms with promising results and reduced effort to analyze tests written in different idioms. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: The 17th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2023

arXiv:2307.16709 [pdf, other]

Multilingual context-based pronunciation learning for Text-to-Speech

Authors: Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

Abstract: Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to co… ▽ More Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to correct pronunciation within or between words. In this work we showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules. We evaluate the proposed model on G2P conversion and other language-specific challenges, such as homograph and polyphones disambiguation, post-lexical rules and implicit diacritization. We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, 5 tables. Interspeech 2023

arXiv:2307.16696 [pdf, other]

Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

Authors: Gustavo Pinto, Isadora Cardoso-Pereira, Danilo Monteiro Ribeiro, Danilo Lucena, Alberto de Souza, Kiev Gama

Abstract: As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the nu… ▽ More As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the number of qualified professionals available for providing corrections is small, delivering effective feedback becomes even more challenging. To circumvent this challenge, this work presents an exploration of using Large Language Models (LLMs) to support the correction process of open-ended questions in technical training. In this study, we utilized ChatGPT to correct open-ended questions answered by 42 industry professionals on two topics. Evaluating the corrections and feedback provided by ChatGPT, we observed that it is capable of identifying semantic details in responses that other metrics cannot observe. Furthermore, we noticed that, in general, subject matter experts tended to agree with the corrections and feedback given by ChatGPT. △ Less

Submitted 1 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: 10 pages, 2 figures

Journal ref: SBES EDU Track, 2023

arXiv:2307.16679 [pdf, other]

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Authors: Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Abstract: Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently proposed as alternatives. In this paper, we compare traditional L1/L2-based approaches to diffusion and flow-based approaches for the tasks of prosod… ▽ More Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently proposed as alternatives. In this paper, we compare traditional L1/L2-based approaches to diffusion and flow-based approaches for the tasks of prosody and mel-spectrogram prediction for text-to-speech synthesis. We use a prosody model to generate log-f0 and duration features, which are used to condition an acoustic model that generates mel-spectrograms. Experimental results demonstrate that the flow-based model achieves the best performance for spectrogram prediction, improving over equivalent diffusion and L1 models. Meanwhile, both diffusion and flow-based prosody predictors result in significant improvements over a typical L2-trained prosody models. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, 5 tables. Interspeech 2023

arXiv:2307.16643 [pdf, other]

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Authors: Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

Abstract: The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to… ▽ More The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently improves the phone error rate of G2P systems across languages and amount of available data. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, 4 tables. Interspeech 2023

arXiv:2307.09151 [pdf, other]

doi 10.1109/ACCESS.2023.3292788

Enhancing Network Slicing Architectures with Machine Learning, Security, Sustainability and Experimental Networks Integration

Authors: Joberto S. B. Martins, Tereza C. Carvalho, Rodrigo Moreira, Cristiano Both, Adnei Donatti, João H. Corrêa, José A. Suruagy, Sand L. Corrêa, Antonio J. G. Abelem, Moisés R. N. Ribeiro, Jose-Marcos Nogueira, Luiz C. S. Magalhães, Juliano Wickboldt, Tiago Ferreto, Ricardo Mello, Rafael Pasquini, Marcos Schwarz, Leobino N. Sampaio, Daniel F. Macedo, José F. de Rezende, Kleber V. Cardoso, Flávio O. Silva

Abstract: Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed re… ▽ More Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed resources among dynamic, demanding clients with highly distinct application requirements. Various standardization organizations, like 3GPP's proposal for new generation networks and state-of-the-art 5G/6G research projects, are proposing new NS architectures. However, new NS architectures have to deal with an extensive range of requirements that inherently result in having NS architecture proposals typically fulfilling the needs of specific sets of domains with commonalities. The Slicing Future Internet Infrastructures (SFI2) architecture proposal explores the gap resulting from the diversity of NS architectures target domains by proposing a new NS reference architecture with a defined focus on integrating experimental networks and enhancing the NS architecture with Machine Learning (ML) native optimizations, energy-efficient slicing, and slicing-tailored security functionalities. The SFI2 architectural main contribution includes the utilization of the slice-as-a-service paradigm for end-to-end orchestration of resources across multi-domains and multi-technology experimental networks. In addition, the SFI2 reference architecture instantiations will enhance the multi-domain and multi-technology integrated experimental network deployment with native ML optimization, energy-efficient aware slicing, and slicing-tailored security functionalities for the practical domain. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 10 pages, 11 figures

ACM Class: I.2.1; C.2.1; C.2.3

Journal ref: IEEE ACCESS 2023

arXiv:2307.06954 [pdf, other]

ACTI at EVALITA 2023: Overview of the Conspiracy Theory Identification Task

Authors: Giuseppe Russo, Niklas Stoehr, Manoel Horta Ribeiro

Abstract: Conspiracy Theory Identication task is a new shared task proposed for the first time at the Evalita 2023. The ACTI challenge, based exclusively on comments published on conspiratorial channels of telegram, is divided into two subtasks: (i) Conspiratorial Content Classification: identifying conspiratorial content and (ii) Conspiratorial Category Classification about specific conspiracy theory class… ▽ More Conspiracy Theory Identication task is a new shared task proposed for the first time at the Evalita 2023. The ACTI challenge, based exclusively on comments published on conspiratorial channels of telegram, is divided into two subtasks: (i) Conspiratorial Content Classification: identifying conspiratorial content and (ii) Conspiratorial Category Classification about specific conspiracy theory classification. A total of fifteen teams participated in the task for a total of 81 submissions. We illustrate the best performing approaches were based on the utilization of large language models. We finally draw conclusions about the utilization of these models for counteracting the spreading of misinformation in online platforms. △ Less

Submitted 2 September, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted at the Evalita Workshop 2023

arXiv:2307.03791 [pdf, other]

Tameness conditions and the Milnor fibrations for composite singularities

Authors: R. N. Araújo dos Santos, D. Dreibelbis, M. F. Ribeiro, I. D. Santamaría Guarín

Abstract: In this paper, we introduce a new regularity condition that characterizes the tameness of a composite singularity $H=G\circ F$ in a sharp way. Our approach provides a natural tool that links the topology of the Milnor tube fibrations through the Milnor fibers of the respective components of the map germs $F$, $G$ and $H = G\circ F$. We also study the invariance of tameness by $\mathcal{L}$-equival… ▽ More In this paper, we introduce a new regularity condition that characterizes the tameness of a composite singularity $H=G\circ F$ in a sharp way. Our approach provides a natural tool that links the topology of the Milnor tube fibrations through the Milnor fibers of the respective components of the map germs $F$, $G$ and $H = G\circ F$. We also study the invariance of tameness by $\mathcal{L}$-equivalence, $\mathcal{R}$-equivalence, and hence by $\mathcal{A}$-equivalence, and we give conditions for when two component map germs of the composite singularity $H=G\circ F$ being tame implies the third one is tame. As an application, we show how to relate the Euler characteristics of the Milnor fibers of $F,G$ and $H$ to each other. △ Less

Submitted 24 May, 2023; originally announced July 2023.

MSC Class: 58K15; 14D06; 58K35; 14B05; 32S55; 32S05

arXiv:2307.02903 [pdf]

PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction

Authors: Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardt, Idelfonso B. R. Nogueira

Abstract: Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we… ▽ More Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment. △ Less

Submitted 8 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.17298 [pdf, other]

Tube2Vec: Social and Semantic Embeddings of YouTube Channels

Authors: Léopaul Boesinger, Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West

Abstract: Research using YouTube data often explores social and semantic dimensions of channels and videos. Typically, analyses rely on laborious manual annotation of content and content creators, often found by low-recall methods such as keyword search. Here, we explore an alternative approach, using latent representations (embeddings) obtained via machine learning. Using a large dataset of YouTube links s… ▽ More Research using YouTube data often explores social and semantic dimensions of channels and videos. Typically, analyses rely on laborious manual annotation of content and content creators, often found by low-recall methods such as keyword search. Here, we explore an alternative approach, using latent representations (embeddings) obtained via machine learning. Using a large dataset of YouTube links shared on Reddit; we create embeddings that capture social sharing behavior, video metadata (title, description, etc.), and YouTube's video recommendations. We evaluate these embeddings using crowdsourcing and existing datasets, finding that recommendation embeddings excel at capturing both social and semantic dimensions, although social-sharing embeddings better correlate with existing partisan scores. We share embeddings capturing the social and semantic dimensions of 44,000 YouTube channels for the benefit of future research on YouTube: https://github.com/epfl-dlab/youtube-embeddings. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.07899 [pdf, other]

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West

Abstract: Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human ann… ▽ More Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. To investigate this concern, we conducted a case study on the prevalence of LLM usage by crowd workers. We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM-friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone. Code/data: https://github.com/epfl-dlab/GPTurk △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 9 pages, 4 figures

arXiv:2306.03280 [pdf, other]

AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms

Authors: Zana Buçinca, Chau Minh Pham, Maurice Jakesch, Marco Tulio Ribeiro, Alexandra Olteanu, Saleema Amershi

Abstract: While demands for change and accountability for harmful AI consequences mount, foreseeing the downstream effects of deploying AI systems remains a challenging task. We developed AHA! (Anticipating Harms of AI), a generative framework to assist AI practitioners and decision-makers in anticipating potential harms and unintended consequences of AI systems prior to development or deployment. Given an… ▽ More While demands for change and accountability for harmful AI consequences mount, foreseeing the downstream effects of deploying AI systems remains a challenging task. We developed AHA! (Anticipating Harms of AI), a generative framework to assist AI practitioners and decision-makers in anticipating potential harms and unintended consequences of AI systems prior to development or deployment. Given an AI deployment scenario, AHA! generates descriptions of possible harms for different stakeholders. To do so, AHA! systematically considers the interplay between common problematic AI behaviors as well as their potential impacts on different stakeholders, and narrates these conditions through vignettes. These vignettes are then filled in with descriptions of possible harms by prompting crowd workers and large language models. By examining 4113 harms surfaced by AHA! for five different AI deployment scenarios, we found that AHA! generates meaningful examples of harms, with different problematic AI behaviors resulting in different types of harms. Prompting both crowds and a large language model with the vignettes resulted in more diverse examples of harms than those generated by either the crowd or the model alone. To gauge AHA!'s potential practical utility, we also conducted semi-structured interviews with responsible AI professionals (N=9). Participants found AHA!'s systematic approach to surfacing harms important for ethical reflection and discovered meaningful stakeholders and harms they believed they would not have thought of otherwise. Participants, however, differed in their opinions about whether AHA! should be used upfront or as a secondary-check and noted that AHA! may shift harm anticipation from an ideation problem to a potentially demanding review problem. Drawing on our results, we discuss design implications of building tools to help practitioners envision possible harms. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.17804 [pdf, other]

Targeted Data Generation: Finding and Fixing Model Weaknesses

Authors: Zexue He, Marco Tulio Ribeiro, Fereshte Khani

Abstract: Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a f… ▽ More Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.17106 [pdf, other]

Understanding Self-Efficacy in the Context of Software Engineering: A Qualitative Study in the Industry

Authors: Danilo Monteiro Ribeiro, Rayfran Rocha Lima, César França, Alberto de Souza, Isadora Cardoso-Pereira, Gustavo Pinto

Abstract: CONTEXT: Self-efficacy is a concept researched in various areas of knowledge that impacts various factors such as performance, satisfaction, and motivation. In Software Engineering, it has mainly been studied in the academic context, presenting results similar to other areas of knowledge. However, it is also important to understand its impact in the industrial context. OBJECTIVE: Therefore, this s… ▽ More CONTEXT: Self-efficacy is a concept researched in various areas of knowledge that impacts various factors such as performance, satisfaction, and motivation. In Software Engineering, it has mainly been studied in the academic context, presenting results similar to other areas of knowledge. However, it is also important to understand its impact in the industrial context. OBJECTIVE: Therefore, this study aims to understand the impact on the software development context with a focus on understanding the behavioral signs of self-efficacy in software engineers and how self-efficacy can impact the work-day of software engineers. METHOD: A qualitative research was conducted using semi-structured questionnaires with 31 interviewees from a software development company located in Brazil. The interviewees participated in a Bootcamp and were later assigned to software development teams. Thematic analysis was used to analyze the data. RESULTS: In the perception of the interviewees, 21 signs were found that are related to people with high and low self-efficacy. These signs were divided into two dimensions: social and cognitive. Also, 18 situations were found that can lead to an increase or decrease of self-efficacy of software engineers. Finally, 12 factors were mentioned that can impact software development teams. CONCLUSION: This work evidences a set of behavioral signs that can help team leaders to better perceive the self-efficacy of their members. It also presents a set of situations that both leaders and individuals can use to improve their self-efficacy in the development context, and finally, factors that can be impacted by self-efficacy in the software development context are also presented. Finally, this work emphasizes the importance of understanding self-efficacy in the industrial context. △ Less

Submitted 2 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 10 pages, 3 figures

Journal ref: Published at EASE 2023

arXiv:2305.15041 [pdf, other]

Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science

Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West

Abstract: Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks. Here, we tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about (in other words, it is unfaithful). In a case study on sarcasm detectio… ▽ More Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks. Here, we tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about (in other words, it is unfaithful). In a case study on sarcasm detection, we study three strategies to increase the faithfulness of synthetic data: grounding, filtering, and taxonomy-based generation. We evaluate these strategies using the performance of classifiers trained with generated synthetic data on real-world data. While all three strategies improve the performance of classifiers, we find that grounding works best for the task at hand. As synthetic data generation plays an ever-increasing role in NLP research, we expect this work to be a stepping stone in improving its utility. We conclude this paper with some recommendations on how to generate high(er)-fidelity synthetic data for specific tasks. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 8 pages

arXiv:2305.12219 [pdf, other]

Collaborative Development of NLP models

Authors: Fereshte Khani, Marco Tulio Ribeiro

Abstract: Despite substantial advancements, Natural Language Processing (NLP) models often require post-training adjustments to enforce business rules, rectify undesired behavior, and align with user values. These adjustments involve operationalizing "concepts"--dictating desired model responses to certain inputs. However, it's difficult for a single entity to enumerate and define all possible concepts, ind… ▽ More Despite substantial advancements, Natural Language Processing (NLP) models often require post-training adjustments to enforce business rules, rectify undesired behavior, and align with user values. These adjustments involve operationalizing "concepts"--dictating desired model responses to certain inputs. However, it's difficult for a single entity to enumerate and define all possible concepts, indicating a need for a multi-user, collaborative model alignment framework. Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations. CoDev aids users in operationalizing their concepts using Large Language Models, and relying on the principle that NLP models exhibit simpler behaviors in local regions. Our main insight is learning a \emph{local} model for each concept, and a \emph{global} model to integrate the original data with all concepts. We then steer a large language model to generate instances within concept boundaries where local and global disagree. Our experiments show CoDev is effective at helping multiple users operationalize concepts and avoid interference for a variety of scenarios, tasks, and models. △ Less

Submitted 24 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

arXiv:2304.09991 [pdf, other]

doi 10.1145/3600211.3604712

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Authors: Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, Saleema Amershi

Abstract: Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI… ▽ More Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures. In this work, we draw upon literature in human-AI collaboration and sensemaking, and conduct interviews with research experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro and Lundberg, 2022), which is powered by a generative large language model (LLM). Through the design process we highlight the importance of sensemaking and human-AI communication to leverage complementary strengths of humans and generative models in collaborative auditing. To evaluate the effectiveness of the augmented tool, AdaTest++, we conduct user studies with participants auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment analysis model. Qualitative analysis shows that AdaTest++ effectively leverages human strengths such as schematization, hypothesis formation and testing. Further, with our tool, participants identified a variety of failures modes, covering 26 different topics over 2 tasks, that have been shown before in formal audits and also those previously under-reported. △ Less

Submitted 30 November, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 21 pages, 3 figures

Journal ref: In Proceedings of the 2023 AAAI and ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, NY, USA, 913-926

arXiv:2304.07587 [pdf, other]

Laminar post-stall wakes of tapered swept wings

Authors: Jean Hélder Marques Ribeiro, Jacob Neal, Anton Burtsev, Michael Amitay, Vassilios Theofilis, Kunihiko Taira

Abstract: While tapered swept wings are widely used, the influence of taper on their post-stall wake characteristics remains largely unexplored. To address this issue, we conduct an extensive study using direct numerical simulations to characterize the wing taper and sweep effects on laminar separated wakes. We analyze flows behind NACA 0015 cross-sectional profile wings at post-stall angles of attack… ▽ More While tapered swept wings are widely used, the influence of taper on their post-stall wake characteristics remains largely unexplored. To address this issue, we conduct an extensive study using direct numerical simulations to characterize the wing taper and sweep effects on laminar separated wakes. We analyze flows behind NACA 0015 cross-sectional profile wings at post-stall angles of attack $α=14^\circ$--$22^\circ$ with taper ratios $λ=0.27$--$1$, leading edge sweep angles $0^\circ$--$50^\circ$, and semi aspect ratios $sAR =1$ and $2$ at a mean-chord-based Reynolds number of $600$. Tapered wings have smaller tip chord length, which generates a weaker tip vortex, and attenuates inboard downwash. This results in the development of unsteadiness over a large portion of the wingspan at high angles of attack. For tapered wings with backward-swept leading edges unsteadiness emerges near the wing tip. On the other hand, wings with forward-swept trailing edges are shown to concentrate wake shedding structures near the wing root. For highly swept untapered wings, the wake is steady, while unsteady shedding vortices appear near the tip for tapered wings with high leading edge sweep angles. For such wings, larger wake oscillations emerge near the root as the taper ratio decreases. While the combination of taper and sweep increases flow unsteadiness, we find that tapered swept wings have more enhanced aerodynamic performance than untapered and unswept wings, exhibiting higher time-averaged lift and lift-to-drag ratio. The current findings shed light on the fundamental aspects of flow separation over tapered wings in the absence of turbulent flow effects. △ Less

Submitted 19 October, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

arXiv:2304.06774 [pdf]

doi 10.1063/5.0141388

Confined ionic liquids films under shear: The importance of the chemical nature of the solid surface

Authors: Kalil Bernardino, Mauro C. C. Ribeiro

Abstract: Ionic liquids have generated interest in applications as lubricants and as additives to conventional lubricants due to their unique physical properties. In these applications, the liquid thin film can be subjected simultaneously to extremely high shear and loads in addition to nanoconfinement effects. Here, we use molecular dynamics simulations with a coarse grained model to study a nanometric fil… ▽ More Ionic liquids have generated interest in applications as lubricants and as additives to conventional lubricants due to their unique physical properties. In these applications, the liquid thin film can be subjected simultaneously to extremely high shear and loads in addition to nanoconfinement effects. Here, we use molecular dynamics simulations with a coarse grained model to study a nanometric film of an ionic liquid confined between two planar solid surfaces both at equilibrium and at several shear rates. The strength of the interaction between the solid surface and the ions was changed by simulating three different surfaces with enhanced interactions with different ions. The increase of the interaction with either the cation or the anion leads to the formation of a solid-like layer that moves alongside the substrates, but this layer can exhibit different structures and stability. An increase in interaction with the high symmetry anion produces a more regular structure that is more resistant to the effects of shear and viscous heating. Two definitions were proposed and used for the calculation of the viscosity: a local definition based on microscopic characteristics of the liquid and an engineering definition based on the forces measured at the solid surfaces, with the former displaying a correlation with the layered structure induced by the surfaces. Because of the shear thinning behavior of the ionic liquids as well as the temperature rise brought on by viscous heating, both the engineering and the local viscosities decrease as the shear rate increases. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 32 pages, 8 figures

Journal ref: J. Chem. Phys. 158, 094712 (2023)

arXiv:2303.16151 [pdf, other]

Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage

Authors: Rafael Alves, Diego S. de Brito, Marcelo C. Medeiros, Ruy M. Ribeiro

Abstract: We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using v… ▽ More We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using vector heterogeneous autoregressive (VHAR) models with the least absolute shrinkage and selection operator (LASSO). Our methodology improves forecasting precision relative to standard benchmarks and leads to better estimates of minimum variance portfolios. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Showing 1–50 of 331 results for author: Ribeiro, M