Search | arXiv e-print repository

doi 10.13140/RG.2.2.16176.74248/1

A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms

Authors: Armin Mokhtarian, Jianye Xu, Patrick Scheffe, Maximilian Kloock, Simon Schäfer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, Johannes Betz, Sean Wilson, Spring Berman, Liam Paull, Amanda Prorok, Bassam Alrifaee

Abstract: Connected and automated vehicles and robot swarms hold transformative potential for enhancing safety, efficiency, and sustainability in the transportation and manufacturing sectors. Extensive testing and validation of these technologies is crucial for their deployment in the real world. While simulations are essential for initial testing, they often have limitations in capturing the complex dynami… ▽ More Connected and automated vehicles and robot swarms hold transformative potential for enhancing safety, efficiency, and sustainability in the transportation and manufacturing sectors. Extensive testing and validation of these technologies is crucial for their deployment in the real world. While simulations are essential for initial testing, they often have limitations in capturing the complex dynamics of real-world interactions. This limitation underscores the importance of small-scale testbeds. These testbeds provide a realistic, cost-effective, and controlled environment for testing and validating algorithms, acting as an essential intermediary between simulation and full-scale experiments. This work serves to facilitate researchers' efforts in identifying existing small-scale testbeds suitable for their experiments and provide insights for those who want to build their own. In addition, it delivers a comprehensive survey of the current landscape of these testbeds. We derive 62 characteristics of testbeds based on the well-known sense-plan-act paradigm and offer an online table comparing 22 small-scale testbeds based on these characteristics. The online table is hosted on our designated public webpage www.cpm-remote.de/testbeds, and we invite testbed creators and developers to contribute to it. We closely examine nine testbeds in this paper, demonstrating how the derived characteristics can be used to present testbeds. Furthermore, we discuss three ongoing challenges concerning small-scale testbeds that we identified, i.e., small-scale to full-scale transition, sustainability, and power and resource management. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 16 pages, 11 figures, 1 table. This work has been submitted to the IEEE Robotics & Automation Magazine for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2408.10886 [pdf, other]

doi 10.1109/RE59067.2024.00046

Leveraging LLMs for the Quality Assurance of Software Requirements

Authors: Sebastian Lubos, Alexander Felfernig, Thi Ngoc Trang Tran, Damian Garber, Merfat El Mansi, Seda Polat Erdeniz, Viet-Man Le

Abstract: Successful software projects depend on the quality of software requirements. Creating high-quality requirements is a crucial step toward successful software development. Effective support in this area can significantly reduce development costs and enhance the software quality. In this paper, we introduce and assess the capabilities of a Large Language Model (LLM) to evaluate the quality characteri… ▽ More Successful software projects depend on the quality of software requirements. Creating high-quality requirements is a crucial step toward successful software development. Effective support in this area can significantly reduce development costs and enhance the software quality. In this paper, we introduce and assess the capabilities of a Large Language Model (LLM) to evaluate the quality characteristics of software requirements according to the ISO 29148 standard. We aim to further improve the support of stakeholders engaged in requirements engineering (RE). We show how an LLM can assess requirements, explain its decision-making process, and examine its capacity to propose improved versions of requirements. We conduct a study with software engineers to validate our approach. Our findings emphasize the potential of LLMs for improving the quality of software requirements. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Accepted for publication at the RE@Next! track of RE 2024

arXiv:2408.08781 [pdf, other]

Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions

Authors: Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos, Vu Le, Nick McKenna, Carina Suzana Negreanu, Chris Parnin, Advait Sarkar

Abstract: LLMs-as-a-judge is a recently popularized method which replaces human judgements in task evaluation (Zheng et al. 2024) with automatic evaluation using LLMs. Due to widespread use of RLHF (Reinforcement Learning from Human Feedback), state-of-the-art LLMs like GPT4 and Llama3 are expected to have strong alignment with human preferences when prompted for a quality judgement, such as the coherence o… ▽ More LLMs-as-a-judge is a recently popularized method which replaces human judgements in task evaluation (Zheng et al. 2024) with automatic evaluation using LLMs. Due to widespread use of RLHF (Reinforcement Learning from Human Feedback), state-of-the-art LLMs like GPT4 and Llama3 are expected to have strong alignment with human preferences when prompted for a quality judgement, such as the coherence of a text. While this seems beneficial, it is not clear whether the assessments by an LLM-as-a-judge constitute only an evaluation based on the instructions in the prompts, or reflect its preference for high-quality data similar to its fine-tune data. To investigate how much influence prompting the LLMs-as-a-judge has on the alignment of AI judgements to human judgements, we analyze prompts with increasing levels of instructions about the target quality of an evaluation, for several LLMs-as-a-judge. Further, we compare to a prompt-free method using model perplexity as a quality measure instead. We aggregate a taxonomy of quality criteria commonly used across state-of-the-art evaluations with LLMs and provide this as a rigorous benchmark of models as judges. Overall, we show that the LLMs-as-a-judge benefit only little from highly detailed instructions in prompts and that perplexity can sometimes align better with human judgements than prompting, especially on textual quality. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.05865 [pdf, ps, other]

The complexity of strong conflict-free vertex-connection $k$-colorability

Authors: Sun-Yuan Hsieh, Hoang-Oanh Le, Van Bang Le, Sheng-Lung Peng

Abstract: We study a new variant of graph coloring by adding a connectivity constraint. A path in a vertex-colored graph is called conflict-free if there is a color that appears exactly once on its vertices. A connected graph $G$ is said to be strongly conflict-free vertex-connection $k$-colorable if $G$ admits a vertex $k$-coloring such that any two distinct vertices of $G$ are connected by a conflict-free… ▽ More We study a new variant of graph coloring by adding a connectivity constraint. A path in a vertex-colored graph is called conflict-free if there is a color that appears exactly once on its vertices. A connected graph $G$ is said to be strongly conflict-free vertex-connection $k$-colorable if $G$ admits a vertex $k$-coloring such that any two distinct vertices of $G$ are connected by a conflict-free $shortest$ path. Among others, we show that deciding whether a given graph is strongly conflict-free vertex-connection $3$-colorable is NP-complete even when restricted to $3$-colorable graphs with diameter $3$, radius $2$ and domination number $3$, and, assuming the Exponential Time Hypothesis (ETH), cannot be solved in $2^{o(n)}$ time on such restricted input graphs with $n$ vertices. This hardness result is quite strong when compared to the ordinary $3$-COLORING problem: it is known that $3$-COLORING is solvable in polynomial time in graphs with bounded domination number, and assuming ETH, cannot be solved in $2^{o(\sqrt{n})}$ time in $n$-vertex graphs with diameter $3$ and radius $2$. On the positive side, we point out that a strong conflict-free vertex-connection coloring with minimum color number of a given split graph or a co-bipartite graph can be computed in polynomial time. △ Less

Submitted 14 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

Comments: The full version of a COCOON 2024 paper

arXiv:2407.21787 [pdf, other]

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Authors: Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini

Abstract: Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of… ▽ More Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models. Moreover, using current API pricing, amplifying the cheaper DeepSeek model with five samples is more cost-effective and solves more issues than paying a premium for one sample from GPT-4o or Claude 3.5 Sonnet. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. Finally, we find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers. When solving math word problems from GSM8K and MATH, coverage with Llama-3 models grows to over 95% with 10,000 samples. However, common methods to pick correct solutions from a sample collection, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.17790 [pdf, other]

Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

Authors: Van Duy Tran, Tran Xuan Hieu Le, Thi Diem Tran, Hoai Luan Pham, Vu Trung Duong Le, Tuan Hai Vu, Van Tinh Nguyen, Yasuhiko Nakashima

Abstract: Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on t… ▽ More Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation. △ Less

Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, 2 tables

arXiv:2407.10657 [pdf, other]

An Empirical Study of Validating Synthetic Data for Formula Generation

Authors: Usneek Singh, José Cambronero, Sumit Gulwani, Aditya Kanade, Anirudh Khatry, Vu Le, Mukul Singh, Gust Verbruggen

Abstract: Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and limiting the ability to fine-tune them. Given a corpus of formulas, we can use a(nother) model to generate synthetic natural language utterances for fine-tuning. However, it is important to validate wh… ▽ More Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and limiting the ability to fine-tune them. Given a corpus of formulas, we can use a(nother) model to generate synthetic natural language utterances for fine-tuning. However, it is important to validate whether the NL generated by the LLM is indeed accurate to be beneficial for fine-tuning. In this paper, we provide empirical results on the impact of validating these synthetic training examples with surrogate objectives that evaluate the accuracy of the synthetic annotations. We demonstrate that validation improves performance over raw data across four models (2 open and 2 closed weight). Interestingly, we show that although validation tends to prune more challenging examples, it increases the complexity of problems that models can solve after being fine-tuned on validated data. △ Less

Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10227 [pdf, other]

doi 10.1109/ICST60714.2024.00017

KAT: Dependency-aware Automated API Testing with Large Language Models

Authors: Tri Le, Thien Tran, Duy Cao, Vy Le, Tien Nguyen, Vu Nguyen

Abstract: API testing has increasing demands for software companies. Prior API testing tools were aware of certain types of dependencies that needed to be concise between operations and parameters. However, their approaches, which are mostly done manually or using heuristic-based algorithms, have limitations due to the complexity of these dependencies. In this paper, we present KAT (Katalon API Testing), a… ▽ More API testing has increasing demands for software companies. Prior API testing tools were aware of certain types of dependencies that needed to be concise between operations and parameters. However, their approaches, which are mostly done manually or using heuristic-based algorithms, have limitations due to the complexity of these dependencies. In this paper, we present KAT (Katalon API Testing), a novel AI-driven approach that leverages the large language model GPT in conjunction with advanced prompting techniques to autonomously generate test cases to validate RESTful APIs. Our comprehensive strategy encompasses various processes to construct an operation dependency graph from an OpenAPI specification and to generate test scripts, constraint validation scripts, test cases, and test data. Our evaluation of KAT using 12 real-world RESTful services shows that it can improve test coverage, detect more undocumented status codes, and reduce false positives in these services in comparison with a state-of-the-art automated test generation tool. These results indicate the effectiveness of using the large language model for generating test scripts and data for API testing. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ICST 2024

arXiv:2407.02086 [pdf, ps, other]

On polynomial kernelization for Stable Cutset

Authors: Stefan Kratsch, Van Bang Le

Abstract: A stable cutset in a graph $G$ is a set $S\subseteq V(G)$ such that vertices of $S$ are pairwise non-adjacent and such that $G-S$ is disconnected, i.e., it is both stable (or independent) set and a cutset (or separator). Unlike general cutsets, it is $NP$-complete to determine whether a given graph $G$ has any stable cutset. Recently, Rauch et al.\ [FCT 2023] gave a number of fixed-parameter tract… ▽ More A stable cutset in a graph $G$ is a set $S\subseteq V(G)$ such that vertices of $S$ are pairwise non-adjacent and such that $G-S$ is disconnected, i.e., it is both stable (or independent) set and a cutset (or separator). Unlike general cutsets, it is $NP$-complete to determine whether a given graph $G$ has any stable cutset. Recently, Rauch et al.\ [FCT 2023] gave a number of fixed-parameter tractable (FPT) algorithms, time $f(k)\cdot |V(G)|^c$, for Stable Cutset under a variety of parameters $k$ such as the size of a (given) dominating set, the size of an odd cycle transversal, or the deletion distance to $P_5$-free graphs. Earlier works imply FPT algorithms relative to clique-width and relative to solution size. We complement these findings by giving the first results on the existence of polynomial kernelizations for \stablecutset, i.e., efficient preprocessing algorithms that return an equivalent instance of size polynomial in the parameter value. Under the standard assumption that $NP\nsubseteq coNP/poly$, we show that no polynomial kernelization is possible relative to the deletion distance to a single path, generalizing deletion distance to various graph classes, nor by the size of a (given) dominating set. We also show that under the same assumption no polynomial kernelization is possible relative to solution size, i.e., given $(G,k)$ answering whether there is a stable cutset of size at most $k$. On the positive side, we show polynomial kernelizations for parameterization by modulators to a single clique, to a cluster or a co-cluster graph, and by twin cover. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: For Dieter Kratsch on his 65th birthday

arXiv:2407.01983 [pdf, other]

SADL: An Effective In-Context Learning Method for Compositional Visual QA

Authors: Long Hoang Dang, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

Abstract: Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying patterns and transfer this latent knowledge to answer new questions about unseen images without the need for expensive supervised fine-tuning. However… ▽ More Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying patterns and transfer this latent knowledge to answer new questions about unseen images without the need for expensive supervised fine-tuning. However, designing effective vision-language prompts, especially for compositional questions, remains poorly understood. Adapting language-only ICL techniques may not necessarily work because we need to bridge the visual-linguistic semantic gap: Symbolic concepts must be grounded in visual content, which does not share the syntactic linguistic structures. This paper introduces SADL, a new visual-linguistic prompting framework for the task. SADL revolves around three key components: SAmpling, Deliberation, and Pseudo-Labeling of image-question pairs. Given an image-question query, we sample image-question pairs from the training data that are in semantic proximity to the query. To address the compositional nature of questions, the deliberation step decomposes complex questions into a sequence of subquestions. Finally, the sequence is progressively annotated one subquestion at a time to generate a sequence of pseudo-labels. We investigate the behaviors of SADL under OpenFlamingo on large-scale Visual QA datasets, namely GQA, GQA-OOD, CLEVR, and CRIC. The evaluation demonstrates the critical roles of sampling in the neighborhood of the image, the decomposition of complex questions, and the accurate pairing of the subquestions and labels. These findings do not always align with those found in language-only ICL, suggesting fresh insights in vision-language settings. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.06156 [pdf, other]

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

Authors: Yi Xiao, Van-Hoang Le, Hongyu Zhang

Abstract: Log parsing, the process of converting raw log messages into structured formats, is an important initial step for automated analysis of logs of large-scale software systems. Traditional log parsers often rely on heuristics or handcrafted features, which may not generalize well across diverse log sources or require extensive model tuning. Recently, some log parsers have utilized powerful generative… ▽ More Log parsing, the process of converting raw log messages into structured formats, is an important initial step for automated analysis of logs of large-scale software systems. Traditional log parsers often rely on heuristics or handcrafted features, which may not generalize well across diverse log sources or require extensive model tuning. Recently, some log parsers have utilized powerful generative capabilities of large language models (LLMs). However, they heavily rely on demonstration examples, resulting in substantial overhead in LLM invocations. To address these issues, we propose LogBatcher, a cost-effective LLM-based log parser that requires no training process or labeled data. To leverage latent characteristics of log data and reduce the overhead, we divide logs into several partitions through clustering. Then we perform a cache matching process to match logs with previously parsed log templates. Finally, we provide LLMs with better prompt context specialized for log parsing by batching a group of logs from each partition. We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective and efficient for log parsing. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04520 [pdf, other]

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for a tool-use environment for evaluating LLMs on Planning. We observe that NATURAL PLAN is a challenging benchmark for state of the art models. For example, in Trip Planning, GPT-4 and Gemini 1.5 Pro could only achieve 31.1% and 34.8% solve rate respectively. We find that model performance drops drastically as the complexity of the problem increases: all models perform below 5% when there are 10 cities, highlighting a significant gap in planning in natural language for SoTA LLMs. We also conduct extensive ablation studies on NATURAL PLAN to further shed light on the (in)effectiveness of approaches such as self-correction, few-shot generalization, and in-context planning with long-contexts on improving LLM planning. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.15130 [pdf, other]

OptLLM: Optimal Assignment of Queries to Large Language Models

Authors: Yueyue Liu, Hongyu Zhang, Yuantian Miao, Van-Hoang Le, Zhiqiang Li

Abstract: Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressi… ▽ More Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: This paper is accepted by ICWS 2024

arXiv:2405.07475 [pdf, other]

How Non-native English Speakers Use, Assess, and Select AI-Generated Paraphrases with Information Aids

Authors: Yewon Kim, Thanh-Long V. Le, Donghwi Kim, Mina Lee, Sung-Ju Lee

Abstract: Non-native English speakers (NNESs) often face challenges in achieving fluency in their written English. AI paraphrasing tools have the potential to improve their writing by suggesting more fluent paraphrases to their original sentences. Yet, the effectiveness of these tools depends on the user's ability to accurately assess and select context-appropriate suggestions, which is a significant challe… ▽ More Non-native English speakers (NNESs) often face challenges in achieving fluency in their written English. AI paraphrasing tools have the potential to improve their writing by suggesting more fluent paraphrases to their original sentences. Yet, the effectiveness of these tools depends on the user's ability to accurately assess and select context-appropriate suggestions, which is a significant challenge for those with limited English proficiency. This paper explores how NNESs utilize a paraphrasing tool augmented with information aids designed to facilitate the assessment of paraphrased suggestions. Through a formative study with 15 NNESs, we identify their specific needs when paraphrasing with AI, leading to the design of a paraphrasing tool integrated with five types of information aids, termed "support features." A user study with 22 NNESs demonstrates their heavy reliance on the paraphrasing functionality throughout the writing process, where they leverage the support features to assess and select suggestions efficiently and comprehensively. When equipped with the support features, NNESs experience enhanced writing experience in efficiency, confidence, and trust. Our findings contribute to the HCI community by (i) identifying the distinct needs of NNESs in AI paraphrasing tools, (ii) elucidating how NNESs use paraphrasing tools with support features, and (iii) offering design implications for the development of more effective AI paraphrasing tools tailored to NNESs' requirements. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.03411 [pdf, other]

Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces

Authors: Phone Thiha Kyaw, Anh Vu Le, Lim Yi, Prabakaran Veerajagadheswar, Mohan Rajesh Elara, Dinh Tung Vo, Minh Bui Vu

Abstract: Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approxima… ▽ More Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approximating and searching the problem domain through random sampling. However, due to its low sampling efficiency and slow convergence rate, research has proposed many variants of RRT*, incorporating different heuristics and sampling strategies to overcome the constraints in complex planning problems. Yet, these approaches address specific convergence aspects of RRT* limitations, leaving a need for a sampling-based algorithm that can quickly find better solutions in complex high-dimensional state spaces with a faster convergence rate for practical motion planning applications. This article unifies and leverages the greedy search and heuristic techniques used in various RRT* variants to develop a greedy version of the anytime Rapidly-exploring Random Trees algorithm, denoted as Greedy RRT* (G-RRT*). It improves the initial solution-finding time of RRT* by maintaining two trees rooted at both the start and goal ends, advancing toward each other using greedy connection heuristics. It also accelerates the convergence rate of RRT* by introducing a greedy version of direct informed sampling procedure, which guides the sampling towards the promising region of the problem domain based on heuristics. We validate our approach on simulated planning problems, manipulation problems on Barrett WAM Arms, and on a self-reconfigurable robot, Panthera. Results show that G-RRT* produces asymptotically optimal solution paths and outperforms state-of-the-art RRT* variants, especially in high-dimensional planning problems. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: To be published at the International Journal of Robotics Research (IJRR)

arXiv:2405.01556 [pdf, other]

Semantically Aligned Question and Code Generation for Automated Insight Generation

Authors: Ananya Singha, Bhavya Chopra, Anirudh Khatry, Sumit Gulwani, Austin Z. Henley, Vu Le, Chris Parnin, Mukul Singh, Gust Verbruggen

Abstract: Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to… ▽ More Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions. △ Less

Submitted 21 March, 2024; originally announced May 2024.

arXiv:2405.00688 [pdf]

Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis

Authors: Yuchen Du, Tho V. Le

Abstract: This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification… ▽ More This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification tasks in terms of accuracy, precision, recall, and F1 score. Our results indicate that, in binary classification tasks, the Support Vector Machine (SVM) model using Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram get the highest accuracy. In ternary classification tasks, the model using Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU) significantly outperforms other machine learning models, achieving an accuracy, precision, recall, and F1 score of 0.78. Additionally, we employ the Latent Dirichlet Allocation model to generate 10 topics from the comments to explore the public's underlying views on SDRs. Drawing from these findings, we propose targeted recommendations for shaping future policies concerning SDRs. This work provides valuable insights for stakeholders in the SDR sector regarding social perception, interaction, and safety. △ Less

Submitted 9 March, 2024; originally announced May 2024.

Comments: 34 pages, 7 figures, 2 tables

arXiv:2403.18802 [pdf, other]

Long-form factuality in large language models

Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality. △ Less

Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.06119 [pdf, other]

CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval

Authors: Doanh C. Bui, Thinh V. Le, Ba Hung Ngo, Tae Jong Choi

Abstract: Person attribute recognition and attribute-based retrieval are two core human-centric tasks. In the recognition task, the challenge is specifying attributes depending on a person's appearance, while the retrieval task involves searching for matching persons based on attribute queries. There is a significant relationship between recognition and retrieval tasks. In this study, we demonstrate that if… ▽ More Person attribute recognition and attribute-based retrieval are two core human-centric tasks. In the recognition task, the challenge is specifying attributes depending on a person's appearance, while the retrieval task involves searching for matching persons based on attribute queries. There is a significant relationship between recognition and retrieval tasks. In this study, we demonstrate that if there is a sufficiently robust network to solve person attribute recognition, it can be adapted to facilitate better performance for the retrieval task. Another issue that needs addressing in the retrieval task is the modality gap between attribute queries and persons' images. Therefore, in this paper, we present CLEAR, a unified network designed to address both tasks. We introduce a robust cross-transformers network to handle person attribute recognition. Additionally, leveraging a pre-trained language model, we construct pseudo-descriptions for attribute queries and introduce an effective training strategy to train only a few additional parameters for adapters, facilitating the handling of the retrieval task. Finally, the unified CLEAR model is evaluated on five benchmarks: PETA, PA100K, Market-1501, RAPv2, and UPAR-2024. Without bells and whistles, CLEAR achieves state-of-the-art performance or competitive results for both tasks, significantly outperforming other competitors in terms of person retrieval performance on the widely-used Market-1501 dataset. △ Less

Submitted 30 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05855 [pdf, other]

doi 10.1145/3613904.3642631

Assessing User Apprehensions About Mixed Reality Artifacts and Applications: The Mixed Reality Concerns (MRC) Questionnaire

Authors: Christopher Katins, Paweł W. Woźniak, Aodi Chen, Ihsan Tumay, Luu Viet Trinh Le, John Uschold, Thomas Kosch

Abstract: Current research in Mixed Reality (MR) presents a wide range of novel use cases for blending virtual elements with the real world. This yet-to-be-ubiquitous technology challenges how users currently work and interact with digital content. While offering many potential advantages, MR technologies introduce new security, safety, and privacy challenges. Thus, it is relevant to understand users' appre… ▽ More Current research in Mixed Reality (MR) presents a wide range of novel use cases for blending virtual elements with the real world. This yet-to-be-ubiquitous technology challenges how users currently work and interact with digital content. While offering many potential advantages, MR technologies introduce new security, safety, and privacy challenges. Thus, it is relevant to understand users' apprehensions towards MR technologies, ranging from security concerns to social acceptance. To address this challenge, we present the Mixed Reality Concerns (MRC) Questionnaire, designed to assess users' concerns towards MR artifacts and applications systematically. The development followed a structured process considering previous work, expert interviews, iterative refinements, and confirmatory tests to analytically validate the questionnaire. The MRC Questionnaire offers a new method of assessing users' critical opinions to compare and assess novel MR artifacts and applications regarding security, privacy, social implications, and trust. △ Less

Submitted 5 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.11734 [pdf, other]

Solving Data-centric Tasks using Large Language Models

Authors: Shraddha Barke, Christian Poelitz, Carina Suzana Negreanu, Benjamin Zorn, José Cambronero, Andrew D. Gordon, Vu Le, Elnaz Nouri, Nadia Polikarpova, Advait Sarkar, Brian Slininger, Neil Toronto, Jack Williams

Abstract: Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how… ▽ More Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline. △ Less

Submitted 24 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Paper accepted to NAACL 2024 (Findings)

arXiv:2402.04931 [pdf, other]

Complexity of the (Connected) Cluster Vertex Deletion problem on $H$-free graphs

Authors: Hoang-Oanh Le, Van Bang Le

Abstract: The well-known Cluster Vertex Deletion problem (CVD) asks for a given graph $G$ and an integer $k$ whether it is possible to delete a set $S$ of at most $k$ vertices of $G$ such that the resulting graph $G-S$ is a cluster graph (a disjoint union of cliques). We give a complete characterization of graphs $H$ for which CVD on $H$-free graphs is polynomially solvable and for which it is NP-complete.… ▽ More The well-known Cluster Vertex Deletion problem (CVD) asks for a given graph $G$ and an integer $k$ whether it is possible to delete a set $S$ of at most $k$ vertices of $G$ such that the resulting graph $G-S$ is a cluster graph (a disjoint union of cliques). We give a complete characterization of graphs $H$ for which CVD on $H$-free graphs is polynomially solvable and for which it is NP-complete. Moreover, in the NP-completeness cases, CVD cannot be solved in sub-exponential time in the vertex number of the $H$-free input graphs unless the Exponential-Time Hypothesis fails. We also consider the connected variant of CVD, the Connected Cluster Vertex Deletion problem (CCVD), in which the set $S$ has to induce a connected subgraph of $G$. It turns out that CCVD admits the same complexity dichotomy for $H$-free graphs. Our results enlarge a list of rare dichotomy theorems for well-studied problems on $H$-free graphs. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Extended version of a MFCS 2022 paper. To appear in Theory of Computing Systems

arXiv:2402.03620 [pdf, other]

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Authors: Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasonin… ▽ More We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 11 figures, 5 tables

arXiv:2312.12960 [pdf, ps, other]

Maximizing Matching Cuts

Authors: Van Bang Le, Felicia Lucke, Daniël Paulusma, Bernard Ries

Abstract: A matching cut in a graph G is an edge cut of G that is also a matching. This short survey gives an overview of old and new results and open problems for Maximum Matching Cut, which is to determine the size of a largest matching cut in a graph. We also compare this problem with the related problems Matching Cut, Minimum Matching Cut, and Perfect Matching Cut, which are to determine if a graph has… ▽ More A matching cut in a graph G is an edge cut of G that is also a matching. This short survey gives an overview of old and new results and open problems for Maximum Matching Cut, which is to determine the size of a largest matching cut in a graph. We also compare this problem with the related problems Matching Cut, Minimum Matching Cut, and Perfect Matching Cut, which are to determine if a graph has a matching cut; the size of a smallest matching cut in a graph; and if a graph has a matching cut that is a perfect matching, respectively. Moreover, we discuss a relationship between Maximum Matching Cut and Max Cut, which is to determine the size of a largest edge cut in a graph, as well as a relationship between Minimum Matching Cut and Min Cut, which is to determine the size of a smallest edge cut in a graph. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.11524 [pdf, other]

Assessing GPT4-V on Structured Reasoning Tasks

Authors: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Gust Verbruggen

Abstract: Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chai… ▽ More Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 9 pages, 9 figures

arXiv:2312.08472 [pdf, other]

AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

Authors: Esteban Real, Yao Chen, Mirko Rossini, Connal de Souza, Manav Garg, Akhil Verghese, Moritz Firsching, Quoc V. Le, Ekin Dogus Cubuk, David H. Park

Abstract: Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, su… ▽ More Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, such as the popular float32. In this study, we show that when aiming for limited precision, existing approximation methods can be outperformed by programs automatically discovered from scratch by a simple evolutionary algorithm. In particular, over real numbers, our method can approximate the exponential function reaching orders of magnitude more precision for a given number of operations when compared to previous approaches. More practically, over float32 numbers and constrained to less than 1 ULP of error, the same method attains a speedup over baselines by generating code that triggers better XLA/LLVM compilation paths. In other words, in both cases, evolution searched a vast space of possible programs, without knowledge of mathematics, to discover previously unknown optimized approximations to high precision, for the first time. We also give evidence that these results extend beyond the exponential. The ubiquity of transcendental functions suggests that our method has the potential to reduce the cost of scientific computing applications. △ Less

Submitted 13 December, 2023; originally announced December 2023.

ACM Class: I.2.2; I.2.6; G.1.2

arXiv:2312.03785 [pdf, ps, other]

Sports Recommender Systems: Overview and Research Issues

Authors: Alexander Felfernig, Manfred Wundara, Thi Ngoc Trang Tran, Viet-Man Le, Sebastian Lubos, Seda Polat-Erdeniz

Abstract: Sports recommender systems receive an increasing attention due to their potential of fostering healthy living, improving personal well-being, and increasing performances in sport. These systems support people in sports, for example, by the recommendation of healthy and performance boosting food items, the recommendation of training practices, talent and team recommendation, and the recommendation… ▽ More Sports recommender systems receive an increasing attention due to their potential of fostering healthy living, improving personal well-being, and increasing performances in sport. These systems support people in sports, for example, by the recommendation of healthy and performance boosting food items, the recommendation of training practices, talent and team recommendation, and the recommendation of specific tactics in competitions. With applications in the virtual world, for example, the recommendation of maps or opponents in e-sports, these systems already transcend conventional sports scenarios where physical presence is needed. On the basis of different working examples, we present an overview of sports recommender systems applications and techniques. Overall, we analyze the related state-of-the-art and discuss open research issues. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Article under review in the Journal of Intelligent Information Systems (Springer JIIS)

ACM Class: I.2; J.3

arXiv:2311.17317 [pdf]

Digital Twins for Logistics and Supply Chain Systems: Literature Review, Conceptual Framework, Research Potential, and Practical Challenges

Authors: Tho V. Le, Ruoling Fan

Abstract: To facilitate an effective, efficient, transparent, and timely decision-making process as well as to provide guidelines for industry planning and public policy development, a conceptual framework of digital twins (DTs) for logistics and supply chain systems (LSCS) is needed. This paper first introduces the background of the logistics and supply chain industry, the DT and its potential benefits, an… ▽ More To facilitate an effective, efficient, transparent, and timely decision-making process as well as to provide guidelines for industry planning and public policy development, a conceptual framework of digital twins (DTs) for logistics and supply chain systems (LSCS) is needed. This paper first introduces the background of the logistics and supply chain industry, the DT and its potential benefits, and the motivations and scope of this research. The literature review indicates research and practice gaps and needs that motivate proposing a new conceptual DT framework for LSCS. As each element of the new framework has different requirements and goals, it initiates new research opportunities and creates practical implementation challenges. As such, the future of DT computation involves advanced analytics and modeling techniques to address the new agenda's requirements. Finally, ideas on the next steps to deploy a transparent, trustworthy, and resilient DT for LSCS are presented. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 45 pages

arXiv:2311.08502 [pdf, other]

Variational Quantum Eigensolver with Constraints (VQEC): Solving Constrained Optimization Problems via VQE

Authors: Thinh Viet Le, Vassilis Kekatos

Abstract: Variational quantum approaches have shown great promise in finding near-optimal solutions to computationally challenging tasks. Nonetheless, enforcing constraints in a disciplined fashion has been largely unexplored. To address this gap, this work proposes a hybrid quantum-classical algorithmic paradigm termed VQEC that extends the celebrated VQE to handle optimization with constraints. As with th… ▽ More Variational quantum approaches have shown great promise in finding near-optimal solutions to computationally challenging tasks. Nonetheless, enforcing constraints in a disciplined fashion has been largely unexplored. To address this gap, this work proposes a hybrid quantum-classical algorithmic paradigm termed VQEC that extends the celebrated VQE to handle optimization with constraints. As with the standard VQE, the vector of optimization variables is captured by the state of a variational quantum circuit (VQC). To deal with constraints, VQEC optimizes a Lagrangian function classically over both the VQC parameters as well as the dual variables associated with constraints. To comply with the quantum setup, variables are updated via a perturbed primal-dual method leveraging the parameter shift rule. Among a wide gamut of potential applications, we showcase how VQEC can approximately solve quadratically-constrained binary optimization (QCBO) problems, find stochastic binary policies satisfying quadratic constraints on the average and in probability, and solve large-scale linear programs (LP) over the probability simplex. Under an assumption on the error for the VQC to approximate an arbitrary probability mass function (PMF), we provide bounds on the optimality gap attained by a VQC. Numerical tests on a quantum simulator investigate the effect of various parameters and corroborate that VQEC can generate high-quality solutions. △ Less

Submitted 26 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 22 pages, 13 figures, 1 table

arXiv:2311.02803 [pdf, other]

Fast and Interpretable Face Identification for Out-Of-Distribution Data Using Vision Transformers

Authors: Hai Phan, Cindy Le, Vu Le, Yihui He, Anh Totti Nguyen

Abstract: Most face identification approaches employ a Siamese neural network to compare two images at the image embedding level. Yet, this technique can be subject to occlusion (e.g. faces with masks or sunglasses) and out-of-distribution data. DeepFace-EMD (Phan et al. 2022) reaches state-of-the-art accuracy on out-of-distribution data by first comparing two images at the image level, and then at the patc… ▽ More Most face identification approaches employ a Siamese neural network to compare two images at the image embedding level. Yet, this technique can be subject to occlusion (e.g. faces with masks or sunglasses) and out-of-distribution data. DeepFace-EMD (Phan et al. 2022) reaches state-of-the-art accuracy on out-of-distribution data by first comparing two images at the image level, and then at the patch level. Yet, its later patch-wise re-ranking stage admits a large $O(n^3 \log n)$ time complexity (for $n$ patches in an image) due to the optimal transport optimization. In this paper, we propose a novel, 2-image Vision Transformers (ViTs) that compares two images at the patch level using cross-attention. After training on 2M pairs of images on CASIA Webface (Yi et al. 2014), our model performs at a comparable accuracy as DeepFace-EMD on out-of-distribution data, yet at an inference speed more than twice as fast as DeepFace-EMD (Phan et al. 2022). In addition, via a human study, our model shows promising explainability through the visualization of cross-attention. We believe our work can inspire more explorations in using ViTs for face identification. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 20 pages, 15 Figures

arXiv:2310.17680

CodeFusion: A Pre-trained Diffusion Model for Code Generation

Authors: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

Abstract: Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses thi… ▽ More Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality. △ Less

Submitted 1 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Contains inappropriately sourced conjecture of OpenAI's ChatGPT parameter count from www.forbes.com/sites/forbestechcouncil/2023/02/17/is-bigger-better-why-the-chatgpt-vs-gpt-3-vs-gpt-4-battle-is-just-a-family-chat, a citation which was omitted. The authors do not have direct knowledge or verification of this information, and relied solely on this article, which may lead to public confusion

arXiv:2310.17475 [pdf]

Analytical model for large-scale design of sidewalk delivery robot systems

Authors: Hai Yang, Yuchen Du, Tho V. Le, Joseph Y. J. Chow

Abstract: With the rise in demand for local deliveries and e-commerce, robotic deliveries are being considered as efficient and sustainable solutions. However, the deployment of such systems can be highly complex due to numerous factors involving stochastic demand, stochastic charging and maintenance needs, complex routing, etc. We propose a model that uses continuous approximation methods for evaluating se… ▽ More With the rise in demand for local deliveries and e-commerce, robotic deliveries are being considered as efficient and sustainable solutions. However, the deployment of such systems can be highly complex due to numerous factors involving stochastic demand, stochastic charging and maintenance needs, complex routing, etc. We propose a model that uses continuous approximation methods for evaluating service trade-offs that consider the unique characteristics of large-scale sidewalk delivery robot systems used to serve online food deliveries. The model captures both the initial cost and the operation cost of the delivery system and evaluates the impact of constraints and operation strategies on the deployment. By minimizing the system cost, variables related to the system design can be determined. First, the minimization problem is formulated based on a homogeneous area, and the optimal system cost can be derived as a closed-form expression. By evaluating the expression, relationships between variables and the system cost can be directly obtained. We then apply the model in neighborhoods in New York City to evaluate the cost of deploying the sidewalk delivery robot system in a real-world scenario. The results shed light on the potential of deploying such a system in the future. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.17306

FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language

Authors: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Elnaz Nouri, Mohammad Raza, Gust Verbruggen

Abstract: Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can… ▽ More Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems. △ Less

Submitted 1 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Contains inappropriately sourced conjecture of OpenAI's ChatGPT parameter count from www.forbes.com/sites/forbestechcouncil/2023/02/17/is-bigger-better-why-the-chatgpt-vs-gpt-3-vs-gpt-4-battle-is-just-a-family-chat, a citation which was omitted. The authors do not have direct knowledge or verification of this information, and relied solely on this article, which may lead to public confusion

arXiv:2310.17228 [pdf, other]

TST$^\mathrm{R}$: Target Similarity Tuning Meets the Real World

Authors: Anirudh Khatry, Sumit Gulwani, Priyanshu Gupta, Vu Le, Ananya Singha, Mukul Singh, Gust Verbruggen

Abstract: Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the similarity between two NL inputs match the similarity between their associated code outputs. In this paper, we propose different methods to apply and improve TST… ▽ More Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the similarity between two NL inputs match the similarity between their associated code outputs. In this paper, we propose different methods to apply and improve TST in the real world. First, we replace the sentence transformer with embeddings from a larger model, which reduces sensitivity to the language distribution and thus provides more flexibility in synthetic generation of examples, and we train a tiny model that transforms these embeddings to a space where embedding similarity matches code similarity, which allows the model to remain a black box and only requires a few matrix multiplications at inference time. Second, we show how to efficiently select a smaller number of training examples to train the TST model. Third, we introduce a ranking-based evaluation for TST that does not require end-to-end code generation experiments, which can be expensive to perform. △ Less

Submitted 28 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted for EMNLP-Findings, 2023

arXiv:2310.10358 [pdf, other]

Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs

Authors: Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, Chris Parnin

Abstract: Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In… ▽ More Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06964 [pdf, other]

Multi-Robot Cooperative Navigation in Crowds: A Game-Theoretic Learning-Based Model Predictive Control Approach

Authors: Viet-Anh Le, Vaishnav Tadiparthi, Behdad Chalaki, Hossein Nourkhiz Mahjoub, Jovin D'sa, Ehsan Moradi-Pari, Andreas A. Malikopoulos

Abstract: In this paper, we develop a control framework for the coordination of multiple robots as they navigate through crowded environments. Our framework comprises of a local model predictive control (MPC) for each robot and a social long short-term memory model that forecasts pedestrians' trajectories. We formulate the local MPC formulation for each individual robot that includes both individual and sha… ▽ More In this paper, we develop a control framework for the coordination of multiple robots as they navigate through crowded environments. Our framework comprises of a local model predictive control (MPC) for each robot and a social long short-term memory model that forecasts pedestrians' trajectories. We formulate the local MPC formulation for each individual robot that includes both individual and shared objectives, in which the latter encourages the emergence of coordination among robots. Next, we consider the multi-robot navigation and human-robot interaction, respectively, as a potential game and a two-player game, then employ an iterative best response approach to solve the resulting optimization problems in a centralized and distributed fashion. Finally, we demonstrate the effectiveness of coordination among robots in simulated crowd navigation. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06117 [pdf, other]

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, Denny Zhou

Abstract: We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with… ▽ More We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L, GPT-4 and Llama2-70B models, and observe substantial performance gains on various challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU (Physics and Chemistry) by 7% and 11% respectively, TimeQA by 27%, and MuSiQue by 7%. △ Less

Submitted 12 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2310.02658 [pdf, other]

Solving Multi-Configuration Problems: A Performance Analysis with Choco Solver

Authors: Benjamin Ritz, Alexander Felfernig, Viet-Man Le, Sebastian Lubos

Abstract: In many scenarios, configurators support the configuration of a solution that satisfies the preferences of a single user. The concept of \emph{multi-configuration} is based on the idea of configuring a set of configurations. Such a functionality is relevant in scenarios such as the configuration of personalized exams, the configuration of project teams, and the configuration of different trips for… ▽ More In many scenarios, configurators support the configuration of a solution that satisfies the preferences of a single user. The concept of \emph{multi-configuration} is based on the idea of configuring a set of configurations. Such a functionality is relevant in scenarios such as the configuration of personalized exams, the configuration of project teams, and the configuration of different trips for individual members of a tourist group (e.g., when visiting a specific city). In this paper, we exemplify the application of multi-configuration for generating individualized exams. We also provide a constraint solver performance analysis which helps to gain some insights into corresponding performance issues. △ Less

Submitted 19 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The paper was presented at ConfWS'23: 25th International Workshop on Configuration, September 6-7, 2023, Málaga, Spain and is published in the conference proceedings: https://ceur-ws.org/Vol-3509/

arXiv:2309.16838 [pdf, other]

Social Navigation in Crowded Environments with Model Predictive Control and Deep Learning-Based Human Trajectory Prediction

Authors: Viet-Anh Le, Behdad Chalaki, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Jovin D'sa, Ehsan Moradi-Pari

Abstract: Crowd navigation has received increasing attention from researchers over the last few decades, resulting in the emergence of numerous approaches aimed at addressing this problem to date. Our proposed approach couples agent motion prediction and planning to avoid the freezing robot problem while simultaneously capturing multi-agent social interactions by utilizing a state-of-the-art trajectory pred… ▽ More Crowd navigation has received increasing attention from researchers over the last few decades, resulting in the emergence of numerous approaches aimed at addressing this problem to date. Our proposed approach couples agent motion prediction and planning to avoid the freezing robot problem while simultaneously capturing multi-agent social interactions by utilizing a state-of-the-art trajectory prediction model i.e., social long short-term memory model (Social-LSTM). Leveraging the output of Social-LSTM for the prediction of future trajectories of pedestrians at each time-step given the robot's possible actions, our framework computes the optimal control action using Model Predictive Control (MPC) for the robot to navigate among pedestrians. We demonstrate the effectiveness of our proposed approach in multiple scenarios of simulated crowd navigation and compare it against several state-of-the-art reinforcement learning-based methods. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 7 pages, 3 figures, 6 tables

arXiv:2309.09479 [pdf, other]

LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data

Authors: Xiaoyun Li, Hongyu Zhang, Van-Hoang Le, Pengfei Chen

Abstract: Log data is a crucial resource for recording system events and states during system execution. However, as systems grow in scale, log data generation has become increasingly explosive, leading to an expensive overhead on log storage, such as several petabytes per day in production. To address this issue, log compression has become a crucial task in reducing disk storage while allowing for further… ▽ More Log data is a crucial resource for recording system events and states during system execution. However, as systems grow in scale, log data generation has become increasingly explosive, leading to an expensive overhead on log storage, such as several petabytes per day in production. To address this issue, log compression has become a crucial task in reducing disk storage while allowing for further log analysis. Unfortunately, existing general-purpose and log-specific compression methods have been limited in their ability to utilize log data characteristics. To overcome these limitations, we conduct an empirical study and obtain three major observations on the characteristics of log data that can facilitate the log compression task. Based on these observations, we propose LogShrink, a novel and effective log compression method by leveraging commonality and variability of log data. An analyzer based on longest common subsequence and entropy techniques is proposed to identify the latent commonality and variability in log messages. The key idea behind this is that the commonality and variability can be exploited to shrink log data with a shorter representation. Besides, a clustering-based sequence sampler is introduced to accelerate the commonality and variability analyzer. The extensive experimental results demonstrate that LogShrink can exceed baselines in compression ratio by 16% to 356% on average while preserving a reasonable compression speed. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted by ICSE 2024 Research Track

arXiv:2309.03409 [pdf, other]

Large Language Models as Optimizers

Authors: Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen

Abstract: Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In eac… ▽ More Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to our main application in prompt optimization, where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks. Code at https://github.com/google-deepmind/opro. △ Less

Submitted 15 April, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: ICLR 2024; 42 pages, 26 figures, 15 tables. Code at https://github.com/google-deepmind/opro

arXiv:2308.10922 [pdf, other]

DataVinci: Learning Syntactic and Semantic String Repairs

Authors: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

Abstract: String data is common in real-world datasets: 67.6% of values in a sample of 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string data can have a significant impact on real users. While prior work has explored errors in string data, proposed approaches have often been limited to error detection or require that the user provide annot… ▽ More String data is common in real-world datasets: 67.6% of values in a sample of 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string data can have a significant impact on real users. While prior work has explored errors in string data, proposed approaches have often been limited to error detection or require that the user provide annotations, examples, or constraints to fix the errors. Furthermore, these systems have focused independently on syntactic errors or semantic errors in strings, but ignore that strings often contain both syntactic and semantic substrings. We introduce DataVinci, a fully unsupervised string data error detection and repair system. DataVinci learns regular-expression-based patterns that cover a majority of values in a column and reports values that do not satisfy such patterns as data errors. DataVinci can automatically derive edits to the data error based on the majority patterns and constraints learned over other columns without the need for further user interaction. To handle strings with both syntactic and semantic substrings, DataVinci uses an LLM to abstract (and re-concretize) portions of strings that are semantic prior to learning majority patterns and deriving edits. Because not all data can result in majority patterns, DataVinci leverages execution information from an existing program (which reads the target data) to identify and correct data repairs that would not otherwise be identified. DataVinci outperforms 7 baselines on both error detection and repair when evaluated on 4 existing and new benchmarks. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 13 pages

arXiv:2308.10756 [pdf, ps, other]

Computing Optimal Leaf Roots of Chordal Cographs in Linear Time

Authors: Van Bang Le, Christian Rosenke

Abstract: A graph G is a k-leaf power, for an integer k >= 2, if there is a tree T with leaf set V(G) such that, for all vertices x, y in V(G), the edge xy exists in G if and only if the distance between x and y in T is at most k. Such a tree T is called a k-leaf root of G. The computational problem of constructing a k-leaf root for a given graph G and an integer k, if any, is motivated by the challenge fro… ▽ More A graph G is a k-leaf power, for an integer k >= 2, if there is a tree T with leaf set V(G) such that, for all vertices x, y in V(G), the edge xy exists in G if and only if the distance between x and y in T is at most k. Such a tree T is called a k-leaf root of G. The computational problem of constructing a k-leaf root for a given graph G and an integer k, if any, is motivated by the challenge from computational biology to reconstruct phylogenetic trees. For fixed k, Lafond [SODA 2022] recently solved this problem in polynomial time. In this paper, we propose to study optimal leaf roots of graphs G, that is, the k-leaf roots of G with minimum k value. Thus, all k'-leaf roots of G satisfy k <= k'. In terms of computational biology, seeking optimal leaf roots is more justified as they yield more probable phylogenetic trees. Lafond's result does not imply polynomial-time computability of optimal leaf roots, because, even for optimal k-leaf roots, k may (exponentially) depend on the size of G. This paper presents a linear-time construction of optimal leaf roots for chordal cographs (also known as trivially perfect graphs). Additionally, it highlights the importance of the parity of the parameter k and provides a deeper insight into the differences between optimal k-leaf roots of even versus odd k. Keywords: k-leaf power, k-leaf root, optimal k-leaf root, trivially perfect leaf power, chordal cograph △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 22 pages, 2 figures, full version of the FCT 2023 paper

MSC Class: 05C85 ACM Class: F.2.2

arXiv:2308.10249 [pdf, other]

doi 10.1145/3623652.3623668

Towards a Formally Verified Security Monitor for VM-based Confidential Computing

Authors: Wojciech Ozga, Guerney D. H. Hunt, Michael V. Le, Elaine R. Palmer, Avraham Shinnar

Abstract: Confidential computing is a key technology for isolating high-assurance applications from the large amounts of untrusted code typical in modern systems. Existing confidential computing systems cannot be certified for use in critical applications, like systems controlling critical infrastructure, hardware security modules, or aircraft, as they lack formal verification. This paper presents an appr… ▽ More Confidential computing is a key technology for isolating high-assurance applications from the large amounts of untrusted code typical in modern systems. Existing confidential computing systems cannot be certified for use in critical applications, like systems controlling critical infrastructure, hardware security modules, or aircraft, as they lack formal verification. This paper presents an approach to formally modeling and proving a security monitor. It introduces a canonical architecture for virtual machine (VM)-based confidential computing systems. It abstracts processor-specific components and identifies a minimal set of hardware primitives required by a trusted security monitor to enforce security guarantees. We demonstrate our methodology and proposed approach with an example from our Rust implementation of the security monitor for RISC-V. △ Less

Submitted 1 October, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Journal ref: HASP '23: Proceedings of the 12th International Workshop on Hardware and Architectural Support for Security and Privacy, October 2023

arXiv:2308.07357 [pdf, other]

Demonstration of CORNET: A System For Learning Spreadsheet Formatting Rules By Example

Authors: Mukul Singh, Jose Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

Abstract: Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries in a column that are negative" or "bold all rows not containing error or failure." Unfortunately, users who want to exercise this functionality ne… ▽ More Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries in a column that are negative" or "bold all rows not containing error or failure." Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce CORNET, a system that automatically learns such conditional formatting rules from user examples. CORNET takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show CORNET in action as a simple add-in to Microsoft Excel. After the user provides one or two formatted cells as examples, CORNET generates formatting rule suggestions for the user to apply to the spreadsheet. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 4 Pages, VLDB 2023 Demonstration Track

arXiv:2308.06539 [pdf, other]

Phase Shift Design for RIS-Aided Cell-Free Massive MIMO with Improved Differential Evolution

Authors: Trinh Van Chien, Cuong V. Le, Huynh Thi Thanh Binh, Hien Quoc Ngo, Symeon Chatzinotas

Abstract: This paper proposes a novel phase shift design for cell-free massive multiple-input and multiple-output (MIMO) systems assisted by reconfigurable intelligent surface (RIS), which only utilizes channel statistics to achieve the uplink sum ergodic throughput maximization under spatial channel correlations. Due to the non-convexity and the scale of the derived optimization problem, we develop an impr… ▽ More This paper proposes a novel phase shift design for cell-free massive multiple-input and multiple-output (MIMO) systems assisted by reconfigurable intelligent surface (RIS), which only utilizes channel statistics to achieve the uplink sum ergodic throughput maximization under spatial channel correlations. Due to the non-convexity and the scale of the derived optimization problem, we develop an improved version of the differential evolution (DE) algorithm. The proposed scheme is capable of providing high-quality solutions within reasonable computing time. Numerical results demonstrate superior improvements of the proposed phase shift designs over the other benchmarks, particularly in scenarios where direct links are highly probable. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: 5 pages, 2 figures. Accepted by IEEE WCL

arXiv:2308.03958 [pdf, other]

Simple synthetic data reduces sycophancy in large language models

Authors: Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

Abstract: Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sy… ▽ More Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no correct answers (e.g., politics), we observe that both model scaling and instruction tuning significantly increase sycophancy for PaLM models up to 540B parameters. Second, we extend sycophancy evaluations to simple addition statements that are objectively incorrect, finding that despite knowing that these statements are wrong, language models will still agree with them if the user does as well. To reduce sycophancy, we present a straightforward synthetic-data intervention that takes public NLP tasks and encourages models to be robust to user opinions on these tasks. Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts. Code for generating synthetic data for intervention can be found at https://github.com/google/sycophancy-intervention. △ Less

Submitted 14 February, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.03290 [pdf, other]

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

Authors: Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

Abstract: Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed… ▽ More Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed-precision methods have performed either a post-training quantization search, which compromises on accuracy, or a differentiable quantization search, which leads to high memory usage from branching. Therefore, we propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models. We evaluate our search (FLIQS) on multiple convolutional and vision transformer networks to discover Pareto-optimal models. Our approach improves upon uniform precision, manual mixed-precision, and recent integer quantization search methods. With integer models, we increase the accuracy of ResNet-18 on ImageNet by 1.31% and ResNet-50 by 0.90% with equivalent model cost over previous methods. Additionally, for the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% compared to prior state-of-the-art FP8 models. Finally, we extend FLIQS to simultaneously search a joint quantization and neural architecture space and improve the ImageNet accuracy by 2.69% with similar model cost on a MobileNetV2 search space. △ Less

Submitted 1 May, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted to AutoML 2024

arXiv:2308.03139 [pdf, other]

Unfolded proximal neural networks for robust image Gaussian denoising

Authors: Hoang Trieu Vy Le, Audrey Repetti, Nelly Pustelnik

Abstract: A common approach to solve inverse imaging problems relies on finding a maximum a posteriori (MAP) estimate of the original unknown image, by solving a minimization problem. In thiscontext, iterative proximal algorithms are widely used, enabling to handle non-smooth functions and linear operators. Recently, these algorithms have been paired with deep learning strategies, to further improve the est… ▽ More A common approach to solve inverse imaging problems relies on finding a maximum a posteriori (MAP) estimate of the original unknown image, by solving a minimization problem. In thiscontext, iterative proximal algorithms are widely used, enabling to handle non-smooth functions and linear operators. Recently, these algorithms have been paired with deep learning strategies, to further improve the estimate quality. In particular, proximal neural networks (PNNs) have been introduced, obtained by unrolling a proximal algorithm as for finding a MAP estimate, but over a fixed number of iterations, with learned linear operators and parameters. As PNNs are based on optimization theory, they are very flexible, and can be adapted to any image restoration task, as soon as a proximal algorithm can solve it. They further have much lighter architectures than traditional networks. In this article we propose a unified framework to build PNNs for the Gaussian denoising task, based on both the dual-FB and the primal-dual Chambolle-Pock algorithms. We further show that accelerated inertial versions of these algorithms enable skip connections in the associated NN layers. We propose different learning strategies for our PNN framework, and investigate their robustness (Lipschitz property) and denoising efficiency. Finally, we assess the robustness of our PNNs when plugged in a forward-backward algorithm for an image deblurring problem. △ Less

Submitted 21 August, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.12729 [pdf, ps, other]

Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction

Authors: Hung Tran, Vuong Le, Svetha Venkatesh, Truyen Tran

Abstract: Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. While neuroscienc… ▽ More Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. While neuroscience and cognitive science have confirmed this multi-mechanism nature of human behavior, machine modeling approaches for human motion are trailing behind. While attempted to use gradually morphing structures (e.g., graph attention networks) to model the dynamic HOI patterns, they miss the expeditious and discrete mode-switching nature of the human motion. To bridge that gap, this work proposes to model two concurrent mechanisms that jointly control human motion: the Persistent process that runs continually on the global scale, and the Transient sub-processes that operate intermittently on the local context of the human while interacting with objects. These two mechanisms form an interactive Persistent-Transient Duality that synergistically governs the activity sequences. We model this conceptual duality by a parent-child neural network of Persistent and Transient channels with a dedicated neural module for dynamic mechanism switching. The framework is trialed on HOI motion forecasting. On two rich datasets and a wide variety of settings, the model consistently delivers superior performances, proving its suitability for the challenge. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted at ICCV 2023

Showing 1–50 of 258 results for author: Le, V