-
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Authors:
Xin Zheng,
Jie Lou,
Boxi Cao,
Xueru Wen,
Yuqiu Ji,
Hongyu Lin,
Yaojie Lu,
Xianpei Han,
Debing Zhang,
Le Sun
Abstract:
Self-critic has become an important mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts without further training, which tend to be over-simplified, leading to limited accuracy.Moreover, there is a lack of in-depth investigation of the relationship between LLM's ability to criticism and its task-solving performance.To address these iss…
▽ More
Self-critic has become an important mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts without further training, which tend to be over-simplified, leading to limited accuracy.Moreover, there is a lack of in-depth investigation of the relationship between LLM's ability to criticism and its task-solving performance.To address these issues, we propose Critic-CoT, a novel framework that pushes LLMs toward System-2-like critic capability, via step-wise CoT reasoning format and distant-supervision data construction, without the need for human annotation. Experiments on GSM8K and MATH show that via filtering out invalid solutions or iterative refinement, our enhanced model boosts task-solving performance, which demonstrates the effectiveness of our method. Further, we find that training on critique and refinement alone improves the generation. We hope our work could shed light on future research on improving the reasoning and critic ability of LLMs.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Authors:
Mengkang Hu,
Pu Zhao,
Can Xu,
Qingfeng Sun,
Jianguang Lou,
Qingwei Lin,
Ping Luo,
Saravan Rajmohan,
Dongmei Zhang
Abstract:
Large Language Model (LLM) based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the plann…
▽ More
Large Language Model (LLM) based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the planning abilities of LLMs through instruction tuning, referred to as agent training. Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities. However, existing work primarily focuses on synthesizing trajectories from manually designed planning tasks and environments. The labor-intensive nature of creating these environments and tasks impedes the generation of sufficiently varied and extensive trajectories. To address this limitation, this paper explores the automated synthesis of diverse environments and a gradual range of planning tasks, from easy to difficult. We introduce a framework, AgentGen, that leverages LLMs first to generate environments and subsequently generate planning tasks conditioned on these environments. Specifically, to improve environmental diversity, we propose using an inspiration corpus composed of various domain-specific text segments as the context for synthesizing environments. Moreover, to increase the difficulty diversity of generated planning tasks, we propose a bidirectional evolution method, Bi-Evol, that evolves planning tasks from easier and harder directions to synthesize a task set with a smoother difficulty curve. The evaluation results derived from AgentBoard show that AgentGen greatly improves LLMs' planning ability, e.g., the AgentGen instruction-tuned Llama-3 8B surpasses GPT-3.5 in overall performance. Moreover, in certain tasks, it even outperforms GPT-4.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Authors:
Fudong Lin,
Jiadong Lou,
Xu Yuan,
Nian-Feng Tzeng
Abstract:
Adversarial training (AT) can help improve the robustness of Vision Transformers (ViT) against adversarial attacks by intentionally injecting adversarial examples into the training data. However, this way of adversarial injection inevitably incurs standard accuracy degradation to some extent, thereby calling for a trade-off between standard accuracy and robustness. Besides, the prominent AT soluti…
▽ More
Adversarial training (AT) can help improve the robustness of Vision Transformers (ViT) against adversarial attacks by intentionally injecting adversarial examples into the training data. However, this way of adversarial injection inevitably incurs standard accuracy degradation to some extent, thereby calling for a trade-off between standard accuracy and robustness. Besides, the prominent AT solutions are still vulnerable to adaptive attacks. To tackle such shortcomings, this paper proposes a novel ViT architecture, including a detector and a classifier bridged by our newly developed adaptive ensemble. Specifically, we empirically discover that detecting adversarial examples can benefit from the Guided Backpropagation technique. Driven by this discovery, a novel Multi-head Self-Attention (MSA) mechanism is introduced to enhance our detector to sniff adversarial examples. Then, a classifier with two encoders is employed for extracting visual representations respectively from clean images and adversarial examples, with our adaptive ensemble to adaptively adjust the proportion of visual representations from the two encoders for accurate classification. This design enables our ViT architecture to achieve a better trade-off between standard accuracy and robustness. Besides, our adaptive ensemble technique allows us to mask off a random subset of image patches within input data, boosting our ViT's robustness against adaptive attacks, while maintaining high standard accuracy. Experimental results exhibit that our ViT architecture, on CIFAR-10, achieves the best standard accuracy and adversarial robustness of 90.3% and 49.8%, respectively.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models
Authors:
Yuyan Chen,
Qiang Fu,
Ge Fan,
Lun Du,
Jian-Guang Lou,
Shi Han,
Dongmei Zhang,
Zhixu Li,
Yanghua Xiao
Abstract:
Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of P…
▽ More
Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks. In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs. This adapter adopts element-wise linear transformation using Hadamard product, hence named as Hadamard adapter, requires the fewest parameters compared to previous parameter-efficient adapters. In addition, we also summarize some tuning patterns for Hadamard adapter shared by various downstream tasks, expecting to provide some guidance for further parameter reduction with shared adapters in future studies. The experiments conducted on the widely-used GLUE benchmark with several SOTA PLMs prove that the Hadamard adapter achieves competitive performance with only 0.033\% parameters compared with full fine-tuning, and it has the fewest parameters compared with other adapters. Moreover, we further find that there is also some redundant layers in the Hadamard adapter which can be removed to achieve more parameter efficiency with only 0.022\% parameters.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
Authors:
Haipeng Luo,
Qingfeng Sun,
Can Xu,
Pu Zhao,
Qingwei Lin,
Jianguang Lou,
Shifeng Chen,
Yansong Tang,
Weizhu Chen
Abstract:
Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate thes…
▽ More
Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate these arena battles using AI-driven annotations to evaluate battle outcomes, thus facilitating the continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. Arena Learning comprises two key elements. First, it ensures precise evaluations and maintains consistency between offline simulations and online competitions via WizardArena, a pipeline developed to accurately predict the Elo rankings of various models using a meticulously designed offline test set. Our results demonstrate that WizardArena's predictions closely align with those from the online Arena. Second, it involves the continuous improvement of training data based on the battle results and the refined model. We establish a data flywheel to iteratively update the training data by highlighting the weaknesses of the target model based on its battle results, enabling it to learn from the strengths of multiple different models. We apply Arena Learning to train our target model, WizardLM-$β$, and demonstrate significant performance enhancements across various metrics. This fully automated training and evaluation pipeline sets the stage for continuous advancements in various LLMs via post-training. Notably, Arena Learning plays a pivotal role in the success of WizardLM-2, and this paper serves both as an exploration of its efficacy and a foundational study for future discussions related to WizardLM-2 and its derivatives.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
FE-GUT: Factor Graph Optimization hybrid with Extended Kalman Filter for tightly coupled GNSS/UWB Integration
Authors:
Qijia Zhao,
Shaolin Lü,
Jianan Lou,
Rong Zhang
Abstract:
Precise positioning and navigation information has been increasingly important with the development of the consumer electronics market. Due to some deficits of Global Navigation Satellite System (GNSS), such as susceptible to interferences, integrating of GNSS with additional alternative sensors is a promising approach to overcome the performance limitations of GNSS-based localization systems. Ult…
▽ More
Precise positioning and navigation information has been increasingly important with the development of the consumer electronics market. Due to some deficits of Global Navigation Satellite System (GNSS), such as susceptible to interferences, integrating of GNSS with additional alternative sensors is a promising approach to overcome the performance limitations of GNSS-based localization systems. Ultra-Wideband (UWB) can be used to enhance GNSS in constructing an integrated localization system. However, most low-cost UWB devices lack a hardware-level time synchronization feature, which necessitates the estimation and compensation of the time-offset in the tightly coupled GNSS/UWB integration. Given the flexibility of probabilistic graphical models, the time-offset can be modeled as an invariant constant in the discretization of the continuous model. This work proposes a novel architecture in which Factor Graph Optimization (FGO) is hybrid with Extend Kalman Filter (EKF) for tightly coupled GNSS/UWB integration with online Temporal calibration (FE-GUT). FGO is utilized to precisely estimate the time-offset, while EKF provides initailization for the new factors and performs time-offset compensation. Simulation-based experiments validate the integrated localization performance of FE-GUT. In a four-wheeled robot scenario, the results demonstrate that, compared to EKF, FE-GUT can improve horizontal and vertical localization accuracy by 58.59\% and 34.80\%, respectively, while the time-offset estimation accuracy is improved by 76.80\%. All the source codes and datasets can be gotten via https://github.com/zhaoqj23/FE-GUT/.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing
Authors:
Mengjie Liu,
Yihua Li,
Fangyi Mou,
Zhiqing Tang,
Jiong Lou,
Jianxiong Guo,
Weijia Jia
Abstract:
Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers.…
▽ More
Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers. The dynamic and resource-constrained nature of meta computing environments requires an optimal container migration strategy for mobile users to minimize latency. However, the problem of container migration in meta computing has not been thoroughly explored. To address this gap, we present low-latency, layer-aware container migration strategies that consider both proactive and passive migration. Specifically: 1) We formulate the container migration problem in meta computing, taking into account layer dependencies to reduce migration costs and overall task duration by considering four delays. 2) We introduce a reinforcement learning algorithm based on policy gradients to minimize total latency by identifying layer dependencies for action selection, making decisions for both proactive and passive migration. Expert demonstrations are introduced to enhance exploitation. 3) Experiments using real data trajectories show that the algorithm outperforms baseline algorithms, achieving lower total latency.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework
Authors:
Zhi Yao,
Zhiqing Tang,
Jiong Lou,
Ping Shen,
Weijia Jia
Abstract:
The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti…
▽ More
The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Automatic Instruction Evolving for Large Language Models
Authors:
Weihao Zeng,
Can Xu,
Yingxiu Zhao,
Jian-Guang Lou,
Weizhu Chen
Abstract:
Fine-tuning large pre-trained language models with Evol-Instruct has achieved encouraging results across a wide range of tasks. However, designing effective evolving methods for instruction evolution requires substantial human expertise. This paper proposes Auto Evol-Instruct, an end-to-end framework that evolves instruction datasets using large language models without any human effort. The framew…
▽ More
Fine-tuning large pre-trained language models with Evol-Instruct has achieved encouraging results across a wide range of tasks. However, designing effective evolving methods for instruction evolution requires substantial human expertise. This paper proposes Auto Evol-Instruct, an end-to-end framework that evolves instruction datasets using large language models without any human effort. The framework automatically analyzes and summarizes suitable evolutionary strategies for the given instruction data and iteratively improves the evolving method based on issues exposed during the instruction evolution process. Our extensive experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks, including MT-Bench, AlpacaEval, GSM8K, and HumanEval.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Authors:
Wen Yin,
Jian Lou,
Pan Zhou,
Yulai Xie,
Dan Feng,
Yuhua Sun,
Tailai Zhang,
Lichao Sun
Abstract:
Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the…
▽ More
Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the context of backdoor attacks, spanning both the digital and physical realms. We introduce two novel types of backdoor attacks on TIOD, each offering unique capabilities: Object-affecting Attack and Range-affecting Attack. We conduct a comprehensive analysis of key factors influencing trigger design, which include temperature, size, material, and concealment. These factors, especially temperature, significantly impact the efficacy of backdoor attacks on TIOD. A thorough understanding of these factors will serve as a foundation for designing physical triggers and temperature controlling experiments. Our study includes extensive experiments conducted in both digital and physical environments. In the digital realm, we evaluate our approach using benchmark datasets for TIOD, achieving an Attack Success Rate (ASR) of up to 98.21%. In the physical realm, we test our approach in two real-world settings: a traffic intersection and a parking lot, using a thermal infrared camera. Here, we attain an ASR of up to 98.38%.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Make Your LLM Fully Utilize the Context
Authors:
Shengnan An,
Zexiong Ma,
Zeqi Lin,
Nanning Zheng,
Jian-Guang Lou
Abstract:
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t…
▽ More
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.
△ Less
Submitted 26 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Merits of Time-Domain Computing for VMM -- A Quantitative Comparison
Authors:
Florian Freye,
Jie Lou,
Christian Lanius,
Tobias Gemmeke
Abstract:
Vector-matrix-multiplication (VMM) accel-erators have gained a lot of traction, especially due to therise of convolutional neural networks (CNNs) and the desireto compute them on the edge. Besides the classical digitalapproach, analog computing has gone through a renais-sance to push energy efficiency further. A more recent ap-proach is called time-domain (TD) computing. In contrastto analog compu…
▽ More
Vector-matrix-multiplication (VMM) accel-erators have gained a lot of traction, especially due to therise of convolutional neural networks (CNNs) and the desireto compute them on the edge. Besides the classical digitalapproach, analog computing has gone through a renais-sance to push energy efficiency further. A more recent ap-proach is called time-domain (TD) computing. In contrastto analog computing, TD computing permits easy technol-ogy as well as voltage scaling. As it has received limitedresearch attention, it is not yet clear which scenarios aremost suitable to be computed in the TD. In this work, weinvestigate these scenarios, focussing on energy efficiencyconsidering approximative computations that preserve ac-curacy. Both goals are addressed by a novel efficiency met-ric, which is used to find a baseline design. We use SPICEsimulation data which is fed into a python framework toevaluate how performance scales for VMM computation.We see that TD computing offers best energy efficiency forsmall to medium sized arrays. With throughput and sili-con footprint we investigate two additional metrics, givinga holistic comparison.
△ Less
Submitted 21 May, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents
Authors:
Jinyang Li,
Nan Huo,
Yan Gao,
Jiayi Shi,
Yingxiu Zhao,
Ge Qu,
Yurong Wu,
Chenhao Ma,
Jian-Guang Lou,
Reynold Cheng
Abstract:
Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM ag…
▽ More
Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM agents on interactive data analysis. Tapilot-Crossing contains 1024 interactions, covering 4 practical scenarios: Normal, Action, Private, and Private Action. Notably, Tapilot-Crossing is constructed by an economical multi-agent environment, Decision Company, with few human efforts. We evaluate popular and advanced LLM agents in Tapilot-Crossing, which underscores the challenges of interactive data analysis. Furthermore, we propose Adaptive Interaction Reflection (AIR), a self-generated reflection strategy that guides LLM agents to learn from successful history. Experiments demonstrate that Air can evolve LLMs into effective interactive data analysis agents, achieving a relative performance improvement of up to 44.5%.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning
Authors:
Z Liu,
J Lou,
W Bao,
Y Hu,
B Li,
Z Qin,
K Ren
Abstract:
Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of D…
▽ More
Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM fine-tuning methods is the satisfactory tradeoff among privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method (DP-ZOSO) that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on fine-tuning trajectory.
We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility (compared with DPZero, DP-ZOPO improves 4.5% on SST-5, 5.5% on MNLI with RoBERTa-Large and 9.2% on CB, 3.9% on BoolQ with OPT-2.7B when $ε=4$).
△ Less
Submitted 9 May, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off
Authors:
Yuecheng Li,
Tong Wang,
Chuan Chen,
Jian Lou,
Bin Chen,
Lei Yang,
Zibin Zheng
Abstract:
To defend against privacy leakage of user data, differential privacy is widely used in federated learning, but it is not free. The addition of noise randomly disrupts the semantic integrity of the model and this disturbance accumulates with increased communication rounds. In this paper, we introduce a novel federated learning framework with rigorous privacy guarantees, named FedCEO, designed to st…
▽ More
To defend against privacy leakage of user data, differential privacy is widely used in federated learning, but it is not free. The addition of noise randomly disrupts the semantic integrity of the model and this disturbance accumulates with increased communication rounds. In this paper, we introduce a novel federated learning framework with rigorous privacy guarantees, named FedCEO, designed to strike a trade-off between model utility and user privacy by letting clients ''Collaborate with Each Other''. Specifically, we perform efficient tensor low-rank proximal optimization on stacked local model parameters at the server, demonstrating its capability to flexibly truncate high-frequency components in spectral space. This implies that our FedCEO can effectively recover the disrupted semantic information by smoothing the global semantic space for different privacy settings and continuous training processes. Moreover, we improve the SOTA utility-privacy trade-off bound by an order of $\sqrt{d}$, where $d$ is the input dimension. We illustrate our theoretical results with experiments on representative image datasets. It observes significant performance improvements and strict privacy guarantees under different privacy settings.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Cross-silo Federated Learning with Record-level Personalized Differential Privacy
Authors:
Junxu Liu,
Jian Lou,
Li Xiong,
Jinfei Liu,
Xiaofeng Meng
Abstract:
Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In thi…
▽ More
Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named \textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements.
A critical and non-trivial problem is how to determine the ideal per-record sampling probability $q$ given the personalized privacy budget $\varepsilon$. We introduce a versatile solution named \textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between $q$ and $\varepsilon$ and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.
△ Less
Submitted 29 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Contrastive Unlearning: A Contrastive Approach to Machine Unlearning
Authors:
Hong kyu Lee,
Qiuchen Zhang,
Carl Yang,
Jian Lou,
Li Xiong
Abstract:
Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effect…
▽ More
Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effective unlearning. It removes the influence of unlearning samples by contrasting their embeddings against the remaining samples so that they are pushed away from their original classes and pulled toward other classes. By directly optimizing the representation space, it effectively removes the influence of unlearning samples while maintaining the representations learned from the remaining samples. Experiments on a variety of datasets and models on both class unlearning and sample unlearning showed that contrastive unlearning achieves the best unlearning effects and efficiency with the lowest performance loss compared with the state-of-the-art algorithms.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Prompt Valuation Based on Shapley Values
Authors:
Hanxi Liu,
Xiaokai Mao,
Haocheng Xia,
Jian Lou,
Jinfei Liu
Abstract:
Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all…
▽ More
Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all prompts are beneficial. A small number of high-quality prompts often outperform many low-quality prompts. Currently, there is a lack of a suitable method for evaluating the impact of prompts on the results. In this paper, we utilize the Shapley value to fairly quantify the contributions of prompts, helping to identify beneficial or detrimental prompts, and potentially guiding prompt valuation in data markets. Through extensive experiments employing various ensemble methods and utility functions on diverse tasks, we validate the effectiveness of using the Shapley value method for prompts as it effectively distinguishes and quantifies the contributions of each prompt.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language
Authors:
Zhenwen Li,
Jian-Guang Lou,
Tao Xie
Abstract:
In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various lingu…
▽ More
In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various linguistic ways of describing the same requirement. Despite having better generalization capability than rule-based approaches, deep-learning-based models are lacking for NL2ERM due to lacking a large-scale dataset. To address this issue, in this paper, we report our insight that there exists a high similarity between the task of NL2ERM and the increasingly popular task of text-to-SQL, and propose a data transformation algorithm that transforms the existing data of text-to-SQL into the data of NL2ERM. We apply our data transformation algorithm on Spider, one of the most popular text-to-SQL datasets, and we also collect some data entries with different NL types, to obtain a large-scale NL2ERM dataset. Because NL2ERM can be seen as a special information extraction (IE) task, we train two state-of-the-art IE models on our dataset. The experimental results show that both the two models achieve high performance and outperform existing baselines.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Signed Graph Neural Ordinary Differential Equation for Modeling Continuous-time Dynamics
Authors:
Lanlan Chen,
Kai Wu,
Jian Lou,
Jing Liu
Abstract:
Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling. The prevailing approach of integrating graph neural networks with ordinary differential equations has demonstrated promising performance. However, they disregard the crucial signed information intrin…
▽ More
Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling. The prevailing approach of integrating graph neural networks with ordinary differential equations has demonstrated promising performance. However, they disregard the crucial signed information intrinsic to graphs, impeding their capacity to accurately capture real-world phenomena and leading to subpar outcomes.
In response, we introduce a novel approach: a signed graph neural ordinary differential equation, adeptly addressing the limitations of miscapturing signed information. Our proposed solution boasts both flexibility and efficiency. To substantiate its effectiveness, we seamlessly integrate our devised strategies into three preeminent graph-based dynamic modeling frameworks: graph neural ordinary differential equations, graph neural controlled differential equations, and graph recurrent neural networks. Rigorous assessments encompass three intricate dynamic scenarios from physics and biology, as well as scrutiny across four authentic real-world traffic datasets. Remarkably outperforming the trio of baselines, empirical results underscore the substantial performance enhancements facilitated by our proposed approach.Our code can be found at https://github.com/beautyonce/SGODE.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Certified Minimax Unlearning with Generalization Rates and Deletion Capacity
Authors:
Jiaqi Liu,
Jian Lou,
Zhan Qin,
Kui Ren
Abstract:
We study the problem of $(ε,δ)$-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new $(ε,δ)$-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearn…
▽ More
We study the problem of $(ε,δ)$-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new $(ε,δ)$-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total-Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables). We derive the generalization rates in terms of population strong and weak primal-dual risk for three different cases of loss functions, i.e., (strongly-)convex-(strongly-)concave losses. We also provide the deletion capacity to guarantee that a desired population risk can be maintained as long as the number of deleted samples does not exceed the derived amount. With training samples $n$ and model dimension $d$, it yields the order $\mathcal O(n/d^{1/4})$, which shows a strict gap over the baseline method of differentially private minimax learning that has $\mathcal O(n/d^{1/2})$. In addition, our rates of generalization and deletion capacity match the state-of-the-art rates derived previously for standard statistical learning models.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach
Authors:
Yuke Hu,
Jian Lou,
Jiaqi Liu,
Wangze Ni,
Feng Lin,
Zhan Qin,
Kui Ren
Abstract:
Over the past years, Machine Learning-as-a-Service (MLaaS) has received a surging demand for supporting Machine Learning-driven services to offer revolutionized user experience across diverse application areas. MLaaS provides inference service with low inference latency based on an ML model trained using a dataset collected from numerous individual data owners. Recently, for the sake of data owner…
▽ More
Over the past years, Machine Learning-as-a-Service (MLaaS) has received a surging demand for supporting Machine Learning-driven services to offer revolutionized user experience across diverse application areas. MLaaS provides inference service with low inference latency based on an ML model trained using a dataset collected from numerous individual data owners. Recently, for the sake of data owners' privacy and to comply with the "right to be forgotten (RTBF)" as enacted by data protection legislation, many machine unlearning methods have been proposed to remove data owners' data from trained models upon their unlearning requests. However, despite their promising efficiency, almost all existing machine unlearning methods handle unlearning requests independently from inference requests, which unfortunately introduces a new security issue of inference service obsolescence and a privacy vulnerability of undesirable exposure for machine unlearning in MLaaS.
In this paper, we propose the ERASER framework for machinE unleaRning in MLaAS via an inferencE seRving-aware approach. ERASER strategically choose appropriate unlearning execution timing to address the inference service obsolescence issue. A novel inference consistency certification mechanism is proposed to avoid the violation of RTBF principle caused by postponed unlearning executions, thereby mitigating the undesirable exposure vulnerability. ERASER offers three groups of design choices to allow for tailor-made variants that best suit the specific environments and preferences of various MLaaS systems. Extensive empirical evaluations across various settings confirm ERASER's effectiveness, e.g., it can effectively save up to 99% of inference latency and 31% of computation overhead over the inference-oblivion baseline.
△ Less
Submitted 18 June, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory
Authors:
Xiaochen Li,
Weiran Liu,
Jian Lou,
Yuan Hong,
Lei Zhang,
Zhan Qin,
Kui Ren
Abstract:
Top-$k$ frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-$k$ items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may c…
▽ More
Top-$k$ frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-$k$ items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may continually leak sensitive individual information. However, most existing works solely focus on addressing either the memory-efficiency problem or the privacy concerns but seldom jointly, which cannot achieve a satisfactory tradeoff between memory efficiency, privacy protection, and detection accuracy.
In this paper, we present a novel framework HG-LDP to achieve accurate Top-$k$ item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection. Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior ``accuracy-privacy-memory efficiency'' tradeoff. Therefore, we instantiate three advanced schemes under the framework by designing novel LDP randomization methods, which address the hurdles caused by the large size of the item domain and by the limited space of the memory. We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior ``accuracy-privacy-memory efficiency'' tradeoff, saving $2300\times$ memory over baseline methods when the item domain size is $41,270$. Our code is open-sourced via the link.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Thread of Thought Unraveling Chaotic Contexts
Authors:
Yucheng Zhou,
Xiubo Geng,
Tao Shen,
Chongyang Tao,
Guodong Long,
Jian-Guang Lou,
Jianbing Shen
Abstract:
Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r…
▽ More
Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the "Thread of Thought" (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile "plug-and-play" module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
LayoutPrompter: Awaken the Design Ability of Large Language Models
Authors:
Jiawei Lin,
Jiaqi Guo,
Shizhao Sun,
Zijiang James Yang,
Jian-Guang Lou,
Dongmei Zhang
Abstract:
Conditional graphic layout generation, which automatically maps user constraints to high-quality layouts, has attracted widespread attention today. Although recent works have achieved promising performance, the lack of versatility and data efficiency hinders their practical applications. In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above prob…
▽ More
Conditional graphic layout generation, which automatically maps user constraints to high-quality layouts, has attracted widespread attention today. Although recent works have achieved promising performance, the lack of versatility and data efficiency hinders their practical applications. In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above problems through in-context learning. LayoutPrompter is made up of three key components, namely input-output serialization, dynamic exemplar selection and layout ranking. Specifically, the input-output serialization component meticulously designs the input and output formats for each layout generation task. Dynamic exemplar selection is responsible for selecting the most helpful prompting exemplars for a given input. And a layout ranker is used to pick the highest quality layout from multiple outputs of LLMs. We conduct experiments on all existing layout generation tasks using four public datasets. Despite the simplicity of our approach, experimental results show that LayoutPrompter can compete with or even outperform state-of-the-art approaches on these tasks without any model training or fine-tuning. This demonstrates the effectiveness of this versatile and training-free approach. In addition, the ablation studies show that LayoutPrompter is significantly superior to the training-based baseline in a low-data regime, further indicating the data efficiency of LayoutPrompter. Our project is available at https://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompter.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Retro-BLEU: Quantifying Chemical Plausibility of Retrosynthesis Routes through Reaction Template Sequence Analysis
Authors:
Junren Li,
Lei Fang,
Jian-Guang Lou
Abstract:
Computer-assisted methods have emerged as valuable tools for retrosynthesis analysis. However, quantifying the plausibility of generated retrosynthesis routes remains a challenging task. We introduce Retro-BLEU, a statistical metric adapted from the well-established BLEU score in machine translation, to evaluate the plausibility of retrosynthesis routes based on reaction template sequences analysi…
▽ More
Computer-assisted methods have emerged as valuable tools for retrosynthesis analysis. However, quantifying the plausibility of generated retrosynthesis routes remains a challenging task. We introduce Retro-BLEU, a statistical metric adapted from the well-established BLEU score in machine translation, to evaluate the plausibility of retrosynthesis routes based on reaction template sequences analysis. We demonstrate the effectiveness of Retro-BLEU by applying it to a diverse set of retrosynthesis routes generated by state-of-the-art algorithms and compare the performance with other evaluation metrics. The results show that Retro-BLEU is capable of differentiating between plausible and implausible routes. Furthermore, we provide insights into the strengths and weaknesses of Retro-BLEU, paving the way for future developments and improvements in this field.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Does Differential Privacy Prevent Backdoor Attacks in Practice?
Authors:
Fereshteh Razmi,
Jian Lou,
Li Xiong
Abstract:
Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate th…
▽ More
Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate the effectiveness of DP-SGD and, for the first time in literature, examine PATE in the context of backdoor attacks. We also explore the role of different components of DP algorithms in defending against backdoor attacks and will show that PATE is effective against these attacks due to the bagging structure of the teacher models it employs. Our experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms. Additionally, we propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE. We conclude that while Label-DP algorithms generally offer weaker privacy protection, accurate hyper-parameter tuning can make them more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Robust and Communication-Efficient Federated Domain Adaptation via Random Features
Authors:
Zhanbo Feng,
Yuanjie Wang,
Jie Li,
Fan Yang,
Jiong Lou,
Tiebin Mi,
Robert. C. Qiu,
Zhenyu Liao
Abstract:
Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, fe…
▽ More
Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge.
Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability.
In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is \emph{independent} of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Learning From Mistakes Makes LLM Better Reasoner
Authors:
Shengnan An,
Zexiong Ma,
Zeqi Lin,
Nanning Zheng,
Jian-Guang Lou,
Weizhu Chen
Abstract:
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this…
▽ More
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LEMA incorporates mistake-correction data pairs during fine-tuning LLMs. Specifically, we first collect inaccurate reasoning paths from various LLMs, and then employ GPT-4 as a ''corrector'' to identify the mistake step, explain the reason for the mistake, correct the mistake and generate the final answer. In addition, we apply a correction-centric evolution strategy that effectively expands the question set for generating correction data. Experiments across various LLMs and reasoning tasks show that LEMA effectively improves CoT-alone fine-tuning. Our further ablations shed light on the non-homogeneous effectiveness between CoT data and correction data. These results suggest a significant potential for LLMs to improve through learning from their mistakes. Our code, models and prompts are publicly available at https://github.com/microsoft/LEMA.
△ Less
Submitted 29 March, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Efficient Serverless Function Scheduling at the Network Edge
Authors:
Jiong Lou,
Zhiqing Tang,
Shijing Yuan,
Jie Li,
Chengtao Wu,
Weijia Jia
Abstract:
Serverless computing is a promising approach for edge computing since its inherent features, e.g., lightweight virtualization, rapid scalability, and economic efficiency. However, previous studies have not studied well the issues of significant cold start latency and highly dynamic workloads in serverless function scheduling, which are exacerbated at the resource-limited network edge. In this pape…
▽ More
Serverless computing is a promising approach for edge computing since its inherent features, e.g., lightweight virtualization, rapid scalability, and economic efficiency. However, previous studies have not studied well the issues of significant cold start latency and highly dynamic workloads in serverless function scheduling, which are exacerbated at the resource-limited network edge. In this paper, we formulate the Serverless Function Scheduling (SFS) problem for resource-limited edge computing, aiming to minimize the average response time. To efficiently solve this intractable scheduling problem, we first consider a simplified offline form of the problem and design a polynomial-time optimal scheduling algorithm based on each function's weight. Furthermore, we propose an Enhanced Shortest Function First (ESFF) algorithm, in which the function weight represents the scheduling urgency. To avoid frequent cold starts, ESFF selectively decides the initialization of new function instances when receiving requests. To deal with dynamic workloads, ESFF judiciously replaces serverless functions based on the function weight at the completion time of requests. Extensive simulations based on real-world serverless request traces are conducted, and the results show that ESFF consistently and substantially outperforms existing baselines under different settings.
△ Less
Submitted 31 October, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
Authors:
Hongwei Yao,
Jian Lou,
Zhan Qin
Abstract:
Prompts have significantly improved the performance of pretrained Large Language Models (LLMs) on various downstream tasks recently, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-bas…
▽ More
Prompts have significantly improved the performance of pretrained Large Language Models (LLMs) on various downstream tasks recently, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.
△ Less
Submitted 18 December, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
DP-starJ: A Differential Private Scheme towards Analytical Star-Join Queries
Authors:
Congcong Fu,
Hui Li,
Jian Lou,
Jiangtao Cui
Abstract:
Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the fact and dimension tables, even those latest DP efforts specifically designed for join, if directly applied to star-join query, will suffer from ext…
▽ More
Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the fact and dimension tables, even those latest DP efforts specifically designed for join, if directly applied to star-join query, will suffer from extremely large estimation errors and expensive computational cost. In this paper, we are thus motivated to propose DP-starJ, a novel Differentially Private framework for star-Join queries. DP-starJ consists of a series of strategies tailored to specific features of star-join, including 1) we unveil the different effect of fact and dimension tables on the neighboring database instances, and accordingly revisit the definitions tailored to different cases of star-join; 2) we propose Predicate Mechanism (PM), which utilizes predicate perturbation to inject noise into the join procedure instead of the results; 3) to further boost the robust performance, we propose a DP-compliant star-join algorithm for various types of star-join tasks based on PM. We provide both theoretical analysis and empirical study, which demonstrate the superiority of the proposed methods over the state-of-the-art solutions in terms of accuracy, efficiency, and scalability.
△ Less
Submitted 17 November, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Joint Task Scheduling and Container Image Caching in Edge Computing
Authors:
Fangyi Mou,
Zhiqing Tang,
Jiong Lou,
Jianxiong Guo,
Wenhua Wang,
Tian Wang
Abstract:
In Edge Computing (EC), containers have been increasingly used to deploy applications to provide mobile users services. Each container must run based on a container image file that exists locally. However, it has been conspicuously neglected by existing work that effective task scheduling combined with dynamic container image caching is a promising way to reduce the container image download time w…
▽ More
In Edge Computing (EC), containers have been increasingly used to deploy applications to provide mobile users services. Each container must run based on a container image file that exists locally. However, it has been conspicuously neglected by existing work that effective task scheduling combined with dynamic container image caching is a promising way to reduce the container image download time with the limited bandwidth resource of edge nodes. To fill in such gaps, in this paper, we propose novel joint Task Scheduling and Image Caching (TSIC) algorithms, specifically: 1) We consider the joint task scheduling and image caching problem and formulate it as a Markov Decision Process (MDP), taking the communication delay, waiting delay, and computation delay into consideration; 2) To solve the MDP problem, a TSIC algorithm based on deep reinforcement learning is proposed with the customized state and action spaces and combined with an adaptive caching update algorithm. 3) A real container system is implemented to validate our algorithms. The experiments show that our strategy outperforms the existing baseline approaches by 23\% and 35\% on average in terms of total delay and waiting delay, respectively.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Stock Market Sentiment Classification and Backtesting via Fine-tuned BERT
Authors:
Jiashu Lou
Abstract:
With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading…
▽ More
With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Re-Reading Improves Reasoning in Large Language Models
Authors:
Xiaohan Xu,
Chongyang Tao,
Tao Shen,
Can Xu,
Hongbo Xu,
Guodong Long,
Jian-guang Lou
Abstract:
To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing…
▽ More
To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, Re2 demonstrates strong generality and compatibility with most thought-eliciting prompting methods, including CoT. Crucially, Re2 facilitates a "bidirectional" encoding in unidirectional decoder-only LLMs because the first pass could provide global information for the second pass. We begin with a preliminary empirical study as the foundation of Re2, illustrating its potential to enable "bidirectional" attention mechanisms. We then evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality. Our findings indicate that, with the exception of a few scenarios on vanilla ChatGPT, Re2 consistently enhances the reasoning performance of LLMs through a simple re-reading strategy. Further analyses reveal Re2's adaptability, showing how it can be effectively integrated with different LLMs, thought-eliciting prompting, and ensemble strategies. Our code is available at \url{https://github.com/Tebmer/Rereading-LLM-Reasoning/}
△ Less
Submitted 29 February, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
A Survey for Graphic Design Intelligence
Authors:
Danqing Huang,
Jiaqi Guo,
Shizhao Sun,
Hanling Tian,
Jieru Lin,
Zheng Hu,
Chin-Yew Lin,
Jian-Guang Lou,
Dongmei Zhang
Abstract:
Graphic design is an effective language for visual communication. Using complex composition of visual elements (e.g., shape, color, font) guided by design principles and aesthetics, design helps produce more visually-appealing content. The creation of a harmonious design requires carefully selecting and combining different visual elements, which can be challenging and time-consuming. To expedite t…
▽ More
Graphic design is an effective language for visual communication. Using complex composition of visual elements (e.g., shape, color, font) guided by design principles and aesthetics, design helps produce more visually-appealing content. The creation of a harmonious design requires carefully selecting and combining different visual elements, which can be challenging and time-consuming. To expedite the design process, emerging AI techniques have been proposed to automatize tedious tasks and facilitate human creativity. However, most current works only focus on specific tasks targeting at different scenarios without a high-level abstraction. This paper aims to provide a systematic overview of graphic design intelligence and summarize literature in the taxonomy of representation, understanding and generation. Specifically we consider related works for individual visual elements as well as the overall design composition. Furthermore, we highlight some of the potential directions for future explorations.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions
Authors:
Jiawei Lin,
Jiaqi Guo,
Shizhao Sun,
Weijiang Xu,
Ting Liu,
Jian-Guang Lou,
Dongmei Zhang
Abstract:
Creating layouts is a fundamental step in graphic design. In this work, we propose to use text as the guidance to create graphic layouts, i.e., Text-to-Layout, aiming to lower the design barriers. Text-to-Layout is a challenging task, because it needs to consider the implicit, combined, and incomplete layout constraints from text, each of which has not been studied in previous work. To address thi…
▽ More
Creating layouts is a fundamental step in graphic design. In this work, we propose to use text as the guidance to create graphic layouts, i.e., Text-to-Layout, aiming to lower the design barriers. Text-to-Layout is a challenging task, because it needs to consider the implicit, combined, and incomplete layout constraints from text, each of which has not been studied in previous work. To address this, we present a two-stage approach, named parse-then-place. The approach introduces an intermediate representation (IR) between text and layout to represent diverse layout constraints. With IR, Text-to-Layout is decomposed into a parse stage and a place stage. The parse stage takes a textual description as input and generates an IR, in which the implicit constraints from the text are transformed into explicit ones. The place stage generates layouts based on the IR. To model combined and incomplete constraints, we use a Transformer-based layout generation model and carefully design a way to represent constraints and layouts as sequences. Besides, we adopt the pretrain-then-finetune strategy to boost the performance of the layout generation model with large-scale unlabeled layouts. To evaluate our approach, we construct two Text-to-Layout datasets and conduct experiments on them. Quantitative results, qualitative analysis, and user studies demonstrate the effectiveness of our approach.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
RemovalNet: DNN Fingerprint Removal Attacks
Authors:
Hongwei Yao,
Zheng Li,
Kunzhe Huang,
Jian Lou,
Zhan Qin,
Kui Ren
Abstract:
With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. I…
▽ More
With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet.
△ Less
Submitted 22 November, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Authors:
Haipeng Luo,
Qingfeng Sun,
Can Xu,
Pu Zhao,
Jianguang Lou,
Chongyang Tao,
Xiubo Geng,
Qingwei Lin,
Shifeng Chen,
Dongmei Zhang
Abstract:
Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2…
▽ More
Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis
Authors:
Yiling He,
Jian Lou,
Zhan Qin,
Kui Ren
Abstract:
Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribut…
▽ More
Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification
Authors:
Hongwei Yao,
Jian Lou,
Kui Ren,
Zhan Qin
Abstract:
Large language models (LLMs) have witnessed a meteoric rise in popularity among the general public users over the past few months, facilitating diverse downstream tasks with human-level accuracy and proficiency. Prompts play an essential role in this success, which efficiently adapt pre-trained LLMs to task-specific applications by simply prepending a sequence of tokens to the query texts. However…
▽ More
Large language models (LLMs) have witnessed a meteoric rise in popularity among the general public users over the past few months, facilitating diverse downstream tasks with human-level accuracy and proficiency. Prompts play an essential role in this success, which efficiently adapt pre-trained LLMs to task-specific applications by simply prepending a sequence of tokens to the query texts. However, designing and selecting an optimal prompt can be both expensive and demanding, leading to the emergence of Prompt-as-a-Service providers who profit by providing well-designed prompts for authorized use. With the growing popularity of prompts and their indispensable role in LLM-based services, there is an urgent need to protect the copyright of prompts against unauthorized use.
In this paper, we propose PromptCARE, the first framework for prompt copyright protection through watermark injection and verification. Prompt watermarking presents unique challenges that render existing watermarking techniques developed for model and dataset copyright verification ineffective. PromptCARE overcomes these hurdles by proposing watermark injection and verification schemes tailor-made for prompts and NLP characteristics. Extensive experiments on six well-known benchmark datasets, using three prevalent pre-trained LLMs (BERT, RoBERTa, and Facebook OPT-1.3b), demonstrate the effectiveness, harmlessness, robustness, and stealthiness of PromptCARE.
△ Less
Submitted 28 November, 2023; v1 submitted 5 August, 2023;
originally announced August 2023.
-
ELFNet: Evidential Local-global Fusion for Stereo Matching
Authors:
Jieming Lou,
Weide Liu,
Zhuo Chen,
Fayao Liu,
Jun Cheng
Abstract:
Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation. Additionally, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the \textbf{E}vidential \textbf{L}ocal-global \textbf{F}usion (ELF) framework for stere…
▽ More
Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation. Additionally, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the \textbf{E}vidential \textbf{L}ocal-global \textbf{F}usion (ELF) framework for stereo matching, which endows both uncertainty estimation and confidence-aware fusion with trustworthy heads. Instead of predicting the disparity map alone, our model estimates an evidential-based disparity considering both aleatoric and epistemic uncertainties. With the normal inverse-Gamma distribution as a bridge, the proposed framework realizes intra evidential fusion of multi-level predictions and inter evidential fusion between cost-volume-based and transformer-based stereo matching. Extensive experimental results show that the proposed framework exploits multi-view information effectively and achieves state-of-the-art overall performance both on accuracy and cross-domain generalization.
The codes are available at https://github.com/jimmy19991222/ELFNet.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Online Container Scheduling for Low-Latency IoT Services in Edge Cluster Upgrade: A Reinforcement Learning Approach
Authors:
Hanshuai Cui,
Zhiqing Tang,
Jiong Lou,
Weijia Jia
Abstract:
In Mobile Edge Computing (MEC), Internet of Things (IoT) devices offload computationally-intensive tasks to edge nodes, where they are executed within containers, reducing the reliance on centralized cloud infrastructure. Frequent upgrades are essential to maintain the efficient and secure operation of edge clusters. However, traditional cloud cluster upgrade strategies are ill-suited for edge clu…
▽ More
In Mobile Edge Computing (MEC), Internet of Things (IoT) devices offload computationally-intensive tasks to edge nodes, where they are executed within containers, reducing the reliance on centralized cloud infrastructure. Frequent upgrades are essential to maintain the efficient and secure operation of edge clusters. However, traditional cloud cluster upgrade strategies are ill-suited for edge clusters due to their geographically distributed nature and resource limitations. Therefore, it is crucial to properly schedule containers and upgrade edge clusters to minimize the impact on running tasks. In this paper, we propose a low-latency container scheduling algorithm for edge cluster upgrades. Specifically: 1) We formulate the online container scheduling problem for edge cluster upgrade to minimize the total task latency. 2) We propose a policy gradient-based reinforcement learning algorithm to address this problem, considering the unique characteristics of MEC. 3) Experimental results demonstrate that our algorithm reduces total task latency by approximately 27\% compared to baseline algorithms.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry
Authors:
Lixia Wu,
Haomin Wen,
Haoyuan Hu,
Xiaowei Mao,
Yutong Xia,
Ergang Shan,
Jianbin Zhen,
Junhong Lou,
Yuxuan Liang,
Liuqing Yang,
Roger Zimmermann,
Youfang Lin,
Huaiyu Wan
Abstract:
Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with…
▽ More
Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.
△ Less
Submitted 2 January, 2024; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Pre-trained transformer for adversarial purification
Authors:
Kai Wu,
Yujian Betterest Li,
Jian Lou,
Xiaoyu Zhang,
Handing Wang,
Jing Liu
Abstract:
With more and more deep neural networks being deployed as various daily services, their reliability is essential. It is frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based. Recent works usually strengthen the robustness by adversarial training or leveraging the knowledge of an amount of clean data…
▽ More
With more and more deep neural networks being deployed as various daily services, their reliability is essential. It is frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based. Recent works usually strengthen the robustness by adversarial training or leveraging the knowledge of an amount of clean data. However, retraining and redeploying the model need a large computational budget, leading to heavy losses to the online service. In addition, when training, it is likely that only limited adversarial examples are available for the service provider, while much clean data may not be accessible. Based on the analysis on the defense for deployed models, we find that how to rapidly defend against a certain attack for a frozen original service model with limitations of few clean and adversarial examples, which is named as RaPiD (Rapid Plug-in Defender), is really important. Motivated by the generalization and the universal computation ability of pre-trained transformer models, we come up with a new defender method, CeTaD, which stands for Considering Pretrained Transformers as Defenders. In particular, we evaluate the effectiveness and the transferability of CeTaD in the case of one-shot adversarial examples and explore the impact of different parts of CeTaD as well as training data conditions. CeTaD is flexible for different differentiable service models, and suitable for various types of attacks.
△ Less
Submitted 25 September, 2023; v1 submitted 27 May, 2023;
originally announced June 2023.
-
Uncovering and Categorizing Social Biases in Text-to-SQL
Authors:
Yan Liu,
Yan Gao,
Zhe Su,
Xiaokang Chen,
Elliott Ash,
Jian-Guang Lou
Abstract:
Content Warning: This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.} Large pre-trained language models are acknowledged to carry social biases towards different demographics, which can further amplify existing stereotypes in our society and cause even more harm. Text-to-SQL is an importa…
▽ More
Content Warning: This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.} Large pre-trained language models are acknowledged to carry social biases towards different demographics, which can further amplify existing stereotypes in our society and cause even more harm. Text-to-SQL is an important task, models of which are mainly adopted by administrative industries, where unfair decisions may lead to catastrophic consequences. However, existing Text-to-SQL models are trained on clean, neutral datasets, such as Spider and WikiSQL. This, to some extent, cover up social bias in models under ideal conditions, which nevertheless may emerge in real application scenarios. In this work, we aim to uncover and categorize social biases in Text-to-SQL models. We summarize the categories of social biases that may occur in structured data for Text-to-SQL models. We build test benchmarks and reveal that models with similar task accuracy can contain social biases at very different rates. We show how to take advantage of our methodology to uncover and assess social biases in the downstream Text-to-SQL task. We will release our code and data.
△ Less
Submitted 7 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Uncovering and Quantifying Social Biases in Code Generation
Authors:
Yan Liu,
Xiaokang Chen,
Yan Gao,
Zhe Su,
Fengji Zhang,
Daoguang Zan,
Jian-Guang Lou,
Pin-Yu Chen,
Tsung-Yi Ho
Abstract:
With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in gen…
▽ More
With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias. (This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.)
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering
Authors:
Jian Wu,
Yicheng Xu,
Yan Gao,
Jian-Guang Lou,
Börje F. Karlsson,
Manabu Okumura
Abstract:
Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to s…
▽ More
Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to show their reasoning ability in retrieving answers. To bridge this gap, we propose a novel Table-alignment-based Cell-selection and Reasoning model (TACR) for hybrid text and table QA, evaluated on the HybridQA and WikiTableQuestions datasets. In evidence retrieval, we design a table-question-alignment enhanced cell-selection method to retrieve fine-grained evidence. In answer reasoning, we incorporate a QA module that treats the row containing selected cells as context. Experimental results over the HybridQA and WikiTableQuestions (WTQ) datasets show that TACR achieves state-of-the-art results on cell selection and outperforms fine-grained evidence retrieval baselines on HybridQA, while achieving competitive performance on WTQ. We also conducted a detailed analysis to demonstrate that being able to align questions to tables in the cell-selection stage can result in important gains from experiments of over 90\% table row and column selection accuracy, meanwhile also improving output explainability.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Question Answering as Programming for Solving Time-Sensitive Questions
Authors:
Xinyu Zhu,
Cheng Yang,
Bei Chen,
Siheng Li,
Jian-Guang Lou,
Yujiu Yang
Abstract:
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our expe…
▽ More
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our experiments reveal that the aforementioned problems still pose a significant challenge to existing LLMs. This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics. To overcome this limitation, rather than requiring LLMs to directly answer the question, we propose a novel approach where we reframe the $\textbf{Q}$uestion $\textbf{A}$nswering task $\textbf{a}$s $\textbf{P}$rogramming ($\textbf{QAaP}$). Concretely, by leveraging modern LLMs' superior capability in understanding both natural language and programming language, we endeavor to harness LLMs to represent diversely expressed text as well-structured code and select the best matching answer from multiple candidates through programming. We evaluate our QAaP framework on several time-sensitive question answering datasets and achieve decent improvement, up to $14.5$% over strong baselines. Our codes and data are available at https://github.com/TianHongZXY/qaap
△ Less
Submitted 20 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Skill-Based Few-Shot Selection for In-Context Learning
Authors:
Shengnan An,
Bo Zhou,
Zeqi Lin,
Qiang Fu,
Bei Chen,
Nanning Zheng,
Weizhu Chen,
Jian-Guang Lou
Abstract:
In-context learning is the paradigm that adapts large language models to downstream tasks by providing a few examples. Few-shot selection -- selecting appropriate examples for each test instance separately -- is important for in-context learning. In this paper, we propose Skill-KNN, a skill-based few-shot selection method for in-context learning. The key advantages of Skill-KNN include: (1) it add…
▽ More
In-context learning is the paradigm that adapts large language models to downstream tasks by providing a few examples. Few-shot selection -- selecting appropriate examples for each test instance separately -- is important for in-context learning. In this paper, we propose Skill-KNN, a skill-based few-shot selection method for in-context learning. The key advantages of Skill-KNN include: (1) it addresses the problem that existing methods based on pre-trained embeddings can be easily biased by surface natural language features that are not important for the target task; (2) it does not require training or fine-tuning of any models, making it suitable for frequently expanding or changing example banks. The key insight is to optimize the inputs fed into the embedding model, rather than tuning the model itself. Technically, Skill-KNN generates the skill-based descriptions for each test case and candidate example by utilizing a pre-processing few-shot prompting, thus eliminating unimportant surface features. Experimental results across five cross-domain semantic parsing datasets and six backbone models show that Skill-KNN significantly outperforms existing methods.
△ Less
Submitted 10 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.