Search | arXiv e-print repository

World Models Increase Autonomy in Reinforcement Learning

Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

Abstract: Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB)… ▽ More Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB) RL methods in such setting, showing that a straightforward adaptation of MBRL can outperform all the prior state-of-the-art methods while requiring less supervision. We then identify limitations inherent to this direct extension and propose a solution called model-based reset-free (MoReFree) agent, which further enhances the performance. MoReFree adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks by prioritizing task-relevant states. It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations while significantly outperforming privileged baselines that require supervision. Our findings suggest model-based methods hold significant promise for reducing human effort in RL. Website: https://sites.google.com/view/morefree △ Less

Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.05645 [pdf, other]

OneDiff: A Generalist Model for Image Difference Captioning

Authors: Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu

Abstract: In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese… ▽ More In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module. This innovative configuration allows for the precise detection and articulation of fine-grained differences between image pairs. OneDiff is trained through a dual-phase strategy, encompassing Coupled Sample Training and multi-task learning across a diverse array of data types, supported by our newly developed DiffCap Dataset. This dataset merges real-world and synthetic data, enhancing the training process and bolstering the model's robustness. Extensive testing on diverse IDC benchmarks, such as Spot-the-Diff, CLEVR-Change, and Birds-to-Words, shows that OneDiff consistently outperforms existing state-of-the-art models in accuracy and adaptability, achieving improvements of up to 85\% CIDEr points in average. By setting a new benchmark in IDC, OneDiff paves the way for more versatile and effective applications in detecting and describing visual differences. The code, models, and data will be made publicly available. △ Less

Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2405.14853 [pdf, other]

Privileged Sensing Scaffolds Reinforcement Learning

Authors: Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

Abstract: We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost,… ▽ More We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose "Scaffolder", a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating sensory scaffolding agents, we design a new "S3" suite of ten diverse simulated robotic tasks that explore a wide range of practical sensor setups. Agents must use privileged camera sensing to train blind hurdlers, privileged active visual perception to help robot arms overcome visual occlusions, privileged touch sensors to train robot hands, and more. Scaffolder easily outperforms relevant prior baselines and frequently performs comparably even to policies that have test-time access to the privileged sensors. Website: https://penn-pal-lab.github.io/scaffolder/ △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: ICLR 2024 Spotlight version

arXiv:2404.05802 [pdf, other]

BatSort: Enhanced Battery Classification with Transfer Learning for Battery Sorting and Recycling

Authors: Yunyi Zhao, Wei Zhang, Erhai Hu, Qingyu Yan, Cheng Xiang, King Jet Tseng, Dusit Niyato

Abstract: Battery recycling is a critical process for minimizing environmental harm and resource waste for used batteries. However, it is challenging, largely because sorting batteries is costly and hardly automated to group batteries based on battery types. In this paper, we introduce a machine learning-based approach for battery-type classification and address the daunting problem of data scarcity for the… ▽ More Battery recycling is a critical process for minimizing environmental harm and resource waste for used batteries. However, it is challenging, largely because sorting batteries is costly and hardly automated to group batteries based on battery types. In this paper, we introduce a machine learning-based approach for battery-type classification and address the daunting problem of data scarcity for the application. We propose BatSort which applies transfer learning to utilize the existing knowledge optimized with large-scale datasets and customizes ResNet to be specialized for classifying battery types. We collected our in-house battery-type dataset of small-scale to guide the knowledge transfer as a case study and evaluate the system performance. We conducted an experimental study and the results show that BatSort can achieve outstanding accuracy of 92.1% on average and up to 96.2% and the performance is stable for battery-type classification. Our solution helps realize fast and automated battery sorting with minimized cost and can be transferred to related industry applications with insufficient data. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.04513 [pdf, other]

Online Cascade Learning for Efficient Inference over Streams

Authors: Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri

Abstract: Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to address this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regression) and endin… ▽ More Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to address this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regression) and ending with a powerful LLM, along with a deferral policy that determines the model to be used on a given input. We formulate the task of learning cascades online as an imitation-learning problem, where smaller models are updated over time imitating the collected LLM demonstrations, and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing. △ Less

Submitted 17 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICML 2024 Main Conference Paper

arXiv:2402.03988 [pdf, other]

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Authors: Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

Abstract: Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text… ▽ More Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance. △ Less

Submitted 28 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.12655 [pdf, other]

Can Transformers Learn Sequential Function Classes In Context?

Authors: Ryan Campbell, Emma Guo, Evan Hu, Reya Vir, Ethan Hsiao

Abstract: In-context learning (ICL) has revolutionized the capabilities of transformer models in NLP. In our project, we extend the understanding of the mechanisms underpinning ICL by exploring whether transformers can learn from sequential, non-textual function class data distributions. We introduce a novel sliding window sequential function class and employ toy-sized transformers with a GPT-2 architecture… ▽ More In-context learning (ICL) has revolutionized the capabilities of transformer models in NLP. In our project, we extend the understanding of the mechanisms underpinning ICL by exploring whether transformers can learn from sequential, non-textual function class data distributions. We introduce a novel sliding window sequential function class and employ toy-sized transformers with a GPT-2 architecture to conduct our experiments. Our analysis indicates that these models can indeed leverage ICL when trained on non-textual sequential function classes. Additionally, our experiments with randomized y-label sequences highlights that transformers retain some ICL capabilities even when the label associations are obfuscated. We provide evidence that transformers can reason with and understand sequentiality encoded within function classes, as reflected by the effective learning of our proposed tasks. Our results also show that the performance deteriorated with increasing randomness in the labels, though not to the extent one might expect, implying a potential robustness of learned sequentiality against label noise. Future research may want to look into how previous explanations of transformers, such as induction heads and task vectors, relate to sequentiality in ICL in these toy examples. Our investigation lays the groundwork for further research into how transformers process and perceive sequential data. △ Less

Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures

arXiv:2310.05513 [pdf, other]

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Authors: Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track w… ▽ More The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted by ASRU

arXiv:2310.04363 [pdf, other]

Amortizing intractable inference in large language models

Authors: Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

Abstract: Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distribu… ▽ More Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use. △ Less

Submitted 13 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: ICLR 2024; 23 pages; code: https://github.com/GFNOrg/gfn-lm-tuning

arXiv:2309.03237 [pdf, other]

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat

Authors: Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine

Abstract: We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate lear… ▽ More We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 16 pages, 7 figures, Accepted at ICCV2023

arXiv:2306.00751 [pdf, other]

Differentiable Tree Operations Promote Compositional Generalization

Authors: Paul Soulos, Edward Hu, Kate McCurdy, Yunmo Chen, Roland Fernandez, Paul Smolensky, Jianfeng Gao

Abstract: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Di… ▽ More In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Differentiable Tree Machine (DTM) architecture that integrates our interpreter with an external memory and an agent that learns to sequentially select tree operations to execute the target transformation in an end-to-end manner. With respect to out-of-distribution compositional generalization on synthetic semantic parsing and language generation tasks, DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains highly interpretable in addition to its perfect performance. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: ICML 2023. Code available at https://github.com/psoulos/dtm

arXiv:2305.10615 [pdf, other]

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Authors: Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

Abstract: Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic… ▽ More Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research. △ Less

Submitted 11 August, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted by Interspeech

arXiv:2303.13002 [pdf, other]

Planning Goals for Exploration

Authors: Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman

Abstract: Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for… ▽ More Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: https://penn-pal-lab.github.io/peg/ △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: Camera Ready version for ICLR2023 Spotlight

arXiv:2303.04242 [pdf, other]

MEV in fixed gas price blockchains: Terra Classic as a case of study

Authors: Facundo Carrillo, Elaine Hu

Abstract: Maximum extractable value (MEV) has been extensively studied. In most papers, the researchers have worked with the Ethereum blockchain almost exclusively. Even though, Ethereum and other blockchains have dynamic gas prices this is not the case for all blockchains; many of them have fixed gas prices. Extending the research to other blockchains with fixed gas price could broaden the scope of the exi… ▽ More Maximum extractable value (MEV) has been extensively studied. In most papers, the researchers have worked with the Ethereum blockchain almost exclusively. Even though, Ethereum and other blockchains have dynamic gas prices this is not the case for all blockchains; many of them have fixed gas prices. Extending the research to other blockchains with fixed gas price could broaden the scope of the existing studies on MEV. To our knowledge, there is not a vast understanding of MEV in fixed gas price blockchains. Therefore, we propose to study Terra Classic as an example to understand how MEV activities affect blockchains with fixed gas price. We first analysed the data from Terra Classic before the UST de-peg event in May 2022 and described the nature of the exploited arbitrage opportunities. We found more than 188K successful arbitrages, and most of them used UST as the initial token. The capital to perform the arbitrage was less than 1K UST in 50% of the cases, and 80% of the arbitrages had less than four swaps. Then, we explored the characteristics that attribute to higher MEV. We found that searchers who use more complex mechanisms, i.e. different contracts and accounts, made higher profits. Finally, we concluded that the most profitable searchers used a strategy of running bots in a multi-instance environment, i.e. running bots with different virtual machines. We measured the importance of the geographic distribution of the virtual machines that run the bots. We found that having good geographic coverage makes the difference between winning or losing the arbitrage opportunities. That is because, unlike MEV extraction in Ethereum, bots in fixed gas price blockchains are not battling a gas war; they are fighting in a latency war. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.06576 [pdf, other]

GFlowNet-EM for learning compositional latent variable models

Authors: Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio

Abstract: Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictiv… ▽ More Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder. △ Less

Submitted 3 June, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-EM

arXiv:2301.12950 [pdf, other]

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-yi Lee, Shao-Hua Sun

Abstract: Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given… ▽ More Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task. Despite the encouraging results, the program policies that LEAPS can produce are limited by the distribution of the program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, failing to precisely reward correct parts of programs and penalize incorrect parts. To address these issues, we propose to learn a meta-policy that composes a series of programs sampled from the learned program embedding space. By learning to compose programs, our proposed hierarchical programmatic reinforcement learning (HPRL) framework can produce program policies that describe out-of-distributionally complex behaviors and directly assign credits to programs that induce desired behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines. The ablation studies confirm the limitations of LEAPS and justify our design choices. △ Less

Submitted 31 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2301.10859 [pdf, other]

Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data

Authors: Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, Juan Carlos Niebles

Abstract: We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneous types. This library includes algorithms that handle linear and non-linear causal relationships between variables, and uses multi-processing for speed-up. We al… ▽ More We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneous types. This library includes algorithms that handle linear and non-linear causal relationships between variables, and uses multi-processing for speed-up. We also include a data generator capable of generating synthetic data with specified structural equation model for the aforementioned data formats and types, that helps users control the ground-truth causal process while investigating various algorithms. Finally, we provide a user interface (UI) that allows users to perform causal analysis on data without coding. The goal of this library is to provide a fast and flexible solution for a variety of problems in the domain of causality. This technical report describes the Salesforce CausalAI API along with its capabilities, the implementations of the supported algorithms, and experiments demonstrating their performance and speed. Our library is available at \url{https://github.com/salesforce/causalai}. △ Less

Submitted 22 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2212.08961 [pdf, other]

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning

Authors: Kun Huang, Edward S. Hu, Dinesh Jayaraman

Abstract: Physical interactions can often help reveal information that is not readily apparent. For example, we may tug at a table leg to evaluate whether it is built well, or turn a water bottle upside down to check that it is watertight. We propose to train robots to acquire such interactive behaviors automatically, for the purpose of evaluating the result of an attempted robotic skill execution. These ev… ▽ More Physical interactions can often help reveal information that is not readily apparent. For example, we may tug at a table leg to evaluate whether it is built well, or turn a water bottle upside down to check that it is watertight. We propose to train robots to acquire such interactive behaviors automatically, for the purpose of evaluating the result of an attempted robotic skill execution. These evaluations in turn serve as "interactive reward functions" (IRFs) for training reinforcement learning policies to perform the target skill, such as screwing the table leg tightly. In addition, even after task policies are fully trained, IRFs can serve as verification mechanisms that improve online task execution. For any given task, our IRFs can be conveniently trained using only examples of successful outcomes, and no further specification is needed to train the task policy thereafter. In our evaluations on door locking and weighted block stacking in simulation, and screw tightening on a real robot, IRFs enable large performance improvements, even outperforming baselines with access to demonstrations or carefully engineered rewards. Project website: https://sites.google.com/view/lirf-corl-2022/ △ Less

Submitted 17 December, 2022; originally announced December 2022.

Comments: CoRL 2022

arXiv:2210.00580 [pdf, other]

GFlowNets and variational inference

Authors: Nikolay Malkin, Salem Lahlou, Tristan Deleu, Xu Ji, Edward Hu, Katie Everett, Dinghuai Zhang, Yoshua Bengio

Abstract: This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically used to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs. We demonstrate that, in certain cases, VI algorithms are equivalent to special cases of… ▽ More This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically used to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs. We demonstrate that, in certain cases, VI algorithms are equivalent to special cases of GFlowNets in the sense of equality of expected gradients of their learning objectives. We then point out the differences between the two families and show how these differences emerge experimentally. Notably, GFlowNets, which borrow ideas from reinforcement learning, are more amenable than VI to off-policy training without the cost of high gradient variance induced by importance sampling. We argue that this property of GFlowNets can provide advantages for capturing diversity in multimodal target distributions. △ Less

Submitted 2 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: ICLR 2023 final version; code: https://github.com/GFNOrg/GFN_vs_HVI

arXiv:2208.13900 [pdf, other]

doi 10.1145/3543174.3546835

Enjoy the Ride Consciously with CAWA: Context-Aware Advisory Warnings for Automated Driving

Authors: Erfan Pakdamanian, Erzhen Hu, Shili Sheng, Sarit Kraus, Seongkook Heo, Lu Feng

Abstract: In conditionally automated driving, drivers decoupled from driving while immersed in non-driving-related tasks (NDRTs) could potentially either miss the system-initiated takeover request (TOR) or a sudden TOR may startle them. To better prepare drivers for a safer takeover in an emergency, we propose novel context-aware advisory warnings (CAWA) for automated driving to gently inform drivers. This… ▽ More In conditionally automated driving, drivers decoupled from driving while immersed in non-driving-related tasks (NDRTs) could potentially either miss the system-initiated takeover request (TOR) or a sudden TOR may startle them. To better prepare drivers for a safer takeover in an emergency, we propose novel context-aware advisory warnings (CAWA) for automated driving to gently inform drivers. This will help them stay vigilant while engaging in NDRTs. The key innovation is that CAWA adapts warning modalities according to the context of NDRTs. We conducted a user study to investigate the effectiveness of CAWA. The study results show that CAWA has statistically significant effects on safer takeover behavior, improved driver situational awareness, less attention demand, and more positive user feedback, compared with uniformly distributed speech-based warnings across all NDRTs. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: Proceeding of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI '22)

arXiv:2203.06942 [pdf, other]

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Authors: Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen

Abstract: To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-trai… ▽ More To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios. △ Less

Submitted 12 April, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted by ACL 2022 main conference; The dataset and code are available at https://github.com/jzhoubu/HLP

arXiv:2203.03466 [pdf, other]

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Authors: Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Abstract: Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call muTransfer: parametrize the target model in muP, tune the HP indirectly on… ▽ More Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call muTransfer: parametrize the target model in muP, tune the HP indirectly on a smaller model, and zero-shot transfer them to the full-sized model, i.e., without directly tuning the latter at all. We verify muTransfer on Transformer and ResNet. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters), with a total tuning cost equivalent to pretraining BERT-large once; 2) by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model, with tuning cost only 7% of total pretraining cost. A Pytorch implementation of our technique can be found at github.com/microsoft/mup and installable via `pip install mup`. △ Less

Submitted 28 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: NeurIPS 2021

arXiv:2202.02484 [pdf, other]

doi 10.1145/3512895

A "Distance Matters" Paradox: Facilitating Intra-Team Collaboration Can Harm Inter-Team Collaboration

Authors: Xinlan Emily Hu, Rebecca Hinds, Melissa A. Valentine, Michael S. Bernstein

Abstract: By identifying the socio-technical conditions required for teams to work effectively remotely, the Distance Matters framework has been influential in CSCW since its introduction in 2000. Advances in collaboration technology and practices have since brought teams increasingly closer to achieving these conditions. This paper presents a ten-month ethnography in a remote organization, where we observe… ▽ More By identifying the socio-technical conditions required for teams to work effectively remotely, the Distance Matters framework has been influential in CSCW since its introduction in 2000. Advances in collaboration technology and practices have since brought teams increasingly closer to achieving these conditions. This paper presents a ten-month ethnography in a remote organization, where we observed that despite exhibiting excellent remote collaboration, teams paradoxically struggled to collaborate across team boundaries. We extend the Distance Matters framework to account for inter-team collaboration, arguing that challenges analogous to those in the original intra-team framework -- common ground, collaboration readiness, collaboration technology readiness, and coupling of work -- persist but are actualized differently at the inter-team scale. Finally, we identify a fundamental tension between the intra- and inter-team layers: the collaboration technology and practices that help individual teams thrive (e.g., adopting customized collaboration software) can also prompt collaboration challenges in the inter-team layer, and conversely the technology and practices that facilitate inter-team collaboration (e.g., strong centralized IT organizations) can harm practices at the intra-team layer. The addition of the inter-team layer to the Distance Matters framework opens new opportunities for CSCW, where balancing the tension between team and organizational collaboration needs will be a critical technological, operational, and organizational challenge for remote work in the coming decades. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: Accepted at CSCW 2022 (The 25th ACM Conference on Computer-Supported Cooperative Work and Social Computing)

Journal ref: Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 48 (April 2022), 36 pages

arXiv:2111.09266 [pdf, other]

GFlowNet Foundations

Authors: Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio

Abstract: Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corr… ▽ More Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions. △ Less

Submitted 10 July, 2023; v1 submitted 17 November, 2021; originally announced November 2021.

arXiv:2110.07431 [pdf, other]

Towards More Effective and Economic Sparsely-Activated Model

Authors: Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao

Abstract: The sparsely-activated models have achieved great success in natural language processing through large-scale parameters and relatively low computational cost, and gradually become a feasible technique for training and implementing extremely large models. Due to the limit of communication cost, activating multiple experts is hardly affordable during training and inference. Therefore, previous work… ▽ More The sparsely-activated models have achieved great success in natural language processing through large-scale parameters and relatively low computational cost, and gradually become a feasible technique for training and implementing extremely large models. Due to the limit of communication cost, activating multiple experts is hardly affordable during training and inference. Therefore, previous work usually activate just one expert at a time to alleviate additional communication cost. Such routing mechanism limits the upper bound of model performance. In this paper, we first investigate a phenomenon that increasing the number of activated experts can boost the model performance with higher sparse ratio. To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU). Our methods shed light on the training of extremely large sparse models and experiments prove that our models can achieve significant performance gain with great efficiency improvement. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.05053 [pdf]

Integrating Structural Description of Data Format Information into Programming to Auto-generate File Reading Programs

Authors: Xinghua Cheng, Erjie Hu, Di Hu

Abstract: File reading is the basis for data sharing and scientific computing. However, manual programming for file reading is labour-intensive and time-consuming, as data formats are heterogeneous and complex. To address such an issue, this study proposes a novel approach for the automatic generation of file reading programs based on structured and self-described data format information. This approach prov… ▽ More File reading is the basis for data sharing and scientific computing. However, manual programming for file reading is labour-intensive and time-consuming, as data formats are heterogeneous and complex. To address such an issue, this study proposes a novel approach for the automatic generation of file reading programs based on structured and self-described data format information. This approach provides two modes composed of sequentially and randomly reading. The file data format is described by Data Format Markup Language and thus DFML documents are generated. The formation of data type sequences by parsing those DFML documents. The generation of programs for sequential or random reading data with formed data type sequences and general programing rules for specific programming languages. A tool named DFML Editor was developed for generating and editing DFML documents. Case studies on binary files, i.e., ESRI point shapefiles and plain text files, i.e., input files of Storm Water Management Model, were conducted with the software developed for automatic program generation and file reading. Experimental results show that the proposed approach is effective for automatically generating programs for reading files. The idea in this study is also helpful for automatically writing files. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 28 pages, 17 figures

ACM Class: K.6.3

arXiv:2107.09846 [pdf, other]

doi 10.24963/ijcai.2020/502

Guided Generation of Cause and Effect

Authors: Zhongyang Li, Xiao Ding, Ting Liu, J. Edward Hu, Benjamin Van Durme

Abstract: We present a conditional text generation framework that posits sentential expressions of possible causes and effects. This framework depends on two novel resources we develop in the course of this work: a very large-scale collection of English sentences expressing causal patterns CausalBank; and a refinement over previous work on constructing large lexical causal knowledge graphs Cause Effect Grap… ▽ More We present a conditional text generation framework that posits sentential expressions of possible causes and effects. This framework depends on two novel resources we develop in the course of this work: a very large-scale collection of English sentences expressing causal patterns CausalBank; and a refinement over previous work on constructing large lexical causal knowledge graphs Cause Effect Graph. Further, we extend prior work in lexically-constrained decoding to support disjunctive positive constraints. Human assessment confirms that our approach gives high-quality and diverse outputs. Finally, we use CausalBank to perform continued training of an encoder supporting a recent state-of-the-art model for causal reasoning, leading to a 3-point improvement on the COPA challenge set, with no change in model architecture. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: accepted in IJCAI 2020 main track

arXiv:2107.09047 [pdf, other]

Know Thyself: Transferable Visual Control Policies Through Robot-Awareness

Authors: Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman

Abstract: Training visual control policies from scratch on a new robot typically requires generating large amounts of robot-specific data. How might we leverage data previously collected on another robot to reduce or even completely remove this need for robot-specific data? We propose a "robot-aware control" paradigm that achieves this by exploiting readily available knowledge about the robot. We then insta… ▽ More Training visual control policies from scratch on a new robot typically requires generating large amounts of robot-specific data. How might we leverage data previously collected on another robot to reduce or even completely remove this need for robot-specific data? We propose a "robot-aware control" paradigm that achieves this by exploiting readily available knowledge about the robot. We then instantiate this in a robot-aware model-based RL policy by training modular dynamics models that couple a transferable, robot-aware world dynamics module with a robot-specific, potentially analytical, robot dynamics module. This also enables us to set up visual planning costs that separately consider the robot agent and the world. Our experiments on tabletop manipulation tasks with simulated and real robots demonstrate that these plug-in improvements dramatically boost the transferability of visual model-based RL policies, even permitting zero-shot transfer of visual manipulation skills onto new robots. Project website: https://www.seas.upenn.edu/~hued/rac △ Less

Submitted 17 October, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: Updated to ICLR22 version

arXiv:2106.09685 [pdf, other]

LoRA: Low-Rank Adaptation of Large Language Models

Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Abstract: An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively… ▽ More An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA. △ Less

Submitted 16 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: Draft V2 includes better baselines, experiments on GLUE, and more on adapter latency

arXiv:2103.09190 [pdf, other]

Using Grammar Patterns to Interpret Test Method Name Evolution

Authors: Anthony Peruma, Emily Hu, Jiajun Chen, Eman Abdullah Alomar, Mohamed Wiem Mkaouer, Christian D. Newman

Abstract: It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost o… ▽ More It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost of supporting these methods. Due to this risk, it is essential to help developers in maintaining their test method names over time. In this paper, we use grammar patterns, and how they relate to test method behavior, to understand test naming practices. This data will be used to support an automated tool for maintaining test names. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2011.14522 [pdf, other]

Feature Learning in Infinite-Width Neural Networks

Authors: Greg Yang, Edward J. Hu

Abstract: As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial… ▽ More As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases. More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique. Code for our experiments can be found at github.com/edwardjhu/TP4. △ Less

Submitted 15 July, 2022; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: 4th paper in the Tensor Programs series. Appearing in ICML 2021

arXiv:2007.00320 [pdf, other]

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Authors: Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin Van Durme

Abstract: We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understa… ▽ More We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2004.12478 [pdf, other]

Improved Image Wasserstein Attacks and Defenses

Authors: Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang

Abstract: Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature. Perturbations in the real-world, however, rarely exhibit the pixel independence that $\ell_p$ threat models assume. A recently proposed Wasserstein distance-bounded threat model is a promising alternative that limits the perturbation to pixel mass movements. We point out and rectify flaws… ▽ More Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature. Perturbations in the real-world, however, rarely exhibit the pixel independence that $\ell_p$ threat models assume. A recently proposed Wasserstein distance-bounded threat model is a promising alternative that limits the perturbation to pixel mass movements. We point out and rectify flaws in previous definition of the Wasserstein threat model and explore stronger attacks and defenses under our better-defined framework. Lastly, we discuss the inability of current Wasserstein-robust models in defending against perturbations seen in the real world. Our code and trained models are available at https://github.com/edwardjhu/improved_wasserstein . △ Less

Submitted 9 May, 2023; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: Best paper award at ICLR Trustworthy ML Workshop 2020

arXiv:2003.10045 [pdf, other]

Architectural Resilience to Foreground-and-Background Adversarial Noise

Authors: Carl Cheng, Evan Hu

Abstract: Adversarial attacks in the form of imperceptible perturbations of normal images have been extensively studied, and for every new defense methodology created, multiple adversarial attacks are found to counteract it. In particular, a popular style of attack, exemplified in recent years by DeepFool and Carlini-Wagner, relies solely on white-box scenarios in which full access to the predictive model a… ▽ More Adversarial attacks in the form of imperceptible perturbations of normal images have been extensively studied, and for every new defense methodology created, multiple adversarial attacks are found to counteract it. In particular, a popular style of attack, exemplified in recent years by DeepFool and Carlini-Wagner, relies solely on white-box scenarios in which full access to the predictive model and its weights are required. In this work, we instead propose distinct model-agnostic benchmark perturbations of images in order to investigate the resilience and robustness of different network architectures. Results empirically determine that increasing depth within most types of Convolutional Neural Networks typically improves model resilience towards general attacks, with improvement steadily decreasing as the model becomes deeper. Additionally, we find that a notable difference in adversarial robustness exists between residual architectures with skip connections and non-residual architectures of similar complexity. Our findings provide direction for future understanding of residual connections and depth on network robustness. △ Less

Submitted 7 June, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: 9 pages, 8 figures; updated email addresses

arXiv:2002.08118 [pdf, other]

Randomized Smoothing of All Shapes and Sizes

Authors: Greg Yang, Tony Duan, J. Edward Hu, Hadi Salman, Ilya Razenshteyn, Jerry Li

Abstract: Randomized smoothing is the current state-of-the-art defense with provable robustness against $\ell_2$ adversarial attacks. Many works have devised new randomized smoothing schemes for other metrics, such as $\ell_1$ or $\ell_\infty$; however, substantial effort was needed to derive such new guarantees. This begs the question: can we find a general theory for randomized smoothing? We propose a n… ▽ More Randomized smoothing is the current state-of-the-art defense with provable robustness against $\ell_2$ adversarial attacks. Many works have devised new randomized smoothing schemes for other metrics, such as $\ell_1$ or $\ell_\infty$; however, substantial effort was needed to derive such new guarantees. This begs the question: can we find a general theory for randomized smoothing? We propose a novel framework for devising and analyzing randomized smoothing schemes, and validate its effectiveness in practice. Our theoretical contributions are: (1) we show that for an appropriate notion of "optimal", the optimal smoothing distributions for any "nice" norms have level sets given by the norm's *Wulff Crystal*; (2) we propose two novel and complementary methods for deriving provably robust radii for any smoothing distribution; and, (3) we show fundamental limits to current randomized smoothing techniques via the theory of *Banach space cotypes*. By combining (1) and (2), we significantly improve the state-of-the-art certified accuracy in $\ell_1$ on standard datasets. Meanwhile, we show using (3) that with only label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbations of $\ell_p$-norm $Ω(\min(1, d^{\frac{1}{p} - \frac{1}{2}}))$, when the input dimension $d$ is large. We provide code in github.com/tonyduan/rs4a. △ Less

Submitted 23 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: 9 pages main text, 49 pages total

arXiv:1912.07670 [pdf, other]

To Follow or not to Follow: Selective Imitation Learning from Observations

Authors: Youngwoon Lee, Edward S. Hu, Zhengyu Yang, Joseph J. Lim

Abstract: Learning from demonstrations is a useful way to transfer a skill from one agent to another. While most imitation learning methods aim to mimic an expert skill by following the demonstration step-by-step, imitating every step in the demonstration often becomes infeasible when the learner and its environment are different from the demonstration. In this paper, we propose a method that can imitate a… ▽ More Learning from demonstrations is a useful way to transfer a skill from one agent to another. While most imitation learning methods aim to mimic an expert skill by following the demonstration step-by-step, imitating every step in the demonstration often becomes infeasible when the learner and its environment are different from the demonstration. In this paper, we propose a method that can imitate a demonstration composed solely of observations, which may not be reproducible with the current agent. Our method, dubbed selective imitation learning from observations (SILO), selects reachable states in the demonstration and learns how to reach the selected states. Our experiments on both simulated and real robot environments show that our method reliably performs a new task by following a demonstration. Videos and code are available at https://clvrai.com/silo . △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: Published at the Conference on Robot Learning (CoRL) 2019

arXiv:1911.07246 [pdf, other]

IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks

Authors: Youngwoon Lee, Edward S. Hu, Zhengyu Yang, Alex Yin, Joseph J. Lim

Abstract: The IKEA Furniture Assembly Environment is one of the first benchmarks for testing and accelerating the automation of complex manipulation tasks. The environment is designed to advance reinforcement learning from simple toy tasks to complex tasks requiring both long-term planning and sophisticated low-level control. Our environment supports over 80 different furniture models, Sawyer and Baxter rob… ▽ More The IKEA Furniture Assembly Environment is one of the first benchmarks for testing and accelerating the automation of complex manipulation tasks. The environment is designed to advance reinforcement learning from simple toy tasks to complex tasks requiring both long-term planning and sophisticated low-level control. Our environment supports over 80 different furniture models, Sawyer and Baxter robot simulation, and domain randomization. The IKEA Furniture Assembly Environment is a testbed for methods aiming to solve complex manipulation tasks. The environment is publicly available at https://clvrai.com/furniture △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: Simulator

arXiv:1905.04629 [pdf, other]

Efficient Low-Rank Semidefinite Programming with Robust Loss Functions

Authors: Quanming Yao, Hangsi Yang, En-Liang Hu, James Kwok

Abstract: In real-world applications, it is important for machine learning algorithms to be robust against data outliers or corruptions. In this paper, we focus on improving the robustness of a large class of learning algorithms that are formulated as low-rank semi-definite programming (SDP) problems. Traditional formulations use square loss, which is notorious for being sensitive to outliers. We propose to… ▽ More In real-world applications, it is important for machine learning algorithms to be robust against data outliers or corruptions. In this paper, we focus on improving the robustness of a large class of learning algorithms that are formulated as low-rank semi-definite programming (SDP) problems. Traditional formulations use square loss, which is notorious for being sensitive to outliers. We propose to replace this with more robust noise models, including the $\ell_1$-loss and other nonconvex losses. However, the resultant optimization problem becomes difficult as the objective is no longer convex or smooth. To alleviate this problem, we design an efficient algorithm based on majorization-minimization. The crux is on constructing a good optimization surrogate, and we show that this surrogate can be efficiently obtained by the alternating direction method of multipliers (ADMM). By properly monitoring ADMM's convergence, the proposed algorithm is empirically efficient and also theoretically guaranteed to converge to a critical point. Extensive experiments are performed on four machine learning applications using both synthetic and real-world data sets. Results show that the proposed algorithm is not only fast but also has better performance than the state-of-the-art. △ Less

Submitted 2 June, 2021; v1 submitted 11 May, 2019; originally announced May 2019.

Comments: Preprint version. Final version is accepted to "IEEE Transactions on Pattern Analysis and Machine Intelligence"

arXiv:1901.03644 [pdf, other]

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

Authors: J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme

Abstract: We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sente… ▽ More We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks. △ Less

Submitted 11 January, 2019; originally announced January 2019.

Comments: To be presented at AAAI 2019. 8 pages

arXiv:1811.01198 [pdf, other]

A biconvex optimization for solving semidefinite programs via bilinear factorization

Authors: En-Liang Hu

Abstract: Many problems in machine learning can be reduced to learning a low-rank positive semidefinite matrix (denoted as $Z$), which encounters semidefinite program (SDP). Existing SDP solvers by classical convex optimization are expensive to solve large-scale problems. Employing the low rank of solution, Burer-Monteiro's method reformulated SDP as a nonconvex problem via the $quadratic$ factorization (… ▽ More Many problems in machine learning can be reduced to learning a low-rank positive semidefinite matrix (denoted as $Z$), which encounters semidefinite program (SDP). Existing SDP solvers by classical convex optimization are expensive to solve large-scale problems. Employing the low rank of solution, Burer-Monteiro's method reformulated SDP as a nonconvex problem via the $quadratic$ factorization ($Z$ as $XX^\top$). However, this would lose the structure of problem in optimization. In this paper, we propose to convert SDP into a biconvex problem via the $bilinear$ factorization ($Z$ as $XY^\top$), and while adding the term $\fracγ{2}||X-Y||_F^2$ to penalize the difference of $X$ and $Y$. Thus, the biconvex structure (w.r.t. $X$ and $Y$) can be exploited naturally in optimization. As a theoretical result, we provide a bound to the penalty parameter $γ$ under the assumption of $L$-Lipschitz smoothness and $σ$-strongly biconvexity, such that, at stationary points, the proposed bilinear factorization is equivalent to Burer-Monteiro's factorization when the bound is arrived, that is $γ>\frac{1}{4}(L-σ)_+$. Our proposal opens up a new way to surrogate SDP by biconvex program. Experiments on two SDP-related applications demonstrate that the proposed method is effective as the state-of-the-art. △ Less

Submitted 28 September, 2021; v1 submitted 3 November, 2018; originally announced November 2018.

arXiv:1804.08207 [pdf, ps, other]

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Authors: Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme

Abstract: We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our c… ▽ More We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources. △ Less

Submitted 29 August, 2018; v1 submitted 22 April, 2018; originally announced April 2018.

Comments: To be presented at EMNLP 2018. 15 pages

Showing 1–41 of 41 results for author: Hu, E