Search | arXiv e-print repository

Logically Constrained Robotics Transformers for Enhanced Perception-Action Planning

Authors: Parv Kapoor, Sai Vemprala, Ashish Kapoor

Abstract: With the advent of large foundation model based planning, there is a dire need to ensure their output aligns with the stakeholder's intent. When these models are deployed in the real world, the need for alignment is magnified due to the potential cost to life and infrastructure due to unexpected faliures. Temporal Logic specifications have long provided a way to constrain system behaviors and are… ▽ More With the advent of large foundation model based planning, there is a dire need to ensure their output aligns with the stakeholder's intent. When these models are deployed in the real world, the need for alignment is magnified due to the potential cost to life and infrastructure due to unexpected faliures. Temporal Logic specifications have long provided a way to constrain system behaviors and are a natural fit for these use cases. In this work, we propose a novel approach to factor in signal temporal logic specifications while using autoregressive transformer models for trajectory planning. We also provide a trajectory dataset for pretraining and evaluating foundation models. Our proposed technique acheives 74.3 % higher specification satisfaction over the baselines. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: Robotics Science and Systems: Towards Safe Autonomy

arXiv:2310.02437 [pdf, other]

EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Authors: Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta

Abstract: We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast… ▽ More We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast motion with almost no motion blur. Neural radiance fields (NeRFs) offer visual-quality geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes. Our EvDNeRF can predict eventstreams of dynamic scenes from a static or moving viewpoint between any desired timestamps, thereby allowing it to be used as an event-based simulator for a given scene. We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions, outperforming baselines that pair standard dynamic NeRFs with event generators. We release our simulated and real datasets, as well as code for multi-view event-based data generation and the training and evaluation of EvDNeRF models (https://github.com/anish-bhattacharya/EvDNeRF). △ Less

Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: 16 pages, 20 figures, 2 tables

arXiv:2310.00887 [pdf, other]

GRID: A Platform for General Robot Intelligence Development

Authors: Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor

Abstract: Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to addres… ▽ More Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to address both of these issues. The platform enables robots to learn, compose and adapt skills to their physical capabilities, environmental constraints and goals. The platform addresses AI problems in robotics via foundation models that know the physical world. GRID is designed from the ground up to be extensible to accommodate new types of robots, vehicles, hardware platforms and software protocols. In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems. We demonstrate the platform in various aerial robotics scenarios and demonstrate how the platform dramatically accelerates development of machine intelligent robots. △ Less

Submitted 7 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2307.07909 [pdf, other]

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

Authors: Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma

Abstract: We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised… ▽ More We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised objective tailored for control tasks and then learns how to make decisions based on different contexts through imitating behaviors conditioned on given prompts. DualMind can handle tasks across domains, scenes, and embodiments using just a single set of model weights and can execute zero-shot prompting without requiring task-specific fine-tuning. We evaluate DualMind on MetaWorld and Habitat through extensive experiments and demonstrate its superior generalizability compared to previous techniques, outperforming other generalist agents by over 50$\%$ and 70$\%$ on Habitat and MetaWorld, respectively. On the 45 tasks in MetaWorld, DualMind achieves over 30 tasks at a 90$\%$ success rate. △ Less

Submitted 9 October, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

arXiv:2306.17582 [pdf, other]

ChatGPT for Robotics: Design Principles and Model Abilities

Authors: Sai Vemprala, Rogerio Bonatti, Arthur Bucker, Ashish Kapoor

Abstract: This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering… ▽ More This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering techniques and dialog strategies towards the execution of various types of robotics tasks. We explore ChatGPT's ability to use free-form dialog, parse XML tags, and to synthesize code, in addition to the use of task-specific prompting functions and closed-loop reasoning through dialogues. Our study encompasses a range of tasks within the robotics domain, from basic logical, geometrical, and mathematical reasoning all the way to complex domains such as aerial navigation, manipulation, and embodied agents. We show that ChatGPT can be effective at solving several of such tasks, while allowing users to interact with it primarily via natural language instructions. In addition to these studies, we introduce an open-sourced research tool called PromptCraft, which contains a platform where researchers can collaboratively upload and vote on examples of good prompting schemes for robotics applications, as well as a sample robotics simulator with ChatGPT integration, making it easier for users to get started with using ChatGPT for robotics. △ Less

Submitted 19 July, 2023; v1 submitted 20 February, 2023; originally announced June 2023.

arXiv:2303.04212 [pdf, other]

ConBaT: Control Barrier Transformer for Safe Policy Learning

Authors: Yue Meng, Sai Vemprala, Rogerio Bonatti, Chuchu Fan, Ashish Kapoor

Abstract: Large-scale self-supervised models have recently revolutionized our ability to perform a variety of tasks within the vision and language domains. However, using such models for autonomous systems is challenging because of safety requirements: besides executing correct actions, an autonomous agent must also avoid the high cost and potentially fatal critical mistakes. Traditionally, self-supervised… ▽ More Large-scale self-supervised models have recently revolutionized our ability to perform a variety of tasks within the vision and language domains. However, using such models for autonomous systems is challenging because of safety requirements: besides executing correct actions, an autonomous agent must also avoid the high cost and potentially fatal critical mistakes. Traditionally, self-supervised training mainly focuses on imitating previously observed behaviors, and the training demonstrations carry no notion of which behaviors should be explicitly avoided. In this work, we propose Control Barrier Transformer (ConBaT), an approach that learns safe behaviors from demonstrations in a self-supervised fashion. ConBaT is inspired by the concept of control barrier functions in control theory and uses a causal transformer that learns to predict safe robot actions autoregressively using a critic that requires minimal safety data labeling. During deployment, we employ a lightweight online optimization to find actions that ensure future states lie within the learned safe set. We apply our approach to different simulated control tasks and show that our method results in safer control policies compared to other classical and learning-based methods such as imitation learning, reinforcement learning, and model predictive control. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2211.15286 [pdf, other]

Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022

Authors: Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor, Kui Ren

Abstract: In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made available. In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made available. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: 5 pages

arXiv:2210.16294 [pdf]

Learning Modular Simulations for Homogeneous Systems

Authors: Jayesh K. Gupta, Sai Vemprala, Ashish Kapoor

Abstract: Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neu… ▽ More Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neural differential equations. We learn to model the individual dynamical subsystem as a neural ODE module. Full simulation of the composite system is orchestrated via spatio-temporal message passing between these modules. An arbitrary number of modules can be combined to simulate systems of a wide variety of coupling topologies. We evaluate our framework on a variety of systems and show that message passing allows coordination between multiple modules over time for accurate predictions and in certain cases, enables zero-shot generalization to new system configurations. Furthermore, we show that our models can be transferred to new system configurations with lower data requirement and training effort, compared to those trained from scratch. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally. Accepted at NeurIPS 2022

arXiv:2209.11133 [pdf, other]

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

Authors: Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor

Abstract: Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge. Inspired by large pre-trained language models, this work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot… ▽ More Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge. Inspired by large pre-trained language models, this work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. Through autoregressive prediction of states and actions over time, our model implicitly encodes dynamics and behaviors for a particular robot. Our experimental evaluation focuses on the domain of mobile agents, where we show that this robot-specific representation can function as a single starting point to achieve distinct tasks such as safe navigation, localization and mapping. We evaluate two form factors: a wheeled robot that uses a LiDAR sensor as perception input (MuSHR), and a simulated agent that uses first-person RGB images (Habitat). We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously, and comparable performance to training a separate large model for each task independently. By sharing a common good-quality representation across tasks we can lower overall model capacity and speed up the real-time deployment of such systems. △ Less

Submitted 23 September, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.10986 [pdf, other]

Learning to Simulate Realistic LiDARs

Authors: Benoit Guillard, Sai Vemprala, Jayesh K. Gupta, Ondrej Miksik, Vibhav Vineet, Pascal Fua, Ashish Kapoor

Abstract: Simulating realistic sensors is a challenging part in data generation for autonomous systems, often involving carefully handcrafted sensor design, scene properties, and physics modeling. To alleviate this, we introduce a pipeline for data-driven simulation of a realistic LiDAR sensor. We propose a model that learns a mapping between RGB images and corresponding LiDAR features such as raydrop or pe… ▽ More Simulating realistic sensors is a challenging part in data generation for autonomous systems, often involving carefully handcrafted sensor design, scene properties, and physics modeling. To alleviate this, we introduce a pipeline for data-driven simulation of a realistic LiDAR sensor. We propose a model that learns a mapping between RGB images and corresponding LiDAR features such as raydrop or per-point intensities directly from real datasets. We show that our model can learn to encode realistic effects such as dropped points on transparent surfaces or high intensity returns on reflective materials. When applied to naively raycasted point clouds provided by off-the-shelf simulator software, our model enhances the data by predicting intensities and removing points based on the scene's appearance to match a real LiDAR sensor. We use our technique to learn models of two distinct LiDAR sensors and use them to improve simulated LiDAR data accordingly. Through a sample task of vehicle segmentation, we show that enhancing simulated point clouds with our technique improves downstream task performance. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: IROS2022 paper

arXiv:2208.02918 [pdf, other]

LATTE: LAnguage Trajectory TransformEr

Authors: Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, Rogerio Bonatti

Abstract: Natural language is one of the most intuitive ways to express human intent. However, translating instructions and commands towards robotic motion generation and deployment in the real world is far from being an easy task. The challenge of combining a robot's inherent low-level geometric and kinodynamic constraints with a human's high-level semantic instructions traditionally is solved using task-s… ▽ More Natural language is one of the most intuitive ways to express human intent. However, translating instructions and commands towards robotic motion generation and deployment in the real world is far from being an easy task. The challenge of combining a robot's inherent low-level geometric and kinodynamic constraints with a human's high-level semantic instructions traditionally is solved using task-specific solutions with little generalizability between hardware platforms, often with the use of static sets of target actions and commands. This work instead proposes a flexible language-based framework that allows a user to modify generic robotic trajectories. Our method leverages pre-trained language models (BERT and CLIP) to encode the user's intent and target objects directly from a free-form text input and scene images, fuses geometrical features generated by a transformer encoder network, and finally outputs trajectories using a transformer decoder, without the need of priors related to the task or robot information. We significantly extend our own previous work presented in Bucker et al. by expanding the trajectory parametrization space to 3D and velocity as opposed to just XY movements. In addition, we now train the model to use actual images of the objects in the scene for context (as opposed to textual descriptions), and we evaluate the system in a diverse set of scenarios beyond manipulation, such as aerial and legged robots. Our simulated and real-life experiments demonstrate that our transformer model can successfully follow human intent, modifying the shape and speed of trajectories within multiple environments. Codebase available at: https://github.com/arthurfenderbucker/LaTTe-Language-Trajectory-TransformEr.git △ Less

Submitted 16 September, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2204.08945 [pdf, other]

Missingness Bias in Model Debugging

Authors: Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry

Abstract: Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools. However, in computer vision, pixels cannot simply be removed from an image. One thus tends to resort to heuristics such as blacking out pixels, which may in turn introduce bias into the debugging process. We study such biases and, in particular, show how transformer-based architectures ca… ▽ More Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools. However, in computer vision, pixels cannot simply be removed from an image. One thus tends to resort to heuristics such as blacking out pixels, which may in turn introduce bias into the debugging process. We study such biases and, in particular, show how transformer-based architectures can enable a more natural implementation of missingness, which side-steps these issues and improves the reliability of model debugging in practice. Our code is available at https://github.com/madrylab/missingness △ Less

Submitted 13 June, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Published at ICLR 2022

arXiv:2203.15788 [pdf, other]

COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems

Authors: Shuang Ma, Sai Vemprala, Wenshan Wang, Jayesh K. Gupta, Yale Song, Daniel McDuff, Ashish Kapoor

Abstract: Learning representations that generalize across tasks and domains is challenging yet necessary for autonomous systems. Although task-driven approaches are appealing, designing models specific to each application can be difficult in the face of limited data, especially when dealing with highly variable multimodal input spaces arising from different tasks in different environments.We introduce the f… ▽ More Learning representations that generalize across tasks and domains is challenging yet necessary for autonomous systems. Although task-driven approaches are appealing, designing models specific to each application can be difficult in the face of limited data, especially when dealing with highly variable multimodal input spaces arising from different tasks in different environments.We introduce the first general-purpose pretraining pipeline, COntrastive Multimodal Pretraining for AutonomouS Systems (COMPASS), to overcome the limitations of task-specific models and existing pretraining approaches. COMPASS constructs a multimodal graph by considering the essential information for autonomous systems and the properties of different modalities. Through this graph, multimodal signals are connected and mapped into two factorized spatio-temporal latent spaces: a "motion pattern space" and a "current state space." By learning from multimodal correspondences in each latent space, COMPASS creates state representations that models necessary information such as temporal dynamics, geometry, and semantics. We pretrain COMPASS on a large-scale multimodal simulation dataset TartanAir \cite{tartanair2020iros} and evaluate it on drone navigation, vehicle racing, and visual odometry tasks. The experiments indicate that COMPASS can tackle all three scenarios and can also generalize to unseen environments and real-world data. △ Less

Submitted 19 February, 2022; originally announced March 2022.

arXiv:2106.13364 [pdf, other]

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

Authors: Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor

Abstract: The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examp… ▽ More The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examples of the outcomes in the case of counterfactual scenarios. However, simulating complex temporal causal events in multi-agent scenarios, such as those that exist in driving and vehicle navigation, is challenging. To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context. A core component of our work is to introduce \textit{agency}, such that it is simple to define and create complex scenarios using high-level definitions. The vehicles then operate with agency to complete these objectives, meaning low-level behaviors need only be controlled if necessary. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment. Finally, we highlight challenges and opportunities for future work. △ Less

Submitted 24 June, 2021; originally announced June 2021.

arXiv:2106.03805 [pdf, other]

3DB: A Framework for Debugging Computer Vision Models

Authors: Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

Abstract: We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation. We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. 3DB captures and generalizes many robustness analyses from prior work, and enables one to study th… ▽ More We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation. We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. 3DB captures and generalizes many robustness analyses from prior work, and enables one to study their interplay. Finally, we find that the insights generated by the system transfer to the physical world. We are releasing 3DB as a library (https://github.com/3db/3db) alongside a set of example analyses, guides, and documentation: https://3db.github.io/3db/ . △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:2103.00806 [pdf, other]

Representation Learning for Event-based Visuomotor Policies

Authors: Sai Vemprala, Sami Mian, Ashish Kapoor

Abstract: Event-based cameras are dynamic vision sensors that provide asynchronous measurements of changes in per-pixel brightness at a microsecond level. This makes them significantly faster than conventional frame-based cameras, and an appealing choice for high-speed navigation. While an interesting sensor modality, this asynchronously streamed event data poses a challenge for machine learning techniques… ▽ More Event-based cameras are dynamic vision sensors that provide asynchronous measurements of changes in per-pixel brightness at a microsecond level. This makes them significantly faster than conventional frame-based cameras, and an appealing choice for high-speed navigation. While an interesting sensor modality, this asynchronously streamed event data poses a challenge for machine learning techniques that are more suited for frame-based data. In this paper, we present an event variational autoencoder and show that it is feasible to learn compact representations directly from asynchronous spatiotemporal event data. Furthermore, we show that such pretrained representations can be used for event-based reinforcement learning instead of end-to-end reward driven perception. We validate this framework of learning event-based visuomotor policies by applying it to an obstacle avoidance scenario in simulation. Compared to techniques that treat event data as images, we show that representations learnt from event streams result in faster policy training, adapt to different control capacities, and demonstrate a higher degree of robustness. △ Less

Submitted 29 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2012.12235 [pdf, other]

Unadversarial Examples: Designing Objects for Robust Vision

Authors: Hadi Salman, Andrew Ilyas, Logan Engstrom, Sai Vemprala, Aleksander Madry, Ashish Kapoor

Abstract: We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized. We develop a framework that leverages this capability to significantly improve vision models' performance and robustness. This framework exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design "robust objects," i.e., objects t… ▽ More We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized. We develop a framework that leverages this capability to significantly improve vision models' performance and robustness. This framework exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design "robust objects," i.e., objects that are explicitly optimized to be confidently detected or classified. We demonstrate the efficacy of the framework on a wide variety of vision-based tasks ranging from standard benchmarks, to (in-simulation) robotics, to real-world experiments. Our code can be found at https://git.io/unadversarial . △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2011.00095 [pdf, other]

Adversarial Attacks on Optimization based Planners

Authors: Sai Vemprala, Ashish Kapoor

Abstract: Trajectory planning is a key piece in the algorithmic architecture of a robot. Trajectory planners typically use iterative optimization schemes for generating smooth trajectories that avoid collisions and are optimal for tracking given the robot's physical specifications. Starting from an initial estimate, the planners iteratively refine the solution so as to satisfy the desired constraints. In th… ▽ More Trajectory planning is a key piece in the algorithmic architecture of a robot. Trajectory planners typically use iterative optimization schemes for generating smooth trajectories that avoid collisions and are optimal for tracking given the robot's physical specifications. Starting from an initial estimate, the planners iteratively refine the solution so as to satisfy the desired constraints. In this paper, we show that such iterative optimization based planners can be vulnerable to adversarial attacks that force the planner either to fail completely, or significantly increase the time required to find a solution. The key insight here is that an adversary in the environment can directly affect the optimization cost function of a planner. We demonstrate how the adversary can adjust its own state configurations to result in poorly conditioned eigenstructure of the objective leading to failures. We apply our method against two state of the art trajectory planners and demonstrate that an adversary can consistently exploit certain weaknesses of an iterative optimization scheme. △ Less

Submitted 4 June, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

Comments: 7 pages. Presented at ICRA 2021

arXiv:2003.05654 [pdf, other]

AirSim Drone Racing Lab

Authors: Ratnesh Madaan, Nicholas Gyde, Sai Vemprala, Matthew Brown, Keiko Nagami, Tim Taubner, Eric Cristofalo, Davide Scaramuzza, Mac Schwager, Ashish Kapoor

Abstract: Autonomous drone racing is a challenging research problem at the intersection of computer vision, planning, state estimation, and control. We introduce AirSim Drone Racing Lab, a simulation framework for enabling fast prototyping of algorithms for autonomy and enabling machine learning research in this domain, with the goal of reducing the time, money, and risks associated with field robotics. Our… ▽ More Autonomous drone racing is a challenging research problem at the intersection of computer vision, planning, state estimation, and control. We introduce AirSim Drone Racing Lab, a simulation framework for enabling fast prototyping of algorithms for autonomy and enabling machine learning research in this domain, with the goal of reducing the time, money, and risks associated with field robotics. Our framework enables generation of racing tracks in multiple photo-realistic environments, orchestration of drone races, comes with a suite of gate assets, allows for multiple sensor modalities (monocular, depth, neuromorphic events, optical flow), different camera models, and benchmarking of planning, control, computer vision, and learning-based algorithms. We used our framework to host a simulation based drone racing competition at NeurIPS 2019. The competition binaries are available at our github repository. △ Less

Submitted 12 March, 2020; originally announced March 2020.

Comments: 14 pages, 6 figures

arXiv:2001.08198 [pdf, other]

Safety Considerations in Deep Control Policies with Safety Barrier Certificates Under Uncertainty

Authors: Tom Hirshberg, Sai Vemprala, Ashish Kapoor

Abstract: Recent advances in Deep Machine Learning have shown promise in solving complex perception and control loops via methods such as reinforcement and imitation learning. However, guaranteeing safety for such learned deep policies has been a challenge due to issues such as partial observability and difficulties in characterizing the behavior of the neural networks. While a lot of emphasis in safe learn… ▽ More Recent advances in Deep Machine Learning have shown promise in solving complex perception and control loops via methods such as reinforcement and imitation learning. However, guaranteeing safety for such learned deep policies has been a challenge due to issues such as partial observability and difficulties in characterizing the behavior of the neural networks. While a lot of emphasis in safe learning has been placed during training, it is non-trivial to guarantee safety at deployment or test time. This paper extends how under mild assumptions, Safety Barrier Certificates can be used to guarantee safety with deep control policies despite uncertainty arising due to perception and other latent variables. Specifically for scenarios where the dynamics are smooth and uncertainty has a finite support, the proposed framework wraps around an existing deep control policy and generates safe actions by dynamically evaluating and modifying the policy from the embedded network. Our framework utilizes control barrier functions to create spaces of control actions that are safe under uncertainty, and when the original actions are found to be in violation of the safety constraint, uses quadratic programming to minimally modify the original actions to ensure they lie in the safe set. Representations of the environment are built through Euclidean signed distance fields that are then used to infer the safety of actions and to guarantee forward invariance. We implement this method in simulation in a drone-racing environment and show that our method results in safer actions compared to a baseline that only relies on imitation learning to generate control actions. △ Less

Submitted 2 March, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1905.02648 [pdf, other]

Collaborative Localization for Micro Aerial Vehicles

Authors: Sai Vemprala, Srikanth Saripalli

Abstract: In this paper, we present a framework for performing collaborative localization for groups of micro aerial vehicles (MAV) that use vision based sensing. The vehicles are each assumed to be equipped with a forward-facing monocular camera, and to be capable of communicating with each other. This collaborative localization approach is developed as a decentralized algorithm and built in a distributed… ▽ More In this paper, we present a framework for performing collaborative localization for groups of micro aerial vehicles (MAV) that use vision based sensing. The vehicles are each assumed to be equipped with a forward-facing monocular camera, and to be capable of communicating with each other. This collaborative localization approach is developed as a decentralized algorithm and built in a distributed fashion where individual and relative pose estimation techniques are combined for the group to localize against surrounding environments. The MAVs initially detect and match salient features between each other to create a sparse reconstruction of the observed environment, which acts as a global map. Once a map is available, each MAV performs feature detection and tracking with a robust outlier rejection process to estimate its own pose in 6 degrees of freedom. Occasionally, one or more MAVs can be tasked to compute poses for another MAV through relative measurements, which is achieved by exploiting multiple view geometry concepts. These relative measurements are then fused with individual measurements in a consistent fashion. We present the results of the algorithm on image data from MAV flights both in simulation and real life, and discuss the advantages of collaborative localization in improving pose estimation accuracy. △ Less

Submitted 10 May, 2020; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: Supplementary video at https://www.youtube.com/watch?v=LvaTOWuTOPo

arXiv:1808.00259 [pdf, other]

Drone Detection Using Depth Maps

Authors: Adrian Carrio, Sai Vemprala, Andres Ripoll, Srikanth Saripalli, Pascual Campoy

Abstract: Obstacle avoidance is a key feature for safe Unmanned Aerial Vehicle (UAV) navigation. While solutions have been proposed for static obstacle avoidance, systems enabling avoidance of dynamic objects, such as drones, are hard to implement due to the detection range and field-of-view (FOV) requirements, as well as the constraints for integrating such systems on-board small UAVs. In this work, a data… ▽ More Obstacle avoidance is a key feature for safe Unmanned Aerial Vehicle (UAV) navigation. While solutions have been proposed for static obstacle avoidance, systems enabling avoidance of dynamic objects, such as drones, are hard to implement due to the detection range and field-of-view (FOV) requirements, as well as the constraints for integrating such systems on-board small UAVs. In this work, a dataset of 6k synthetic depth maps of drones has been generated and used to train a state-of-the-art deep learning-based drone detection model. While many sensing technologies can only provide relative altitude and azimuth of an obstacle, our depth map-based approach enables full 3D localization of the obstacle. This is extremely useful for collision avoidance, as 3D localization of detected drones is key to perform efficient collision-free path planning. The proposed detection technique has been validated in several real depth map sequences, with multiple types of drones flying at up to 2 m/s, achieving an average precision of 98.7%, an average recall of 74.7% and a record detection range of 9.5 meters. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018), Madrid, Spain

arXiv:1804.02510 [pdf, other]

Monocular Vision based Collaborative Localization for Micro Aerial Vehicle Swarms

Authors: Sai Vemprala, Srikanth Saripalli

Abstract: In this paper, we present a vision based collaborative localization framework for groups of micro aerial vehicles (MAV). The vehicles are each assumed to be equipped with a forward-facing monocular camera, and to be capable of communicating with each other. This collaborative localization approach is built upon a distributed algorithm where individual and relative pose estimation techniques are co… ▽ More In this paper, we present a vision based collaborative localization framework for groups of micro aerial vehicles (MAV). The vehicles are each assumed to be equipped with a forward-facing monocular camera, and to be capable of communicating with each other. This collaborative localization approach is built upon a distributed algorithm where individual and relative pose estimation techniques are combined for the group to localize against surrounding environments. The MAVs initially detect and match salient features between each other to create a sparse reconstruction of the observed environment, which acts as a global map. Once a map is available, each MAV performs feature detection and tracking with a robust outlier rejection process to estimate its own six degree-of-freedom pose. Occasionally, the MAVs can also fuse relative measurements with individual measurements through feature matching and multiple-view geometry based relative pose computation. We present the implementation of this algorithm for MAVs and environments simulated within Microsoft AirSim, and discuss the results and the advantages of collaborative localization. △ Less

Submitted 7 April, 2018; originally announced April 2018.

Comments: 9 pages, 12 figures, IEEE conference format

Showing 1–23 of 23 results for author: Vemprala, S