-
Modular Deutsch Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Khosravi, Drnovšek and Moslehian [\textit{Filomat, 2012}] derived Buzano inequality for Hilbert C*-modules. Using this inequality we derive Deutsch entropic uncertainty principle for Hilbert C*-modules over commutative unital C*-algebras.
Khosravi, Drnovšek and Moslehian [\textit{Filomat, 2012}] derived Buzano inequality for Hilbert C*-modules. Using this inequality we derive Deutsch entropic uncertainty principle for Hilbert C*-modules over commutative unital C*-algebras.
△ Less
Submitted 8 August, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Continuous Krishna-Parthasarathy Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to…
▽ More
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Authors:
Laksh Nanwani,
Kumaraditya Gupta,
Aditya Mathur,
Swayam Agrawal,
A. H. Abdul Hafez,
K. Madhava Krishna
Abstract:
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi…
▽ More
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation
Authors:
Gaurav Singh,
Sanket Kalwar,
Md Faizal Karim,
Bipasha Sen,
Nagamanikandan Govindan,
Srinath Sridhar,
K Madhava Krishna
Abstract:
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore set…
▽ More
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
△ Less
Submitted 15 July, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model
Authors:
Amith Manoharan,
Aditya Sharma,
Himani Belsare,
Kaustab Pal,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int…
▽ More
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories.
△ Less
Submitted 11 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Unexpected Uncertainty Principle for Disc Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n…
▽ More
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n,m \in \mathbb{N} }|f_n(ω_m)|\right)^p\left(\displaystyle\sup_{n, m \in \mathbb{N}}|g_m(τ_n)|\right)^p}, \end{align} where \begin{align*} & θ_f: \mathcal{D}(θ_f) \ni x \mapsto θ_fx := \{f_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}), \quad θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx := \{g_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}). \end{align*} Inequality (1) is unexpectedly different from both bounded uncertainty principle arXiv:2308.00312v1 and unbounded uncertainty principle arXiv:2312.00366v1 for Banach spaces.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving
Authors:
Pranjal Paul,
Anant Garg,
Tushar Choudhary,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh…
▽ More
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, which aims to address this issue by estimating a goal location based on the given language command as an intermediate representation in an end-to-end setting. The estimated goal might fall in a non-desirable region, like on top of a car for a parking-like command, leading to inadequate planning. Hence, we propose to train the architecture in an end-to-end manner, resulting in iterative refinement of both the goal and the trajectory collectively. We validate the effectiveness of our method through comprehensive experiments conducted in diverse simulated environments. We report significant improvements in standard autonomous driving metrics, with a goal reaching Success Rate of 81%. We further showcase the versatility of LeGo-Drive across different driving scenarios and linguistic inputs, underscoring its potential for practical deployment in autonomous vehicles and intelligent transportation systems.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
△ Less
Submitted 8 August, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Nonlinear Maccone-Pati Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\…
▽ More
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\quad \|θ_fx\|_0\|θ_gx\|_0\geq \frac{\bigg[1-(\|θ_fx\|_0-1)\max\limits_{1\leq j,r \leq n,j\neq r}|f_j(τ_r)|\bigg]^+\bigg[1-(\|θ_g x\|_0-1)\max\limits_{1\leq k,s \leq m,k\neq s}|g_k(ω_s)|\bigg]^+}{\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|f_j(ω_k)|\right)\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|g_k(τ_j)|\right)}. \end{align}
We call Inequality (1) as \textbf{Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Kuppinger, Durisi and Bölcskei \textit{[IEEE Trans. Inform. Theory (2012)]} (which improved the Donoho-Stark-Elad-Bruckstein uncertainty principle \textit{[SIAM J. Appl. Math. (1989), IEEE Trans. Inform. Theory (2002)]}). We also derive functional form of the uncertainity principle obtained by Studer, Kuppinger, Pope and Bölcskei \textit{[EEE Trans. Inform. Theory (2012)]}.
△ Less
Submitted 1 January, 2024;
originally announced February 2024.
-
ATPPNet: Attention based Temporal Point cloud Prediction Network
Authors:
Kaustab Pal,
Aditya Sharma,
Avinash Sharma,
K. Madhava Krishna
Abstract:
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry d…
▽ More
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principles
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f…
▽ More
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))ν(\operatorname{supp}(θ_g x)) \geq \frac{1}{\left(\displaystyle\sup_{α\in Ω, β\in Δ}|f_α(ω_β)|\right)\left(\displaystyle\sup_{α\in Ω, β\in Δ}|g_β(τ_α)|\right)}, \end{align} where \begin{align*} &θ_f:\mathcal{D}(θ_f) \ni x \mapsto θ_fx \in \mathcal{L}^p(Ω, μ); \quad θ_fx: Ω\ni α\mapsto (θ_fx) (α):= f_α(x) \in \mathbb{K},\\ &θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx \in \mathcal{L}^p(Δ, ν); \quad θ_gx: Δ\ni β\mapsto (θ_gx) (β):= g_β(x) \in \mathbb{K}. \end{align*} We call Inequality (1) as \textbf{Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Along with recent \textbf{Functional Continuous Uncertainty Principle} [arXiv:2308.00312], Inequality (1) also improves Ricaud-Torrésani uncertainty principle [IEEE Trans. Inform. Theory, 2013]. In particular, it improves Elad-Bruckstein uncertainty principle [IEEE Trans. Inform. Theory, 2002] and Donoho-Stark uncertainty principle [SIAM J. Appl. Math., 1989].
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing
Authors:
Dhruv Patel,
Shivani Chepuri,
Sarvesh Thakur,
K. Harikumar,
Ravi Kiran S.,
K. Madhava Krishna
Abstract:
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and co…
▽ More
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by developing computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving
Authors:
Kaustab Pal,
Aditya Sharma,
Mohd Omama,
Parth N. Shah,
K. Madhava Krishna
Abstract:
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha…
▽ More
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon that precludes further resampling, an iterative process that makes sampling based optimal control formulations difficult to adopt in real time settings. Generating control samples around the network-predicted optimal mean retains the advantage of sample diversity while enabling real time rollout of trajectories that avoids multiple dynamic obstacles in an on-road navigation setting. Further the 3D CNN architecture implicitly learns the future trajectories of the dynamic agents in the scene resulting in successful collision free navigation despite no explicit future trajectory prediction. We show performance gain over multiple baselines in a number of on-road scenes through closed loop simulations in CARLA. We also showcase the real world applicability of our system by running it on our custom Autonomous Driving Platform (AutoDP).
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction
Authors:
Basant Sharma,
Aditya Sharma,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. H…
▽ More
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. However, existing planners cannot handle the multi-modality based on just sample-level information of the predictions. With this motivation, this paper proposes a trajectory optimizer that can leverage the distributional aspects of the prediction in a computationally tractable and sample-efficient manner. Our optimizer can work with arbitrarily complex distributions and thus can be used with output distribution represented as a deep neural network. The core of our approach is built on embedding distribution in Reproducing Kernel Hilbert Space (RKHS), which we leverage in two ways. First, we propose an RKHS embedding approach to select probable samples from the obstacle trajectory distribution. Second, we rephrase chance-constrained optimization as distribution matching in RKHS and propose a novel sampling-based optimizer for its solution. We validate our approach with hand-crafted and neural network-based predictors trained on real-world datasets and show improvement over the existing stochastic optimization approaches in safety metrics.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions
Authors:
Sanket Kalwar,
Mihir Ungarala,
Shruti Jain,
Aaron Monis,
Krishna Reddy Konda,
Sourav Garg,
K Madhava Krishna
Abstract:
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in founda…
▽ More
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.
△ Less
Submitted 26 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Authors:
Tushar Choudhary,
Vikrant Dewangan,
Shivam Chandhok,
Shubham Priyadarshan,
Anushka Jain,
Arun K. Singh,
Siddharth Srivastava,
Krishna Murthy Jatavallabhula,
K. Madhava Krishna
Abstract:
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representation…
▽ More
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
△ Less
Submitted 14 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Functional Donoho-Stark Approximate Support Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\la…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\label{ME} (1) \quad \quad \quad \quad &o(M)^\frac{1}{p}o(N)^\frac{1}{q}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|f_j(ω_k) |}\max \{1-\varepsilon-δ, 0\},\\ (2) \quad \quad \quad \quad&o(M)^\frac{1}{q}o(N)^\frac{1}{p}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}\max \{1-\varepsilon-δ, 0\},\label{ME2} \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^n \in \ell^p([n]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequalities (1) and (2) as \textbf{Functional Donoho-Stark Approximate Support Uncertainty Principle}. Inequalities (1) and (2) improve the finite approximate support uncertainty principle obtained by Donoho and Stark \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork
Authors:
Bipasha Sen,
Gaurav Singh,
Aditya Agarwal,
Rohith Agaram,
K Madhava Krishna,
Srinath Sridhar
Abstract:
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to…
▽ More
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
△ Less
Submitted 23 December, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images
Authors:
Vikrant Dewangan,
Basant Sharma,
Tushar Choudhary,
Sarthak Sharma,
Aakash Aanegola,
Arun K. Singh,
K. Madhava Krishna
Abstract:
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous drivin…
▽ More
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous driving. However, the current approaches have two critical gaps. First, the optimization process is simplistic and involves just evaluating a fixed set of trajectories over the cost map. The trajectory samples are not adapted based on their associated cost values. Second, the existing cost maps do not account for the uncertainty in the cost maps that can arise due to noise in RGB images, and BEV annotations. As a result, these approaches can struggle in challenging scenarios where there is abrupt cut-in, stopping, overtaking, merging, etc from the neighboring vehicles.
In this paper, we propose UAP-BEV: A novel approach that models the noise in Spatio-Temporal BEV predictions to create an uncertainty-aware occupancy grid map. Using queries of the distance to the closest occupied cell, we obtain a sample estimate of the collision probability of the ego-vehicle. Subsequently, our approach uses gradient-free sampling-based optimization to compute low-cost trajectories over the cost map. Importantly, the sampling distribution is adapted based on the optimal cost values of the sampled trajectories. By explicitly modeling probabilistic collision avoidance in the BEV space, our approach is able to outperform the cost-map-based baselines in collision avoidance, route completion, time to completion, and smoothness. To further validate our method, we also show results on the real-world dataset NuScenes, where we report improvements in collision avoidance and smoothness.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Functional Ghobber-Jaming Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all $x \in \mathcal{X}$, we show that \begin{align}\label{FGJU} (1) \quad \quad \quad \quad \|x\|\leq \left(1+\frac{1}{1-o(M)^\frac{1}{q}o(N)^\frac{1}{p}\displaystyle\max_{1\leq j,k\leq n}|g_k(τ_j)|}\right)\left[\left(\sum_{j\in M^c}|f_j(x)|^p\right)^\frac{1}{p}+\left(\sum_{k\in N^c}|g_k(x) |^p\right)^\frac{1}{p}\right]. \end{align}
We call Inequality (1) as \textbf{Functional Ghobber-Jaming Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Ghobber and Jaming \textit{[Linear Algebra Appl., 2011]}.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Instance-Level Semantic Maps for Vision Language Navigation
Authors:
Laksh Nanwani,
Anmol Agarwal,
Kanishk Jain,
Raghav Prabhakar,
Aaron Monis,
Aditya Mathur,
Krishna Murthy,
Abdul Hafez,
Vineet Gandhi,
K. Madhava Krishna
Abstract:
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this…
▽ More
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this goal by creating a semantic spatial map representation of the environment without any labeled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the mapping representation. The resulting map representation improves the navigation performance by two-fold (233%) on realistic language commands with instance-specific descriptions compared to the baseline. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments.
△ Less
Submitted 1 July, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\f…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\frac{1}{p}\|θ_f x\|_0^\frac{1}{q}\geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|g_k(τ_j)|}. \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^m \in \ell^p([m]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequality (1) as \textbf{Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) improves Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, it improves Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Authors:
Sudarshan S Harithas,
Gurkirat Singh,
Aneesh Chavan,
Sarthak Sharma,
Suraj Patni,
Chetan Arora,
K. Madhava Krishna
Abstract:
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads…
▽ More
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves
Authors:
Pranjali Pathre,
Anurag Sahu,
Ashwin Rao,
Avinash Prabhu,
Meher Shashwat Nigam,
Tanvi Karandikar,
Harit Pandya,
K. Madhava Krishna
Abstract:
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented…
▽ More
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Non-Archimedean Welch Bounds and Non-Archimedean Zauner Conjecture
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and…
▽ More
Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and $\text{Sym}^m(\mathbb{K}^d)$ be the non-Archimedean Hilbert space of symmetric m-tensors. We prove the following result. If $\{τ_j\}_{j=1}^n$ is a collection in $\mathbb{K}^d$ satisfying $\langle τ_j, τ_j\rangle =1$ for all $1\leq j \leq n$ and the operator $\text{Sym}^m(\mathbb{K}^d)\ni x \mapsto \sum_{j=1}^n\langle x, τ_j^{\otimes m}\rangle τ_j^{\otimes m} \in \text{Sym}^m(\mathbb{K}^d)$ is diagonalizable, then \begin{align} (1) \quad \quad \quad \max_{1\leq j,k \leq n, j \neq k}\{|n|, |\langle τ_j, τ_k\rangle|^{2m} \}\geq \frac{|n|^2}{\left|{d+m-1 \choose m}\right| }. \end{align} We call Inequality (1) as the non-Archimedean version of Welch bounds obtained by Welch [\textit{IEEE Transactions on Information Theory, 1974}]. We formulate non-Archimedean Zauner conjecture.
△ Less
Submitted 28 August, 2022;
originally announced October 2022.
-
GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions
Authors:
Sanket Kalwar,
Dhruv Patel,
Aakash Aanegola,
Krishna Reddy Konda,
Sourav Garg,
K Madhava Krishna
Abstract:
Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition ima…
▽ More
Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition images such as those captured under fog and low lighting. Our proposed GDIP block learns to enhance images directly through the downstream object detection loss. This is achieved by learning parameters of multiple image pre-processing (IP) techniques that operate concurrently, with their outputs combined using weights learned through a novel gating mechanism. We further improve GDIP through a multi-stage guidance procedure for progressive image enhancement. Finally, trading off accuracy for speed, we propose a variant of GDIP that can be used as a regularizer for training Yolo, which eliminates the need for GDIP-based image enhancement during inference, resulting in higher throughput and plausible real-world deployment. We demonstrate significant improvement in detection performance over several state-of-the-art methods through quantitative and qualitative studies on synthetic datasets such as PascalVOC, and real-world foggy (RTTS) and low-lighting (ExDark) datasets.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
UAV-based Visual Remote Sensing for Automated Building Inspection
Authors:
Kushagra Srivastava,
Dhruv Patel,
Aditya Kumar Jha,
Mohhit Kumar Jha,
Jaskirat Singh,
Ravi Kiran Sarvadevabhatla,
Pradeep Kumar Ramancharla,
Harikumar Kandath,
K. Madhava Krishna
Abstract:
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the co…
▽ More
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the component's contribution to structural system performance. Most of these inspections are done manually, leading to high utilization of manpower, time, and cost. This paper proposes a methodology to automate these inspections through UAV-based image data collection and a software library for post-processing that helps in estimating the seismic structural parameters. The key parameters considered here are the distances between adjacent buildings, building plan-shape, building plan area, objects on the rooftop and rooftop layout. The accuracy of the proposed methodology in estimating the above-mentioned parameters is verified through field measurements taken using a distance measuring sensor and also from the data obtained through Google Earth. Additional details and code can be accessed from https://uvrsabi.github.io/ .
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Ground then Navigate: Language-guided Navigation in Dynamic Scenes
Authors:
Kanishk Jain,
Varun Chhangani,
Amogh Tiwari,
K. Madhava Krishna,
Vineet Gandhi
Abstract:
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, w…
▽ More
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded training sequences and a live environment for validation and testing. We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Real-Time Heuristic Framework for Safe Landing of UAVs in Dynamic Scenarios
Authors:
Jaskirat Singh,
Neel Adwani,
Harikumar Kandath,
K. Madhava Krishna
Abstract:
The world we live in is full of technology and with each passing day the advancement and usage of UAVs increases efficiently. As a result of the many application scenarios, there are some missions where the UAVs are vulnerable to external disruptions, such as a ground station's loss of connectivity, security missions, safety concerns, and delivery-related missions. Therefore, depending on the scen…
▽ More
The world we live in is full of technology and with each passing day the advancement and usage of UAVs increases efficiently. As a result of the many application scenarios, there are some missions where the UAVs are vulnerable to external disruptions, such as a ground station's loss of connectivity, security missions, safety concerns, and delivery-related missions. Therefore, depending on the scenario, this could affect the operations and result in the safe landing of UAVs. Hence, this paper presents a heuristic approach towards safe landing of multi-rotor UAVs in the dynamic environments. The aim of this approach is to detect safe potential landing zones - PLZ, and find out the best one to land in. The PLZ is initially, detected by processing an image through the canny edge algorithm, and then the diameter-area estimation is applied for each region with minimal edges. The spots that have a higher area than the vehicle's clearance are labeled as safe PLZ. Onto the second phase of this approach, the velocities of dynamic obstacles that are moving towards the PLZs are calculated and their time to reach the zones are taken into consideration. The ETA of the UAV is calculated and during the descending of UAV, the dynamic obstacle avoidance is executed. The approach tested on the real-world environments have shown better results from existing work.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Leveraging Distributional Bias for Reactive Collision Avoidance under Uncertainty: A Kernel Embedding Approach
Authors:
Anish Gupta,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Many commodity sensors that measure the robot and dynamic obstacle's state have non-Gaussian noise characteristics. Yet, many current approaches treat the underlying-uncertainty in motion and perception as Gaussian, primarily to ensure computational tractability. On the other hand, existing planners working with non-Gaussian uncertainty do not shed light on leveraging distributional characteristic…
▽ More
Many commodity sensors that measure the robot and dynamic obstacle's state have non-Gaussian noise characteristics. Yet, many current approaches treat the underlying-uncertainty in motion and perception as Gaussian, primarily to ensure computational tractability. On the other hand, existing planners working with non-Gaussian uncertainty do not shed light on leveraging distributional characteristics of motion and perception noise, such as bias for efficient collision avoidance.
This paper fills this gap by interpreting reactive collision avoidance as a distribution matching problem between the collision constraint violations and Dirac Delta distribution. To ensure fast reactivity in the planner, we embed each distribution in Reproducing Kernel Hilbert Space and reformulate the distribution matching as minimizing the Maximum Mean Discrepancy (MMD) between the two distributions. We show that evaluating the MMD for a given control input boils down to just matrix-matrix products. We leverage this insight to develop a simple control sampling approach for reactive collision avoidance with dynamic and uncertain obstacles.
We advance the state-of-the-art in two respects. First, we conduct an extensive empirical study to show that our planner can infer distributional bias from sample-level information. Consequently, it uses this insight to guide the robot to good homotopy. We also highlight how a Gaussian approximation of the underlying uncertainty can lose the bias estimate and guide the robot to unfavorable states with a high collision probability. Second, we show tangible comparative advantages of the proposed distribution matching approach for collision avoidance with previous non-parametric and Gaussian approximated methods of reactive collision avoidance.
△ Less
Submitted 22 September, 2022; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Flow Synthesis Based Visual Servoing Frameworks for Monocular Obstacle Avoidance Amidst High-Rises
Authors:
Harshit K. Sankhla,
M. Nomaan Qureshi,
Shankara Narayanan V.,
Vedansh Mittal,
Gunjan Gupta,
Harit Pandya,
K. Madhava Krishna
Abstract:
We propose a novel flow synthesis based visual servoing framework enabling long-range obstacle avoidance for Micro Air Vehicles (MAV) flying amongst tall skyscrapers. Recent deep learning based frameworks use optical flow to do high-precision visual servoing. In this paper, we explore the question: can we design a surrogate flow for these high-precision visual-servoing methods, which leads to obst…
▽ More
We propose a novel flow synthesis based visual servoing framework enabling long-range obstacle avoidance for Micro Air Vehicles (MAV) flying amongst tall skyscrapers. Recent deep learning based frameworks use optical flow to do high-precision visual servoing. In this paper, we explore the question: can we design a surrogate flow for these high-precision visual-servoing methods, which leads to obstacle avoidance? We revisit the concept of saliency for identifying high-rise structures in/close to the line of attack amongst other competing skyscrapers and buildings as a collision obstacle. A synthesised flow is used to displace the salient object segmentation mask. This flow is so computed that the visual servoing controller maneuvers the MAV safely around the obstacle. In this approach, we use a multi-step Cross-Entropy Method (CEM) based servo control to achieve flow convergence, resulting in obstacle avoidance. We use this novel pipeline to successfully and persistently maneuver high-rises and reach the goal in simulated and photo-realistic real-world scenes. We conduct extensive experimentation and compare our approach with optical flow and short-range depth-based obstacle avoidance methods to demonstrate the proposed framework's merit. Additional Visualisation can be found at https://sites.google.com/view/monocular-obstacle/home
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Approaches and Challenges in Robotic Perception for Table-top Rearrangement and Planning
Authors:
Aditya Agarwal,
Bipasha Sen,
Shankara Narayanan V,
Vishal Reddy Mandadi,
Brojeshwar Bhowmick,
K Madhava Krishna
Abstract:
Table-top Rearrangement and Planning is a challenging problem that relies heavily on an excellent perception stack. The perception stack involves observing and registering the 3D scene on the table, detecting what objects are on the table, and how to manipulate them. Consequently, it greatly influences the system's task-planning and motion-planning stacks that follow. We present a comprehensive ov…
▽ More
Table-top Rearrangement and Planning is a challenging problem that relies heavily on an excellent perception stack. The perception stack involves observing and registering the 3D scene on the table, detecting what objects are on the table, and how to manipulate them. Consequently, it greatly influences the system's task-planning and motion-planning stacks that follow. We present a comprehensive overview and discuss the different challenges associated with the perception module. This work is a result of our extensive involvement in the ICRA 2022 Open Cloud Robot Table Organization Challenge, in which we stood third in the final rankings.
△ Less
Submitted 3 June, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
UrbanFly: Uncertainty-Aware Planning for Navigation Amongst High-Rises with Monocular Visual-Inertial SLAM Maps
Authors:
Sudarshan S Harithas,
Ayyappa Swamy Thatavarthy,
Gurkirat Singh,
Arun K Singh,
K Madhava Krishna
Abstract:
We present UrbanFly: an uncertainty-aware real-time planning framework for quadrotor navigation in urban high-rise environments. A core aspect of UrbanFly is its ability to robustly plan directly on the sparse point clouds generated by a Monocular Visual Inertial SLAM (VINS) backend. It achieves this by using the sparse point clouds to build an uncertainty-integrated cuboid representation of the e…
▽ More
We present UrbanFly: an uncertainty-aware real-time planning framework for quadrotor navigation in urban high-rise environments. A core aspect of UrbanFly is its ability to robustly plan directly on the sparse point clouds generated by a Monocular Visual Inertial SLAM (VINS) backend. It achieves this by using the sparse point clouds to build an uncertainty-integrated cuboid representation of the environment through a data-driven monocular plane segmentation network. Our chosen world model provides faster distance queries than the more common voxel-grid representation, and UrbanFly leverages this capability in two different ways leading to two trajectory optimizers. The first optimizer uses a gradient-free cross-entropy method to compute trajectories that minimize collision probability and smoothness cost. Our second optimizer is a simplified version of the first and uses a sequential convex programming optimizer initialized based on probabilistic safety estimates on a set of randomly drawn trajectories. Both our trajectory optimizers are made computationally tractable and independent of the nature of underlying uncertainty by embedding the distribution of collision violations in Reproducing Kernel Hilbert Space. Empowered by the algorithmic innovation, UrbanFly outperforms competing baselines in metrics such as collision rate, trajectory length, etc., on a high-fidelity AirSim simulator augmented with synthetic and real-world dataset scenes.
△ Less
Submitted 3 October, 2022; v1 submitted 2 April, 2022;
originally announced April 2022.
-
Drift Reduced Navigation with Deep Explainable Features
Authors:
Mohd Omama,
Sundar Sripada Venugopalaswamy Sriraman,
Sandeep Chinchali,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argu…
▽ More
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76.76% compared to benchmark approaches.
△ Less
Submitted 25 November, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
ReF -- Rotation Equivariant Features for Local Feature Matching
Authors:
Abhishek Peri,
Kinal Mehta,
Avneesh Mishra,
Michael Milford,
Sourav Garg,
K. Madhava Krishna
Abstract:
Sparse local feature matching is pivotal for many computer vision and robotics tasks. To improve their invariance to challenging appearance conditions and viewing angles, and hence their usefulness, existing learning-based methods have primarily focused on data augmentation-based training. In this work, we propose an alternative, complementary approach that centers on inducing bias in the model ar…
▽ More
Sparse local feature matching is pivotal for many computer vision and robotics tasks. To improve their invariance to challenging appearance conditions and viewing angles, and hence their usefulness, existing learning-based methods have primarily focused on data augmentation-based training. In this work, we propose an alternative, complementary approach that centers on inducing bias in the model architecture itself to generate `rotation-specific' features using Steerable E2-CNNs, that are then group-pooled to achieve rotation-invariant local features. We demonstrate that this high performance, rotation-specific coverage from the steerable CNNs can be expanded to all rotation angles by combining it with augmentation-trained standard CNNs which have broader coverage but are often inaccurate, thus creating a state-of-the-art rotation-robust local feature matcher. We benchmark our proposed methods against existing techniques on HPatches and a newly proposed UrbanScenes3D-Air dataset for visual place recognition. Furthermore, we present a detailed analysis of the performance effects of ensembling, robust estimation, network architecture variations, and the use of rotation priors.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
The $(abc,pqr)$-problem for Approximate Schauder Frames for Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
Motivated from the complete solution of important $abc$-problem for Gabor system for the Hilbert space $\mathcal{L}^2(\mathbb{R})$ by Dai and Sun [\textit{Memoirs of Amer. Math. Soc., 2016}] and from the existential result of approximate Schauder frames for $\mathcal{L}^p(\mathbb{R})$ using translation operators on $\mathcal{L}^p(\mathbb{R})$ by Freeman, Odell, Schlumprecht, and Zsak [\textit{Isra…
▽ More
Motivated from the complete solution of important $abc$-problem for Gabor system for the Hilbert space $\mathcal{L}^2(\mathbb{R})$ by Dai and Sun [\textit{Memoirs of Amer. Math. Soc., 2016}] and from the existential result of approximate Schauder frames for $\mathcal{L}^p(\mathbb{R})$ using translation operators on $\mathcal{L}^p(\mathbb{R})$ by Freeman, Odell, Schlumprecht, and Zsak [\textit{Israel. J. Math, 2014}], we formulate $(abc, pqr)$-problem for approximate Schauder frames for Banach spaces $\mathcal{L}^p(\mathbb{R})$, $1<p<\infty$.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Non Holonomic Collision Avoidance of Dynamic Obstacles under Non-Parametric Uncertainty: A Hilbert Space Approach
Authors:
Unni Krishnan R Nair,
Anish Gupta,
D. A. Sasi Kiran,
Ajay Shrihari,
Vanshil Shah,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
We consider the problem of an agent/robot with non-holonomic kinematics avoiding many dynamic obstacles. State and velocity noise of both the robot and obstacles as well as the robot's control noise are modelled as non-parametric distributions as often the Gaussian assumptions of noise models are violated in real-world scenarios. Under these assumptions, we formulate a robust MPC that samples robo…
▽ More
We consider the problem of an agent/robot with non-holonomic kinematics avoiding many dynamic obstacles. State and velocity noise of both the robot and obstacles as well as the robot's control noise are modelled as non-parametric distributions as often the Gaussian assumptions of noise models are violated in real-world scenarios. Under these assumptions, we formulate a robust MPC that samples robotic controls effectively in a manner that aligns the robot to the goal state while avoiding obstacles under the duress of such non-parametric noise. In particular, the MPC incorporates a distribution matching cost that effectively aligns the distribution of the current collision cone to a certain desired distribution whose samples are collision-free. This cost is posed as a distance function in the Hilbert Space, whose minimization typically results in the collision cone samples becoming collision-free. We compare and show tangible performance gain with methods that model the collision cone distribution by linearizing the Gaussian approximations of the original non-parametric state and obstacle distributions. We also show superior performance with methods that pose a chance constraint formulation of the Gaussian approximations of non-parametric noise without subjecting such approximations to further linearizations. The performance gain is shown both in terms of trajectory length and control costs that vindicates the efficacy of the proposed method. To the best of our knowledge, this is the first presentation of non-holonomic collision avoidance of moving obstacles in the presence of non-parametric state, velocity and actuator noise models.
△ Less
Submitted 1 January, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Grounding Linguistic Commands to Navigable Regions
Authors:
Nivedita Rufus,
Kanishk Jain,
Unni Krishnan R Nair,
Vineet Gandhi,
K Madhava Krishna
Abstract:
Humans have a natural ability to effortlessly comprehend linguistic commands such as "park next to the yellow sedan" and instinctively know which region of the road the vehicle should navigate. Extending this ability to autonomous vehicles is the next step towards creating fully autonomous agents that respond and act according to human commands. To this end, we propose the novel task of Referring…
▽ More
Humans have a natural ability to effortlessly comprehend linguistic commands such as "park next to the yellow sedan" and instinctively know which region of the road the vehicle should navigate. Extending this ability to autonomous vehicles is the next step towards creating fully autonomous agents that respond and act according to human commands. To this end, we propose the novel task of Referring Navigable Regions (RNR), i.e., grounding regions of interest for navigation based on the linguistic command. RNR is different from Referring Image Segmentation (RIS), which focuses on grounding an object referred to by the natural language expression instead of grounding a navigable region. For example, for a command "park next to the yellow sedan," RIS will aim to segment the referred sedan, and RNR aims to segment the suggested parking region on the road. We introduce a new dataset, Talk2Car-RegSeg, which extends the existing Talk2car dataset with segmentation masks for the regions described by the linguistic commands. A separate test split with concise manoeuvre-oriented commands is provided to assess the practicality of our dataset. We benchmark the proposed dataset using a novel transformer-based architecture. We present extensive ablations and show superior performance over baselines on multiple evaluation metrics. A downstream path planner generating trajectories based on RNR outputs confirms the efficacy of the proposed framework.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Design And Analysis Of Three-Output Open Differential with 3-DOF
Authors:
Rama Vadapalli,
Nagamanikandan Govindan,
K Madhava Krishna
Abstract:
This paper presents a novel passive three-output differential with three degrees of freedom (3DOF), that translates motion and torque from a single input to three outputs. The proposed Three-Output Open Differential is designed such that its functioning is analogous to the functioning of a traditional two-output open differential. That is, the differential translates equal motion and torque to all…
▽ More
This paper presents a novel passive three-output differential with three degrees of freedom (3DOF), that translates motion and torque from a single input to three outputs. The proposed Three-Output Open Differential is designed such that its functioning is analogous to the functioning of a traditional two-output open differential. That is, the differential translates equal motion and torque to all its three outputs when the outputs are unconstrained or are subjected to equivalent load conditions. The introduced design is the first differential with three outputs to realise this outcome. The differential action between the three outputs is realised passively by a symmetric arrangement of three two-output open differentials and three two-input open differentials. The resulting differential mechanism achieves the novel result of equivalent input to output angular velocity and torque relations for all its three outputs. Furthermore, Three-Output Open Differential achieves the novel result for differentials with more than two outputs where each of its outputs shares equivalent angular velocity and torque relations with all the other outputs. The kinematics and dynamics of the Three-Output Open Differential are derived using the bond graph method. In addition, the merits of the differential mechanism along with its current and potential applications are presented.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Modular Pipe Climber III with Three-Output Open Differential
Authors:
Rama Vadapalli,
Saharsh Agarwal,
Vishnu Kumar,
Kartik Suryavanshi,
Nagamanikandan,
K Madhava Krishna
Abstract:
The paper introduces the novel Modular Pipe Climber III with a Three-Output Open Differential (3-OOD) mechanism to eliminate slipping of the tracks due to the changing cross-sections of the pipe. This will be achieved in any orientation of the robot. Previous pipe climbers use three-wheel/track modules, each with an individual driving mechanism to achieve stable traversing. Slipping of tracks is p…
▽ More
The paper introduces the novel Modular Pipe Climber III with a Three-Output Open Differential (3-OOD) mechanism to eliminate slipping of the tracks due to the changing cross-sections of the pipe. This will be achieved in any orientation of the robot. Previous pipe climbers use three-wheel/track modules, each with an individual driving mechanism to achieve stable traversing. Slipping of tracks is prevalent in such robots when it encounters the pipe turns. Thus, active control of each module's speed is employed to mitigate the slip, thereby requiring substantial control effort. The proposed pipe climber implements the 3-OOD to address this issue by allowing the robot to mechanically modulate the track speeds as it encounters a turn. The proposed 3-OOD is the first three-output differential to realize the functional abilities of a traditional two-output differential.
△ Less
Submitted 8 January, 2022; v1 submitted 1 November, 2021;
originally announced December 2021.
-
Learning Actions for Drift-Free Navigation in Highly Dynamic Scenes
Authors:
Mohd Omama,
Sundar Sripada V. S.,
Sandeep Chinchali,
K. Madhava Krishna
Abstract:
We embark on a hitherto unreported problem of an autonomous robot (self-driving car) navigating in dynamic scenes in a manner that reduces its localization error and eventual cumulative drift or Absolute Trajectory Error, which is pronounced in such dynamic scenes. With the hugely popular Velodyne-16 3D LIDAR as the main sensing modality, and the accurate LIDAR-based Localization and Mapping algor…
▽ More
We embark on a hitherto unreported problem of an autonomous robot (self-driving car) navigating in dynamic scenes in a manner that reduces its localization error and eventual cumulative drift or Absolute Trajectory Error, which is pronounced in such dynamic scenes. With the hugely popular Velodyne-16 3D LIDAR as the main sensing modality, and the accurate LIDAR-based Localization and Mapping algorithm, LOAM, as the state estimation framework, we show that in the absence of a navigation policy, drift rapidly accumulates in the presence of moving objects. To overcome this, we learn actions that lead to drift-minimized navigation through a suitable set of reward and penalty functions. We use Proximal Policy Optimization, a class of Deep Reinforcement Learning methods, to learn the actions that result in drift-minimized trajectories. We show by extensive comparisons on a variety of synthetic, yet photo-realistic scenes made available through the CARLA Simulator the superior performance of the proposed framework vis-a-vis methods that do not adopt such policies.
△ Less
Submitted 31 March, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
CCO-VOXEL: Chance Constrained Optimization over Uncertain Voxel-Grid Representation for Safe Trajectory Planning
Authors:
Sudarshan S Harithas,
Rishabh Dev Yadav,
Deepak Singh,
Arun Kumar Singh,
K Madhava Krishna
Abstract:
We present CCO-VOXEL: the very first chance-constrained optimization (CCO) algorithm that can compute trajectory plans with probabilistic safety guarantees in real-time directly on the voxel-grid representation of the world. CCO-VOXEL maps the distribution over the distance to the closest obstacle to a distribution over collision-constraint violation and computes an optimal trajectory that minimiz…
▽ More
We present CCO-VOXEL: the very first chance-constrained optimization (CCO) algorithm that can compute trajectory plans with probabilistic safety guarantees in real-time directly on the voxel-grid representation of the world. CCO-VOXEL maps the distribution over the distance to the closest obstacle to a distribution over collision-constraint violation and computes an optimal trajectory that minimizes the violation probability. Importantly, unlike existing works, we never assume the nature of the sensor uncertainty or the probability distribution of the resulting collision-constraint violations. We leverage the notion of Hilbert Space embedding of distributions and Maximum Mean Discrepancy (MMD) to compute a tractable surrogate for the original chance-constrained optimization problem and employ a combination of A* based graph-search and Cross-Entropy Method for obtaining its minimum. We show tangible performance gain in terms of collision avoidance and trajectory smoothness as a consequence of our probabilistic formulation vis a vis state-of-the-art planning methods that do not account for such nonparametric noise. Finally, we also show how a combination of low-dimensional feature embedding and pre-caching of Kernel Matrices of MMD allows us to achieve real-time performance in simulations as well as in implementations on on-board commodity hardware that controls the quadrotor flight
△ Less
Submitted 5 April, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Multi-Modal Model Predictive Control through Batch Non-Holonomic Trajectory Optimization: Application to Highway Driving
Authors:
Vivek K. Adajania,
Aditya Sharma,
Anish Gupta,
Houman Masnavi,
K Madhava Krishna,
Arun K. Singh
Abstract:
Standard Model Predictive Control (MPC) or trajectory optimization approaches perform only a local search to solve a complex non-convex optimization problem. As a result, they cannot capture the multi-modal characteristic of human driving. A global optimizer can be a potential solution but is computationally intractable in a real-time setting. In this paper, we present a real-time MPC capable of s…
▽ More
Standard Model Predictive Control (MPC) or trajectory optimization approaches perform only a local search to solve a complex non-convex optimization problem. As a result, they cannot capture the multi-modal characteristic of human driving. A global optimizer can be a potential solution but is computationally intractable in a real-time setting. In this paper, we present a real-time MPC capable of searching over different driving modalities. Our basic idea is simple: we run several goal-directed parallel trajectory optimizations and score the resulting trajectories based on user-defined meta cost functions. This allows us to perform a global search over several locally optimal motion plans. Although conceptually straightforward, realizing this idea in real-time with existing optimizers is highly challenging from technical and computational standpoints. With this motivation, we present a novel batch non-holonomic trajectory optimization whose underlying matrix algebra is easily parallelizable across problem instances and reduces to computing large batch matrix-vector products. This structure, in turn, is achieved by deriving a linearization-free multi-convex reformulation of the non-holonomic kinematics and collision avoidance constraints. We extensively validate our approach using both synthetic and real data sets (NGSIM) of traffic scenarios. We highlight how our algorithm automatically takes lane-change and overtaking decisions based on the defined meta cost function. Our batch optimizer achieves trajectories with lower meta cost, up to 6x faster than competing baselines.
△ Less
Submitted 14 March, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
AutoLay: Benchmarking amodal layout estimation for autonomous driving
Authors:
Kaustubh Mani,
N. Sai Shankar,
Krishna Murthy Jatavallabhula,
K. Madhava Krishna
Abstract:
Given an image or a video captured from a monocular camera, amodal layout estimation is the task of predicting semantics and occupancy in bird's eye view. The term amodal implies we also reason about entities in the scene that are occluded or truncated in image space. While several recent efforts have tackled this problem, there is a lack of standardization in task specification, datasets, and eva…
▽ More
Given an image or a video captured from a monocular camera, amodal layout estimation is the task of predicting semantics and occupancy in bird's eye view. The term amodal implies we also reason about entities in the scene that are occluded or truncated in image space. While several recent efforts have tackled this problem, there is a lack of standardization in task specification, datasets, and evaluation protocols. We address these gaps with AutoLay, a dataset and benchmark for amodal layout estimation from monocular images. AutoLay encompasses driving imagery from two popular datasets: KITTI and Argoverse. In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D point clouds. We implement several baselines and bleeding edge approaches, and release our data and code.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
RP-VIO: Robust Plane-based Visual-Inertial Odometry for Dynamic Environments
Authors:
Karnik Ram,
Chaitanya Kharyal,
Sudarshan S. Harithas,
K. Madhava Krishna
Abstract:
Modern visual-inertial navigation systems (VINS) are faced with a critical challenge in real-world deployment: they need to operate reliably and robustly in highly dynamic environments. Current best solutions merely filter dynamic objects as outliers based on the semantics of the object category. Such an approach does not scale as it requires semantic classifiers to encompass all possibly-moving o…
▽ More
Modern visual-inertial navigation systems (VINS) are faced with a critical challenge in real-world deployment: they need to operate reliably and robustly in highly dynamic environments. Current best solutions merely filter dynamic objects as outliers based on the semantics of the object category. Such an approach does not scale as it requires semantic classifiers to encompass all possibly-moving object classes; this is hard to define, let alone deploy. On the other hand, many real-world environments exhibit strong structural regularities in the form of planes such as walls and ground surfaces, which are also crucially static. We present RP-VIO, a monocular visual-inertial odometry system that leverages the simple geometry of these planes for improved robustness and accuracy in challenging dynamic environments. Since existing datasets have a limited number of dynamic elements, we also present a highly-dynamic, photorealistic synthetic dataset for a more effective evaluation of the capabilities of modern VINS systems. We evaluate our approach on this dataset, and three diverse sequences from standard datasets including two real-world dynamic sequences and show a significant improvement in robustness and accuracy over a state-of-the-art monocular visual-inertial odometry system. We also show in simulation an improvement over a simple dynamic-features masking approach. Our code and dataset are publicly available.
△ Less
Submitted 5 December, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Monocular Multi-Layer Layout Estimation for Warehouse Racks
Authors:
Meher Shashwat Nigam,
Avinash Prabhu,
Anurag Sahu,
Puru Gupta,
Tanvi Karandikar,
N. Sai Shankar,
Ravi Kiran Sarvadevabhatla,
K. Madhava Krishna
Abstract:
Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction. To this end, we present RackLay, a deep neural network for real-time shelf layout estimation from a single image. Unlike previous layout estimation methods, which provide a single layout for the dominant ground plane alone, Rac…
▽ More
Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction. To this end, we present RackLay, a deep neural network for real-time shelf layout estimation from a single image. Unlike previous layout estimation methods, which provide a single layout for the dominant ground plane alone, RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects. RackLay's architecture and its variants are versatile and estimate accurate layouts for diverse scenes characterized by varying number of visible shelves in an image, large range in shelf occupancy factor and varied background clutter. Given the extreme paucity of datasets in this space and the difficulty involved in acquiring real data from warehouses, we additionally release a flexible synthetic dataset generation pipeline WareSynth which allows users to control the generation process and tailor the dataset according to contingent application. The ablations across architectural variants and comparison with strong prior baselines vindicate the efficacy of RackLay as an apt architecture for the novel problem of multi-layered layout estimation. We also show that fusing the top-view and front-view enables 3D reasoning applications such as metric free space estimation for the considered rack.
△ Less
Submitted 28 October, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
RoRD: Rotation-Robust Descriptors and Orthographic Views for Local Feature Matching
Authors:
Udit Singh Parihar,
Aniket Gujarathi,
Kinal Mehta,
Satyajit Tourani,
Sourav Garg,
Michael Milford,
K. Madhava Krishna
Abstract:
The use of local detectors and descriptors in typical computer vision pipelines work well until variations in viewpoint and appearance change become extreme. Past research in this area has typically focused on one of two approaches to this challenge: the use of projections into spaces more suitable for feature matching under extreme viewpoint changes, and attempting to learn features that are inhe…
▽ More
The use of local detectors and descriptors in typical computer vision pipelines work well until variations in viewpoint and appearance change become extreme. Past research in this area has typically focused on one of two approaches to this challenge: the use of projections into spaces more suitable for feature matching under extreme viewpoint changes, and attempting to learn features that are inherently more robust to viewpoint change. In this paper, we present a novel framework that combines learning of invariant descriptors through data augmentation and orthographic viewpoint projection. We propose rotation-robust local descriptors, learnt through training data augmentation based on rotation homographies, and a correspondence ensemble technique that combines vanilla feature correspondences with those obtained through rotation-robust features. Using a range of benchmark datasets as well as contributing a new bespoke dataset for this research domain, we evaluate the effectiveness of the proposed approach on key tasks including pose estimation and visual place recognition. Our system outperforms a range of baseline and state-of-the-art techniques, including enabling higher levels of place recognition precision across opposing place viewpoints and achieves practically-useful performance levels even under extreme viewpoint changes.
△ Less
Submitted 24 March, 2022; v1 submitted 15 March, 2021;
originally announced March 2021.
-
DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects
Authors:
Rahul Sajnani,
AadilMehdi Sanchawala,
Krishna Murthy Jatavallabhula,
Srinath Sridhar,
K. Madhava Krishna
Abstract:
We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction, estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters, is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered…
▽ More
We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction, estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters, is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO is competitive or superior to fully-supervised methods.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
BirdSLAM: Monocular Multibody SLAM in Bird's-Eye View
Authors:
Swapnil Daga,
Gokul B. Nair,
Anirudha Ramesh,
Rahul Sajnani,
Junaid Ahmed Ansari,
K. Madhava Krishna
Abstract:
In this paper, we present BirdSLAM, a novel simultaneous localization and mapping (SLAM) system for the challenging scenario of autonomous driving platforms equipped with only a monocular camera. BirdSLAM tackles challenges faced by other monocular SLAM systems (such as scale ambiguity in monocular reconstruction, dynamic object localization, and uncertainty in feature representation) by using an…
▽ More
In this paper, we present BirdSLAM, a novel simultaneous localization and mapping (SLAM) system for the challenging scenario of autonomous driving platforms equipped with only a monocular camera. BirdSLAM tackles challenges faced by other monocular SLAM systems (such as scale ambiguity in monocular reconstruction, dynamic object localization, and uncertainty in feature representation) by using an orthographic (bird's-eye) view as the configuration space in which localization and mapping are performed. By assuming only the height of the ego-camera above the ground, BirdSLAM leverages single-view metrology cues to accurately localize the ego-vehicle and all other traffic participants in bird's-eye view. We demonstrate that our system outperforms prior work that uses strictly greater information, and highlight the relevance of each design decision via an ablation analysis.
△ Less
Submitted 15 November, 2020;
originally announced November 2020.