-
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Authors:
Yi Yang,
Qingwen Zhang,
Kei Ikemura,
Nazre Batool,
John Folkesson
Abstract:
Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models…
▽ More
Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect_VLM.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Authors:
Yi Yang,
Qingwen Zhang,
Ci Li,
Daniel Simões Marta,
Nazre Batool,
John Folkesson
Abstract:
The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality. However, a human-centric large-scale adoption hinges on meeting a variety of multifaceted requirements. To ensure that the autonomous system meets the user's intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations.…
▽ More
The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality. However, a human-centric large-scale adoption hinges on meeting a variety of multifaceted requirements. To ensure that the autonomous system meets the user's intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations. To this end, we propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users' commands. Through a series of experiments that include different LLM models and prompt designs, we explore the few-shot multivariate binary classification accuracy of system requirements from natural language textual commands. We confirm the general ability of LLMs to understand and reason about prompts but underline that their effectiveness is conditioned on the quality of both the LLM model and the design of appropriate sequential prompts. Code and models are public with the link \url{https://github.com/KTH-RPL/DriveCmd_LLM}.
△ Less
Submitted 19 December, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Towards Long-Range 3D Object Detection for Autonomous Vehicles
Authors:
Ajinkya Khoche,
Laura Pereira Sánchez,
Nazre Batool,
Sina Sharif Mansouri,
Patric Jensfelt
Abstract:
3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away…
▽ More
3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long range. To address the above limitations, we investigate two ways to improve long range performance of current LiDAR based 3D detectors. First, we combine two 3D detection networks, referred to as range experts, one specializing at near to mid range objects, and one at long range 3D detection. To train a detector at long range under a scarce label regime, we further weigh the loss according to the labelled point's distance from ego vehicle. Second, we augment LiDAR scans with virtual points generated using Multimodal Virtual Points (MVP), a readily available image-based depth completion algorithm. Our experiments on the long range Argoverse2 (AV2) dataset indicate that MVP is more effective in improving long range performance, while maintaining a straightforward implementation. On the other hand, the range experts offer a computationally efficient and simpler alternative, avoiding dependency on image-based segmentation networks and perfect camera-LiDAR calibration.
△ Less
Submitted 20 May, 2024; v1 submitted 7 October, 2023;
originally announced October 2023.
-
RMP: A Random Mask Pretrain Framework for Motion Prediction
Authors:
Yi Yang,
Qingwen Zhang,
Thomas Gilles,
Nazre Batool,
John Folkesson
Abstract:
As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving. In this paper, we propose a framework to formalize the pretraining task for trajectory prediction of traffic participants. Within our framework, inspired by the random masked model in natural language processing (NLP) and computer vision (CV…
▽ More
As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving. In this paper, we propose a framework to formalize the pretraining task for trajectory prediction of traffic participants. Within our framework, inspired by the random masked model in natural language processing (NLP) and computer vision (CV), objects' positions at random timesteps are masked and then filled in by the learned neural network (NN). By changing the mask profile, our framework can easily switch among a range of motion-related tasks. We show that our proposed pretraining framework is able to deal with noisy inputs and improves the motion prediction accuracy and miss rate, especially for objects occluded over time by evaluating it on Argoverse and NuScenes datasets.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Efficient Lightweight Encryption Algorithm for Smart Video Applications
Authors:
Amna Shifa,
Mamoona Naveed Asghar,
Naila Batool,
Martin Fluery
Abstract:
The future generation networks: Internet of things (IoT), in combination with the advanced computer vision techniques poses new challenges for securing videos for end-users. The visual devices generally have constrained resources in respects to their low computation power, small memory with limited power supply. Therefore, to facilitate the video security in smart environment, lightweight security…
▽ More
The future generation networks: Internet of things (IoT), in combination with the advanced computer vision techniques poses new challenges for securing videos for end-users. The visual devices generally have constrained resources in respects to their low computation power, small memory with limited power supply. Therefore, to facilitate the video security in smart environment, lightweight security schemes are required instead of inefficient existing traditional cryptography algorithms. This research paper provides the solution to overcome such problems. A novel lightweight cipher algorithm is proposed here which targets multimedia in IoT with an in-house name EXPer i.e. Extended permutation with eXclusive OR (XOR). EXPer is a symmetric stream cipher that consists of simple XOR and left shift operations with three keys of 128 bits. The proposed cipher algorithm has been tested on various sample videos. Comparison of proposed algorithm has been made with the traditional cipher algorithms XOR and Advanced Encryption Standard (AES). Visual results confirm that EXPer provides security level equivalent to the AES algorithm with less computational cost than AES. Therefore, it can easily be perceived that the EXPer is a better replacement of AES for securing real-time video applications in IoT.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
A Fault Tolerant, Dynamic and Low Latency BDII Architecture for Grids
Authors:
Asif Osman,
Ashiq Anjum,
Naheed Batool,
Richard McClatchey
Abstract:
The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change…
▽ More
The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change in each BDDI information cycle and consequently it may not be necessary to update each parameter in a cycle. It has been demonstrated that significant performance gains can be achieved by exchanging only the information about records that changed during a cycle. Our investigations have led us to implement a low latency and fault tolerant BDII system that involves only minimal data transfer and facilitates secure transactions in a Grid environment.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.