Search | arXiv e-print repository

arXiv:2406.13046 [pdf, other]

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Authors: Cristian Meo, Ksenia Sycheva, Anirudh Goyal, Justin Dauwels

Abstract: It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed… ▽ More It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods. △ Less

Submitted 9 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10108 [pdf, other]

Precipitation Nowcasting Using Physics Informed Discriminator Generative Models

Authors: Junzhe Yin, Cristian Meo, Ankush Roy, Zeineh Bou Cher, Yanbo Wang, Ruben Imhoff, Remko Uijlenhoet, Justin Dauwels

Abstract: Nowcasting leverages real-time atmospheric conditions to forecast weather over short periods. State-of-the-art models, including PySTEPS, encounter difficulties in accurately forecasting extreme weather events because of their unpredictable distribution patterns. In this study, we design a physics-informed neural network to perform precipitation nowcasting using the precipitation and meteorologica… ▽ More Nowcasting leverages real-time atmospheric conditions to forecast weather over short periods. State-of-the-art models, including PySTEPS, encounter difficulties in accurately forecasting extreme weather events because of their unpredictable distribution patterns. In this study, we design a physics-informed neural network to perform precipitation nowcasting using the precipitation and meteorological data from the Royal Netherlands Meteorological Institute (KNMI). This model draws inspiration from the novel Physics-Informed Discriminator GAN (PID-GAN) formulation, directly integrating physics-based supervision within the adversarial learning framework. The proposed model adopts a GAN structure, featuring a Vector Quantization Generative Adversarial Network (VQ-GAN) and a Transformer as the generator, with a temporal discriminator serving as the discriminator. Our findings demonstrate that the PID-GAN model outperforms numerical and SOTA deep generative models in terms of precipitation nowcasting downstream metrics. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2403.03929 [pdf, other]

Extreme Precipitation Nowcasting using Transformer-based Generative Models

Authors: Cristian Meo, Ankush Roy, Mircea Lică, Junzhe Yin, Zeineb Bou Che, Yanbo Wang, Ruben Imhoff, Remko Uijlenhoet, Justin Dauwels

Abstract: This paper presents an innovative approach to extreme precipitation nowcasting by employing Transformer-based generative models, namely NowcastingGPT with Extreme Value Loss (EVL) regularization. Leveraging a comprehensive dataset from the Royal Netherlands Meteorological Institute (KNMI), our study focuses on predicting short-term precipitation with high accuracy. We introduce a novel method for… ▽ More This paper presents an innovative approach to extreme precipitation nowcasting by employing Transformer-based generative models, namely NowcastingGPT with Extreme Value Loss (EVL) regularization. Leveraging a comprehensive dataset from the Royal Netherlands Meteorological Institute (KNMI), our study focuses on predicting short-term precipitation with high accuracy. We introduce a novel method for computing EVL without assuming fixed extreme representations, addressing the limitations of current models in capturing extreme weather events. We present both qualitative and quantitative analyses, demonstrating the superior performance of the proposed NowcastingGPT-EVL in generating accurate precipitation forecasts, especially when dealing with extreme precipitation events. The code is available at \url{https://github.com/Cmeo97/NowcastingGPT}. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2306.06997 [pdf, other]

Slot-VAE: Object-Centric Scene Generation with Slot Attention

Authors: Yanbo Wang, Letao Liu, Justin Dauwels

Abstract: Slot attention has shown remarkable object-centric representation learning performance in computer vision tasks without requiring any supervision. Despite its object-centric binding ability brought by compositional modelling, as a deterministic module, slot attention lacks the ability to generate novel scenes. In this paper, we propose the Slot-VAE, a generative model that integrates slot attentio… ▽ More Slot attention has shown remarkable object-centric representation learning performance in computer vision tasks without requiring any supervision. Despite its object-centric binding ability brought by compositional modelling, as a deterministic module, slot attention lacks the ability to generate novel scenes. In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. For each image, the model simultaneously infers a global scene representation to capture high-level scene structure and object-centric slot representations to embed individual object components. During generation, slot representations are generated from the global scene representation to ensure coherent scene structures. Our extensive evaluation of the scene generation ability indicates that Slot-VAE outperforms slot representation-based generative baselines in terms of sample quality and scene structure accuracy. △ Less

Submitted 13 February, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: ICML 2023 https://proceedings.mlr.press/v202/wang23r.html

arXiv:2305.18925 [pdf, other]

Investigating model performance in language identification: beyond simple error statistics

Authors: Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels

Abstract: Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics… ▽ More Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may therefore mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. In the current paper, we investigate how well a number of language identification systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge. The Challenge dataset features accented English-Mandarin code-switched child-directed speech. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted to Interspeech 2023, 5 pages, 5 figures

arXiv:2304.04103

TC-VAE: Uncovering Out-of-Distribution Data Generative Factors

Authors: Cristian Meo, Anirudh Goyal, Justin Dauwels

Abstract: Uncovering data generative factors is the ultimate goal of disentanglement learning. Although many works proposed disentangling generative models able to uncover the underlying generative factors of a dataset, so far no one was able to uncover OOD generative factors (i.e., factors of variations that are not explicitly shown on the dataset). Moreover, the datasets used to validate these models are… ▽ More Uncovering data generative factors is the ultimate goal of disentanglement learning. Although many works proposed disentangling generative models able to uncover the underlying generative factors of a dataset, so far no one was able to uncover OOD generative factors (i.e., factors of variations that are not explicitly shown on the dataset). Moreover, the datasets used to validate these models are synthetically generated using a balanced mixture of some predefined generative factors, implicitly assuming that generative factors are uniformly distributed across the datasets. However, real datasets do not present this property. In this work we analyse the effect of using datasets with unbalanced generative factors, providing qualitative and quantitative results for widely used generative models. Moreover, we propose TC-VAE, a generative model optimized using a lower bound of the joint total correlation between the learned latent representations and the input data. We show that the proposed model is able to uncover OOD generative factors on different datasets and outperforms on average the related baselines in terms of downstream disentanglement metrics. △ Less

Submitted 10 April, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

Comments: The paper is incomplete as it is. We will work on it and repost it

arXiv:2302.11919 [pdf, other]

doi 10.1109/TITS.2023.3311633

PEM: Perception Error Model for Virtual Testing of Autonomous Vehicles

Authors: Andrea Piazzoni, Jim Cherian, Justin Dauwels, Lap-Pui Chau

Abstract: Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that… ▽ More Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that can enable the analysis of the impact of perception errors on AV safety, without the need to model the sensors themselves. We propose a generalized data-driven procedure towards parametric modeling and evaluate it using Apollo, an open-source driving software, and nuScenes, a public AV dataset. Additionally, we implement PEMs in SVL, an open-source vehicle simulator. Furthermore, we demonstrate the usefulness of PEM-based virtual tests, by evaluating camera, LiDAR, and camera-LiDAR setups. Our virtual tests highlight limitations in the current evaluation metrics, and the proposed approach can help study the impact of perception errors on AV safety. △ Less

Submitted 27 February, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 12 pages, 10 figures. This is a preprint, and version 2 only updates the title and the reference to the final published article, which can be found at DOI: 10.1109/TITS.2023.3311633

ACM Class: C.4; I.2; I.6

Journal ref: IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 1, pp. 670-681, Jan. 2024

arXiv:2211.11175 [pdf, other]

doi 10.1109/ITSC55140.2022.9921807

CoPEM: Cooperative Perception Error Models for Autonomous Driving

Authors: Andrea Piazzoni, Jim Cherian, Roshan Vijay, Lap-Pui Chau, Justin Dauwels

Abstract: In this paper, we introduce the notion of Cooperative Perception Error Models (coPEMs) towards achieving an effective and efficient integration of V2X solutions within a virtual test environment. We focus our analysis on the occlusion problem in the (onboard) perception of Autonomous Vehicles (AV), which can manifest as misdetection errors on the occluded objects. Cooperative perception (CP) solut… ▽ More In this paper, we introduce the notion of Cooperative Perception Error Models (coPEMs) towards achieving an effective and efficient integration of V2X solutions within a virtual test environment. We focus our analysis on the occlusion problem in the (onboard) perception of Autonomous Vehicles (AV), which can manifest as misdetection errors on the occluded objects. Cooperative perception (CP) solutions based on Vehicle-to-Everything (V2X) communications aim to avoid such issues by cooperatively leveraging additional points of view for the world around the AV. This approach usually requires many sensors, mainly cameras and LiDARs, to be deployed simultaneously in the environment either as part of the road infrastructure or on other traffic vehicles. However, implementing a large number of sensor models in a virtual simulation pipeline is often prohibitively computationally expensive. Therefore, in this paper, we rely on extending Perception Error Models (PEMs) to efficiently implement such cooperative perception solutions along with the errors and uncertainties associated with them. We demonstrate the approach by comparing the safety achievable by an AV challenged with a traffic scenario where occlusion is the primary cause of a potential collision. △ Less

Submitted 21 November, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at 2022 IEEE International Conference on Intelligent Transportation Systems - ITSC2022 6 pages, 6 figures

ACM Class: C.4; I.2; I.6

arXiv:2209.06094 [pdf, other]

Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning

Authors: Rongkai Zhang, Cong Zhang, Zhiguang Cao, Wen Song, Puay Siew Tan, Jie Zhang, Bihan Wen, Justin Dauwels

Abstract: We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-rout… ▽ More We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-routing tasks by assigning customers to each vehicle via a Graph Isomorphism Network (GIN) based policy network. A worker agent learns to solve sub-routing tasks by minimizing the cost in terms of both tour length and rejection rate for each vehicle, the maximum of which is then fed back to the manager agent to learn better assignments. Experimental results demonstrate that the proposed framework outperforms strong baselines in terms of higher solution quality and shorter computation time. More importantly, the trained agents also achieve competitive performance for solving unseen larger instances. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2203.03218 [pdf, other]

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

Authors: Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur

Abstract: In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances. The dual-mode XSA-LID model is trained by jointly optimizing both the full and short modes with their respective inputs being the full-length speech and its short clip e… ▽ More In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances. The dual-mode XSA-LID model is trained by jointly optimizing both the full and short modes with their respective inputs being the full-length speech and its short clip extracted by a specific Boolean mask, and KD is applied to further boost the performance on short utterances. In addition, we investigate the impact of clip-wise linguistic variability and lexical integrity for LID by analyzing the variation of LID performance in terms of the lengths and positions of the mimicked speech clips. We evaluated our approach on the MLS14 data from the NIST 2017 LRE. With the 3~s random-location Boolean mask, our proposed method achieved 19.23%, 21.52% and 8.37% relative improvement in average cost compared with the XSA-LID model on 3s, 10s, and 30s speech, respectively. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Submitted to Odyssey 2022

arXiv:2107.05318 [pdf, other]

R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery

Authors: Rongkai Zhang, Jiang Zhu, Zhiyuan Zha, Justin Dauwels, Bihan Wen

Abstract: State-of-the-art image denoisers exploit various types of deep neural networks via deterministic training. Alternatively, very recent works utilize deep reinforcement learning for restoring images with diverse or unknown corruptions. Though deep reinforcement learning can generate effective policy networks for operator selection or architecture search in image restoration, how it is connected to t… ▽ More State-of-the-art image denoisers exploit various types of deep neural networks via deterministic training. Alternatively, very recent works utilize deep reinforcement learning for restoring images with diverse or unknown corruptions. Though deep reinforcement learning can generate effective policy networks for operator selection or architecture search in image restoration, how it is connected to the classic deterministic training in solving inverse problems remains unclear. In this work, we propose a novel image denoising scheme via Residual Recovery using Reinforcement Learning, dubbed R3L. We show that R3L is equivalent to a deep recurrent neural network that is trained using a stochastic reward, in contrast to many popular denoisers using supervised learning with deterministic losses. To benchmark the effectiveness of reinforcement learning in R3L, we train a recurrent neural network with the same architecture for residual recovery using the deterministic loss, thus to analyze how the two different training strategies affect the denoising performance. With such a unified benchmarking system, we demonstrate that the proposed R3L has better generalizability and robustness in image denoising when the estimated noise level varies, comparing to its counterparts using deterministic training, as well as various state-of-the-art image denoising algorithms. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: Accepted by ICIP 2021

arXiv:2009.07703 [pdf, other]

doi 10.1109/TPAMI.2022.3140886

Efficient Variational Bayes Learning of Graphical Models with Smooth Structural Changes

Authors: Hang Yu, Songwei Wu, Justin Dauwels

Abstract: Estimating time-varying graphical models are of paramount importance in various social, financial, biological, and engineering systems, since the evolution of such networks can be utilized for example to spot trends, detect anomalies, predict vulnerability, and evaluate the impact of interventions. Existing methods require extensive tuning of parameters that control the graph sparsity and temporal… ▽ More Estimating time-varying graphical models are of paramount importance in various social, financial, biological, and engineering systems, since the evolution of such networks can be utilized for example to spot trends, detect anomalies, predict vulnerability, and evaluate the impact of interventions. Existing methods require extensive tuning of parameters that control the graph sparsity and temporal smoothness. Furthermore, these methods are computationally burdensome with time complexity $O(NP^3)$ for $P$ variables and $N$ time points. As a remedy, we propose a low-complexity tuning-free Bayesian approach, named BASS. Specifically, we impose temporally-dependent spike-and-slab priors on the graphs such that they are sparse and varying smoothly across time. A variational inference algorithm is then derived to learn the graph structures from the data automatically. Owning to the pseudo-likelihood and the mean-field approximation, the time complexity of BASS is only $O(NP^2)$. Additionally, by identifying the frequency-domain resemblance to the time-varying graphical models, we show that BASS can be extended to learning frequency-varying inverse spectral density matrices, and yields graphical models for multivariate stationary time series. Numerical results on both synthetic and real data show that that BASS can better recover the underlying true graphs, while being more efficient than the existing methods, especially for high-dimensional cases. △ Less

Submitted 4 February, 2023; v1 submitted 16 September, 2020; originally announced September 2020.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023)

arXiv:2008.13443 [pdf, other]

doi 10.1016/j.commtr.2021.100008

On the Quality Requirements of Demand Prediction for Dynamic Public Transport

Authors: Inon Peled, Kelvin Lee, Yu Jiang, Justin Dauwels, Francisco C. Pereira

Abstract: As Public Transport (PT) becomes more dynamic and demand-responsive, it increasingly depends on predictions of transport demand. But how accurate need such predictions be for effective PT operation? We address this question through an experimental case study of PT trips in Metropolitan Copenhagen, Denmark, which we conduct independently of any specific prediction models. First, we simulate errors… ▽ More As Public Transport (PT) becomes more dynamic and demand-responsive, it increasingly depends on predictions of transport demand. But how accurate need such predictions be for effective PT operation? We address this question through an experimental case study of PT trips in Metropolitan Copenhagen, Denmark, which we conduct independently of any specific prediction models. First, we simulate errors in demand prediction through unbiased noise distributions that vary considerably in shape. Using the noisy predictions, we then simulate and optimize demand-responsive PT fleets via a linear programming formulation and measure their performance. Our results suggest that the optimized performance is mainly affected by the skew of the noise distribution and the presence of infrequently large prediction errors. In particular, the optimized performance can improve under non-Gaussian vs. Gaussian noise. We also find that dynamic routing could reduce trip time by at least 23% vs. static routing. This reduction is estimated at 809,000 EUR/year in terms of Value of Travel Time Savings for the case study. △ Less

Submitted 6 November, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

Comments: 26 pages, 9 tables, 6 figures

arXiv:2001.11695 [pdf, other]

doi 10.24963/ijcai.2020/483

Modeling Perception Errors towards Robust Decision Making in Autonomous Vehicles

Authors: Andrea Piazzoni, Jim Cherian, Martin Slavik, Justin Dauwels

Abstract: Sensing and Perception (S&P) is a crucial component of an autonomous system (such as a robot), especially when deployed in highly dynamic environments where it is required to react to unexpected situations. This is particularly true in case of Autonomous Vehicles (AVs) driving on public roads. However, the current evaluation metrics for perception algorithms are typically designed to measure their… ▽ More Sensing and Perception (S&P) is a crucial component of an autonomous system (such as a robot), especially when deployed in highly dynamic environments where it is required to react to unexpected situations. This is particularly true in case of Autonomous Vehicles (AVs) driving on public roads. However, the current evaluation metrics for perception algorithms are typically designed to measure their accuracy per se and do not account for their impact on the decision making subsystem(s). This limitation does not help developers and third party evaluators to answer a critical question: is the performance of a perception subsystem sufficient for the decision making subsystem to make robust, safe decisions? In this paper, we propose a simulation-based methodology towards answering this question. At the same time, we show how to analyze the impact of different kinds of sensing and perception errors on the behavior of the autonomous system. △ Less

Submitted 3 September, 2021; v1 submitted 31 January, 2020; originally announced January 2020.

Comments: 11 pages, 8 figures. Preprint of an article published at IJCAI2020 update: fixed title and metadata

ACM Class: C.4; I.2; I.6

Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track (2020). Pages 3494-3500

arXiv:1911.03667 [pdf, other]

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Authors: Satyajit Neogi, Justin Dauwels

Abstract: Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multip… ▽ More Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models. △ Less

Submitted 12 November, 2019; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: To be submitted to Journal of Machine Learning Research (JMLR)

arXiv:1907.11881 [pdf, other]

doi 10.1109/TITS.2020.2995166

Context Model for Pedestrian Intention Prediction using Factored Latent-Dynamic Conditional Random Fields

Authors: Satyajit Neogi, Michael Hoy, Kang Dang, Hang Yu, Justin Dauwels

Abstract: Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context… ▽ More Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context variables such as traffic lights, zebra crossings etc. We stress on the necessity of early prediction for smooth operation of such systems. We introduce the influence of vehicle interactions on pedestrian intention for this purpose. In this paper, we show a discernible advance in prediction time aided by the inclusion of such vehicle interaction context. We apply our methods to two different datasets, one in-house collected - NTU dataset and another public real-life benchmark - JAAD dataset. We also propose a generic graphical model Factored Latent-Dynamic Conditional Random Fields (FLDCRF) for single and multi-label sequence prediction as well as joint interaction modeling tasks. FLDCRF outperforms Long Short-Term Memory (LSTM) networks across the datasets ($\sim$100 sequences per dataset) over identical time-series features. While the existing best system predicts pedestrian stopping behaviour with 70\% accuracy 0.38 seconds before the actual events, our system achieves such accuracy at least 0.9 seconds on an average before the actual events across datasets. △ Less

Submitted 15 September, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

arXiv:1907.05274 [pdf, other]

Affine Disentangled GAN for Interpretable and Robust AV Perception

Authors: Letao Liu, Martin Saerbeck, Justin Dauwels

Abstract: Autonomous vehicles (AV) have progressed rapidly with the advancements in computer vision algorithms. The deep convolutional neural network as the main contributor to this advancement has boosted the classification accuracy dramatically. However, the discovery of adversarial examples reveals the generalization gap between dataset and the real world. Furthermore, affine transformations may also con… ▽ More Autonomous vehicles (AV) have progressed rapidly with the advancements in computer vision algorithms. The deep convolutional neural network as the main contributor to this advancement has boosted the classification accuracy dramatically. However, the discovery of adversarial examples reveals the generalization gap between dataset and the real world. Furthermore, affine transformations may also confuse computer vision based object detectors. The degradation of the perception system is undesirable for safety critical systems such as autonomous vehicles. In this paper, a deep learning system is proposed: Affine Disentangled GAN (ADIS-GAN), which is robust against affine transformations and adversarial attacks. It is demonstrated that conventional data augmentation for affine transformation and adversarial attacks are orthogonal, while ADIS-GAN can handle both attacks at the same time. Useful information such as image rotation angle and scaling factor are also generated in ADIS-GAN. On MNIST dataset, ADIS-GAN can achieve over 98 percent classification accuracy within 30 degrees rotation, and over 90 percent classification accuracy against FGSM and PGD adversarial attack. △ Less

Submitted 6 July, 2019; originally announced July 2019.

arXiv:1902.09745 [pdf, other]

doi 10.1109/ITSC.2019.8916878

Online Predictive Optimization Framework for Stochastic Demand-Responsive Transit Services

Authors: Inon Peled, Kelvin Lee, Yu Jiang, Justin Dauwels, Francisco C. Pereira

Abstract: This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of mov… ▽ More This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of movement counts between each pair of serviced locations. The framework then combines these marginals into a joint demand distribution by constructing a Gaussian copula, which captures the structure of correlation between the marginals. For supply optimization, we devise a linear programming model, which simultaneously determines the route structure and the service frequency according to the predicted demand. Importantly, our framework both preserves the uncertainty structure of future demand and leverages this for robust route optimization, while keeping both components decoupled. We evaluate our framework using a real-world case study of autonomous mobility in a university campus in Denmark. The results show that our framework often obtains the ground truth optimal solution, and can outperform conventional methods for route optimization, which do not leverage full predictive distributions. △ Less

Submitted 21 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: 34 pages, 12 figures, 5 tables

Journal ref: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 3043-3048

arXiv:1807.08430 [pdf, other]

Actor-Action Semantic Segmentation with Region Masks

Authors: Kang Dang, Chunluan Zhou, Zhigang Tu, Michael Hoy, Justin Dauwels, Junsong Yuan

Abstract: In this paper, we study the actor-action semantic segmentation problem, which requires joint labeling of both actor and action categories in video frames. One major challenge for this task is that when an actor performs an action, different body parts of the actor provide different types of cues for the action category and may receive inconsistent action labeling when they are labeled independentl… ▽ More In this paper, we study the actor-action semantic segmentation problem, which requires joint labeling of both actor and action categories in video frames. One major challenge for this task is that when an actor performs an action, different body parts of the actor provide different types of cues for the action category and may receive inconsistent action labeling when they are labeled independently. To address this issue, we propose an end-to-end region-based actor-action segmentation approach which relies on region masks from an instance segmentation algorithm. Our main novelty is to avoid labeling pixels in a region mask independently - instead we assign a single action label to these pixels to achieve consistent action labeling. When a pixel belongs to multiple region masks, max pooling is applied to resolve labeling conflicts. Our approach uses a two-stream network as the front-end (which learns features capturing both appearance and motion information), and uses two region-based segmentation networks as the back-end (which takes the fused features from the two-stream network as the input and predicts actor-action labeling). Experiments on the A2D dataset demonstrate that both the region-based segmentation strategy and the fused features from the two-stream network contribute to the performance improvements. The proposed approach outperforms the state-of-the-art results by more than 8% in mean class accuracy, and more than 5% in mean class IOU, which validates its effectiveness. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: Accepted by BMVC 2018

arXiv:1807.02406 [pdf, other]

Multi-atomic Annealing Heuristic for Static Dial-a-ride Problem

Authors: Song Guang Ho, Ramesh Ramasamy Pandi, Sarat Chandra Nagavarapu, Justin Dauwels

Abstract: Dial-a-ride problem (DARP) deals with the transportation of users between pickup and drop-off locations associated with specified time windows. This paper proposes a novel algorithm called multi-atomic annealing (MATA) to solve static dial-a-ride problem. Two new local search operators (burn and reform), a new construction heuristic and two request sequencing mechanisms (Sorted List and Random Lis… ▽ More Dial-a-ride problem (DARP) deals with the transportation of users between pickup and drop-off locations associated with specified time windows. This paper proposes a novel algorithm called multi-atomic annealing (MATA) to solve static dial-a-ride problem. Two new local search operators (burn and reform), a new construction heuristic and two request sequencing mechanisms (Sorted List and Random List) are developed. Computational experiments conducted on various standard DARP test instances prove that MATA is an expeditious meta-heuristic in contrast to other existing methods. In all experiments, MATA demonstrates the capability to obtain high quality solutions, faster convergence, and quicker attainment of a first feasible solution. It is observed that MATA attains a first feasible solution 29.8 to 65.1% faster, and obtains a final solution that is 3.9 to 5.2% better, when compared to other algorithms within 60 sec. △ Less

Submitted 29 June, 2018; originally announced July 2018.

Comments: To be presented at the IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Singapore, 2018

arXiv:1801.09547 [pdf, ps, other]

An Improved Tabu Search Heuristic for Static Dial-A-Ride Problem

Authors: Songguang Ho, Sarat Chandra Nagavarapu, Ramesh Ramasamy Pandi, Justin Dauwels

Abstract: Multi-vehicle routing has become increasingly important with the rapid development of autonomous vehicle technology. Dial-a-ride problem, a variant of vehicle routing problem (VRP), deals with the allocation of customer requests to vehicles, scheduling the pick-up and drop-off times and the sequence of serving those requests by ensuring high customer satisfaction with minimized travel cost. In thi… ▽ More Multi-vehicle routing has become increasingly important with the rapid development of autonomous vehicle technology. Dial-a-ride problem, a variant of vehicle routing problem (VRP), deals with the allocation of customer requests to vehicles, scheduling the pick-up and drop-off times and the sequence of serving those requests by ensuring high customer satisfaction with minimized travel cost. In this paper, we propose an improved tabu search (ITS) heuristic for static dial-a-ride problem (DARP) with the objective of obtaining high-quality solutions in short time. Two new techniques, initialization heuristic, and time window adjustment are proposed to achieve faster convergence to the global optimum. Various numerical experiments are conducted for the proposed solution methodology using DARP test instances from the literature and the convergence speed up is validated. △ Less

Submitted 13 February, 2018; v1 submitted 25 January, 2018; originally announced January 2018.

Comments: Journal Paper

arXiv:1711.06195 [pdf, other]

Neurology-as-a-Service for the Developing World

Authors: Tejas Dharamsi, Payel Das, Tejaswini Pedapati, Gregory Bramble, Vinod Muthusamy, Horst Samulowitz, Kush R. Varshney, Yuvaraj Rajamanickam, John Thomas, Justin Dauwels

Abstract: Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician sho… ▽ More Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician shortages plague society. This problem can be addressed by teleEEG that uses remote EEG analysis by experts or by local computer processing of EEGs. However, both of these options are prohibitively expensive and the second option requires abundant computing resources and infrastructure, which is another concern in developing countries where there are resource constraints on capital and computing infrastructure. In this work, we present a cloud-based deep neural network approach to provide decision support for non-specialist physicians in EEG analysis and interpretation. Named `neurology-as-a-service,' the approach requires almost no manual intervention in feature engineering and in the selection of an optimal architecture and hyperparameters of the neural network. In this study, we deploy a pipeline that includes moving EEG data to the cloud and getting optimal models for various classification tasks. Our initial prototype has been tested only in developed world environments to-date, but our intention is to test it in developing world environments in future work. We demonstrate the performance of our proposed approach using the BCI2000 EEG MMI dataset, on which our service attains 63.4% accuracy for the task of classifying real vs. imaginary activity performed by the subject, which is significantly higher than what is obtained with a shallow approach such as support vector machines. △ Less

Submitted 21 November, 2017; v1 submitted 16 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Workshop on Machine Learning for the Developing World

arXiv:0910.2832 [pdf, ps, other]

Expectation Maximization as Message Passing - Part I: Principles and Gaussian Messages

Authors: Justin Dauwels, Andrew Eckford, Sascha Korl, Hans-Andrea Loeliger

Abstract: It is shown how expectation maximization (EM) may be viewed as a message passing algorithm in factor graphs. In particular, a general EM message computation rule is identified. As a factor graph tool, EM may be used to break cycles in a factor graph, and tractable messages may in some cases be obtained where the sum-product messages are unwieldy. As an exemplary application, the paper consider… ▽ More It is shown how expectation maximization (EM) may be viewed as a message passing algorithm in factor graphs. In particular, a general EM message computation rule is identified. As a factor graph tool, EM may be used to break cycles in a factor graph, and tractable messages may in some cases be obtained where the sum-product messages are unwieldy. As an exemplary application, the paper considers linear Gaussian state space models. Unknown coefficients in such models give rise to multipliers in the corresponding factor graph. A main attraction of EM in such cases is that it results in purely Gaussian message passing algorithms. These Gaussian EM messages are tabulated for several (scalar, vector, matrix) multipliers that frequently appear in applications. △ Less

Submitted 15 October, 2009; originally announced October 2009.

arXiv:0904.4741 [pdf, other]

Belief-Propagation Decoding of Lattices Using Gaussian Mixtures

Authors: Brian M. Kurkoski, Justin Dauwels

Abstract: A belief-propagation decoder for low-density lattice codes is given which represents messages explicitly as a mixture of Gaussians functions. The key component is an algorithm for approximating a mixture of several Gaussians with another mixture with a smaller number of Gaussians. This Gaussian mixture reduction algorithm iteratively reduces the number of Gaussians by minimizing the distance bet… ▽ More A belief-propagation decoder for low-density lattice codes is given which represents messages explicitly as a mixture of Gaussians functions. The key component is an algorithm for approximating a mixture of several Gaussians with another mixture with a smaller number of Gaussians. This Gaussian mixture reduction algorithm iteratively reduces the number of Gaussians by minimizing the distance between the original mixture and an approximation with one fewer Gaussians. Error rates and noise thresholds of this decoder are compared with those for the previously-proposed decoder which discretely quantizes the messages. The error rates are indistinguishable for dimension 1000 and 10000 lattices, and the Gaussian-mixture decoder has a 0.2 dB loss for dimension 100 lattices. The Gaussian-mixture decoder has a loss of about 0.03 dB in the noise threshold, which is evaluated via Monte Carlo density evolution. Further, the Gaussian-mixture decoder uses far less storage for the messages. △ Less

Submitted 30 April, 2009; originally announced April 2009.

Comments: 7 pages, 5 figures, submitted to IEEE Transactions on Information Theory

arXiv:0802.0554 [pdf, other]

Message-Passing Decoding of Lattices Using Gaussian Mixtures

Authors: Brian M. Kurkoski, Justin Dauwels

Abstract: A lattice decoder which represents messages explicitly as a mixture of Gaussians functions is given. In order to prevent the number of functions in a mixture from growing as the decoder iterations progress, a method for replacing N Gaussian functions with M Gaussian functions, with M < N, is given. A squared distance metric is used to select functions for combining. A pair of selected Gaussians… ▽ More A lattice decoder which represents messages explicitly as a mixture of Gaussians functions is given. In order to prevent the number of functions in a mixture from growing as the decoder iterations progress, a method for replacing N Gaussian functions with M Gaussian functions, with M < N, is given. A squared distance metric is used to select functions for combining. A pair of selected Gaussians is replaced by a single Gaussian with the same first and second moments. The metric can be computed efficiently, and at the same time, the proposed algorithm empirically gives good results, for example, a dimension 100 lattice has a loss of 0.2 dB in signal-to-noise ratio at a probability of symbol error of 10^{-5}. △ Less

Submitted 5 February, 2008; originally announced February 2008.

Comments: Cite this paper as: Brian Kurkoski and Justin Dauwels, "Message-passing decoding of lattices using Gaussian mixtures," in Proceedings of the 30th Symposium on Information Theory and its Applications (SITA 2007), pp. 877-882, November 27-30, 2007, Shima, Mie, Japan

arXiv:cs/0607027 [pdf, ps, other]

A general computation rule for lossy summaries/messages with examples from equalization

Authors: Junli Hu, Hans-Andrea Loeliger, Justin Dauwels, Frank Kschischang

Abstract: Elaborating on prior work by Minka, we formulate a general computation rule for lossy messages. An important special case (with many applications in communications) is the conversion of "soft-bit" messages to Gaussian messages. By this method, the performance of a Kalman equalizer is improved, both for uncoded and coded transmission. Elaborating on prior work by Minka, we formulate a general computation rule for lossy messages. An important special case (with many applications in communications) is the conversion of "soft-bit" messages to Gaussian messages. By this method, the performance of a Kalman equalizer is improved, both for uncoded and coded transmission. △ Less

Submitted 1 October, 2006; v1 submitted 7 July, 2006; originally announced July 2006.

Comments: Proc. of the 44th Allerton Conference on Communication, Control, and Computing, Monticello, Ill., USA, Sept. 2006

arXiv:cs/0508027 [pdf, ps, other]

doi 10.1109/ISIT.2005.1523402

Expectation maximization as message passing

Authors: J. Dauwels, S. Korl, H. -A. Loeliger

Abstract: Based on prior work by Eckford, it is shown how expectation maximization (EM) may be viewed, and used, as a message passing algorithm in factor graphs. Based on prior work by Eckford, it is shown how expectation maximization (EM) may be viewed, and used, as a message passing algorithm in factor graphs. △ Less

Submitted 3 August, 2005; originally announced August 2005.

Comments: To appear in the proceedings of the 2005 IEEE International Symposium on Information Theory, Adelaide, Australia, September 4-9, 2005

Showing 1–27 of 27 results for author: Dauwels, J