Search | arXiv e-print repository

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

Authors: Khang T. Doan, Bao G. Huynh, Dung T. Hoang, Thuc D. Pham, Nhat H. Pham, Quan T. M. Nguyen, Bang Q. Vo, Suong N. Hoang

Abstract: In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Viet… ▽ More In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Vietnamese context. The model is fine-tuned on an extensive dataset of over 3 million image-question-answer pairs, achieving robust performance and reliable results across multiple Vietnamese language benchmarks like OpenViVQA and ViTextVQA. Vintern-1B is small enough to fit into various on-device applications easily. Additionally, we have open-sourced several Vietnamese vision question answering (VQA) datasets for text and diagrams, created with Gemini 1.5 Flash. Our models are available at: https://huggingface.co/5CD-AI/Vintern-1B-v2. △ Less

Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.06115 [pdf, other]

Measurement Study of Programmable Network Coding in Cloud-native 5G and Beyond Networks

Authors: Osel Lhamo, Tung V. Doan, Elif Tasdemir, Mahdi Attawna, Giang T. Nguyen, Patrick Seeling, Martin Reisslein, Frank H. P. Fitzek

Abstract: Emerging 5G/6G use cases span various industries, necessitating flexible solutions that leverage emerging technologies to meet diverse and stringent application requirements under changing network conditions. The standard 5G RAN solution, retransmission, reduces packet loss but can increase transmission delay in the process. Random Linear Network Coding (RLNC) offers an alternative by proactively… ▽ More Emerging 5G/6G use cases span various industries, necessitating flexible solutions that leverage emerging technologies to meet diverse and stringent application requirements under changing network conditions. The standard 5G RAN solution, retransmission, reduces packet loss but can increase transmission delay in the process. Random Linear Network Coding (RLNC) offers an alternative by proactively sending combinations of original packets, thus reducing both delay and packet loss. Current research often only simulates the integration of RLNC in 5G while we implement and evaluate our approach on real commercially available hardware in a real-world deployment. We introduce Flexible Network Coding (FlexNC), which enables the flexible fusion of several RLNC protocols by incorporating a forwarder with multiple RLNC nodes. Network operators can configure FlexNC based on network conditions and application requirements. To further boost network programmability, our Recoder in the Network (RecNet) leverages intermediate network nodes to join the coding process. Both the proposed algorithms have been implemented on OpenAirInterface and extensively tested with traffic from different applications in a real network. While FlexNC adapts to various application needs of latency and packet loss, RecNet significantly minimizes packet loss for a remote user with minimal increase in delay compared to pure RLNC. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2407.19287 [pdf, other]

Bayesian meta learning for trustworthy uncertainty quantification

Authors: Zhenyuan Yuan, Thinh T. Doan

Abstract: We consider the problem of Bayesian regression with trustworthy uncertainty quantification. We define that the uncertainty quantification is trustworthy if the ground truth can be captured by intervals dependent on the predictive distributions with a pre-specified probability. Furthermore, we propose, Trust-Bayes, a novel optimization framework for Bayesian meta learning which is cognizant of trus… ▽ More We consider the problem of Bayesian regression with trustworthy uncertainty quantification. We define that the uncertainty quantification is trustworthy if the ground truth can be captured by intervals dependent on the predictive distributions with a pre-specified probability. Furthermore, we propose, Trust-Bayes, a novel optimization framework for Bayesian meta learning which is cognizant of trustworthy uncertainty quantification without explicit assumptions on the prior model/distribution of the functions. We characterize the lower bounds of the probabilities of the ground truth being captured by the specified intervals and analyze the sample complexity with respect to the feasible probability for trustworthy uncertainty quantification. Monte Carlo simulation of a case study using Gaussian process regression is conducted for verification and comparison with the Meta-prior algorithm. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.18454 [pdf, other]

Fairness Definitions in Language Models Explained

Authors: Thang Viet Doan, Zhibo Chu, Zichong Wang, Wenbin Zhang

Abstract: Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairne… ▽ More Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts (\textit{e.g.,} medium-sized LMs versus large-sized LMs) and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up-to-date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their foundational principles and operational distinctions. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The implementation and additional resources are publicly available at https://github.com/LavinWong/Fairness-in-Large-Language-Models/tree/main/definitions. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2406.14835 [pdf, other]

ToVo: Toxicity Taxonomy via Voting

Authors: Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Diep Thi-Ngoc Nguyen

Abstract: Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a h… ▽ More Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a high-quality open-source dataset for toxic content detection. Our methodology ensures diverse classification metrics for each sample and includes both classification scores and explanatory reasoning for the classifications. We utilize the dataset created through our proposed mechanism to train our model, which is then compared against existing widely-used detectors. Our approach not only enhances transparency and customizability but also facilitates better fine-tuning for specific use cases. This work contributes a robust framework for developing toxic content detection models, emphasizing openness and adaptability, thus paving the way for more effective and user-specific content moderation solutions. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.05271 [pdf, other]

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories. In this paper, we introduce the Universal Segment Embedding (USE) framework to address this challenge. This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories. The USE model can not only help open-vocabulary image segmentation but also facilitate other downstream tasks (e.g., querying and ranking). Through comprehensive experimental studies on semantic segmentation and part segmentation benchmarks, we demonstrate that the USE framework outperforms state-of-the-art open-vocabulary segmentation methods. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02554 [pdf, other]

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities. Our experiments on the AV-ASD dataset demonstrate that integrating audio, visual, and speech modalities significantly enhances the performance in autism behavior recognition. Additionally, we explored the use of a post-hoc to ad-hoc pipeline in a multimodal large language model to investigate its potential to augment the model's explanatory capability during autism behavior recognition. We will release our dataset, code, and pre-trained models. △ Less

Submitted 22 March, 2024; originally announced June 2024.

arXiv:2405.09660 [pdf, other]

Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning

Authors: Sihan Zeng, Thinh T. Doan

Abstract: Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level p… ▽ More Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level problem, which is to find the root of a strongly monotone operator. In this work, we propose a new method for solving two-time-scale optimization that achieves significantly faster convergence than the prior arts. The key idea of our approach is to leverage an averaging step to improve the estimates of the operators in both lower and upper levels before using them to update the decision variables. These additional averaging steps eliminate the direct coupling between the main variables, enabling the accelerated performance of our algorithm. We characterize the finite-time convergence rates of the proposed algorithm under various conditions of the underlying objective function, including strong convexity, convexity, Polyak-Lojasiewicz condition, and general non-convexity. These rates significantly improve over the best-known complexity of the standard two-time-scale stochastic approximation algorithm. When applied to RL, we show how the proposed algorithm specializes to novel online sample-based methods that surpass or match the performance of the existing state of the art. Finally, we support our theoretical results with numerical simulations in RL. △ Less

Submitted 10 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.02456 [pdf, ps, other]

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where informa… ▽ More Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where information for all tasks is accessible to a single server, and in the decentralized setting, where a network of agents, each given one task and observing local information, cooperate to find the solution of the globally constrained objective using local communication. We first propose a primal-dual algorithm that provably converges to the globally optimal solution of this constrained formulation under exact gradient evaluations. When the gradient is unknown, we further develop a sampled-based actor-critic algorithm that finds the optimal policy using online samples of state, action, and reward. Finally, we study the extension of the algorithm to the linear function approximation setting. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2403.06295 [pdf, other]

A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets

Authors: Thang Doan, Sima Behpour, Xin Li, Wenbin He, Liang Gou, Liu Ren

Abstract: Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams, all without overfitting. The rise of Vision-Language models (VLMs) has unlocked numerous applications, leveraging their existing knowledge to fine-tune on custom data. However, training the whole model is computationally prohibitive, and VLMs while being versat… ▽ More Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams, all without overfitting. The rise of Vision-Language models (VLMs) has unlocked numerous applications, leveraging their existing knowledge to fine-tune on custom data. However, training the whole model is computationally prohibitive, and VLMs while being versatile in general domains still struggle with fine-grained datasets crucial for many applications. We tackle these challenges with two proposed simple modules. The first, Session-Specific Prompts (SSP), enhances the separability of image-text embeddings across sessions. The second, Hyperbolic distance, compresses representations of image-text pairs within the same class while expanding those from different classes, leading to better representations. Experimental results demonstrate an average 10-point increase compared to baselines while requiring at least 8 times fewer trainable parameters. This improvement is further underscored on our three newly introduced fine-grained datasets. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2401.12764 [pdf, other]

Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity

Authors: Thinh T. Doan

Abstract: This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will the… ▽ More This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will then be used in the two-time-scale stochastic approximation updates to find the desired solution. Our main theoretical result is to show that under the strongly monotone condition of the underlying nonlinear operators the mean-squared errors of the iterates generated by the proposed method converge to zero at an optimal rate $O(1/k)$, where $k$ is the number of iterations. Our result significantly improves the existing result of two-time-scale stochastic approximation, where the best known finite-time convergence rate is $O(1/k^{2/3})$. We illustrate this result by applying the proposed method to develop new reinforcement learning algorithms with improved performance. △ Less

Submitted 22 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2312.16129 [pdf, other]

doi 10.1145/3613904.3642257

Interactive Shape Sonification for Tumor Localization in Breast Cancer Surgery

Authors: Laura Schütz, Trishia El Chemaly, Emmanuelle Weber, Anh Thien Doan, Jacqueline Tsai, Christoph Leuze, Bruce Daniel, Nassir Navab

Abstract: About 20 percent of patients undergoing breast-conserving surgery require reoperation due to cancerous tissue remaining inside the breast. Breast cancer localization systems utilize auditory feedback to convey the distance between a localization probe and a small marker (seed) implanted into the breast tumor prior to surgery. However, no information on the location of the tumor margin is provided.… ▽ More About 20 percent of patients undergoing breast-conserving surgery require reoperation due to cancerous tissue remaining inside the breast. Breast cancer localization systems utilize auditory feedback to convey the distance between a localization probe and a small marker (seed) implanted into the breast tumor prior to surgery. However, no information on the location of the tumor margin is provided. To reduce the reoperation rate by improving the usability and accuracy of the surgical task, we developed an auditory display using shape sonification to assist with tumor margin localization. Accuracy and usability of the interactive shape sonification were determined on models of the female breast in three user studies with both breast surgeons and non-clinical participants. The comparative studies showed a significant increase in usability (p<0.05) and localization accuracy (p<0.001) of the shape sonification over the auditory feedback currently used in surgery. △ Less

Submitted 28 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: 15 pages, 9 figures

ACM Class: H.5.2; H.5.5; J.3

Journal ref: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA

arXiv:2312.07035 [pdf, other]

HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts

Authors: Giang Do, Khiem Le, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Bint T. Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven Hoi

Abstract: By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from rando… ▽ More By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces \HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of \HyperRouter compared to existing routing methods. Our implementation is publicly available at {\url{https://github.com/giangdip2410/HyperRouter}}. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2308.00310 [pdf, other]

GradOrth: A Simple yet Efficient Out-of-Distribution Detection with Orthogonal Projection of Gradients

Authors: Sima Behpour, Thang Doan, Xin Li, Wenbin He, Liang Gou, Liu Ren

Abstract: Detecting out-of-distribution (OOD) data is crucial for ensuring the safe deployment of machine learning models in real-world applications. However, existing OOD detection approaches primarily rely on the feature maps or the full gradient space information to derive OOD scores neglecting the role of most important parameters of the pre-trained network over in-distribution (ID) data. In this study,… ▽ More Detecting out-of-distribution (OOD) data is crucial for ensuring the safe deployment of machine learning models in real-world applications. However, existing OOD detection approaches primarily rely on the feature maps or the full gradient space information to derive OOD scores neglecting the role of most important parameters of the pre-trained network over in-distribution (ID) data. In this study, we propose a novel approach called GradOrth to facilitate OOD detection based on one intriguing observation that the important features to identify OOD data lie in the lower-rank subspace of in-distribution (ID) data. In particular, we identify OOD data by computing the norm of gradient projection on the subspaces considered important for the in-distribution data. A large orthogonal projection value (i.e. a small projection value) indicates the sample as OOD as it captures a weak correlation of the ID data. This simple yet effective method exhibits outstanding performance, showcasing a notable reduction in the average false positive rate at a 95% true positive rate (FPR95) of up to 8% when compared to the current state-of-the-art methods. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.11227 [pdf, other]

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models

Authors: Xin Li, Sima Behpour, Thang Doan, Wenbin He, Liang Gou, Liu Ren

Abstract: In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ign… ▽ More In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ignored the powerfulness of text features. In this work, we argue that, with proper design, the joint feature space of both vision and text can yield a better representation for data pre-selection. To this end, we introduce UP-DP, a simple yet effective unsupervised prompt learning approach that adapts vision-language models, like BLIP-2, for data pre-selection. Specifically, with the BLIP-2 parameters frozen, we train text prompts to extract the joint features with improved representation, ensuring a diverse cluster structure that covers the entire dataset. We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. Interestingly, the prompts learned from one dataset demonstrate significant generalizability and can be applied directly to enhance the feature extraction of BLIP-2 from other datasets. To the best of our knowledge, UP-DP is the first work to incorporate unsupervised prompt learning in a vision-language model for data pre-selection. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.14291 [pdf, other]

Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection

Authors: Thang Doan, Xin Li, Sima Behpour, Wenbin He, Liang Gou, Liu Ren

Abstract: Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a se… ▽ More Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a self-driving scene, but it may be significant in a household context. We argue that this contextual information should already be embedded within the known classes. In other words, there should be a semantic or latent structure relationship between the known and unknown items to be discovered. Motivated by this observation, we propose Hyp-OW, a method that learns and models hierarchical representation of known items through a SuperClass Regularizer. Leveraging this representation allows us to effectively detect unknown objects using a similarity distance-based relabeling module. Extensive experiments on benchmark datasets demonstrate the effectiveness of Hyp-OW, achieving improvement in both known and unknown detection (up to 6 percent). These findings are particularly pronounced in our newly designed benchmark, where a strong hierarchical structure exists between known and unknown objects. Our code can be found at https://github.com/boschresearch/Hyp-OW △ Less

Submitted 15 February, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted at AAAI 2024 || keywords: Open World Object Detection, Hyperbolic Distance, Unknown Detection, Deformable Transformers, Hierarchical Representation Learning

arXiv:2305.13696 [pdf, other]

doi 10.18653/v1/2023.findings-acl.7

Abstractive Text Summarization Using the BRIO Training Paradigm

Authors: Khang Nhut Lam, Thieu Gia Doan, Khang Thua Pham, Jugal Kalita

Abstract: Summary sentences produced by abstractive summarization models may be coherent and comprehensive, but they lack control and rely heavily on reference summaries. The BRIO training paradigm assumes a non-deterministic distribution to reduce the model's dependence on reference summaries, and improve model performance during inference. This paper presents a straightforward but effective technique to i… ▽ More Summary sentences produced by abstractive summarization models may be coherent and comprehensive, but they lack control and rely heavily on reference summaries. The BRIO training paradigm assumes a non-deterministic distribution to reduce the model's dependence on reference summaries, and improve model performance during inference. This paper presents a straightforward but effective technique to improve abstractive summaries by fine-tuning pre-trained language models, and training them with the BRIO paradigm. We build a text summarization dataset for Vietnamese, called VieSum. We perform experiments with abstractive summarization models trained with the BRIO paradigm on the CNNDM and the VieSum datasets. The results show that the models, trained on basic hardware, outperform all existing abstractive summarization models, especially for Vietnamese. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 6 pages, Findings of the Association for Computational Linguistics: ACL 2023

Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

arXiv:2305.00790 [pdf, other]

doi 10.1145/3517745.3561445

DNS Privacy with Speed? Evaluating DNS over QUIC and its Impact on Web Performance

Authors: Mike Kosek, Luca Schumann, Robin Marx, Trinh Viet Doan, Vaibhav Bajpai

Abstract: Over the last decade, Web traffic has significantly shifted towards HTTPS due to an increased awareness for privacy. However, DNS traffic is still largely unencrypted, which allows user profiles to be derived from plaintext DNS queries. While DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem by leveraging transport encryption for DNS, both protocols are constrained by the underlying… ▽ More Over the last decade, Web traffic has significantly shifted towards HTTPS due to an increased awareness for privacy. However, DNS traffic is still largely unencrypted, which allows user profiles to be derived from plaintext DNS queries. While DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem by leveraging transport encryption for DNS, both protocols are constrained by the underlying transport (TCP) and encryption (TLS) protocols, requiring multiple round-trips to establish a secure connection. In contrast, QUIC combines the transport and cryptographic handshake into a single round-trip, which allows the recently standardized DNS over QUIC (DoQ) to provide DNS privacy with minimal latency. In the first study of its kind, we perform distributed DoQ measurements across multiple vantage points to evaluate the impact of DoQ on Web performance. We find that DoQ excels over DoH, leading to significant improvements with up to 10% faster loads for simple webpages. With increasing complexity of webpages, DoQ even catches up to DNS over UDP (DoUDP) as the cost of encryption amortizes: With DoQ being only ~2% slower than DoUDP, encrypted DNS becomes much more appealing for the Web. △ Less

Submitted 3 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Journal ref: ACM Internet Measurement Conference (IMC) 2023

arXiv:2303.12981 [pdf, other]

Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the o… ▽ More The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the optimization objective as a function of the policy parameter and reward satisfies a stronger "equiconnectedness" property. To our best knowledge, these are novel and previously unknown discoveries. We present an application of the connectedness of these superlevel sets to the derivation of minimax theorems for robust reinforcement learning. We show that any minimax optimization program which is convex on one side and is equiconnected on the other side observes the minimax equality (i.e. has a Nash equilibrium). We find that this exact structure is exhibited by an interesting robust reinforcement learning problem under an adversarial reward attack, and the validity of its minimax equality immediately follows. This is the first time such a result is established in the literature. △ Less

Submitted 30 September, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2301.10965 [pdf]

Design of Mobile Manipulator for Fire Extinguisher Testing. Part I Key Specifications and Conceptual Design

Authors: Xuan Quang Ngo, Thai Nguyen Chau, Cong Thang Doan, Van Tu Duong, Duy Vo Hoang, Tan Tien Nguyen

Abstract: All flames are extinguished as early as possible, or fire services have to deal with major conflagrations. This leads to the fact that the quality of fire extinguishers has become a very sensitive and important issue in firefighting. Inspired by the development of automatic fire fighting systems, this paper proposes key specifications based on the standard of fire extinguishers that is ISO 7165:20… ▽ More All flames are extinguished as early as possible, or fire services have to deal with major conflagrations. This leads to the fact that the quality of fire extinguishers has become a very sensitive and important issue in firefighting. Inspired by the development of automatic fire fighting systems, this paper proposes key specifications based on the standard of fire extinguishers that is ISO 7165:2009 and ISO 11601:2008, and feasible solutions to design a mobile manipulator for automatically evaluating the quality or, more specifically, power of fire extinguishers. In addition, a part of the mechanical design is also discussed. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 10 pages, 8 figures, the 7th International Conference on Advanced Engineering, Theory and Applications

arXiv:2211.10445 [pdf, other]

Building a Subspace of Policies for Scalable Continual Learning

Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method tha… ▽ More The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation). △ Less

Submitted 2 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted at ICLR2023 (notable-top-25%). website: https://continual-subspace-policies-streamlit-app-gofujp.streamlit.app/ code: https://github.com/facebookresearch/salina/tree/main/salina_cl

arXiv:2211.03151 [pdf, other]

LG-Hand: Advancing 3D Hand Pose Estimation with Locally and Globally Kinematic Knowledge

Authors: Tu Le-Xuan, Trung Tran-Quang, Thi Ngoc Hien Doan, Thanh-Hai Tran

Abstract: 3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both sp… ▽ More 3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both spatial and temporal dependencies into a single process. We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation. We thereby introduce two new objective functions, Angle and Direction loss, to take the hand structure into account. While Angle loss covers locally kinematic information, Direction loss handles globally kinematic one. Our LG-Hand achieves promising results on the First-Person Hand Action Benchmark (FPHAB) dataset. We also perform an ablation study to show the efficacy of the two proposed objective functions. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2210.12938 [pdf]

GradMix for nuclei segmentation and classification in imbalanced pathology image datasets

Authors: Tan Nhu Nhat Doan, Kyungeun Kim, Boram Song, Jin Tae Kwak

Abstract: An automated segmentation and classification of nuclei is an essential task in digital pathology. The current deep learning-based approaches require a vast amount of annotated datasets by pathologists. However, the existing datasets are imbalanced among different types of nuclei in general, leading to a substantial performance degradation. In this paper, we propose a simple but effective data augm… ▽ More An automated segmentation and classification of nuclei is an essential task in digital pathology. The current deep learning-based approaches require a vast amount of annotated datasets by pathologists. However, the existing datasets are imbalanced among different types of nuclei in general, leading to a substantial performance degradation. In this paper, we propose a simple but effective data augmentation technique, termed GradMix, that is specifically designed for nuclei segmentation and classification. GradMix takes a pair of a major-class nucleus and a rare-class nucleus, creates a customized mixing mask, and combines them using the mask to generate a new rare-class nucleus. As it combines two nuclei, GradMix considers both nuclei and the neighboring environment by using the customized mixing mask. This allows us to generate realistic rare-class nuclei with varying environments. We employed two datasets to evaluate the effectiveness of GradMix. The experimental results suggest that GradMix is able to improve the performance of nuclei segmentation and classification in imbalanced pathology image datasets. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: submitted to MICCAI2022

arXiv:2208.13252 [pdf, other]

MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities

Authors: Hoang H. Nguyen, Nhat-Minh Nguyen, Chunyao Xie, Zahra Ahmadi, Daniel Kudendo, Thanh-Nam Doan, Lingxiao Jiang

Abstract: Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulner… ▽ More Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized metapaths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score. △ Less

Submitted 7 September, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

Comments: Accepted at the 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2022 - Research Track)

ACM Class: I.2.5; D.2.4

arXiv:2206.07642 [pdf, other]

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

Authors: Dingyang Chen, Qi Zhang, Thinh T. Doan

Abstract: We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gra… ▽ More We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks. We first show the asymptotic convergence of this method to a Nash equilibrium of MPGs for tabular softmax policies. Second, we derive the finite-time performance of the policy gradient in two settings: 1) using the log-barrier regularization, and 2) using the natural policy gradient under the best-response dynamics (NPG-BR). Finally, extending the notion of price of anarchy (POA) and smoothness in normal-form games, we introduce the POA for MPGs and provide a POA bound for NPG-BR. To our knowledge, this is the first POA bound for solving MPGs. To support our theoretical results, we empirically compare the convergence rates and POA of policy gradient variants for both tabular and neural softmax policies. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2205.13746 [pdf, other]

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of th… ▽ More We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm. △ Less

Submitted 12 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.05940 [pdf, other]

SimCPSR: Simple Contrastive Learning for Paper Submission Recommendation System

Authors: Duc H. Le, Tram T. Doan, Son T. Huynh, Binh T. Nguyen

Abstract: The recommendation system plays a vital role in many areas, especially academic fields, to support researchers in submitting and increasing the acceptance of their work through the conference or journal selection process. This study proposes a transformer-based model using transfer learning as an efficient approach for the paper submission recommendation system. By combining essential information… ▽ More The recommendation system plays a vital role in many areas, especially academic fields, to support researchers in submitting and increasing the acceptance of their work through the conference or journal selection process. This study proposes a transformer-based model using transfer learning as an efficient approach for the paper submission recommendation system. By combining essential information (such as the title, the abstract, and the list of keywords) with the aims and scopes of journals, the model can recommend the Top K journals that maximize the acceptance of the paper. Our model had developed through two states: (i) Fine-tuning the pre-trained language model (LM) with a simple contrastive learning framework. We utilized a simple supervised contrastive objective to fine-tune all parameters, encouraging the LM to learn the document representation effectively. (ii) The fine-tuned LM was then trained on different combinations of the features for the downstream task. This study suggests a more advanced method for enhancing the efficiency of the paper submission recommendation system compared to previous approaches when we respectively achieve 0.5173, 0.8097, 0.8862, 0.9496 for Top 1, 3, 5, and 10 accuracies on the test set for combining the title, abstract, and keywords as input features. Incorporating the journals' aims and scopes, our model shows an exciting result by getting 0.5194, 0.8112, 0.8866, and 0.9496 respective to Top 1, 3, 5, and 10. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 13 pages, 1 table, 4 figures

arXiv:2205.00746 [pdf, other]

doi 10.1145/3544912.3544918

Measuring DNS over TCP in the Era of Increasing DNS Response Sizes: A View from the Edge

Authors: Mike Kosek, Trinh Viet Doan, Simon Huber, Vaibhav Bajpai

Abstract: The Domain Name System (DNS) is one of the most crucial parts of the Internet. Although the original standard defined the usage of DNS over UDP (DoUDP) as well as DNS over TCP (DoTCP), UDP has become the predominant protocol used in the DNS. With the introduction of new Resource Records (RRs), the sizes of DNS responses have increased considerably. Since this can lead to truncation or IP fragmenta… ▽ More The Domain Name System (DNS) is one of the most crucial parts of the Internet. Although the original standard defined the usage of DNS over UDP (DoUDP) as well as DNS over TCP (DoTCP), UDP has become the predominant protocol used in the DNS. With the introduction of new Resource Records (RRs), the sizes of DNS responses have increased considerably. Since this can lead to truncation or IP fragmentation, the fallback to DoTCP as required by the standard ensures successful DNS responses by overcoming the size limitations of DoUDP. However, the effects of the usage of DoTCP by stub resolvers are not extensively studied to this date. We close this gap by presenting a view at DoTCP from the Edge, issuing 12.1M DNS requests from 2,500 probes toward Public as well as Probe DNS recursive resolvers. In our measurement study, we observe that DoTCP is generally slower than DoUDP, where the relative increase in Response Time is less than 37% for most resolvers. While optimizations to DoTCP can be leveraged to further reduce the response times, we show that support on Public resolvers is still missing, hence leaving room for optimizations in the future. Moreover, we also find that Public resolvers generally have comparable reliability for DoTCP and DoUDP. However, Probe resolvers show a significantly different behavior: DoTCP queries targeting Probe resolvers fail in 3 out of 4 cases, and, therefore, do not comply with the standard. This problem will only aggravate in the future: As DNS response sizes will continue to grow, the need for DoTCP will solidify. △ Less

Submitted 18 July, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: Published in ACM SIGCOMM Computer Communication Review Volume 52 Issue 2, April 2022

arXiv:2202.09826 [pdf, other]

Continual Learning Beyond a Single Model

Authors: Thang Doan, Seyed Iman Mirzadeh, Mehrdad Farajtabar

Abstract: A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. Howev… ▽ More A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles. △ Less

Submitted 3 July, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: Accepted to 2nd Conference on Lifelong Learning Agents (CoLLAs 2023); Keywords: continual learning, neural network subspaces, ensemble models, computationally efficient training

arXiv:2202.06315 [pdf, other]

Towards Decentralised Cloud Storage with IPFS: Opportunities, Challenges, and Future Directions

Authors: Trinh Viet Doan, Yiannis Psaras, Jörg Ott, Vaibhav Bajpai

Abstract: The InterPlanetary File System (IPFS) is a novel decentralised storage architecture, which attempts to provide decentralised cloud storage by building on founding principles of P2P networking and content addressing. IPFS is used by more than 230k peers per week and serves tens of millions of requests per day, which makes it an interesting large-scale operational network to study. While it is used… ▽ More The InterPlanetary File System (IPFS) is a novel decentralised storage architecture, which attempts to provide decentralised cloud storage by building on founding principles of P2P networking and content addressing. IPFS is used by more than 230k peers per week and serves tens of millions of requests per day, which makes it an interesting large-scale operational network to study. While it is used as a building block in several projects and studies, its inner workings, properties, and implications have only been marginally explored in research. Thus, we provide an overview of the IPFS design and its core features, along with the opportunities that it opens as well as the challenges that it faces because of its properties. Overall, IPFS presents an interesting set of characteristics and offers lessons which can help building decentralised systems of the future. △ Less

Submitted 2 April, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

arXiv:2202.02987 [pdf, ps, other]

doi 10.1007/978-3030-98785-5_24

One to Rule them All? A First Look at DNS over QUIC

Authors: Mike Kosek, Trinh Viet Doan, Malte Granderath, Vaibhav Bajpai

Abstract: The DNS is one of the most crucial parts of the Internet. Since the original DNS specifications defined UDP and TCP as the underlying transport protocols, DNS queries are inherently unencrypted, making them vulnerable to eavesdropping and on-path manipulations. Consequently, concerns about DNS privacy have gained attention in recent years, which resulted in the introduction of the encrypted protoc… ▽ More The DNS is one of the most crucial parts of the Internet. Since the original DNS specifications defined UDP and TCP as the underlying transport protocols, DNS queries are inherently unencrypted, making them vulnerable to eavesdropping and on-path manipulations. Consequently, concerns about DNS privacy have gained attention in recent years, which resulted in the introduction of the encrypted protocols DNS over TLS (DoT) and DNS over HTTPS (DoH). Although these protocols address the key issues of adding privacy to the DNS, they are inherently restrained by their underlying transport protocols, which are at strife with, e.g., IP fragmentation or multi-RTT handshakes - challenges which are addressed by QUIC. As such, the recent addition of DNS over QUIC (DoQ) promises to improve upon the established DNS protocols. However, no studies focusing on DoQ, its adoption, or its response times exist to this date - a gap we close with our study. Our active measurements show a slowly but steadily increasing adoption of DoQ and reveal a high week-over-week fluctuation, which reflects the ongoing development process: As DoQ is still in standardization, implementations and services undergo rapid changes. Analyzing the response times of DoQ, we find that roughly 40% of measurements show considerably higher handshake times than expected, which traces back to the enforcement of the traffic amplification limit despite successful validation of the client's address. However, DoQ already outperforms DoT as well as DoH, which makes it the best choice for encrypted DNS to date. △ Less

Submitted 23 March, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: The final publication is available at Springer via https://doi.org/10.1007/978-3030-98785-5_24

Journal ref: International Conference on Passive and Active Network Measurement (PAM) 2022

arXiv:2201.00142 [pdf, other]

Impact of Evolving Protocols and COVID-19 on Internet Traffic Shares

Authors: Luca Schumann, Trinh Viet Doan, Tanya Shreedhar, Ricky Mok, Vaibhav Bajpai

Abstract: The rapid deployment of new Internet protocols over the last few years and the COVID-19 pandemic more recently (2020) has resulted in a change in the Internet traffic composition. Consequently, an updated microscopic view of traffic shares is needed to understand how the Internet is evolving to capture both such shorter- and longer-term events. Toward this end, we observe traffic composition at a… ▽ More The rapid deployment of new Internet protocols over the last few years and the COVID-19 pandemic more recently (2020) has resulted in a change in the Internet traffic composition. Consequently, an updated microscopic view of traffic shares is needed to understand how the Internet is evolving to capture both such shorter- and longer-term events. Toward this end, we observe traffic composition at a research network in Japan and a Tier-1 ISP in the USA. We analyze the traffic traces passively captured at two inter-domain links: MAWI (Japan) and CAIDA (New York-Sao Paulo), which cover 100GB of data for MAWI traces and 4TB of data for CAIDA traces in total. We begin by studying the impact of COVID-19 on the MAWI link: We find a substantial increase in the traffic volume of OpenVPN and rsync, as well as increases in traffic volume from cloud storage and video conferencing services, which shows that clients shift to remote work during the pandemic. For traffic traces between March 2018 to December 2018, we find that the use of IPv6 is increasing quickly on the CAIDA monitor: The IPv6 traffic volume increases from 1.1% in March 2018 to 6.1% in December 2018, while the IPv6 traffic share remains stable in the MAWI dataset at around 9% of the traffic volume. Among other protocols at the application layer, 60%-70% of IPv4 traffic on the CAIDA link is HTTP(S) traffic, out of which two-thirds are encrypted; for the MAWI link, more than 90% of the traffic is Web, of which nearly 75% is encrypted. Compared to previous studies, this depicts a larger increase in encrypted Web traffic of up to a 3-to-1 ratio of HTTPS to HTTP. As such, our observations in this study further reconfirm that traffic shares change with time and can vary greatly depending on the vantage point studied despite the use of the same generalized methodology and analyses, which can also be applied to other traffic monitoring datasets. △ Less

Submitted 15 January, 2022; v1 submitted 1 January, 2022; originally announced January 2022.

arXiv:2112.09579 [pdf, ps, other]

Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for Solving Nonconvex Min-Max Problems

Authors: Thinh T. Doan

Abstract: There are much recent interests in solving noncovnex min-max optimization problems due to its broad applications in many areas including machine learning, networked resource allocations, and distributed optimization. Perhaps, the most popular first-order method in solving min-max optimization is the so-called simultaneous (or single-loop) gradient descent-ascent algorithm due to its simplicity in… ▽ More There are much recent interests in solving noncovnex min-max optimization problems due to its broad applications in many areas including machine learning, networked resource allocations, and distributed optimization. Perhaps, the most popular first-order method in solving min-max optimization is the so-called simultaneous (or single-loop) gradient descent-ascent algorithm due to its simplicity in implementation. However, theoretical guarantees on the convergence of this algorithm is very sparse since it can diverge even in a simple bilinear problem. In this paper, our focus is to characterize the finite-time performance (or convergence rates) of the continuous-time variant of simultaneous gradient descent-ascent algorithm. In particular, we derive the rates of convergence of this method under a number of different conditions on the underlying objective function, namely, two-sided Polyak-L ojasiewicz (PL), one-sided PL, nonconvex-strongly concave, and strongly convex-nonconcave conditions. Our convergence results improve the ones in prior works under the same conditions of objective functions. The key idea in our analysis is to use the classic singular perturbation theory and coupling Lyapunov functions to address the time-scale difference and interactions between the gradient descent and ascent dynamics. Our results on the behavior of continuous-time algorithm may be used to enhance the convergence properties of its discrete-time counterpart. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2110.11383 [pdf, other]

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal a… ▽ More We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations. △ Less

Submitted 23 September, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2109.14756 [pdf, other]

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the… ▽ More We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" than the latter. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, PL condition, and general non-convexity. We apply our framework to various policy optimization problems. First, we look at the infinite-horizon average-reward MDP with finite state and action spaces and derive a convergence rate of $O(k^{-2/5})$ for the online actor-critic algorithm under function approximation, which recovers the best known rate derived specifically for this problem. Second, we study the linear-quadratic regulator and show that an online actor-critic method converges with rate $O(k^{-2/3})$. Third, we use the actor-critic algorithm to solve the policy optimization problem in an entropy regularized Markov decision process, where we also establish a convergence of $O(k^{-2/3})$. The results we derive for both the second and third problem are novel and previously unknown in the literature. Finally, we briefly present the application of our framework to gradient-based policy evaluation algorithms in reinforcement learning. △ Less

Submitted 23 August, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2108.11867 [pdf, other]

A Typed Programmatic Interface to Contracts on the Blockchain

Authors: Thi Thu Ha Doan, Peter Thiemann

Abstract: Smart contract applications on the blockchain can only reach their full potential if they integrate seamlessly with traditional software systems via a programmatic interface. This interface should provide for originating and invoking contracts as well as observing the state of the blockchain. We propose a typed API for this purpose and establish some properties of the combined system. Specifically… ▽ More Smart contract applications on the blockchain can only reach their full potential if they integrate seamlessly with traditional software systems via a programmatic interface. This interface should provide for originating and invoking contracts as well as observing the state of the blockchain. We propose a typed API for this purpose and establish some properties of the combined system. Specifically, we provide an execution model that enables us to prove type-safe interaction between programs and the blockchain. We establish further properties of the model that give rise to requirements on the API. A prototype of the interface is implemented in OCaml for the Tezos blockchain. △ Less

Submitted 29 August, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: 19 pages + 8 pages appendix. Appears in APLAS 2021. Extended version with proofs in appendix

MSC Class: 68N15

arXiv:2108.11769 [pdf, other]

Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy

Authors: Nirupam Gupta, Thinh T. Doan, Nitin Vaidya

Abstract: We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some ag… ▽ More We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some agents ($f$ out of $N$) are Byzantine faulty. Such agents need not follow a prescribed algorithm correctly, and may communicate arbitrary incorrect information to the coordinator. In the presence of Byzantine agents, a more reasonable goal for the non-faulty agents is to find a minimizer of the aggregate cost function of only the non-faulty agents. This particular goal is commonly referred as exact fault-tolerance. Recent work has shown that exact fault-tolerance is achievable if only if the non-faulty agents satisfy the property of $2f$-redundancy. Now, under this property, techniques are known to impart exact fault-tolerance to the distributed implementation of the classical stochastic gradient-descent (SGD) algorithm. However, we do not know of any such techniques for the federated local SGD algorithm - a more commonly used method for federated machine learning. To address this issue, we propose a novel technique named comparative elimination (CE). We show that, under $2f$-redundancy, the federated local SGD algorithm with CE can indeed obtain exact fault-tolerance in the deterministic setting when the non-faulty agents can accurately compute gradients of their local cost functions. In the general stochastic case, when agents can only compute unbiased noisy estimates of their local gradients, our algorithm achieves approximate fault-tolerance with approximation error proportional to the variance of stochastic gradients and the fraction of Byzantine agents. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: 14 pages, 2 figures

arXiv:2106.11541 [pdf, other]

doi 10.1109/ACCESS.2022.3182345

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Authors: Tung Doan, Atsuhiro Takasu

Abstract: Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A popular algorithm for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements. Given that sequences… ▽ More Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A popular algorithm for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements. Given that sequences in practice are too long, this algorithm is not a practical approach. Although many heuristic algorithms have been proposed to approximate the optimal segmentation, they have no guarantee on the quality of their solutions. In this paper, we take a differentiable approach to alleviate the aforementioned issues. First, we introduce a novel sigmoid-based regularization to smoothly approximate the combinatorial constraints. Combining it with objective of the balanced kernel clustering, we formulate a differentiable model termed Kernel clustering with sigmoid-based regularization (KCSR), where the gradient-based algorithm can be exploited to obtain the optimal segmentation. Second, we develop a stochastic variant of the proposed model. By using the stochastic gradient descent algorithm, which has much lower time and space complexities, for optimization, the second model can perform segmentation on overlong data sequences. Finally, for simultaneously segmenting multiple data sequences, we slightly modify the sigmoid-based regularization to further introduce an extended variant of the proposed model. Through extensive experiments on various types of data sequences performances of our models are evaluated and compared with those of the existing methods. The experimental results validate advantages of the proposed models. Our Matlab source code is available on github. △ Less

Submitted 22 June, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

arXiv:2104.01627 [pdf, ps, other]

Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic Approximation under Markovian Noise

Authors: Thinh T. Doan

Abstract: We study the so-called two-time-scale stochastic approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. Our focus is to characterize its finite-time performance in a Markov setting, which often arises in stochastic control and reinforcement learning problems. In particular, we consider the scenario where the data in the method are generated by Markov p… ▽ More We study the so-called two-time-scale stochastic approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. Our focus is to characterize its finite-time performance in a Markov setting, which often arises in stochastic control and reinforcement learning problems. In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent. Such dependent data result to biased observations of the underlying operators. Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero. Our result shows that the method achieves a convergence in expectation at a rate $\mathcal{O}(1/k^{2/3})$, where $k$ is the number of iterations. Our analysis is mainly motivated by the classic singular perturbation theory for studying the asymptotic convergence of two-time-scale systems, that is, we consider a Lyapunov function that carefully characterizes the coupling between the two iterates. In addition, we utilize the geometric mixing time of the underlying Markov process to handle the bias and dependence in the data. Our theoretical result complements for the existing literature, where the rate of nonlinear two-time-scale stochastic approximation under Markovian noise is unknown. △ Less

Submitted 4 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:2011.01868

arXiv:2102.07097 [pdf, other]

Domain Adversarial Reinforcement Learning

Authors: Bonnie Li, Vincent François-Lavet, Thang Doan, Joelle Pineau

Abstract: We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the d… ▽ More We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the distribution (e.g. unseen backgrounds). For this "zero-shot RL" task, we enforce invariance of the learned representations to visual domains via a domain adversarial optimization process. We empirically show that this approach allows achieving a significant generalization improvement to new unseen domains. △ Less

Submitted 14 February, 2021; originally announced February 2021.

arXiv:2101.10506 [pdf, other]

Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

Authors: Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri

Abstract: Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis app… ▽ More Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis applies to very general settings, as we only assume ergodicity of the underlying Markov decision process. In order to ensure enough exploration, we employ an $ε$-greedy sampling of the trajectory. For a fixed and small enough exploration parameter $ε$, we show that the two-time-scale natural actor-critic algorithm has a rate of convergence of $\tilde{\mathcal{O}}(1/T^{1/4})$, where $T$ is the number of samples, and this leads to a sample complexity of $\Tilde{\mathcal{O}}(1/δ^{8})$ samples to find a policy that is within an error of $δ$ from the \emph{global optimum}. Moreover, by carefully decreasing the exploration parameter $ε$ as the iterations proceed, we present an improved sample complexity of $\Tilde{\mathcal{O}}(1/δ^{6})$ for convergence to the global optimum. △ Less

Submitted 20 February, 2022; v1 submitted 25 January, 2021; originally announced January 2021.

Comments: 28 pages, 2 figures

arXiv:2011.01868 [pdf, ps, other]

Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance

Authors: Thinh T. Doan

Abstract: Two-time-scale stochastic approximation, a generalized version of the popular stochastic approximation, has found broad applications in many areas including stochastic control, optimization, and machine learning. Despite its popularity, theoretical guarantees of this method, especially its finite-time performance, are mostly achieved for the linear case while the results for the nonlinear counterp… ▽ More Two-time-scale stochastic approximation, a generalized version of the popular stochastic approximation, has found broad applications in many areas including stochastic control, optimization, and machine learning. Despite its popularity, theoretical guarantees of this method, especially its finite-time performance, are mostly achieved for the linear case while the results for the nonlinear counterpart are very sparse. Motivated by the classic control theory for singularly perturbed systems, we study in this paper the asymptotic convergence and finite-time analysis of the nonlinear two-time-scale stochastic approximation. Under some fairly standard assumptions, we provide a formula that characterizes the rate of convergence of the main iterates to the desired solutions. In particular, we show that the method achieves a convergence in expectation at a rate $\mathcal{O}(1/k^{2/3})$, where $k$ is the number of iterations. The key idea in our analysis is to properly choose the two step sizes to characterize the coupling between the fast and slow-time-scale iterates. △ Less

Submitted 23 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

arXiv:2010.15088 [pdf, other]

Finite-Time Convergence Rates of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning

Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

Abstract: We study a decentralized variant of stochastic approximation, a data-driven approach for finding the root of an operator under noisy measurements. A network of agents, each with its own operator and data observations, cooperatively find the fixed point of the aggregate operator over a decentralized communication graph. Our main contribution is to provide a finite-time analysis of this decentralize… ▽ More We study a decentralized variant of stochastic approximation, a data-driven approach for finding the root of an operator under noisy measurements. A network of agents, each with its own operator and data observations, cooperatively find the fixed point of the aggregate operator over a decentralized communication graph. Our main contribution is to provide a finite-time analysis of this decentralized stochastic approximation method when the data observed at each agent are sampled from a Markov process; this lack of independence makes the iterates biased and (potentially) unbounded. Under fairly standard assumptions, we show that the convergence rate of the proposed method is essentially the same as if the samples were independent, differing only by a log factor that accounts for the mixing time of the Markov processes. The key idea in our analysis is to introduce a novel Razumikhin-Lyapunov function, motivated by the one used in analyzing the stability of delayed ordinary differential equations. We also discuss applications of the proposed method on a number of interesting learning problems in multi-agent systems. △ Less

Submitted 16 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.04003 [pdf, other]

A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

Authors: Thang Doan, Mehdi Bennani, Bogdan Mazoure, Guillaume Rabusseau, Pierre Alquier

Abstract: Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we… ▽ More Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method can help reduce CF on classical CL datasets. △ Less

Submitted 25 February, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: Accepted to AISTATS 2021. Keywords: continual learning, catastrophic forgetting, NTK regime, orthgonal gradient descent

Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

arXiv:2010.03691 [pdf, other]

Regularized Inverse Reinforcement Learning

Authors: Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

Abstract: Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions. Regularized IRL applies strongly convex regularizers to the learner's policy in order to avoid the expert's behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable soluti… ▽ More Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions. Regularized IRL applies strongly convex regularizers to the learner's policy in order to avoid the expert's behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable solutions, and practical methods to obtain them, for regularized IRL. Current methods are restricted to the maximum-entropy IRL framework, limiting them to Shannon-entropy regularizers, as well as proposing the solutions that are intractable in practice. We present theoretical backing for our proposed IRL method's applicability for both discrete and continuous controls, empirically validating our performance on a variety of tasks. △ Less

Submitted 2 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: 26 pages, 7 figures

arXiv:2009.14763 [pdf, other]

Byzantine Fault-Tolerance in Decentralized Optimization under Minimal Redundancy

Authors: Nirupam Gupta, Thinh T. Doan, Nitin H. Vaidya

Abstract: This paper considers the problem of Byzantine fault-tolerance in multi-agent decentralized optimization. In this problem, each agent has a local cost function. The goal of a decentralized optimization algorithm is to allow the agents to cooperatively compute a common minimum point of their aggregate cost function. We consider the case when a certain number of agents may be Byzantine faulty. Such f… ▽ More This paper considers the problem of Byzantine fault-tolerance in multi-agent decentralized optimization. In this problem, each agent has a local cost function. The goal of a decentralized optimization algorithm is to allow the agents to cooperatively compute a common minimum point of their aggregate cost function. We consider the case when a certain number of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm, and they may share arbitrary or incorrect information with other non-faulty agents. Presence of such Byzantine agents renders a typical decentralized optimization algorithm ineffective. We propose a decentralized optimization algorithm with provable exact fault-tolerance against a bounded number of Byzantine agents, provided the non-faulty agents have a minimal redundancy. △ Less

Submitted 30 September, 2020; originally announced September 2020.

Comments: An extension of our prior work on fault-tolerant distributed optimization, for the server-based system architecture (https://dl.acm.org/doi/10.1145/3382734.3405748), to the more general peer-to-peer system architecture

arXiv:2006.13460 [pdf, ps, other]

Local Stochastic Approximation: A Unified View of Federated Learning and Distributed Multi-Task Reinforcement Learning Algorithms

Authors: Thinh T. Doan

Abstract: Motivated by broad applications in reinforcement learning and federated learning, we study local stochastic approximation over a network of agents, where their goal is to find the root of an operator composed of the local operators at the agents. Our focus is to characterize the finite-time performance of this method when the data at each agent are generated from Markov processes, and hence they a… ▽ More Motivated by broad applications in reinforcement learning and federated learning, we study local stochastic approximation over a network of agents, where their goal is to find the root of an operator composed of the local operators at the agents. Our focus is to characterize the finite-time performance of this method when the data at each agent are generated from Markov processes, and hence they are dependent. In particular, we provide the convergence rates of local stochastic approximation for both constant and time-varying step sizes. Our results show that these rates are within a logarithmic factor of the ones under independent data. We then illustrate the applications of these results to different interesting problems in multi-task reinforcement learning and federated learning. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.11942 [pdf, other]

Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

Authors: Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama

Abstract: In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks… ▽ More In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks and proxies for Transfer Learning, generalisation and tasks similarity. In this framework, we prove that OGD is robust to Catastrophic Forgetting then derive the first generalisation bound for SGD and OGD for Continual Learning. Finally, we study the limits of this framework in practice for OGD and highlight the importance of the Neural Tangent Kernel variation for Continual Learning with OGD. △ Less

Submitted 4 December, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

arXiv:2006.07217 [pdf, other]

Deep Reinforcement and InfoMax Learning

Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm

Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal repres… ▽ More We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment. △ Less

Submitted 16 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020

arXiv:2006.04338 [pdf, other]

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Authors: Sihan Zeng, Aqeel Anwar, Thinh Doan, Arijit Raychowdhury, Justin Romberg

Abstract: We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are… ▽ More We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart, and illustrate them with simple examples. We then develop a decentralized entropy-regularized policy gradient method for solving the MTRL problem, and study its finite-time convergence rate. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "GridWorld" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to navigate an airborne drone in multiple (simulated) environments. △ Less

Submitted 27 May, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Showing 1–50 of 68 results for author: Doan, T