Zum Hauptinhalt springen

Showing 1–50 of 68 results for author: Doan, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12480  [pdf, other

    cs.LG cs.CL

    Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

    Authors: Khang T. Doan, Bao G. Huynh, Dung T. Hoang, Thuc D. Pham, Nhat H. Pham, Quan T. M. Nguyen, Bang Q. Vo, Suong N. Hoang

    Abstract: In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Viet… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2408.06115  [pdf, other

    cs.NI

    Measurement Study of Programmable Network Coding in Cloud-native 5G and Beyond Networks

    Authors: Osel Lhamo, Tung V. Doan, Elif Tasdemir, Mahdi Attawna, Giang T. Nguyen, Patrick Seeling, Martin Reisslein, Frank H. P. Fitzek

    Abstract: Emerging 5G/6G use cases span various industries, necessitating flexible solutions that leverage emerging technologies to meet diverse and stringent application requirements under changing network conditions. The standard 5G RAN solution, retransmission, reduces packet loss but can increase transmission delay in the process. Random Linear Network Coding (RLNC) offers an alternative by proactively… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  3. arXiv:2407.19287  [pdf, other

    stat.ML cs.LG eess.SY

    Bayesian meta learning for trustworthy uncertainty quantification

    Authors: Zhenyuan Yuan, Thinh T. Doan

    Abstract: We consider the problem of Bayesian regression with trustworthy uncertainty quantification. We define that the uncertainty quantification is trustworthy if the ground truth can be captured by intervals dependent on the predictive distributions with a pre-specified probability. Furthermore, we propose, Trust-Bayes, a novel optimization framework for Bayesian meta learning which is cognizant of trus… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  4. arXiv:2407.18454  [pdf, other

    cs.CL cs.AI cs.LG

    Fairness Definitions in Language Models Explained

    Authors: Thang Viet Doan, Zhibo Chu, Zichong Wang, Wenbin Zhang

    Abstract: Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairne… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  5. arXiv:2406.14835  [pdf, other

    cs.CL cs.LG

    ToVo: Toxicity Taxonomy via Voting

    Authors: Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Diep Thi-Ngoc Nguyen

    Abstract: Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a h… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.05271  [pdf, other

    cs.CV

    USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

    Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

    Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  8. arXiv:2405.09660  [pdf, other

    math.OC cs.LG

    Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning

    Authors: Sihan Zeng, Thinh T. Doan

    Abstract: Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level p… ▽ More

    Submitted 10 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  9. arXiv:2405.02456  [pdf, ps, other

    math.OC cs.LG

    Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where informa… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  10. arXiv:2403.06295  [pdf, other

    cs.CV

    A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets

    Authors: Thang Doan, Sima Behpour, Xin Li, Wenbin He, Liang Gou, Liu Ren

    Abstract: Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams, all without overfitting. The rise of Vision-Language models (VLMs) has unlocked numerous applications, leveraging their existing knowledge to fine-tune on custom data. However, training the whole model is computationally prohibitive, and VLMs while being versat… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  11. arXiv:2401.12764  [pdf, other

    math.OC cs.LG

    Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity

    Authors: Thinh T. Doan

    Abstract: This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will the… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  12. Interactive Shape Sonification for Tumor Localization in Breast Cancer Surgery

    Authors: Laura Schütz, Trishia El Chemaly, Emmanuelle Weber, Anh Thien Doan, Jacqueline Tsai, Christoph Leuze, Bruce Daniel, Nassir Navab

    Abstract: About 20 percent of patients undergoing breast-conserving surgery require reoperation due to cancerous tissue remaining inside the breast. Breast cancer localization systems utilize auditory feedback to convey the distance between a localization probe and a small marker (seed) implanted into the breast tumor prior to surgery. However, no information on the location of the tumor margin is provided.… ▽ More

    Submitted 28 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: 15 pages, 9 figures

    ACM Class: H.5.2; H.5.5; J.3

    Journal ref: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA

  13. arXiv:2312.07035  [pdf, other

    cs.LG cs.AI

    HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts

    Authors: Giang Do, Khiem Le, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Bint T. Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven Hoi

    Abstract: By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from rando… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  14. arXiv:2308.00310  [pdf, other

    cs.CV cs.LG

    GradOrth: A Simple yet Efficient Out-of-Distribution Detection with Orthogonal Projection of Gradients

    Authors: Sima Behpour, Thang Doan, Xin Li, Wenbin He, Liang Gou, Liu Ren

    Abstract: Detecting out-of-distribution (OOD) data is crucial for ensuring the safe deployment of machine learning models in real-world applications. However, existing OOD detection approaches primarily rely on the feature maps or the full gradient space information to derive OOD scores neglecting the role of most important parameters of the pre-trained network over in-distribution (ID) data. In this study,… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  15. arXiv:2307.11227  [pdf, other

    cs.CV

    UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models

    Authors: Xin Li, Sima Behpour, Thang Doan, Wenbin He, Liang Gou, Liu Ren

    Abstract: In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ign… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  16. arXiv:2306.14291  [pdf, other

    cs.CV cs.LG

    Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection

    Authors: Thang Doan, Xin Li, Sima Behpour, Wenbin He, Liang Gou, Liu Ren

    Abstract: Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a se… ▽ More

    Submitted 15 February, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted at AAAI 2024 || keywords: Open World Object Detection, Hyperbolic Distance, Unknown Detection, Deformable Transformers, Hierarchical Representation Learning

  17. Abstractive Text Summarization Using the BRIO Training Paradigm

    Authors: Khang Nhut Lam, Thieu Gia Doan, Khang Thua Pham, Jugal Kalita

    Abstract: Summary sentences produced by abstractive summarization models may be coherent and comprehensive, but they lack control and rely heavily on reference summaries. The BRIO training paradigm assumes a non-deterministic distribution to reduce the model's dependence on reference summaries, and improve model performance during inference. This paper presents a straightforward but effective technique to i… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 6 pages, Findings of the Association for Computational Linguistics: ACL 2023

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  18. DNS Privacy with Speed? Evaluating DNS over QUIC and its Impact on Web Performance

    Authors: Mike Kosek, Luca Schumann, Robin Marx, Trinh Viet Doan, Vaibhav Bajpai

    Abstract: Over the last decade, Web traffic has significantly shifted towards HTTPS due to an increased awareness for privacy. However, DNS traffic is still largely unencrypted, which allows user profiles to be derived from plaintext DNS queries. While DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem by leveraging transport encryption for DNS, both protocols are constrained by the underlying… ▽ More

    Submitted 3 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Journal ref: ACM Internet Measurement Conference (IMC) 2023

  19. arXiv:2303.12981  [pdf, other

    cs.LG math.OC

    Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the o… ▽ More

    Submitted 30 September, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  20. arXiv:2301.10965  [pdf

    cs.RO

    Design of Mobile Manipulator for Fire Extinguisher Testing. Part I Key Specifications and Conceptual Design

    Authors: Xuan Quang Ngo, Thai Nguyen Chau, Cong Thang Doan, Van Tu Duong, Duy Vo Hoang, Tan Tien Nguyen

    Abstract: All flames are extinguished as early as possible, or fire services have to deal with major conflagrations. This leads to the fact that the quality of fire extinguishers has become a very sensitive and important issue in firefighting. Inspired by the development of automatic fire fighting systems, this paper proposes key specifications based on the standard of fire extinguishers that is ISO 7165:20… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: 10 pages, 8 figures, the 7th International Conference on Advanced Engineering, Theory and Applications

  21. arXiv:2211.10445  [pdf, other

    cs.LG cs.AI

    Building a Subspace of Policies for Scalable Continual Learning

    Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

    Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method tha… ▽ More

    Submitted 2 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted at ICLR2023 (notable-top-25%). website: https://continual-subspace-policies-streamlit-app-gofujp.streamlit.app/ code: https://github.com/facebookresearch/salina/tree/main/salina_cl

  22. arXiv:2211.03151  [pdf, other

    cs.CV

    LG-Hand: Advancing 3D Hand Pose Estimation with Locally and Globally Kinematic Knowledge

    Authors: Tu Le-Xuan, Trung Tran-Quang, Thi Ngoc Hien Doan, Thanh-Hai Tran

    Abstract: 3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both sp… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  23. arXiv:2210.12938  [pdf

    eess.IV cs.CV

    GradMix for nuclei segmentation and classification in imbalanced pathology image datasets

    Authors: Tan Nhu Nhat Doan, Kyungeun Kim, Boram Song, Jin Tae Kwak

    Abstract: An automated segmentation and classification of nuclei is an essential task in digital pathology. The current deep learning-based approaches require a vast amount of annotated datasets by pathologists. However, the existing datasets are imbalanced among different types of nuclei in general, leading to a substantial performance degradation. In this paper, we propose a simple but effective data augm… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: submitted to MICCAI2022

  24. arXiv:2208.13252  [pdf, other

    cs.SE cs.LG

    MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities

    Authors: Hoang H. Nguyen, Nhat-Minh Nguyen, Chunyao Xie, Zahra Ahmadi, Daniel Kudendo, Thanh-Nam Doan, Lingxiao Jiang

    Abstract: Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulner… ▽ More

    Submitted 7 September, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

    Comments: Accepted at the 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2022 - Research Track)

    ACM Class: I.2.5; D.2.4

  25. arXiv:2206.07642  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

    Authors: Dingyang Chen, Qi Zhang, Thinh T. Doan

    Abstract: We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gra… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  26. arXiv:2205.13746  [pdf, other

    math.OC cs.LG

    Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of th… ▽ More

    Submitted 12 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  27. arXiv:2205.05940  [pdf, other

    cs.IR cs.CV

    SimCPSR: Simple Contrastive Learning for Paper Submission Recommendation System

    Authors: Duc H. Le, Tram T. Doan, Son T. Huynh, Binh T. Nguyen

    Abstract: The recommendation system plays a vital role in many areas, especially academic fields, to support researchers in submitting and increasing the acceptance of their work through the conference or journal selection process. This study proposes a transformer-based model using transfer learning as an efficient approach for the paper submission recommendation system. By combining essential information… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 13 pages, 1 table, 4 figures

  28. Measuring DNS over TCP in the Era of Increasing DNS Response Sizes: A View from the Edge

    Authors: Mike Kosek, Trinh Viet Doan, Simon Huber, Vaibhav Bajpai

    Abstract: The Domain Name System (DNS) is one of the most crucial parts of the Internet. Although the original standard defined the usage of DNS over UDP (DoUDP) as well as DNS over TCP (DoTCP), UDP has become the predominant protocol used in the DNS. With the introduction of new Resource Records (RRs), the sizes of DNS responses have increased considerably. Since this can lead to truncation or IP fragmenta… ▽ More

    Submitted 18 July, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: Published in ACM SIGCOMM Computer Communication Review Volume 52 Issue 2, April 2022

  29. arXiv:2202.09826  [pdf, other

    cs.LG cs.AI

    Continual Learning Beyond a Single Model

    Authors: Thang Doan, Seyed Iman Mirzadeh, Mehrdad Farajtabar

    Abstract: A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. Howev… ▽ More

    Submitted 3 July, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

    Comments: Accepted to 2nd Conference on Lifelong Learning Agents (CoLLAs 2023); Keywords: continual learning, neural network subspaces, ensemble models, computationally efficient training

  30. arXiv:2202.06315  [pdf, other

    cs.NI

    Towards Decentralised Cloud Storage with IPFS: Opportunities, Challenges, and Future Directions

    Authors: Trinh Viet Doan, Yiannis Psaras, Jörg Ott, Vaibhav Bajpai

    Abstract: The InterPlanetary File System (IPFS) is a novel decentralised storage architecture, which attempts to provide decentralised cloud storage by building on founding principles of P2P networking and content addressing. IPFS is used by more than 230k peers per week and serves tens of millions of requests per day, which makes it an interesting large-scale operational network to study. While it is used… ▽ More

    Submitted 2 April, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

  31. One to Rule them All? A First Look at DNS over QUIC

    Authors: Mike Kosek, Trinh Viet Doan, Malte Granderath, Vaibhav Bajpai

    Abstract: The DNS is one of the most crucial parts of the Internet. Since the original DNS specifications defined UDP and TCP as the underlying transport protocols, DNS queries are inherently unencrypted, making them vulnerable to eavesdropping and on-path manipulations. Consequently, concerns about DNS privacy have gained attention in recent years, which resulted in the introduction of the encrypted protoc… ▽ More

    Submitted 23 March, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: The final publication is available at Springer via https://doi.org/10.1007/978-3030-98785-5_24

    Journal ref: International Conference on Passive and Active Network Measurement (PAM) 2022

  32. arXiv:2201.00142  [pdf, other

    cs.NI

    Impact of Evolving Protocols and COVID-19 on Internet Traffic Shares

    Authors: Luca Schumann, Trinh Viet Doan, Tanya Shreedhar, Ricky Mok, Vaibhav Bajpai

    Abstract: The rapid deployment of new Internet protocols over the last few years and the COVID-19 pandemic more recently (2020) has resulted in a change in the Internet traffic composition. Consequently, an updated microscopic view of traffic shares is needed to understand how the Internet is evolving to capture both such shorter- and longer-term events. Toward this end, we observe traffic composition at a… ▽ More

    Submitted 15 January, 2022; v1 submitted 1 January, 2022; originally announced January 2022.

  33. arXiv:2112.09579  [pdf, ps, other

    math.OC cs.GT cs.LG

    Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for Solving Nonconvex Min-Max Problems

    Authors: Thinh T. Doan

    Abstract: There are much recent interests in solving noncovnex min-max optimization problems due to its broad applications in many areas including machine learning, networked resource allocations, and distributed optimization. Perhaps, the most popular first-order method in solving min-max optimization is the so-called simultaneous (or single-loop) gradient descent-ascent algorithm due to its simplicity in… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  34. arXiv:2110.11383  [pdf, other

    math.OC cs.LG

    Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal a… ▽ More

    Submitted 23 September, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

  35. arXiv:2109.14756  [pdf, other

    math.OC cs.LG

    A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the… ▽ More

    Submitted 23 August, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

  36. arXiv:2108.11867  [pdf, other

    cs.PL

    A Typed Programmatic Interface to Contracts on the Blockchain

    Authors: Thi Thu Ha Doan, Peter Thiemann

    Abstract: Smart contract applications on the blockchain can only reach their full potential if they integrate seamlessly with traditional software systems via a programmatic interface. This interface should provide for originating and invoking contracts as well as observing the state of the blockchain. We propose a typed API for this purpose and establish some properties of the combined system. Specifically… ▽ More

    Submitted 29 August, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: 19 pages + 8 pages appendix. Appears in APLAS 2021. Extended version with proofs in appendix

    MSC Class: 68N15

  37. arXiv:2108.11769  [pdf, other

    cs.DC cs.LG

    Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy

    Authors: Nirupam Gupta, Thinh T. Doan, Nitin Vaidya

    Abstract: We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some ag… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 14 pages, 2 figures

  38. Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

    Authors: Tung Doan, Atsuhiro Takasu

    Abstract: Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A popular algorithm for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements. Given that sequences… ▽ More

    Submitted 22 June, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

  39. arXiv:2104.01627  [pdf, ps, other

    math.OC cs.LG

    Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic Approximation under Markovian Noise

    Authors: Thinh T. Doan

    Abstract: We study the so-called two-time-scale stochastic approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. Our focus is to characterize its finite-time performance in a Markov setting, which often arises in stochastic control and reinforcement learning problems. In particular, we consider the scenario where the data in the method are generated by Markov p… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2011.01868

  40. arXiv:2102.07097  [pdf, other

    cs.LG cs.AI

    Domain Adversarial Reinforcement Learning

    Authors: Bonnie Li, Vincent François-Lavet, Thang Doan, Joelle Pineau

    Abstract: We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the d… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

  41. arXiv:2101.10506  [pdf, other

    cs.LG

    Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

    Authors: Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri

    Abstract: Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis app… ▽ More

    Submitted 20 February, 2022; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: 28 pages, 2 figures

  42. arXiv:2011.01868  [pdf, ps, other

    math.OC cs.LG eess.SY

    Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance

    Authors: Thinh T. Doan

    Abstract: Two-time-scale stochastic approximation, a generalized version of the popular stochastic approximation, has found broad applications in many areas including stochastic control, optimization, and machine learning. Despite its popularity, theoretical guarantees of this method, especially its finite-time performance, are mostly achieved for the linear case while the results for the nonlinear counterp… ▽ More

    Submitted 23 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

  43. arXiv:2010.15088  [pdf, other

    cs.LG math.OC

    Finite-Time Convergence Rates of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning

    Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg

    Abstract: We study a decentralized variant of stochastic approximation, a data-driven approach for finding the root of an operator under noisy measurements. A network of agents, each with its own operator and data observations, cooperatively find the fixed point of the aggregate operator over a decentralized communication graph. Our main contribution is to provide a finite-time analysis of this decentralize… ▽ More

    Submitted 16 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

  44. arXiv:2010.04003  [pdf, other

    cs.LG cs.AI stat.ML

    A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

    Authors: Thang Doan, Mehdi Bennani, Bogdan Mazoure, Guillaume Rabusseau, Pierre Alquier

    Abstract: Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we… ▽ More

    Submitted 25 February, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to AISTATS 2021. Keywords: continual learning, catastrophic forgetting, NTK regime, orthgonal gradient descent

    Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

  45. arXiv:2010.03691  [pdf, other

    cs.LG

    Regularized Inverse Reinforcement Learning

    Authors: Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

    Abstract: Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions. Regularized IRL applies strongly convex regularizers to the learner's policy in order to avoid the expert's behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable soluti… ▽ More

    Submitted 2 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 26 pages, 7 figures

  46. arXiv:2009.14763  [pdf, other

    cs.DC cs.MA eess.SY

    Byzantine Fault-Tolerance in Decentralized Optimization under Minimal Redundancy

    Authors: Nirupam Gupta, Thinh T. Doan, Nitin H. Vaidya

    Abstract: This paper considers the problem of Byzantine fault-tolerance in multi-agent decentralized optimization. In this problem, each agent has a local cost function. The goal of a decentralized optimization algorithm is to allow the agents to cooperatively compute a common minimum point of their aggregate cost function. We consider the case when a certain number of agents may be Byzantine faulty. Such f… ▽ More

    Submitted 30 September, 2020; originally announced September 2020.

    Comments: An extension of our prior work on fault-tolerant distributed optimization, for the server-based system architecture (https://dl.acm.org/doi/10.1145/3382734.3405748), to the more general peer-to-peer system architecture

  47. arXiv:2006.13460  [pdf, ps, other

    cs.LG math.OC stat.ML

    Local Stochastic Approximation: A Unified View of Federated Learning and Distributed Multi-Task Reinforcement Learning Algorithms

    Authors: Thinh T. Doan

    Abstract: Motivated by broad applications in reinforcement learning and federated learning, we study local stochastic approximation over a network of agents, where their goal is to find the root of an operator composed of the local operators at the agents. Our focus is to characterize the finite-time performance of this method when the data at each agent are generated from Markov processes, and hence they a… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

  48. arXiv:2006.11942  [pdf, other

    stat.ML cs.LG

    Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

    Authors: Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama

    Abstract: In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks… ▽ More

    Submitted 4 December, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

  49. arXiv:2006.07217  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement and InfoMax Learning

    Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm

    Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal repres… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  50. arXiv:2006.04338  [pdf, other

    cs.LG stat.ML

    A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

    Authors: Sihan Zeng, Aqeel Anwar, Thinh Doan, Arijit Raychowdhury, Justin Romberg

    Abstract: We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are… ▽ More

    Submitted 27 May, 2021; v1 submitted 7 June, 2020; originally announced June 2020.