Search | arXiv e-print repository

arXiv:2407.17790 [pdf, other]

Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

Authors: Van Duy Tran, Tran Xuan Hieu Le, Thi Diem Tran, Hoai Luan Pham, Vu Trung Duong Le, Tuan Hai Vu, Van Tinh Nguyen, Yasuhiko Nakashima

Abstract: Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on t… ▽ More Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation. △ Less

Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, 2 tables

arXiv:2407.17053 [pdf, other]

doi 10.1145/3674805.3686670

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Authors: Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

Abstract: Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has be… ▽ More Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area. △ Less

Submitted 3 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2407.16946 [pdf, other]

Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning

Authors: Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Mudita Shakya, Davide Di Ruscio, Massimiliano Di Penta

Abstract: In the GitHub ecosystem, workflows are used as an effective means to automate development tasks and to set up a Continuous Integration and Delivery (CI/CD pipeline). GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain workflows, avoiding reinventing the wheel and cluttering the workflow with shell commands. Properly leveraging the power of Gi… ▽ More In the GitHub ecosystem, workflows are used as an effective means to automate development tasks and to set up a Continuous Integration and Delivery (CI/CD pipeline). GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain workflows, avoiding reinventing the wheel and cluttering the workflow with shell commands. Properly leveraging the power of GitHub Actions can facilitate the development processes, enhance collaboration, and significantly impact project outcomes. To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually. These are used as an effective means to group actions sharing similar functionality. Nevertheless, while providing a practical way to execute workflows, many actions have unclear purposes, and sometimes they are not categorized. In this work, we bridge such a gap by conceptualizing Gavel, a practical solution to increasing the visibility of actions in GitHub. By leveraging the content of README.MD files for each action, we use Transformer--a deep learning algorithm--to assign suitable categories to the action. We conducted an empirical investigation and compared Gavel with a state-of-the-art baseline. The experimental results show that our proposed approach can assign categories to GitHub actions effectively, thus outperforming the state-of-the-art baseline. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: The paper has been peer-reviewed and accepted for publication in the Proceedings of the 18th International Symposium on Empirical Software Engineering and Measurement (ESEM 2024)

arXiv:2407.15812 [pdf, ps, other]

On the stability of blowup solutions to the complex Ginzburg-Landau equation in R^d

Authors: Jiajie Chen, Thomas Y. Hou, Van Tien Nguyen, Yixuan Wang

Abstract: Building upon the idea in \cite{HNWarXiv24}, we establish stability of the type-I blowup with log correction for the complex Ginzburg-Landau equation. In the amplitude-phase representation, a generalized dynamic rescaling formulation is introduced, with modulation parameters capturing the spatial translation and rotation symmetries of the equation and novel additional modulation parameters perturb… ▽ More Building upon the idea in \cite{HNWarXiv24}, we establish stability of the type-I blowup with log correction for the complex Ginzburg-Landau equation. In the amplitude-phase representation, a generalized dynamic rescaling formulation is introduced, with modulation parameters capturing the spatial translation and rotation symmetries of the equation and novel additional modulation parameters perturbing the scaling symmetry. This new formulation provides enough degrees of freedom to impose normalization conditions on the rescaled solution, completely eliminating the unstable and neutrally stable modes of the linearized operator around the blowup profile. It enables us to establish the full stability of the blowup by enforcing vanishing conditions via the choice of normalization and using weighted energy estimates, without relying on a topological argument or a spectrum analysis. The log correction for the blowup rate is captured by the energy estimates and refined estimates of the modulation parameters. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 38 pages

MSC Class: 35Q56

arXiv:2407.15468 [pdf, ps, other]

Efficient influence functions for Sobol' indices under two designs of experiments

Authors: Thierry Klein, Agnès Lagnoux, Paul Rochet, Thi Mong Ngoc Nguyen

Abstract: In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings. In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.13904 [pdf, other]

In defense of MAR over latent ignorability (or latent MAR) for outcome missingness in studying principal causal effects: a causal graph view

Authors: Trang Quynh Nguyen

Abstract: This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variable… ▽ More This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variables. The literature has focused on methods assuming LMAR (usually supplemented with a more specific assumption about the missingness), without considering the theoretical plausibility and necessity of LMAR. In this paper, we devise a way to represent principal stratum in causal graphs, and use causal graphs to examine this assumption. We find that LMAR is harder to satisfy than MAR, and for the purpose of breaking the dependence between the outcome and its missingness, no benefit is gained from conditioning on principal stratum on top of conditioning on observed variables. This finding has an important implication: MAR should be preferred over LMAR. This is convenient because MAR is easier to handle and (unlike LMAR) if MAR is assumed no additional assumption is needed. We thus turn to focus on the plausibility of MAR and its implications, with a view to facilitate appropriate use of this assumption. We clarify conditions on the causal structure and on auxiliary variables (if available) that need to hold for MAR to hold, and we use MAR to recover effect identification under two dominant identification assumptions (exclusion restriction and principal ignorability). We briefly comment on cases where MAR does not hold. In terms of broader connections, most of the MAR findings are also relevant to classic instrumental variable analysis that targets the local average treatment effect; and the LMAR finding suggests general caution with assumptions that condition on principal stratum. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13842 [pdf, other]

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Authors: Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Abstract: 6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection i… ▽ More 6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything. △ Less

Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.12094 [pdf, other]

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Authors: Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen

Abstract: We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these ga… ▽ More We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification} △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: accepted to INTERSPEECH 2024

arXiv:2407.12064 [pdf, other]

LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

Authors: Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy

Abstract: Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To… ▽ More Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To the best of our knowledge, this is the first study to utilize vision-language models for the novel task of joint localization and classification in medical images. Besides, we are pioneers in providing baselines for disease localization in chest X-rays. Finally, we set new state-of-the-art performance in the image classification task on the well-benchmarked VinDr-CXR dataset. All code and models are publicly available online: https://github.com/leduckhai/LiteGPT △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Preprint, 19 pages

arXiv:2407.12034 [pdf, other]

Understanding Transformers via N-gram Statistics

Authors: Timothy Nguyen

Abstract: Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on their context in terms of simple template functions. This paper takes a first step in this direction by considering families of functions (i.e. rules) formed out… ▽ More Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on their context in terms of simple template functions. This paper takes a first step in this direction by considering families of functions (i.e. rules) formed out of simple N-gram based statistics of the training data. By studying how well these rulesets approximate transformer predictions, we obtain a variety of novel discoveries: a simple method to detect overfitting during training without using a holdout set, a quantitative measure of how transformers progress from learning simple to more complex statistical rules over the course of training, a model-variance criterion governing when transformer predictions tend to be described by N-gram rules, and insights into how well transformers can be approximated by N-gram rulesets in the limit where these rulesets become increasingly complex. In this latter direction, we find that for 78% of LLM next-token distributions on TinyStories, their top-1 predictions agree with those provided by our N-gram rulesets. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.11771 [pdf, other]

XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach

Authors: Truong Thanh Hung Nguyen, Phuc Truong Loc Nguyen, Hung Cao

Abstract: Recent advancements in deep learning have significantly improved visual quality inspection and predictive maintenance within industrial settings. However, deploying these technologies on low-resource edge devices poses substantial challenges due to their high computational demands and the inherent complexity of Explainable AI (XAI) methods. This paper addresses these challenges by introducing a no… ▽ More Recent advancements in deep learning have significantly improved visual quality inspection and predictive maintenance within industrial settings. However, deploying these technologies on low-resource edge devices poses substantial challenges due to their high computational demands and the inherent complexity of Explainable AI (XAI) methods. This paper addresses these challenges by introducing a novel XAI-integrated Visual Quality Inspection framework that optimizes the deployment of semantic segmentation models on low-resource edge devices. Our framework incorporates XAI and the Large Vision Language Model to deliver human-centered interpretability through visual and textual explanations to end-users. This is crucial for end-user trust and model interpretability. We outline a comprehensive methodology consisting of six fundamental modules: base model fine-tuning, XAI-based explanation generation, evaluation of XAI approaches, XAI-guided data augmentation, development of an edge-compatible model, and the generation of understandable visual and textual explanations. Through XAI-guided data augmentation, the enhanced model incorporating domain expert knowledge with visual and textual explanations is successfully deployed on mobile devices to support end-users in real-world scenarios. Experimental results showcase the effectiveness of the proposed framework, with the mobile model achieving competitive accuracy while significantly reducing model size. This approach paves the way for the broader adoption of reliable and interpretable AI tools in critical industrial applications, where decisions must be both rapid and justifiable. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 28 pages, preprint submitted to Information Fusion journal

arXiv:2407.11525 [pdf, ps, other]

On a Theorem of Nathanson on Diophantine Approximation

Authors: Jaroslav Hančl, Tho Phuoc Nguyen

Abstract: In 1974, M. B. Nathanson proved that every irrational number $α$ represented by a simple continued fraction with infinitely many elements greater than or equal to $k$ is approximable by an infinite number of rational numbers $p/q$ satisfying $|α-p/q|<1/(\sqrt{k^2+4}q^2)$. In this paper we refine this result. In 1974, M. B. Nathanson proved that every irrational number $α$ represented by a simple continued fraction with infinitely many elements greater than or equal to $k$ is approximable by an infinite number of rational numbers $p/q$ satisfying $|α-p/q|<1/(\sqrt{k^2+4}q^2)$. In this paper we refine this result. △ Less

Submitted 16 July, 2024; originally announced July 2024.

MSC Class: 11J82; 11A55

arXiv:2407.11194 [pdf, other]

AstroMLab 1: Who Wins Astronomy Jeopardy!?

Authors: Yuan-Sen Ting, Tuan Dung Nguyen, Tirthankar Ghosal, Rui Pan, Hardik Arora, Zechang Sun, Tijmen de Haan, Nesar Ramachandra, Azton Wells, Sandeep Madireddy, Alberto Accomazzi

Abstract: We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and asse… ▽ More We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and assesses response calibration, crucial for potential deployment in research environments. Claude-3.5-Sonnet outperforms competitors by up to 4.6 percentage points, achieving 85.0% accuracy. For proprietary models, we observed a universal reduction in cost every 3-to-12 months to achieve similar score in this particular astronomy benchmark. Open-source models have rapidly improved, with LLaMA-3-70b (80.6%) and Qwen-2-72b (77.7%) now competing with some of the best proprietary models. We identify performance variations across topics, with non-English-focused models generally struggling more in exoplanet-related fields, stellar astrophysics, and instrumentation related questions. These challenges likely stem from less abundant training data, limited historical context, and rapid recent developments in these areas. This pattern is observed across both open-weights and proprietary models, with regional dependencies evident, highlighting the impact of training data diversity on model performance in specialized scientific domains. Top-performing models demonstrate well-calibrated confidence, with correlations above 0.9 between confidence and correctness, though they tend to be slightly underconfident. The development for fast, low-cost inference of open-weights models presents new opportunities for affordable deployment in astronomy. The rapid progress observed suggests that LLM-driven research in astronomy may become feasible in the near future. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 45 pages, 12 figures, 7 tables. Submitted to ApJ. Comments welcome. AstroMLab homepage: https://astromlab.org/

arXiv:2407.11166 [pdf, ps, other]

On a Theorem of Legendre on Diophantine Approximation

Authors: Jaroslav Hančl, Tho Phuoc Nguyen

Abstract: Legendre's theorem states that every irreducible fraction $\frac{p}{q}$ which satisfies the inequality $\left |α-\frac{p}{q} \right | < \frac{1}{2q^2}$ is convergent to $α$. Later Barbolosi and Jager improved this theorem. In this paper we refine these results. Legendre's theorem states that every irreducible fraction $\frac{p}{q}$ which satisfies the inequality $\left |α-\frac{p}{q} \right | < \frac{1}{2q^2}$ is convergent to $α$. Later Barbolosi and Jager improved this theorem. In this paper we refine these results. △ Less

Submitted 15 July, 2024; originally announced July 2024.

MSC Class: 11J82; 11A55

arXiv:2407.11078 [pdf, other]

Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator

Authors: Thinh Nguyen, Khoa D Doan, Binh T. Nguyen, Danh Le-Phuoc, Kok-Seng Wong

Abstract: Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant perfor… ▽ More Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant performance declines on earlier tasks. Recent works, based on generative models, produce synthetic images to help mitigate this issue across all classes, but these approaches' testing accuracy on previous classes is still much lower than recent classes, i.e., having better plasticity than stability. To overcome these issues, this paper presents Federated Global Twin Generator (FedGTG), an FCIL framework that exploits privacy-preserving generative-model training on the global side without accessing client data. Specifically, the server trains a data generator and a feature generator to create two types of information from all seen classes, and then it sends the synthetic data to the client side. The clients then use feature-direction-controlling losses to make the local models retain knowledge and learn new tasks well. We extensively analyze the robustness of FedGTG on natural images, as well as its ability to converge to flat local minima and achieve better-predicting confidence (calibration). Experimental results on CIFAR-10, CIFAR-100, and tiny-ImageNet demonstrate the improvements in accuracy and forgetting measures of FedGTG compared to previous frameworks. △ Less

Submitted 13 July, 2024; originally announced July 2024.

MSC Class: 68T07 (Primary); 68T45 (Secondary)

arXiv:2407.10227 [pdf, other]

doi 10.1109/ICST60714.2024.00017

KAT: Dependency-aware Automated API Testing with Large Language Models

Authors: Tri Le, Thien Tran, Duy Cao, Vy Le, Tien Nguyen, Vu Nguyen

Abstract: API testing has increasing demands for software companies. Prior API testing tools were aware of certain types of dependencies that needed to be concise between operations and parameters. However, their approaches, which are mostly done manually or using heuristic-based algorithms, have limitations due to the complexity of these dependencies. In this paper, we present KAT (Katalon API Testing), a… ▽ More API testing has increasing demands for software companies. Prior API testing tools were aware of certain types of dependencies that needed to be concise between operations and parameters. However, their approaches, which are mostly done manually or using heuristic-based algorithms, have limitations due to the complexity of these dependencies. In this paper, we present KAT (Katalon API Testing), a novel AI-driven approach that leverages the large language model GPT in conjunction with advanced prompting techniques to autonomously generate test cases to validate RESTful APIs. Our comprehensive strategy encompasses various processes to construct an operation dependency graph from an OpenAPI specification and to generate test scripts, constraint validation scripts, test cases, and test data. Our evaluation of KAT using 12 real-world RESTful services shows that it can improve test coverage, detect more undocumented status codes, and reduce false positives in these services in comparison with a state-of-the-art automated test generation tool. These results indicate the effectiveness of using the large language model for generating test scripts and data for API testing. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ICST 2024

arXiv:2407.09740 [pdf, ps, other]

doi 10.1063/5.0181217

Ferroelectric AlBN Films by Molecular Beam Epitaxy

Authors: Chandrashekhar Savant, Ved Gund, Kazuki Nomoto, Takuya Maeda, Shubham Jadhav, Joongwon Lee, Madhav Ramesh, Eungkyun Kim, Thai-Son Nguyen, Yu-Hsin Chen, Joseph Casamento, Farhan Rana, Amit Lal, Huili, Xing, Debdeep Jena

Abstract: We report the properties of molecular beam epitaxy deposited AlBN thin films on a recently developed epitaxial nitride metal electrode Nb2N. While a control AlN thin film exhibits standard capacitive behavior, distinct ferroelectric switching is observed in the AlBN films with increasing Boron mole fraction. The measured remnant polarization Pr of 15 uC/cm2 and coercive field Ec of 1.45 MV/cm in t… ▽ More We report the properties of molecular beam epitaxy deposited AlBN thin films on a recently developed epitaxial nitride metal electrode Nb2N. While a control AlN thin film exhibits standard capacitive behavior, distinct ferroelectric switching is observed in the AlBN films with increasing Boron mole fraction. The measured remnant polarization Pr of 15 uC/cm2 and coercive field Ec of 1.45 MV/cm in these films are smaller than those recently reported on films deposited by sputtering, due to incomplete wake-up, limited by current leakage. Because AlBN preserves the ultrawide energy bandgap of AlN compared to other nitride hi-K dielectrics and ferroelectrics, and it can be epitaxially integrated with GaN and AlN semiconductors, its development will enable several opportunities for unique electronic, photonic, and memory devices. △ Less

Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: DOI: 10.1063/5.0181217

arXiv:2407.09281 [pdf, other]

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

Authors: Thuy Ngoc Nguyen, Kasturi Jamale, Cleotilde Gonzalez

Abstract: Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging th… ▽ More Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns. △ Less

Submitted 5 August, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09035 [pdf, other]

GPC: Generative and General Pathology Image Classifier

Authors: Anh Tien Nguyen, Jin Tae Kwak

Abstract: Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, t… ▽ More Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, training resources, and cost. Moreover, transferring arbitrary task-specific model to another task is still a challenging problem. Herein, we propose a task-agnostic generative and general pathology image classifier, so called GPC, that aims at learning from diverse kinds of pathology images and conducting numerous classification tasks in a unified model. GPC, equipped with a convolutional neural network and a Transformer-based language model, maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts via the image-to-text classification mechanism. We evaluate GPC on six datasets for four different pathology image classification tasks. Experimental results show that GPC holds considerable potential for developing an effective and efficient universal model for pathology image analysis. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: MICCAI-MedAGI 2023 (Best Paper Honorable Mention)

arXiv:2407.09030 [pdf, other]

CAMP: Continuous and Adaptive Learning Model in Pathology

Authors: Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak

Abstract: There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pa… ▽ More There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pathology image classification. CAMP is a generative, efficient, and adaptive classification model that can continuously adapt to any classification task by leveraging pathology-specific prior knowledge and learning taskspecific knowledge with minimal computational cost and without forgetting the knowledge from the existing tasks. We evaluated CAMP on 22 datasets, including 1,171,526 patches and 11,811 pathology slides, across 17 classification tasks. CAMP achieves state-of-theart classification performance on a wide range of datasets and tasks at both patch- and slide-levels and reduces up to 94% of computation time and 85% of storage memory in comparison to the conventional classification models. Our results demonstrate that CAMP can offer a fundamental transformation in pathology image classification, paving the way for the fully digitized and computerized pathology practice. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2407.08872 [pdf, other]

Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

Authors: Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

Abstract: This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause rea… ▽ More This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause reappearing objects to be initialized as new tracks, especially after long periods of being undetected. In occlusion handling, the filter's efficacy is dictated by trade-offs between the sophistication of the occlusion model and computational demand. Our contribution is a novel modeling method that exploits object features to address reappearing objects whilst maintaining a linear complexity in the number of detections. Moreover, to improve the filter's occlusion handling, we propose a fuzzy detection model that takes into consideration the overlapping areas between tracks and their sizes. We also develop a fast version of the filter to further reduce the computational time. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08470 [pdf, other]

Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer

Authors: Thien-Qua T. Nguyen, Hieu-Nghia Nguyen, Thanh-Hieu Bui, Thien B. Nguyen-Tat, Vuong M. Ngo

Abstract: This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scan… ▽ More This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scans, emphasizing how elements rely on each other across an extended spatial range. The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures, including location, size, and boundaries. Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches, achieving Dice score of 82.0%, 81.5%, 89.0% for Enhancing Tumor, Tumor Core and Whole Tumor, respectively, on BraTS2019. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 6 pages, 7 figures

arXiv:2407.07917 [pdf, other]

Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape

Authors: Tuan Nguyen, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong

Abstract: Despite the promise of Federated Learning (FL) for privacy-preserving model training on distributed data, it remains susceptible to backdoor attacks. These attacks manipulate models by embedding triggers (specific input patterns) in the training data, forcing misclassification as predefined classes during deployment. Traditional single-trigger attacks and recent work on cooperative multiple-trigge… ▽ More Despite the promise of Federated Learning (FL) for privacy-preserving model training on distributed data, it remains susceptible to backdoor attacks. These attacks manipulate models by embedding triggers (specific input patterns) in the training data, forcing misclassification as predefined classes during deployment. Traditional single-trigger attacks and recent work on cooperative multiple-trigger attacks, where clients collaborate, highlight limitations in attack realism due to coordination requirements. We investigate a more alarming scenario: non-cooperative multiple-trigger attacks. Here, independent adversaries introduce distinct triggers targeting unique classes. These parallel attacks exploit FL's decentralized nature, making detection difficult. Our experiments demonstrate the alarming vulnerability of FL to such attacks, where individual backdoors can be successfully learned without impacting the main task. This research emphasizes the critical need for robust defenses against diverse backdoor attacks in the evolving FL landscape. While our focus is on empirical analysis, we believe it can guide backdoor research toward more realistic settings, highlighting the crucial role of FL in building robust defenses against diverse backdoor threats. The code is available at \url{https://anonymous.4open.science/r/nba-980F/}. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.07472 [pdf, other]

Rectifier: Code Translation with Corrector via LLMs

Authors: Xin Yin, Chao Ni, Tien N. Nguyen, Shaohua Wang, Xiaohu Yang

Abstract: Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translati… ▽ More Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.03109, arXiv:2302.03908 by other authors

arXiv:2407.07421 [pdf, other]

doi 10.1109/TNET.2024.3423780

Federated PCA on Grassmann Manifold for IoT Anomaly Detection

Authors: Tung-Anh Nguyen, Long Tan Le, Tuan Dung Nguyen, Wei Bao, Suranga Seneviratne, Choong Seon Hong, Nguyen H. Tran

Abstract: With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with hi… ▽ More With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework, FedPCA, that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FEDPE in Euclidean space and FEDPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a subsampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to nonlinear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted for publication at IEEE/ACM Transactions on Networking

Journal ref: IEEE/ACM Transactions on Networking On page(s): 1-16 Print ISSN: 1063-6692 Online ISSN: 1558-2566 Digital Object Identifier: 10.1109/TNET.2024.3423780

arXiv:2407.07369 [pdf, ps, other]

Viscosity estimation for 2D pipe flows I. Construction, consistency, asymptotic normality

Authors: Thi Hien Nguyen, Armen Shirikyan

Abstract: We consider the motion of incompressible viscous fluid in a rectangle, imposing the periodicity condition in one direction and the no-slip boundary condition in the other. Assuming that the flow is subject to an external random force, white in time and regular in space, we construct an estimator for the viscosity using only observations of the enstrophy. The goal of the paper is to prove that the… ▽ More We consider the motion of incompressible viscous fluid in a rectangle, imposing the periodicity condition in one direction and the no-slip boundary condition in the other. Assuming that the flow is subject to an external random force, white in time and regular in space, we construct an estimator for the viscosity using only observations of the enstrophy. The goal of the paper is to prove that the estimator is strongly consistent and asymptotically normal. The proof of consistency is based on the explicit formula for the estimator and some bounds for trajectories, while that of asymptotic normality uses in addition mixing properties of the Navier-Stokes flow. △ Less

Submitted 10 July, 2024; originally announced July 2024.

MSC Class: 35Q30; 37L55; 62M05; 76D06

arXiv:2407.07360 [pdf, other]

Towards a text-based quantitative and explainable histopathology image analysis

Authors: Anh Tien Nguyen, Trinh Thi Le Vuong, Jin Tae Kwak

Abstract: Recently, vision-language pre-trained models have emerged in computational pathology. Previous works generally focused on the alignment of image-text pairs via the contrastive pre-training paradigm. Such pre-trained models have been applied to pathology image classification in zero-shot learning or transfer learning fashion. Herein, we hypothesize that the pre-trained vision-language models can be… ▽ More Recently, vision-language pre-trained models have emerged in computational pathology. Previous works generally focused on the alignment of image-text pairs via the contrastive pre-training paradigm. Such pre-trained models have been applied to pathology image classification in zero-shot learning or transfer learning fashion. Herein, we hypothesize that the pre-trained vision-language models can be utilized for quantitative histopathology image analysis through a simple image-to-text retrieval. To this end, we propose a Text-based Quantitative and Explainable histopathology image analysis, which we call TQx. Given a set of histopathology images, we adopt a pre-trained vision-language model to retrieve a word-of-interest pool. The retrieved words are then used to quantify the histopathology images and generate understandable feature embeddings due to the direct mapping to the text description. To evaluate the proposed method, the text-based embeddings of four histopathology image datasets are utilized to perform clustering and classification tasks. The results demonstrate that TQx is able to quantify and analyze histopathology images that are comparable to the prevalent visual models in computational pathology. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: MICCAI 2024 - Early acceptance (Top 11%)

arXiv:2407.06826 [pdf, other]

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Authors: Thanh-Dat Nguyen, Tung Do-Viet, Hung Nguyen-Duy, Tuan-Hai Luu, Hung Le, Bach Le, Patanamon, Thongtanunam

Abstract: Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture… ▽ More Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture the complexity of VRD domain, we design a domain-specific language (DSL) to capture spatial and textual relations to describe the synthesized programs. Along with this, we also derive a new synthesis algorithm utilizing frequent spatial relations, search space pruning, and a combination of positive, negative, and exclusive programs to improve coverage. We evaluate VRDSynth on the FUNSD and XFUND benchmarks for semantic entity linking, consisting of 1,592 forms in 8 languages. VRDSynth outperforms state-of-the-art pre-trained models (LayoutXLM, InfoXLMBase, and XLMRobertaBase) in 5, 6, and 7 out of 8 languages, respectively, improving the F1 score by 42% over LayoutXLM in English. To test the extensibility of the model, we further improve VRDSynth with automated table recognition, creating VRDSynth(Table), and compare it with extended versions of the pre-trained models, InfoXLM(Large) and XLMRoberta(Large). VRDSynth(Table) outperforms these baselines in 4 out of 8 languages and in average F1 score. VRDSynth also significantly reduces memory footprint (1M and 380MB vs. 1.48GB and 3GB for LayoutXLM) while maintaining similar time efficiency. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted in ISSTA'24

arXiv:2407.06581 [pdf, other]

Vision language models are blind

Authors: Pooyan Rahmanzadehgervi, Logan Bolton, Mohammad Reza Taesiri, Anh Totti Nguyen

Abstract: While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, are powering various image-text applications and scoring high on many vision-understanding benchmarks, we find that they are surprisingly still struggling with low-level vision tasks that are easy to humans. Specifically, on BlindTest, our suite of 7 very simple tasks such as identifying (a) whether two c… ▽ More While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, are powering various image-text applications and scoring high on many vision-understanding benchmarks, we find that they are surprisingly still struggling with low-level vision tasks that are easy to humans. Specifically, on BlindTest, our suite of 7 very simple tasks such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting circles in an Olympic-like logo, four state-of-the-art VLMs are only 58.57% accurate on average. Claude 3.5 Sonnet performs the best at 74.94% accuracy, but this is still far from the human expected accuracy of 100%. Across different image resolutions and line widths, VLMs consistently struggle with tasks that require precise spatial information and recognizing geometric primitives that overlap or are close together. Code and data are available at: https://vlmsareblind.github.io △ Less

Submitted 25 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06142 [pdf, ps, other]

Delay-Aware Robust Edge Network Hardening Under Decision-Dependent Uncertainty

Authors: Jiaming Cheng, Duong Thuy Anh Nguyen, Ni Trieu, Duong Tung Nguyen

Abstract: Edge computing promises to offer low-latency and ubiquitous computation to numerous devices at the network edge. For delay-sensitive applications, link delays can have a direct impact on service quality. These delays can fluctuate drastically over time due to various factors such as network congestion, changing traffic conditions, cyberattacks, component failures, and natural disasters. Thus, it i… ▽ More Edge computing promises to offer low-latency and ubiquitous computation to numerous devices at the network edge. For delay-sensitive applications, link delays can have a direct impact on service quality. These delays can fluctuate drastically over time due to various factors such as network congestion, changing traffic conditions, cyberattacks, component failures, and natural disasters. Thus, it is crucial to efficiently harden the edge network to mitigate link delay variation as well as ensure a stable and improved user experience. To this end, we propose a novel robust model for optimal edge network hardening, considering the link delay uncertainty. Departing from the existing literature that treats uncertainties as exogenous, our model incorporates an endogenous uncertainty set to properly capture the impact of hardening and workload allocation decisions on link delays. However, the endogenous set introduces additional complexity to the problem due to the interdependence between decisions and uncertainties. We present two efficient methods to transform the problem into a solvable form. Extensive numerical results are shown to demonstrate the effectiveness of the proposed approach. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 14 pages, 18 figures

arXiv:2407.06045 [pdf, other]

OpenCIL: Benchmarking Out-of-Distribution Detection in Class-Incremental Learning

Authors: Wenjun Miao, Guansong Pang, Trong-Tung Nguyen, Ruohang Fang, Jin Zheng, Xiao Bai

Abstract: Class incremental learning (CIL) aims to learn a model that can not only incrementally accommodate new classes, but also maintain the learned knowledge of old classes. Out-of-distribution (OOD) detection in CIL is to retain this incremental learning ability, while being able to reject unknown samples that are drawn from different distributions of the learned classes. This capability is crucial to… ▽ More Class incremental learning (CIL) aims to learn a model that can not only incrementally accommodate new classes, but also maintain the learned knowledge of old classes. Out-of-distribution (OOD) detection in CIL is to retain this incremental learning ability, while being able to reject unknown samples that are drawn from different distributions of the learned classes. This capability is crucial to the safety of deploying CIL models in open worlds. However, despite remarkable advancements in the respective CIL and OOD detection, there lacks a systematic and large-scale benchmark to assess the capability of advanced CIL models in detecting OOD samples. To fill this gap, in this study we design a comprehensive empirical study to establish such a benchmark, named $\textbf{OpenCIL}$. To this end, we propose two principled frameworks for enabling four representative CIL models with 15 diverse OOD detection methods, resulting in 60 baseline models for OOD detection in CIL. The empirical evaluation is performed on two popular CIL datasets with six commonly-used OOD datasets. One key observation we find through our comprehensive evaluation is that the CIL models can be severely biased towards the OOD samples and newly added classes when they are exposed to open environments. Motivated by this, we further propose a new baseline for OOD detection in CIL, namely Bi-directional Energy Regularization ($\textbf{BER}$), which is specially designed to mitigate these two biases in different CIL models by having energy regularization on both old and new classes. Its superior performance is justified in our experiments. All codes and datasets are open-source at https://github.com/mala-lab/OpenCIL. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05469 [pdf, other]

Smart Camera Parking System With Auto Parking Spot Detection

Authors: Tuan T. Nguyen, Mina Sartipi

Abstract: Given the rising urban population and the consequential rise in traffic congestion, the implementation of smart parking systems has emerged as a critical matter of concern. Smart parking solutions use cameras, sensors, and algorithms like computer vision to find available parking spaces. This method improves parking place recognition, reduces traffic and pollution, and optimizes travel time. In re… ▽ More Given the rising urban population and the consequential rise in traffic congestion, the implementation of smart parking systems has emerged as a critical matter of concern. Smart parking solutions use cameras, sensors, and algorithms like computer vision to find available parking spaces. This method improves parking place recognition, reduces traffic and pollution, and optimizes travel time. In recent years, computer vision-based approaches have been widely used. However, most existing studies rely on manually labeled parking spots, which has implications for the cost and practicality of implementation. To solve this problem, we propose a novel approach PakLoc, which automatically localize parking spots. Furthermore, we present the PakSke module, which automatically adjust the rotation and the size of detected bounding box. The efficacy of our proposed methodology on the PKLot dataset results in a significant reduction in human labor of 94.25\%. Another fundamental aspect of a smart parking system is its capacity to accurately determine and indicate the state of parking spots within a parking lot. The conventional approach involves employing classification techniques to forecast the condition of parking spots based on the bounding boxes derived from manually labeled grids. In this study, we provide a novel approach called PakSta for identifying the state of parking spots automatically. Our method utilizes object detector from PakLoc to simultaneously determine the occupancy status of all parking lots within a video frame. Our proposed method PakSta exhibits a competitive performance on the PKLot dataset when compared to other classification methods. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05452 [pdf, other]

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

Authors: Tuan T. Nguyen, Phan Le, Yasir Hassan, Mina Sartipi

Abstract: In this paper, we present the submission to the 5th Annual Smoky Mountains Computational Sciences Data Challenge, Challenge 3. This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera. We concentrate in building a robust model which performs well across various domains of different outdoor situations such as sunny, snowy,… ▽ More In this paper, we present the submission to the 5th Annual Smoky Mountains Computational Sciences Data Challenge, Challenge 3. This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera. We concentrate in building a robust model which performs well across various domains of different outdoor situations such as sunny, snowy, rainy, etc. In particular, our method is developed with two main directions: model development and domain adaptation. In model development, we use the High Resolution Network (HRNet) as the baseline. Then, this baseline s result is processed by two coarse-to-fine models: Object-Contextual Representations (OCR) and Hierarchical Multi-scale Attention (HMA) to get the better robust feature. For domain adaption, we implement the Domain-Based Batch Normalization (DNB) to reduce the distribution shift from diverse domains. Our proposed method yield 81.259 mean intersection-over-union (mIoU) in validation set. This paper studies the effectiveness of employing real-world and synthetic data to handle the domain adaptation in semantic segmentation problem. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 13 pages

arXiv:2407.05205 [pdf, other]

The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering

Authors: Zhangying He, Thomas Nguyen, Tahereh Miari, Mehrdad Aliasgari, Setareh Rafatirad, Hossein Sayadi

Abstract: Artificial Intelligence (AI), with ChatGPT as a prominent example, has recently taken center stage in various domains including higher education, particularly in Computer Science and Engineering (CSE). The AI revolution brings both convenience and controversy, offering substantial benefits while lacking formal guidance on their application. The primary objective of this work is to comprehensively… ▽ More Artificial Intelligence (AI), with ChatGPT as a prominent example, has recently taken center stage in various domains including higher education, particularly in Computer Science and Engineering (CSE). The AI revolution brings both convenience and controversy, offering substantial benefits while lacking formal guidance on their application. The primary objective of this work is to comprehensively analyze the pedagogical potential of ChatGPT in CSE education, understanding its strengths and limitations from the perspectives of educators and learners. We employ a systematic approach, creating a diverse range of educational practice problems within CSE field, focusing on various subjects such as data science, programming, AI, machine learning, networks, and more. According to our examinations, certain question types, like conceptual knowledge queries, typically do not pose significant challenges to ChatGPT, and thus, are excluded from our analysis. Alternatively, we focus our efforts on developing more in-depth and personalized questions and project-based tasks. These questions are presented to ChatGPT, followed by interactions to assess its effectiveness in delivering complete and meaningful responses. To this end, we propose a comprehensive five-factor reliability analysis framework to evaluate the responses. This assessment aims to identify when ChatGPT excels and when it faces challenges. Our study concludes with a correlation analysis, delving into the relationships among subjects, task types, and limiting factors. This analysis offers valuable insights to enhance ChatGPT's utility in CSE education, providing guidance to educators and students regarding its reliability and efficacy. △ Less

Submitted 23 April, 2024; originally announced July 2024.

Comments: conference, 13 pages

arXiv:2407.04992 [pdf, other]

Scalable Variational Causal Discovery Unconstrained by Acyclicity

Authors: Nu Hoang, Bao Duong, Thin Nguyen

Abstract: Bayesian causal discovery offers the power to quantify epistemic uncertainties among a broad range of structurally diverse causal theories potentially explaining the data, represented in forms of directed acyclic graphs (DAGs). However, existing methods struggle with efficient DAG sampling due to the complex acyclicity constraint. In this study, we propose a scalable Bayesian approach to effective… ▽ More Bayesian causal discovery offers the power to quantify epistemic uncertainties among a broad range of structurally diverse causal theories potentially explaining the data, represented in forms of directed acyclic graphs (DAGs). However, existing methods struggle with efficient DAG sampling due to the complex acyclicity constraint. In this study, we propose a scalable Bayesian approach to effectively learn the posterior distribution over causal graphs given observational data thanks to the ability to generate DAGs without explicitly enforcing acyclicity. Specifically, we introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph by mapping an unconstrained distribution of implicit topological orders to a distribution over DAGs. Given this efficient DAG sampling scheme, we are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain, which can be learned via the variational inference framework. Extensive empirical experiments on both simulated and real datasets demonstrate the superior performance of the proposed model compared to several state-of-the-art baselines. △ Less

Submitted 28 August, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted at ECAI 2024

arXiv:2407.04980 [pdf, other]

Enabling Causal Discovery in Post-Nonlinear Models with Normalizing Flows

Authors: Nu Hoang, Bao Duong, Thin Nguyen

Abstract: Post-nonlinear (PNL) causal models stand out as a versatile and adaptable framework for modeling intricate causal relationships. However, accurately capturing the invertibility constraint required in PNL models remains challenging in existing studies. To address this problem, we introduce CAF-PoNo (Causal discovery via Normalizing Flows for Post-Nonlinear models), harnessing the power of the norma… ▽ More Post-nonlinear (PNL) causal models stand out as a versatile and adaptable framework for modeling intricate causal relationships. However, accurately capturing the invertibility constraint required in PNL models remains challenging in existing studies. To address this problem, we introduce CAF-PoNo (Causal discovery via Normalizing Flows for Post-Nonlinear models), harnessing the power of the normalizing flows architecture to enforce the crucial invertibility constraint in PNL models. Through normalizing flows, our method precisely reconstructs the hidden noise, which plays a vital role in cause-effect identification through statistical independence testing. Furthermore, the proposed approach exhibits remarkable extensibility, as it can be seamlessly expanded to facilitate multivariate causal discovery via causal order identification, empowering us to efficiently unravel complex causal relationships. Extensive experimental evaluations on both simulated and real datasets consistently demonstrate that the proposed method outperforms several state-of-the-art approaches in both bivariate and multivariate causal discovery tasks. △ Less

Submitted 28 August, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: Acepted at ECAI 2024

arXiv:2407.04489 [pdf, other]

Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Authors: Duy M. H. Nguyen, An T. Le, Trung Q. Nguyen, Nghiem T. Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag

Abstract: Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c… ▽ More Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Version 1

arXiv:2407.04408 [pdf, ps, other]

Hybrid Receiver Design for Massive MIMO-OFDM with Low-Resolution ADCs and Oversampling

Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

Abstract: Low-resolution analog-to-digital converters (ADCs) and hybrid beamforming have emerged as efficient solutions to reduce power consumption with satisfactory spectral efficiency (SE) in massive multiple-input multiple-output (MIMO) systems. In this paper, we investigate the performance of a hybrid receiver in uplink massive MIMO orthogonal frequency-division multiplexing (OFDM) systems with low-reso… ▽ More Low-resolution analog-to-digital converters (ADCs) and hybrid beamforming have emerged as efficient solutions to reduce power consumption with satisfactory spectral efficiency (SE) in massive multiple-input multiple-output (MIMO) systems. In this paper, we investigate the performance of a hybrid receiver in uplink massive MIMO orthogonal frequency-division multiplexing (OFDM) systems with low-resolution ADCs and oversampling. Considering both the temporal and spatial correlation of the quantization distortion (QD), we derive a closed-form approximation of the frequency-domain QD covariance matrix, which facilitates the evaluation of the system SE. Then we jointly design the analog and baseband combiners to maximize the SE. The formulated problem is significantly challenging due to the constant-modulus constraint of the analog combiner and its coupling with the digital one. To overcome the challenges, we transform the objective function into an equivalent but more tractable form and then iteratively update the analog and digital combiner. Numerical simulations verify the superiority of the proposed algorithm compared to the considered benchmarks and show the resilience of the hybrid receiver to beam squint for low-resolution systems. Furthermore, the results show that the proposed hybrid receiver design with oversampling can achieve significantly higher energy efficiency compared to the digital one. △ Less

Submitted 9 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures, submitted to GlobeCom 2024

arXiv:2407.03796 [pdf, ps, other]

Joint Beamforming Design and Bit Allocation in Massive MIMO with Resolution-Adaptive ADCs

Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

Abstract: Low-resolution analog-to-digital converters (ADCs) have emerged as a promising technology for reducing power consumption and complexity in massive multiple-input multiple-output (MIMO) systems while maintaining satisfactory spectral and energy efficiencies (SE/EE). In this work, we first identify the essential properties of optimal quantization and leverage them to derive a closed-form approximati… ▽ More Low-resolution analog-to-digital converters (ADCs) have emerged as a promising technology for reducing power consumption and complexity in massive multiple-input multiple-output (MIMO) systems while maintaining satisfactory spectral and energy efficiencies (SE/EE). In this work, we first identify the essential properties of optimal quantization and leverage them to derive a closed-form approximation of the covariance matrix of the quantization distortion. The theoretical finding facilitates the system SE analysis in the presence of low-resolution ADCs. We then focus on the joint optimization of the transmit-receive beamforming and bit allocation to maximize the SE under constraints on the transmit power and the total number of active ADC bits. To solve the resulting mixed-integer problem, we first develop an efficient beamforming design for fixed ADC resolutions. Then, we propose a low-complexity heuristic algorithm to iteratively optimize the ADC resolutions and beamforming matrices. Numerical results for a $64 \times 64$ MIMO system demonstrate that the proposed design offers $6\%$ improvement in both SE and EE with $40\%$ fewer active ADC bits compared with the uniform bit allocation. Furthermore, we numerically show that receiving more data streams with low-resolution ADCs can achieve higher SE and EE compared to receiving fewer data streams with high-resolution ADCs. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 13 pages, 14 figures

arXiv:2407.03788 [pdf, other]

Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, we propose a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model's focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, we improve video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets. △ Less

Submitted 19 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.03665 [pdf, other]

Heterogeneous Hypergraph Embedding for Recommendation Systems

Authors: Darnbi Sakong, Viet Hung Vu, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

Abstract: Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to… ▽ More Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to sub-optimal recommendations, and ii) Dealing with the heterogeneous modalities of input sources, such as user-item bipartite graphs and KGs, which may introduce noise and inaccuracies. To address these issues, we present a novel Knowledge-enhanced Heterogeneous Hypergraph Recommender System (KHGRec). KHGRec captures group-wise characteristics of both the interaction network and the KG, modeling complex connections in the KG. Using a collaborative knowledge heterogeneous hypergraph (CKHG), it employs two hypergraph encoders to model group-wise interdependencies and ensure explainability. Additionally, it fuses signals from the input graphs with cross-view self-supervised learning and attention mechanisms. Extensive experiments on four real-world datasets show our model's superiority over various state-of-the-art baselines, with an average 5.18\% relative improvement. Additional tests on noise resilience, missing data, and cold-start problems demonstrate the robustness of our KHGRec framework. Our model and evaluation datasets are publicly available at \url{https://github.com/viethungvu1998/KHGRec}. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03611 [pdf, other]

An Empirical Study on Capability of Large Language Models in Understanding Code Semantics

Authors: Thu-Trang Nguyen, Thanh Trong Vu, Hieu Dinh Vo, Son Nguyen

Abstract: Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks, increasing the application of code LLMs in software development. Despite the success of code LLMs, there remain significant concerns about the actual capabilities and reliability of these models, "whether these models really learn the semantics of code from the traini… ▽ More Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks, increasing the application of code LLMs in software development. Despite the success of code LLMs, there remain significant concerns about the actual capabilities and reliability of these models, "whether these models really learn the semantics of code from the training data and leverage the learned knowledge to perform the SE tasks". In this paper, we introduce EMPICA, a comprehensive framework designed to systematically and empirically evaluate the capabilities of code LLMs in understanding code semantics. Specifically, EMPICA systematically introduces controlled modifications/transformations into the input code and examines the models' responses. Generally, code LLMs must be robust to semantically equivalent code inputs and be sensitive to non-equivalent ones for all SE tasks. Specifically, for every SE task, given an input code snippet c and its semantic equivalent variants, code LLMs must robustly produce consistent/equivalent outputs while they are expected to generate different outputs for c and its semantic non-equivalent variants. Our experimental results on three representative code understanding tasks, including code summarization, method name prediction, and output prediction, reveal that the robustness and sensitivity of the state-of-the-art code LLMs to code transformations vary significantly across tasks and transformation operators. In addition, the code LLMs exhibit better robustness to the semantic preserving transformations than their sensitivity to the semantic non-preserving transformations. These results highlight a need to enhance the model's capabilities of understanding code semantics, especially the sensitivity property. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03144 [pdf, other]

Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Authors: Son Nguyen, Thinh Nguyen, Khoa D Doan, Kok-Seng Wong

Abstract: Federated Learning (FL) is a distributed machine learning approach that maintains data privacy by training on decentralized data sources. Similar to centralized machine learning, FL is also susceptible to backdoor attacks, where an attacker can compromise some clients by injecting a backdoor trigger into local models of those clients, leading to the global model's behavior being manipulated as des… ▽ More Federated Learning (FL) is a distributed machine learning approach that maintains data privacy by training on decentralized data sources. Similar to centralized machine learning, FL is also susceptible to backdoor attacks, where an attacker can compromise some clients by injecting a backdoor trigger into local models of those clients, leading to the global model's behavior being manipulated as desired by the attacker. Most backdoor attacks in FL assume a predefined target class and require control over a large number of clients or knowledge of benign clients' information. Furthermore, they are not imperceptible and are easily detected by human inspection due to clear artifacts left on the poison data. To overcome these challenges, we propose Venomancer, an effective backdoor attack that is imperceptible and allows target-on-demand. Specifically, imperceptibility is achieved by using a visual loss function to make the poison data visually indistinguishable from the original data. Target-on-demand property allows the attacker to choose arbitrary target classes via conditional adversarial training. Additionally, experiments showed that the method is robust against state-of-the-art defenses such as Norm Clipping, Weak DP, Krum, Multi-Krum, RLR, FedRAD, Deepsight, and RFLBAT. The source code is available at https://github.com/nguyenhongson1902/Venomancer. △ Less

Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03110 [pdf, other]

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio \& visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications. △ Less

Submitted 2 May, 2024; originally announced July 2024.

arXiv:2407.02966 [pdf, other]

Efficient Forward-Mode Algorithmic Derivatives of Geant4

Authors: Max Aehle, Xuan Tung Nguyen, Mihály Novák, Tommaso Dorigo, Nicolas R. Gauger, Jan Kieseler, Markus Klute, Vassil Vassilev

Abstract: We have applied an operator-overloading forward-mode algorithmic differentiation tool to the Monte-Carlo particle simulation toolkit Geant4. Our differentiated version of Geant4 allows computing mean pathwise derivatives of user-defined outputs of Geant4 applications with respect to user-defined inputs. This constitutes a major step towards enabling gradient-based optimization techniques in high-e… ▽ More We have applied an operator-overloading forward-mode algorithmic differentiation tool to the Monte-Carlo particle simulation toolkit Geant4. Our differentiated version of Geant4 allows computing mean pathwise derivatives of user-defined outputs of Geant4 applications with respect to user-defined inputs. This constitutes a major step towards enabling gradient-based optimization techniques in high-energy physics, as well as other application domains of Geant4. This is a preliminary report on the technical aspects of applying operator-overloading AD to Geant4, as well as a first analysis of some results obtained by our differentiated Geant4 prototype. We plan to follow up with a more refined analysis. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02828 [pdf]

Quantum Serverless Paradigm and Application Development using the QFaaS Framework

Authors: Hoa T. Nguyen, Bui Binh An Pham, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a pr… ▽ More Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a practical Quantum Function-as-a-Service framework. This framework utilizes the serverless computing model to simplify quantum application development and deployment by abstracting the complexities of quantum hardware and enhancing application portability across different quantum software development kits and quantum backends. The chapter provides comprehensive documentation and guidelines for deploying and using QFaaS, detailing the setup, component deployment, and examples of service-oriented quantum applications. This framework offers a promising approach to overcoming current limitations and advancing the practical software engineering of quantum computing. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Guidelines for deploying and using the QFaaS Framework (for the original paper, see https://doi.org/10.1016/j.future.2024.01.018)

arXiv:2407.02748 [pdf, other]

DRLQ: A Deep Reinforcement Learning-based Task Placement for Quantum Cloud Computing

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computin… ▽ More The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computing environments, addressing the optimization of task completion time and quantum task scheduling efficiency. It leverages the Deep Q Network (DQN) architecture, enhanced with the Rainbow DQN approach, to create a dynamic task placement strategy. This approach is one of the first in the field of quantum cloud resource management, enabling adaptive learning and decision-making for quantum cloud environments and effectively optimizing task placement based on changing conditions and resource availability. We conduct extensive experiments using the QSimPy simulation toolkit to evaluate the performance of our method, demonstrating substantial improvements in task execution efficiency and a reduction in the need to reschedule quantum tasks. Our results show that utilizing the DRLQ approach for task placement can significantly reduce total quantum task completion time by 37.81% to 72.93% and prevent task rescheduling attempts compared to other heuristic approaches. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted paper at IEEE CLOUD 2024 conference

arXiv:2407.02190 [pdf, other]

I2EKF-LO: A Dual-Iteration Extended Kalman Filter Based LiDAR Odometry

Authors: Wenlu Yu, Jie Xu, Chengwei Zhao, Lijun Zhao, Thien-Minh Nguyen, Shenghai Yuan, Mingming Bai, Lihua Xie

Abstract: LiDAR odometry is a pivotal technology in the fields of autonomous driving and autonomous mobile robotics. However, most of the current works focus on nonlinear optimization methods, and still existing many challenges in using the traditional Iterative Extended Kalman Filter (IEKF) framework to tackle the problem: IEKF only iterates over the observation equation, relying on a rough estimate of the… ▽ More LiDAR odometry is a pivotal technology in the fields of autonomous driving and autonomous mobile robotics. However, most of the current works focus on nonlinear optimization methods, and still existing many challenges in using the traditional Iterative Extended Kalman Filter (IEKF) framework to tackle the problem: IEKF only iterates over the observation equation, relying on a rough estimate of the initial state, which is insufficient to fully eliminate motion distortion in the input point cloud; the system process noise is difficult to be determined during state estimation of the complex motions; and the varying motion models across different sensor carriers. To address these issues, we propose the Dual-Iteration Extended Kalman Filter (I2EKF) and the LiDAR odometry based on I2EKF (I2EKF-LO). This approach not only iterates over the observation equation but also leverages state updates to iteratively mitigate motion distortion in LiDAR point clouds. Moreover, it dynamically adjusts process noise based on the confidence level of prior predictions during state estimation and establishes motion models for different sensor carriers to achieve accurate and efficient state estimation. Comprehensive experiments demonstrate that I2EKF-LO achieves outstanding levels of accuracy and computational efficiency in the realm of LiDAR odometry. Additionally, to foster community development, our code is open-sourced.https://github.com/YWL0720/I2EKF-LO. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by IROS 2024

arXiv:2407.01987 [pdf, other]

AHMsys: An Automated HVAC Modeling System for BIM Project

Authors: Long Hoang Dang, Duy-Hung Nguyen, Thai Quang Le, Thinh Truong Nguyen, Clark Mei, Vu Hoang

Abstract: This paper presents a novel system, named AHMsys, designed to automate the process of generating 3D Heating, Ventilation, and Air Conditioning (HVAC) models from 2D Computer-Aided Design (CAD) drawings, a key component of Building Information Modeling (BIM). By automatically preprocessing and extracting essential HVAC object information then creating detailed 3D models, our proposed AHMsys signifi… ▽ More This paper presents a novel system, named AHMsys, designed to automate the process of generating 3D Heating, Ventilation, and Air Conditioning (HVAC) models from 2D Computer-Aided Design (CAD) drawings, a key component of Building Information Modeling (BIM). By automatically preprocessing and extracting essential HVAC object information then creating detailed 3D models, our proposed AHMsys significantly reduced the 20 percent work schedule of the BIM process in Akila. This advancement highlights the essential impact of integrating AI technologies in managing the lifecycle of a digital representation of the building. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01963 [pdf, other]

Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders

Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Thinh Pham, Loi Khanh Nguyen, Alexander Schindler

Abstract: Existing speaker diarization systems heavily rely on large amounts of manually annotated data, which is labor-intensive and challenging to collect in real-world scenarios. Additionally, the language-specific constraint in speaker diarization systems significantly hinders their applicability and scalability in multilingual settings. In this paper, we therefore propose a cluster-based speaker diariz… ▽ More Existing speaker diarization systems heavily rely on large amounts of manually annotated data, which is labor-intensive and challenging to collect in real-world scenarios. Additionally, the language-specific constraint in speaker diarization systems significantly hinders their applicability and scalability in multilingual settings. In this paper, we therefore propose a cluster-based speaker diarization system for multilingual telephone call applications. The proposed system supports multiple languages and does not require large-scale annotated data for the training process as leveraging the multilingual Whisper model to extract speaker embeddings and proposing a novel Mixture of Sparse Autoencoders (Mix-SAE) network architecture for unsupervised speaker clustering. Experimental results on the evaluating dataset derived from two-speaker subsets of CALLHOME and CALLFRIEND telephonic speech corpora demonstrate superior efficiency of the proposed Mix-SAE network to other autoencoder-based clustering methods. The overall performance of our proposed system also indicates the promising potential of our approach in developing unsupervised multilingual speaker diarization applications within the context of limited annotated data and enhancing the integration ability into comprehensive multi-task speech analysis systems (i.e. multiple tasks of speech-to-text, language detection, speaker diarization integrated in a low-complexity system). △ Less

Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures

Showing 51–100 of 3,810 results for author: Nguyen, T