Search | arXiv e-print repository

arXiv:2409.08143 [pdf, other]

Effective Segmentation of Post-Treatment Gliomas Using Simple Approaches: Artificial Sequence Generation and Ensemble Models

Authors: Heejong Kim, Leo Milecki, Mina C Moghadam, Fengbei Liu, Minh Nguyen, Eric Qiu, Abhishek Thanki, Mert R Sabuncu

Abstract: Segmentation is a crucial task in the medical imaging field and is often an important primary step or even a prerequisite to the analysis of medical volumes. Yet treatments such as surgery complicate the accurate delineation of regions of interest. The BraTS Post-Treatment 2024 Challenge published the first public dataset for post-surgery glioma segmentation and addresses the aforementioned issue… ▽ More Segmentation is a crucial task in the medical imaging field and is often an important primary step or even a prerequisite to the analysis of medical volumes. Yet treatments such as surgery complicate the accurate delineation of regions of interest. The BraTS Post-Treatment 2024 Challenge published the first public dataset for post-surgery glioma segmentation and addresses the aforementioned issue by fostering the development of automated segmentation tools for glioma in MRI data. In this effort, we propose two straightforward approaches to enhance the segmentation performances of deep learning-based methodologies. First, we incorporate an additional input based on a simple linear combination of the available MRI sequences input, which highlights enhancing tumors. Second, we employ various ensembling methods to weigh the contribution of a battery of models. Our results demonstrate that these approaches significantly improve segmentation performance compared to baseline models, underscoring the effectiveness of these simple approaches in improving medical image segmentation tasks. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: Invited for an Oral Presentation at the MICCAI BraTS Challenge 2024

arXiv:2409.05996 [pdf, other]

Adapting to Shifting Correlations with Unlabeled Data Calibration

Authors: Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu

Abstract: Distribution shifts between sites can seriously degrade model performance since models are prone to exploiting unstable correlations. Thus, many methods try to find features that are stable across sites and discard unstable features. However, unstable features might have complementary information that, if used appropriately, could increase accuracy. More recent methods try to adapt to unstable fea… ▽ More Distribution shifts between sites can seriously degrade model performance since models are prone to exploiting unstable correlations. Thus, many methods try to find features that are stable across sites and discard unstable features. However, unstable features might have complementary information that, if used appropriately, could increase accuracy. More recent methods try to adapt to unstable features at the new sites to achieve higher accuracy. However, they make unrealistic assumptions or fail to scale to multiple confounding features. We propose Generalized Prevalence Adjustment (GPA for short), a flexible method that adjusts model predictions to the shifting correlations between prediction target and confounders to safely exploit unstable features. GPA can infer the interaction between target and confounders in new sites using unlabeled samples from those sites. We evaluate GPA on several real and synthetic datasets, and show that it outperforms competitive baselines. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: Accepted at ECCV

arXiv:2409.05089 [pdf]

Leveraging WaveNet for Dynamic Listening Head Modeling from Speech

Authors: Minh-Duc Nguyen, Hyung-Jeong Yang, Seung-Won Kim, Ji-Eun Shin, Soo-Hyung Kim

Abstract: The creation of listener facial responses aims to simulate interactive communication feedback from a listener during a face-to-face conversation. Our goal is to generate believable videos of listeners' heads that respond authentically to a single speaker by a sequence-to-sequence model with an combination of WaveNet and Long short-term memory network. Our approach focuses on capturing the subtle n… ▽ More The creation of listener facial responses aims to simulate interactive communication feedback from a listener during a face-to-face conversation. Our goal is to generate believable videos of listeners' heads that respond authentically to a single speaker by a sequence-to-sequence model with an combination of WaveNet and Long short-term memory network. Our approach focuses on capturing the subtle nuances of listener feedback, ensuring the preservation of individual listener identity while expressing appropriate attitudes and viewpoints. Experiment results show that our method surpasses the baseline models on ViCo benchmark Dataset. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2409.05088 [pdf, other]

Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment

Authors: Minh-Duc Nguyen, Hyung-Jeong Yang, Soo-Hyung Kim, Ji-Eun Shin, Seung-Won Kim

Abstract: Accurate pain assessment is crucial in healthcare for effective diagnosis and treatment; however, traditional methods relying on self-reporting are inadequate for populations unable to communicate their pain. Cutting-edge AI is promising for supporting clinicians in pain recognition using facial video data. In this paper, we enhance pain recognition by employing facial video analysis within a Tran… ▽ More Accurate pain assessment is crucial in healthcare for effective diagnosis and treatment; however, traditional methods relying on self-reporting are inadequate for populations unable to communicate their pain. Cutting-edge AI is promising for supporting clinicians in pain recognition using facial video data. In this paper, we enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model. By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions. We conducted our experiment on the AI4Pain dataset, which produced promising results that pave the way for innovative healthcare solutions that are both comprehensive and objective. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2409.04954 [pdf, ps, other]

Spectral invariants and equivariant monopole Floer homology for rational homology three-spheres

Authors: Minh Lam Nguyen

Abstract: In this paper, we study a model for $S^1$-equivariant monopole Floer homology for rational homology three-spheres via a homological device called $\mathcal{S}$-complex. Using the Chern-Simons-Dirac functional, we define an $\mathbf{R}$-filtration on the (equivariant) complex of monopole Floer homology $HM$. This $\mathbf{R}$-filtration fits $HM$ into a persistent homology theory, from which one ca… ▽ More In this paper, we study a model for $S^1$-equivariant monopole Floer homology for rational homology three-spheres via a homological device called $\mathcal{S}$-complex. Using the Chern-Simons-Dirac functional, we define an $\mathbf{R}$-filtration on the (equivariant) complex of monopole Floer homology $HM$. This $\mathbf{R}$-filtration fits $HM$ into a persistent homology theory, from which one can define a numerical quantity called the spectral invariant $ρ$. The spectral invariant $ρ$ is tied with the geometry of the underlying manifold. The main result of the papers shows that $ρ$ provides an obstruction to the existence of positive scalar curvature metric on a ribbon homology cobordism. △ Less

Submitted 7 September, 2024; originally announced September 2024.

Comments: Comments are welcome!

MSC Class: 57Rxx; 57Mxx; 57Kxx

arXiv:2409.03411 [pdf, other]

doi 10.1117/12.3018900

Mid-order wavefront control for exoplanet imaging: preliminary characterization of the segmented deformable mirror and Zernike wavefront sensor on HiCAT

Authors: B. Buralli, M. N'Diaye, R. Pourcelot, M. Carbillet, E. H. Por, I. Laginja, L. Canas, S. Steiger, P. Petrone, M. M. Nguyen, B. Nickson, S. F. Redmond, A. Sahoo, L. Pueyo, M. D. Perrin, R. Soummer

Abstract: We study a mid-order wavefront sensor (MOWFS) to address fine cophasing errors in exoplanet imaging with future large segmented aperture space telescopes. Observing Earth analogs around Sun-like stars requires contrasts down to $10^{-10}$ in visible light. One promising solution consists of producing a high-contrast dark zone in the image of an observed star. In a space observatory, this dark regi… ▽ More We study a mid-order wavefront sensor (MOWFS) to address fine cophasing errors in exoplanet imaging with future large segmented aperture space telescopes. Observing Earth analogs around Sun-like stars requires contrasts down to $10^{-10}$ in visible light. One promising solution consists of producing a high-contrast dark zone in the image of an observed star. In a space observatory, this dark region will be altered by several effects, and among them, the small misalignments of the telescope mirror segments due to fine thermo-mechanical drifts. To correct for these errors in real time, we investigate a wavefront control loop based on a MOWFS with a Zernike sensor. Such a MOWFS was installed on the high-contrast imager for complex aperture telescopes (HiCAT) testbed in Baltimore in June 2023. The bench uses a 37-segment Iris-AO deformable mirror to mimic telescope segmentation and some wavefront control strategies to produce a dark zone with such an aperture. In this contribution, we first use the MOWFS to characterize the Iris-AO segment discretization steps. For the central segment, we find a minimal step of 125 $\pm$ 31 pm. This result will help us to assess the contribution of the Iris-AO DM on the contrast in HiCAT. We then determine the detection limits of the MOWFS, estimating wavefront error amplitudes of 119 and 102 pm for 10 s and 1 min exposure time with a SNR of 3. These values inform us about the measurement capabilities of our wavefront sensor on the testbed. These preliminary results will be useful to provide insights on metrology and stability for exo-Earth observations with the Habitable Worlds Observatory. △ Less

Submitted 9 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: 9 pages, 6 figures

arXiv:2408.13687 [pdf, other]

Quantum error correction below the surface code threshold

Authors: Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, David A. Browne , et al. (224 additional authors not shown)

Abstract: Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this… ▽ More Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $Λ$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $μ$s at distance-5 up to a million cycles, with a cycle time of 1.1 $μ$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 10 pages, 4 figures, Supplementary Information

arXiv:2408.13126 [pdf, other]

CathAction: A Benchmark for Endovascular Intervention Understanding

Authors: Baoru Huang, Tuan Vo, Chayun Kongtongvattana, Giulio Dagnino, Dennis Kundrat, Wenqiang Chi, Mohamed Abdelaziz, Trevor Kwok, Tudor Jianu, Tuong Do, Hieu Le, Minh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Jianyang Xie, Yanda Meng, Binod Bhattarai, Zhaorui Tan, Hongbin Liu, Hong Seng Gan, Wei Wang, Xi Yang, Qiufeng Wang, Jionglong Su , et al. (13 additional authors not shown)

Abstract: Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale datase… ▽ More Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale dataset for catheterization understanding. Our CathAction dataset encompasses approximately 500,000 annotated frames for catheterization action understanding and collision detection, and 25,000 ground truth masks for catheter and guidewire segmentation. For each task, we benchmark recent related works in the field. We further discuss the challenges of endovascular intentions compared to traditional computer vision tasks and point out open research questions. We hope that CathAction will facilitate the development of endovascular intervention understanding methods that can be applied to real-world applications. The dataset is available at https://airvlab.github.io/cathaction/. △ Less

Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

Comments: 10 pages. Webpage: https://airvlab.github.io/cathaction/

arXiv:2408.12959 [pdf, other]

Multimodal Contrastive In-Context Learning

Authors: Yosuke Miyanishi, Minh Le Nguyen

Abstract: The rapid growth of Large Language Models (LLMs) usage has highlighted the importance of gradient-free in-context learning (ICL). However, interpreting their inner workings remains challenging. This paper introduces a novel multimodal contrastive in-context learning framework to enhance our understanding of ICL in LLMs. First, we present a contrastive learning-based interpretation of ICL in real-w… ▽ More The rapid growth of Large Language Models (LLMs) usage has highlighted the importance of gradient-free in-context learning (ICL). However, interpreting their inner workings remains challenging. This paper introduces a novel multimodal contrastive in-context learning framework to enhance our understanding of ICL in LLMs. First, we present a contrastive learning-based interpretation of ICL in real-world settings, marking the distance of the key-value representation as the differentiator in ICL. Second, we develop an analytical framework to address biases in multimodal input formatting for real-world datasets. We demonstrate the effectiveness of ICL examples where baseline performance is poor, even when they are represented in unseen formats. Lastly, we propose an on-the-fly approach for ICL (Anchored-by-Text ICL) that demonstrates effectiveness in detecting hateful memes, a task where typical ICL struggles due to resource limitations. Extensive experiments on multimodal datasets reveal that our approach significantly improves ICL performance across various scenarios, such as challenging tasks and resource-constrained environments. Moreover, it provides valuable insights into the mechanisms of in-context learning in LLMs. Our findings have important implications for developing more interpretable, efficient, and robust multimodal AI systems, especially in challenging tasks and resource-constrained environments. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12480 [pdf, other]

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

Authors: Khang T. Doan, Bao G. Huynh, Dung T. Hoang, Thuc D. Pham, Nhat H. Pham, Quan T. M. Nguyen, Bang Q. Vo, Suong N. Hoang

Abstract: In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Viet… ▽ More In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Vietnamese context. The model is fine-tuned on an extensive dataset of over 3 million image-question-answer pairs, achieving robust performance and reliable results across multiple Vietnamese language benchmarks like OpenViVQA and ViTextVQA. Vintern-1B is small enough to fit into various on-device applications easily. Additionally, we have open-sourced several Vietnamese vision question answering (VQA) datasets for text and diagrams, created with Gemini 1.5 Flash. Our models are available at: https://huggingface.co/5CD-AI/Vintern-1B-v2. △ Less

Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.06618 [pdf, other]

Generalized knowledge-enhanced framework for biomedical entity and relation extraction

Authors: Minh Nguyen, Phuong Le

Abstract: In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to cons… ▽ More In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to construct a task-independent and reusable background knowledge graph for biomedical entity and relation extraction. The design of our model is inspired by how humans learn domain-specific topics. In particular, humans often first acquire the most basic and common knowledge regarding a field to build the foundational knowledge and then use that as a basis for extending to various specialized topics. Our framework employs such common-knowledge-sharing mechanism to build a general neural-network knowledge graph that is learning transferable to different domain-specific biomedical texts effectively. Experimental evaluations demonstrate that our model, equipped with this generalized and cross-transferable knowledge base, achieves competitive performance benchmarks, including BioRelEx for binding interaction detection and ADE for Adverse Drug Effect identification. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.04874 [pdf, other]

DG Comics: Semi-Automatically Authoring Graph Comics for Dynamic Graphs

Authors: Joohee Kim, Hyunwook Lee, Duc M. Nguyen, Minjeong Shin, Bum Chul Kwon, Sungahn Ko, Niklas Elmqvist

Abstract: Comics are an effective method for sequential data-driven storytelling, especially for dynamic graphs -- graphs whose vertices and edges change over time. However, manually creating such comics is currently time-consuming, complex, and error-prone. In this paper, we propose DG Comics, a novel comic authoring tool for dynamic graphs that allows users to semi-automatically build and annotate comics.… ▽ More Comics are an effective method for sequential data-driven storytelling, especially for dynamic graphs -- graphs whose vertices and edges change over time. However, manually creating such comics is currently time-consuming, complex, and error-prone. In this paper, we propose DG Comics, a novel comic authoring tool for dynamic graphs that allows users to semi-automatically build and annotate comics. The tool uses a newly developed hierarchical clustering algorithm to segment consecutive snapshots of dynamic graphs while preserving their chronological order. It also presents rich information on both individuals and communities extracted from dynamic graphs in multiple views, where users can explore dynamic graphs and choose what to tell in comics. For evaluation, we provide an example and report the results of a user study and an expert review. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: To appear in IEEE Transactions on Visualization and Computer Graphics

arXiv:2408.02855 [pdf, other]

doi 10.1145/3568294.3580153

Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises

Authors: Aleksa Marusic, Louis Annabi, Sao Msi Nguyen, Adriana Tapus

Abstract: Analyzing human motion is an active research area, with various applications. In this work, we focus on human motion analysis in the context of physical rehabilitation using a robot coach system. Computer-aided assessment of physical rehabilitation entails evaluation of patient performance in completing prescribed rehabilitation exercises, based on processing movement data captured with a sensory… ▽ More Analyzing human motion is an active research area, with various applications. In this work, we focus on human motion analysis in the context of physical rehabilitation using a robot coach system. Computer-aided assessment of physical rehabilitation entails evaluation of patient performance in completing prescribed rehabilitation exercises, based on processing movement data captured with a sensory system, such as RGB and RGB-D cameras. As 2D and 3D human pose estimation from RGB images had made impressive improvements, we aim to compare the assessment of physical rehabilitation exercises using movement data obtained from both RGB-D camera (Microsoft Kinect) and estimation from RGB videos (OpenPose and BlazePose algorithms). A Gaussian Mixture Model (GMM) is employed from position (and orientation) features, with performance metrics defined based on the log-likelihood values from GMM. The evaluation is performed on a medical database of clinical patients carrying out low back-pain rehabilitation exercises, previously coached by robot Poppy. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: European Conference on Mobile Robots (2023)

arXiv:2408.00992 [pdf, ps, other]

Fairness in Large Language Models in Three Hours

Authors: Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, Wenbin Zhang

Abstract: Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent… ▽ More Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}. △ Less

Submitted 7 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.16491 [pdf, other]

Canadian Traveller Problems in Temporal Graphs

Authors: Thomas Bellitto, Johanne Cohen, Bruno Escoffier, Minh-Hang Nguyen, Mikael Rabie

Abstract: This paper formalises the Canadian Traveller problem as a positional two-player game on graphs. We consider two variants depending on whether an edge is blocked. In the locally-informed variant, the traveller learns if an edge is blocked upon reaching one of its endpoints, while in the uninformed variant, they discover this only when the edge is supposed to appear. We provide a polynomial algorith… ▽ More This paper formalises the Canadian Traveller problem as a positional two-player game on graphs. We consider two variants depending on whether an edge is blocked. In the locally-informed variant, the traveller learns if an edge is blocked upon reaching one of its endpoints, while in the uninformed variant, they discover this only when the edge is supposed to appear. We provide a polynomial algorithm for each shortest path variant in the uninformed case. This algorithm also solves the case of directed acyclic non-temporal graphs. In the locally-informed case, we prove that finding a winning strategy is PSPACE-complete. Moreover, we establish that the problem is polynomial-time solvable when $k=1$ but NP-hard for $k\geq 2$. Additionally, we show that the standard (non-temporal) Canadian Traveller Problem is NP-hard when there are $k\geq 4$ blocked edges, which is, to the best of our knowledge, the first hardness result for CTP for a constant number of blocked edges. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.15468 [pdf, ps, other]

Efficient influence functions for Sobol' indices under two designs of experiments

Authors: Thierry Klein, Agnès Lagnoux, Paul Rochet, Thi Mong Ngoc Nguyen

Abstract: In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings. In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15426 [pdf, other]

Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training

Authors: Ye Lin Tun, Chu Myaet Thwal, Minh N. H. Nguyen, Choong Seon Hong

Abstract: Combining different data modalities enables deep neural networks to tackle complex tasks more effectively, making multimodal learning increasingly popular. To harness multimodal data closer to end users, it is essential to integrate multimodal learning with privacy-preserving training approaches such as federated learning (FL). However, compared to conventional unimodal learning, multimodal settin… ▽ More Combining different data modalities enables deep neural networks to tackle complex tasks more effectively, making multimodal learning increasingly popular. To harness multimodal data closer to end users, it is essential to integrate multimodal learning with privacy-preserving training approaches such as federated learning (FL). However, compared to conventional unimodal learning, multimodal setting requires dedicated encoders for each modality, resulting in larger and more complex models that demand significant resources. This presents a substantial challenge for FL clients operating with limited computational resources and communication bandwidth. To address these challenges, we introduce LW-FedMML, a layer-wise federated multimodal learning approach, which decomposes the training process into multiple steps. Each step focuses on training only a portion of the model, thereby significantly reducing the memory and computational requirements. Moreover, FL clients only need to exchange the trained model portion with the central server, lowering the resulting communication cost. We conduct extensive experiments across various FL scenarios and multimodal learning setups to validate the effectiveness of our proposed method. The results demonstrate that LW-FedMML can compete with conventional end-to-end federated multimodal learning (FedMML) while significantly reducing the resource burden on FL clients. Specifically, LW-FedMML reduces memory usage by up to $2.7\times$, computational operations (FLOPs) by $2.4\times$, and total communication cost by $2.3\times$. We also introduce a progressive training approach called Prog-FedMML. While it offers lesser resource efficiency than LW-FedMML, Prog-FedMML has the potential to surpass the performance of end-to-end FedMML, making it a viable option for scenarios with fewer resource constraints. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.14728 [pdf, ps, other]

An Integral Equation Approach for the Valuation of Finite-maturity margin-call Stock Loans

Authors: Minh-Quan Nguyen, Nhat-Tan Le, Khuong Nguyen-An, Duc-Thi Luu

Abstract: This paper examines the pricing issue of margin-call stock loans with finite maturities under the Black-Scholes-Merton framework. In particular, using a Fourier Sine transform method, we reduce the partial differential equation governing the price of a margin-call stock loan into an ordinary differential equation, the solution of which can be easily found (in the Fourier Sine space) and analytical… ▽ More This paper examines the pricing issue of margin-call stock loans with finite maturities under the Black-Scholes-Merton framework. In particular, using a Fourier Sine transform method, we reduce the partial differential equation governing the price of a margin-call stock loan into an ordinary differential equation, the solution of which can be easily found (in the Fourier Sine space) and analytically inverted into the original space. As a result, we obtain an integral representation of the value of the stock loan in terms of the unknown optimal exit prices, which are, in turn, governed by a Volterra integral equation. We thus can break the pricing problem of margin-call stock loans into two steps: 1) finding the optimal exit prices by solving numerically the governing Volterra integral equation and 2) calculating the values of margin-call stock loans based on the obtained optimal exit prices. By validating and comparing with other available numerical methods, we show that our proposed numerical scheme offers a reliable and efficient way to calculate the service fee of a margin-call stock loan contract, track the contract value over time, and compute the level of stock price above which it is optimal to exit the contract. The effects of the margin-call feature on the loan contract are also examined and quantified. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.13694 [pdf, other]

Anticipatory Task and Motion Planning

Authors: Roshan Dhakal, Duc M. Nguyen, Tom Silver, Xuesu Xiao, Gregory J. Stein

Abstract: We consider a sequential task and motion planning (tamp) setting in which a robot is assigned continuous-space rearrangement-style tasks one-at-a-time in an environment that persists between each. Lacking advance knowledge of future tasks, existing (myopic) planning strategies unwittingly introduce side effects that impede completion of subsequent tasks: e.g., by blocking future access or manipula… ▽ More We consider a sequential task and motion planning (tamp) setting in which a robot is assigned continuous-space rearrangement-style tasks one-at-a-time in an environment that persists between each. Lacking advance knowledge of future tasks, existing (myopic) planning strategies unwittingly introduce side effects that impede completion of subsequent tasks: e.g., by blocking future access or manipulation. We present anticipatory task and motion planning, in which estimates of expected future cost from a learned model inform selection of plans generated by a model-based tamp planner so as to avoid such side effects, choosing configurations of the environment that both complete the task and minimize overall cost. Simulated multi-task deployments in navigation-among-movable-obstacles and cabinet-loading domains yield improvements of 32.7% and 16.7% average per-task cost respectively. When given time in advance to prepare the environment, our learning-augmented planning approach yields improvements of 83.1% and 22.3%. Both showcase the value of our approach. Finally, we also demonstrate anticipatory tamp on a real-world Fetch mobile manipulator. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12437 [pdf, other]

Variable-Agnostic Causal Exploration for Reinforcement Learning

Authors: Minh Hoang Nguyen, Hung Le, Svetha Venkatesh

Abstract: Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic… ▽ More Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL), incorporating causal relationships to drive exploration in RL without specifying environmental causal variables. Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. Subsequently, it constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion. This can be leveraged to generate intrinsic rewards or establish a hierarchy of subgoals to enhance exploration efficiency. Experimental results showcase a significant improvement in agent performance in grid-world, 2d games and robotic domains, particularly in scenarios with sparse rewards and noisy actions, such as the notorious Noisy-TV environments. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12311 [pdf, ps, other]

doi 10.1016/j.apnum.2024.07.008

A Finite Difference Scheme for (2+1)D Cubic-Quintic Nonlinear Schrödinger Equations with Nonlinear Damping

Authors: Anh Ha Le, Toan T. Huynh, Quan M. Nguyen

Abstract: Solitons of the purely cubic nonlinear Schrödinger equation in a space dimension of $n \geq 2$ suffer critical and supercritical collapses. These solitons can be stabilized in a cubic-quintic nonlinear medium. In this paper, we analyze the Crank-Nicolson finite difference scheme for the (2+1)D cubic-quintic nonlinear Schrödinger equation with cubic damping. We show that both the discrete solution,… ▽ More Solitons of the purely cubic nonlinear Schrödinger equation in a space dimension of $n \geq 2$ suffer critical and supercritical collapses. These solitons can be stabilized in a cubic-quintic nonlinear medium. In this paper, we analyze the Crank-Nicolson finite difference scheme for the (2+1)D cubic-quintic nonlinear Schrödinger equation with cubic damping. We show that both the discrete solution, in the discrete $L^2$-norm, and discrete energy are bounded. By using appropriate settings and estimations, the existence and the uniqueness of the numerical solution are proved. In addition, the error estimations are established in terms of second order for both space and time in discrete $L^2$-norm and $H^1$-norm. Numerical simulations for the (2+1)D cubic-quintic nonlinear Schrödinger equation with cubic damping are conducted to validate the convergence. △ Less

Submitted 6 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 43 pages, 4 figures, 5 tables

Journal ref: Appl. Numer. Math. 205 (2024) 215

arXiv:2407.12094 [pdf, other]

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Authors: Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen

Abstract: We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these ga… ▽ More We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification} △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: accepted to INTERSPEECH 2024

arXiv:2407.04489 [pdf, other]

Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Authors: Duy M. H. Nguyen, An T. Le, Trung Q. Nguyen, Nghiem T. Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag

Abstract: Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c… ▽ More Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Version 1

arXiv:2407.04279 [pdf, other]

BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks

Authors: Jieying Xue, Minh Phuong Nguyen, Blake Matheny, Le Minh Nguyen

Abstract: In the Emotion Recognition in Conversation task, recent investigations have utilized attention mechanisms exploring relationships among utterances from intra- and inter-speakers for modeling emotional interaction between them. However, attributes such as speaker personality traits remain unexplored and present challenges in terms of their applicability to other tasks or compatibility with diverse… ▽ More In the Emotion Recognition in Conversation task, recent investigations have utilized attention mechanisms exploring relationships among utterances from intra- and inter-speakers for modeling emotional interaction between them. However, attributes such as speaker personality traits remain unexplored and present challenges in terms of their applicability to other tasks or compatibility with diverse model architectures. Therefore, this work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation. By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation as supplementary knowledge injected into the model to classify emotional labels for each utterance. Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets: IEMOCAP, MELD, and EmoryNLP, demonstrating the effectiveness and generalization of our model and showcasing its potential for adaptation to various conversation analysis tasks. Our source code is available at https://github.com/yingjie7/BiosERC. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Accepted in the 33rd International Conference on Artificial Neural Networks (ICANN 2024)

arXiv:2407.03376 [pdf, other]

The Complex Interplay Between Risk Tolerance and the Spread of Infectious Diseases

Authors: Maximilian Nguyen, Ari Freedman, Matthew Cheung, Chadi Saad-Roy, Baltazar Espinoza, Bryan Grenfell, Simon Levin

Abstract: Risk-driven behavior provides a feedback mechanism through which individuals both shape and are collectively affected by an epidemic. We introduce a general and flexible compartmental model to study the effect of heterogeneity in the population with regards to risk tolerance. The interplay between behavior and epidemiology leads to a rich set of possible epidemic dynamics. Depending on the behavio… ▽ More Risk-driven behavior provides a feedback mechanism through which individuals both shape and are collectively affected by an epidemic. We introduce a general and flexible compartmental model to study the effect of heterogeneity in the population with regards to risk tolerance. The interplay between behavior and epidemiology leads to a rich set of possible epidemic dynamics. Depending on the behavioral composition of the population, we find that increasing heterogeneity in risk tolerance can either increase or decrease the epidemic size. We find that multiple waves of infection can arise due to the interplay between transmission and behavior, even without the replenishment of susceptibles. We find that increasing protective mechanisms such as the effectiveness of interventions, the number of risk-averse people in the population, and the duration of intervention usage reduces the epidemic overshoot. When the protection is pushed past a critical threshold, the epidemic dynamics enter an underdamped regime where the epidemic size exactly equals the herd immunity threshold. Lastly, we can find regimes where epidemic size does not monotonically decrease with a population that becomes increasingly risk-averse. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 7 figures (main text)

arXiv:2407.01082 [pdf, other]

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Authors: Minh Nguyen, Andrew Baker, Andreas Kirsch, Clement Neo

Abstract: Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To ad… ▽ More Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To address this issue, we propose min-$p$, a dynamic truncation sampling method, that establishes a minimum base percentage threshold for tokens, which the scales according to the probability of the top candidate token. Through experiments on several benchmarks, such as GPQA, GSM8K and AlpacaEval Creative Writing, we demonstrate that min-$p$ improves the coherence and quality of generated text even at high temperatures, while also facilitating more creative and diverse outputs compared to top-$p$ and other sampling methods. As of writing, min-$p$ has been adopted by multiple open-source LLM implementations, and have been independently assessed by members of the open-source LLM community, further validating its practical utility and potential. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 8 Pages

arXiv:2407.00521 [pdf, other]

A Medical Low-Back Pain Physical Rehabilitation Dataset for Human Body Movement Analysis

Authors: Sao Mai Nguyen, Maxime Devanne, Olivier Remy-Neris, Mathieu Lempereur, André Thepaut

Abstract: While automatic monitoring and coaching of exercises are showing encouraging results in non-medical applications, they still have limitations such as errors and limited use contexts. To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we identify in this article four challenges to address and propose a medical dataset of clinical patients carrying… ▽ More While automatic monitoring and coaching of exercises are showing encouraging results in non-medical applications, they still have limitations such as errors and limited use contexts. To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we identify in this article four challenges to address and propose a medical dataset of clinical patients carrying out low back-pain rehabilitation exercises. The dataset includes 3D Kinect skeleton positions and orientations, RGB videos, 2D skeleton data, and medical annotations to assess the correctness, and error classification and localisation of body part and timespan. Along this dataset, we perform a complete research path, from data collection to processing, and finally a small benchmark. We evaluated on the dataset two baseline movement recognition algorithms, pertaining to two different approaches: the probabilistic approach with a Gaussian Mixture Model (GMM), and the deep learning approach with a Long-Short Term Memory (LSTM). This dataset is valuable because it includes rehabilitation relevant motions in a clinical setting with patients in their rehabilitation program, using a cost-effective, portable, and convenient sensor, and because it shows the potential for improvement on these challenges. △ Less

Submitted 29 June, 2024; originally announced July 2024.

ACM Class: I.5.4; I.4.8

Journal ref: IJCNN 2024

arXiv:2406.20043 [pdf, ps, other]

Existence of Solutions to the Seiberg-Witten Vortex Equations with Exponential Decay on the Plane

Authors: William L. Blair, Minh Lam Nguyen

Abstract: Clifford Taubes showed that the moduli space of the variational equation of the Yang-Mills-Higgs functional on the plane is non-empty, and its elements correspond to "vortices". Inspired by this result, in this paper, we show that the moduli space of the Hitchin-type dimensional reduction of the Seiberg-Witten equations on the plane contains both exponentially decayed solutions and polynomial grow… ▽ More Clifford Taubes showed that the moduli space of the variational equation of the Yang-Mills-Higgs functional on the plane is non-empty, and its elements correspond to "vortices". Inspired by this result, in this paper, we show that the moduli space of the Hitchin-type dimensional reduction of the Seiberg-Witten equations on the plane contains both exponentially decayed solutions and polynomial growth solutions. Furthermore, we show that there is correspondence from the moduli space of exponentially decayed and polynomial growth solutions to the symmetric products of complex numbers. The correspondence restricted to the latter is a surjective map. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 32 pages, comments are welcome!

MSC Class: 53Cxx; 57Rxx; 58Jxx; 57Kxx; 30G20; 30Cxx

arXiv:2406.13997 [pdf, other]

"Global is Good, Local is Bad?": Understanding Brand Bias in LLMs

Authors: Mahammed Kamruzzaman, Hieu Minh Nguyen, Gene Louis Kim

Abstract: Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established globa… ▽ More Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established global brands while marginalizing local ones. Using a curated dataset across four brand categories, we probe the behavior of LLMs in this space. We find a consistent pattern of bias in this space -- both in terms of disproportionately associating global brands with positive attributes and disproportionately recommending luxury gifts for individuals in high-income countries. We also find LLMs are subject to country-of-origin effects which may boost local brand preference in LLM outputs in specific contexts. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13781 [pdf, other]

A Primal-Dual Framework for Transformers and Neural Networks

Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

arXiv:2406.13770 [pdf, other]

Elliptical Attention

Authors: Stefan K. Nielsen, Laziz U. Abdullaev, Rachel Teo, Tan M. Nguyen

Abstract: Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propos… ▽ More Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions. We term this novel class of attention Elliptical Attention. Our Elliptical Attention provides two benefits: 1) reducing representation collapse and 2) enhancing the model's robustness as the Elliptical Attention pays more attention to contextually relevant information rather than focusing on some small subset of informative features. We empirically demonstrate the advantages of Elliptical Attention over the baseline dot-product attention and state-of-the-art attention methods on various practical tasks, including object classification, image segmentation, and language modeling across different data modalities. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 38 pages, 7 figures, 12 tables

arXiv:2406.13762 [pdf, other]

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

Authors: Rachel S. Y. Teo, Tan M. Nguyen

Abstract: The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms rely on heuristics and experience. In our work, we derive self-attention from kernel principa… ▽ More The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms rely on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a novel class of robust attention that is resilient to data contamination. We empirically demonstrate the advantages of RPC-Attention over softmax attention on the ImageNet-1K object classification, WikiText-103 language modeling, and ADE20K image segmentation task. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 33 pages, 5 figures, 12 tables

arXiv:2406.13725 [pdf, other]

Tree-Sliced Wasserstein Distance on a System of Lines

Authors: Viet-Hoang Tran, Trang Pham, Tho Tran, Tam Le, Tan M. Nguyen

Abstract: Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number o… ▽ More Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number of edges in the tree. Especially, TSW is identical to SW when the tree is a chain. While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW. However, for practical applications, popular tree metric sampling methods are heavily built upon given supports, which limits their capacity to adapt to new supports. In this paper, we propose the Tree-Sliced Wasserstein distance on a System of Lines (TSW-SL), which brings a connection between SW and TSW. Compared to SW and TSW, our TSW-SL benefits from the higher degree of freedom of TSW while being suitable to dynamic settings as SW. In TSW-SL, we use a variant of the Radon Transform to project measures onto a system of lines, resulting in measures on a space with a tree metric, then leverage TW to efficiently compute distances between them. We empirically verify the advantages of TSW-SL over the traditional SW by conducting a variety of experiments on gradient flows, image style transfer, and generative models. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 33 pages, 6 figures, 2 tables, 4 algorithms

arXiv:2406.12389 [pdf, other]

doi 10.18653/v1/2023.findings-emnlp.737

EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause

Authors: Mia Huong Nguyen, Yasith Samaradivakara, Prasanth Sasikumar, Chitralekha Gupta, Suranga Nanayakkara

Abstract: Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived f… ▽ More Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived from 9.8 million cleaned tweets over 15 years. We describe our curation process, which includes a comprehensive pipeline for data gathering, cleaning, labeling, and validation, ensuring the dataset's reliability and richness. We extract emotion labels and provide abstractive summarization of the events causing emotions. The final dataset comprises over 700,000 tweets with corresponding emotion-cause pairs spanning 48 emotion classes, validated by human evaluators. The novelty of our dataset stems from its broad spectrum of emotion classes and the abstractive emotion cause that facilitates the development of an emotion-cause knowledge graph for nuanced reasoning. Our dataset will enable the design of emotion-aware systems that account for the diverse emotional responses of different people for the same event. △ Less

Submitted 7 August, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to Findings of EMNLP 2023

ACM Class: I.2.7

Journal ref: Findings of EMNLP 2023

arXiv:2406.11927 [pdf, other]

On the Impacts of Contexts on Repository-Level Code Generation

Authors: Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

Abstract: CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present \textbf{\methodnamews}, a novel benchmark designed to evalua… ▽ More CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present \textbf{\methodnamews}, a novel benchmark designed to evaluate repository-level code generation, with a focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts. Our study examines a controlled scenario where developers specify essential code dependencies (contexts), challenging models to integrate them effectively. Additionally, we introduce an instruction-tuned dataset that enhances CodeLLMs' ability to leverage dependencies, along with a new metric, \textit{Dependency Invocation Rate (DIR)}, to quantify context utilization. Experimental results reveal that while pretrained LLMs demonstrate superior performance in terms of correctness, instruction-tuned models excel in context utilization and debugging capabilities. \methodnamews offers a comprehensive evaluation framework for assessing code functionality and alignment with developer intent, thereby advancing the development of more reliable CodeLLMs for real-world applications. The dataset and source code are available at~\url{https://github.com/FSoft-AI4Code/RepoExec}. △ Less

Submitted 2 September, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11912 [pdf, other]

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Authors: Minh Huynh Nguyen, Thang Phan Chau, Phong X. Nguyen, Nghi D. Q. Bui

Abstract: Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assign… ▽ More Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles - such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs. AgileCoder enhances development efficiency by organizing work into sprints, focusing on incrementally developing software through sprints. Additionally, we introduce Dynamic Code Graph Generator, a module that creates a Code Dependency Graph dynamically as updates are made to the codebase. This allows agents to better comprehend the codebase, leading to more precise code generation and modifications throughout the software development process. AgileCoder surpasses existing benchmarks, like ChatDev and MetaGPT, establishing a new standard and showcasing the capabilities of multi agent systems in advanced software engineering environments. △ Less

Submitted 14 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.10853 [pdf, other]

MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images

Authors: Eunji Hong, Minh Hieu Nguyen, Mikaela Angelina Uy, Minhyuk Sung

Abstract: We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CAD model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view im… ▽ More We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CAD model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view images, leveraging 2D convolutional neural networks allows these images to be exploited as a rich source for extracting extrusion cylinder information. However, we observe that extracting only the surface information of the extrudes and utilizing it results in suboptimal outcomes due to the challenges in the occlusion and surface segmentation. By synergizing with the extracted base curve information, we achieve the optimal reconstruction result with the best accuracy in 2D sketch and extrude parameter estimation. Our experiments, comparing our method with previous work that takes a raw 3D point cloud as input, demonstrate the effectiveness of our approach by taking advantage of multi-view images. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 24 pages

arXiv:2406.09837 [pdf, other]

TabularFM: An Open Framework For Tabular Foundational Models

Authors: Quan M. Tran, Suong N. Hoang, Lam M. Nguyen, Dzung Phan, Hoang Thanh Lam

Abstract: Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured… ▽ More Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future. △ Less

Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.06239 [pdf, other]

I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data

Authors: Hoang H. Le, Duy M. H. Nguyen, Omair Shahzad Bhatti, Laszlo Kopacsi, Thinh P. Ngo, Binh T. Nguyen, Michael Barz, Daniel Sonntag

Abstract: Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r… ▽ More Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation. △ Less

Submitted 7 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Updated version

arXiv:2406.05592 [pdf, other]

Constrained Design of a Binary Instrument in a Partially Linear Model

Authors: Tim Morrison, Minh Nguyen, Jonathan Chen, Michael Baiocchi, Art B. Owen

Abstract: We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment stat… ▽ More We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment status. Our goal is to assign the nudge as a function of covariates in a way that best estimates the local average treatment effect (LATE). We assume a partially linear model, wherein the baseline model is non-parametric and the treatment term is linear in the covariates. Under this model, we outline a two-stage procedure to consistently estimate the LATE. Though the variance of the LATE is intractable, we derive a finite sample approximation and thus a design criterion to minimize. This criterion is convex, allowing for constraints that might arise for budgetary or ethical reasons. We prove conditions under which our solution asymptotically recovers the lowest true variance among all possible nudge propensities. We apply our method to a semi-synthetic example involving triage in an emergency department and find significant gains relative to a regression discontinuity design. △ Less

Submitted 28 August, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: 31 pages, 6 figures

arXiv:2406.03413 [pdf, other]

UnWave-Net: Unrolled Wavelet Network for Compton Tomography Image Reconstruction

Authors: Ishak Ayad, Cécilia Tarpau, Javier Cebeiro, Maï K. Nguyen

Abstract: Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities… ▽ More Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities with several advantages such as high sensitivity, compactness, and entirely fixed systems, image reconstruction remains an open problem due to the mathematical challenges of CST modeling. In contrast, deep unrolling networks have demonstrated potential in CT image reconstruction, despite their computationally intensive nature. In this study, we investigate the efficiency of unrolling networks for CST image reconstruction. To address the important computational cost required for training, we propose UnWave-Net, a novel unrolled wavelet-based reconstruction network. This architecture includes a non-local regularization term based on wavelets, which captures long-range dependencies within images and emphasizes the multi-scale components of the wavelet transform. We evaluate our approach using a CST of circular geometry which stays completely static during data acquisition, where UnWave-Net facilitates image reconstruction in the absence of a specific reconstruction formula. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, and offers an improved computational efficiency compared to traditional unrolling networks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: This paper has been early accepted by MICCAI 2024

arXiv:2406.02533 [pdf, other]

SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition

Authors: Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White

Abstract: On-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries… ▽ More On-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries and high-confidence detection of components of unknown, non-cooperative satellites on orbit. We implement accelerated 3D Gaussian splatting to learn a 3D representation of the satellite, render virtual views of the target, and ensemble the YOLOv5 object detector over the virtual views, resulting in reliable, accurate, and precise satellite component detections. The full pipeline capable of running on-board and stand to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00162 [pdf, ps, other]

doi 10.1145/3663408.366582

Optical-computing-enabled Network: A New Dawn for Optical-layer Intelligence?

Authors: Dao Thanh Hai, Minh Nguyen, Isaac Woungang

Abstract: Inspired by the renaissance of optical computing recently, this poster presents a disruptive outlook on the possibility of seamless integration between optical communications and optical computing infrastructures, paving the way for achieving optical-layer intelligence and consequently boosting the capacity efficiency. This entails a paradigm shift in optical node architecture from the currently u… ▽ More Inspired by the renaissance of optical computing recently, this poster presents a disruptive outlook on the possibility of seamless integration between optical communications and optical computing infrastructures, paving the way for achieving optical-layer intelligence and consequently boosting the capacity efficiency. This entails a paradigm shift in optical node architecture from the currently used optical-bypass to a novel one, entitled, optical-computing-enabled mode, where in addition to the traditional add-drop and cross-connect functionalities, optical nodes are upgraded to account for optical-computing capabilities between the lightpath entities directly at the optical layer. A preliminary study focusing on the optical aggregation operation is examined and early simulation results indicate a promising spectral saving enabled by the optical-computing-enabled mode compared with the optical-bypass one. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 2-page poster, to be appeared in Proceedings of the 8th Asia-Pacific Workshop on Networking (APNet 2024)

arXiv:2405.20448 [pdf, other]

Knockout: A simple way to handle missing inputs

Authors: Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu

Abstract: Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training mul… ▽ More Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20024 [pdf, other]

Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

Authors: Thai-Hoc Vu, Senthil Kumar Jagatheesaperumal, Minh-Duong Nguyen, Nguyen Van Huynh, Sunghwan Kim, Quoc-Viet Pham

Abstract: The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and… ▽ More The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and mobile gaming). To bypass this circumvent, Generative AI (GAI), a.k.a. AI-generated content (AIGC), has emerged as a powerful AI paradigm; thanks to its ability to efficiently learn complex data distributions and generate synthetic data to represent the original data in various forms. This impressive feature is projected to transform the management of mobile networking and diversify the current services and applications provided. On this basis, this work presents a concise tutorial on the role of GAIs in mobile and wireless networking. In particular, this survey first provides the fundamentals of GAI and representative GAI models, serving as an essential preliminary to the understanding of the applications of GAI in mobile and wireless networking. Then, this work provides a comprehensive review of state-of-the-art studies and GAI applications in network management, wireless security, semantic communication, and lessons learned from the open literature. Finally, this work summarizes the current research on GAI for mobile and wireless networking by outlining important challenges that need to be resolved to facilitate the development and applicability of GAI in this edge-cutting area. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19612 [pdf, other]

Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations

Authors: Hai-Dang Kieu, Minh Duc Nguyen, Thanh-Son Nguyen, Dung D. Le

Abstract: Recent advancements in Large Language Models (LLMs) have shown significant potential in enhancing recommender systems. However, addressing the cold-start recommendation problem, where users lack historical data, remains a considerable challenge. In this paper, we introduce KALM4Rec (Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations), a novel framework spe… ▽ More Recent advancements in Large Language Models (LLMs) have shown significant potential in enhancing recommender systems. However, addressing the cold-start recommendation problem, where users lack historical data, remains a considerable challenge. In this paper, we introduce KALM4Rec (Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations), a novel framework specifically designed to tackle this problem by requiring only a few input keywords from users in a practical scenario of cold-start user restaurant recommendations. KALM4Rec operates in two main stages: candidates retrieval and LLM-based candidates re-ranking. In the first stage, keyword-driven retrieval models are used to identify potential candidates, addressing LLMs' limitations in processing extensive tokens and reducing the risk of generating misleading information. In the second stage, we employ LLMs with various prompting strategies, including zero-shot and few-shot techniques, to re-rank these candidates by integrating multiple examples directly into the LLM prompts. Our evaluation, using a Yelp restaurant dataset with user reviews from three English-speaking cities, shows that our proposed framework significantly improves recommendation quality. Specifically, the integration of in-context instructions with LLMs for re-ranking markedly enhances the performance of the cold-start user recommender system. △ Less

Submitted 8 September, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 10 pages, 10 figures, 4 tables

arXiv:2405.18605 [pdf, ps, other]

BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction

Authors: Bridget T. McInnes, Jiawei Tang, Darshini Mahendran, Mai H. Nguyen

Abstract: This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improv… ▽ More This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17385 [pdf, other]

Thermalization and Criticality on an Analog-Digital Quantum Simulator

Authors: Trond I. Andersen, Nikita Astrakhantsev, Amir H. Karamlou, Julia Berndtsson, Johannes Motruk, Aaron Szasz, Jonathan A. Gross, Alexander Schuckert, Tom Westerhout, Yaxing Zhang, Ebrahim Forati, Dario Rossi, Bryce Kobrin, Agustin Di Paolo, Andrey R. Klots, Ilya Drozdov, Vladislav D. Kurilovich, Andre Petukhov, Lev B. Ioffe, Andreas Elben, Aniket Rath, Vittorio Vitale, Benoit Vermersch, Rajeev Acharya, Laleh Aghababaie Beni , et al. (202 additional authors not shown)

Abstract: Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal qua… ▽ More Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal quantum gates and high-fidelity analog evolution, with performance beyond the reach of classical simulation in cross-entropy benchmarking experiments. Emulating a two-dimensional (2D) XY quantum magnet, we leverage a wide range of measurement techniques to study quantum states after ramps from an antiferromagnetic initial state. We observe signatures of the classical Kosterlitz-Thouless phase transition, as well as strong deviations from Kibble-Zurek scaling predictions attributed to the interplay between quantum and classical coarsening of the correlated domains. This interpretation is corroborated by injecting variable energy density into the initial state, which enables studying the effects of the eigenstate thermalization hypothesis (ETH) in targeted parts of the eigenspectrum. Finally, we digitally prepare the system in pairwise-entangled dimer states and image the transport of energy and vorticity during thermalization. These results establish the efficacy of superconducting analog-digital quantum processors for preparing states across many-body spectra and unveiling their thermalization dynamics. △ Less

Submitted 8 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Showing 1–50 of 832 results for author: Nguyen, M