-
Enhancement of the sound absorption of closed-cell mineral foams by perforations: Manufacturing process and model-supported adaptation
Authors:
Bart Van Damme,
Théo Cavalieri,
Cong-Truc Nguyen,
Camille Perrot
Abstract:
Thin low-frequency acoustic absorbers that are economical to produce in large quantities are scarce, and their efficiency is often limited to a narrow frequency range. In this paper, we present opportunities to use highly porous mineral foams, in particular optimally designed gypsum foams, to achieve high absorption levels for layers of less than 1/10 of a wavelength thick. To reach this goal, we…
▽ More
Thin low-frequency acoustic absorbers that are economical to produce in large quantities are scarce, and their efficiency is often limited to a narrow frequency range. In this paper, we present opportunities to use highly porous mineral foams, in particular optimally designed gypsum foams, to achieve high absorption levels for layers of less than 1/10 of a wavelength thick. To reach this goal, we perforate a fraction of the initially closed pores using thin needles. Finite element simulations of the fluid flow in a representative volume element show how the combination of foam properties (cell size and wall thickness) and perforation pattern (hole diameter and perforation distance) can be chosen such that sub-wavelength absorption is obtained. In particular two transport parameters used in the approximate but robust Johnson-Champoux-Allard model for porous media have to be optimized: the flow resistivity and high-frequency tortuosity. The fluid flow modeling results are successfully compared with sound absorption measurements, showing indeed that the proposed material, once appropriately perforated, yields a remarkable low-frequency sound absorption peak. On a more fundamental level, this paper shows how the multiporosity, the presence of microcracks, and the material's surface roughness can be exploited to enhance its acoustic absorption at very low frequencies.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
Authors:
Chuanghao Ding,
Xuejing Liu,
Wei Tang,
Juan Li,
Xiaoliang Wang,
Rui Zhao,
Cam-Tu Nguyen,
Fei Tan
Abstract:
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) by generating high-quality, diverse datasets that include text, images, tables, and charts. Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create…
▽ More
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) by generating high-quality, diverse datasets that include text, images, tables, and charts. Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset. Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies. The release of a benchmark dataset comprising 5,000 image-text pairs not only showcases the pipeline's capabilities but also provides a valuable resource for the VDU community to advance research and development in document image recognition. This work significantly contributes to the field by offering a scalable solution to data scarcity and by validating the efficacy of end-to-end models in parsing complex, real-world documents.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Variational Autoencoder for Anomaly Detection: A Comparative Study
Authors:
Huy Hoang Nguyen,
Cuong Nhat Nguyen,
Xuan Tung Dao,
Quoc Trung Duong,
Dzung Pham Thi Kim,
Minh-Tan Pham
Abstract:
This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a…
▽ More
This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a vision transformer (ViT-VAE). The findings reveal that ViT-VAE exhibits exemplary performance across various scenarios, whereas VAE-GRF may necessitate more intricate hyperparameter tuning to attain its optimal performance state. Additionally, to mitigate the propensity for over-reliance on results derived from the widely used MVTec dataset, this paper leverages the recently-public MiAD dataset for benchmarking. This deliberate inclusion seeks to enhance result competitiveness by alleviating the impact of domain-specific models tailored exclusively for MVTec, thereby contributing to a more robust evaluation framework. Codes is available at https://github.com/endtheme123/VAE-compare.git.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition
Authors:
Cam-Van Thi Nguyen,
The-Son Le,
Anh-Tuan Mai,
Duc-Trong Le
Abstract:
Multimodal Emotion Recognition in Conversations (ERC) is a typical multimodal learning task in exploiting various data modalities concurrently. Prior studies on effective multimodal ERC encounter challenges in addressing modality imbalances and optimizing learning across modalities. Dealing with these problems, we present a novel framework named Ada2I, which consists of two inseparable modules nam…
▽ More
Multimodal Emotion Recognition in Conversations (ERC) is a typical multimodal learning task in exploiting various data modalities concurrently. Prior studies on effective multimodal ERC encounter challenges in addressing modality imbalances and optimizing learning across modalities. Dealing with these problems, we present a novel framework named Ada2I, which consists of two inseparable modules namely Adaptive Feature Weighting (AFW) and Adaptive Modality Weighting (AMW) for feature-level and modality-level balancing respectively via leveraging both Inter- and Intra-modal interactions. Additionally, we introduce a refined disparity ratio as part of our training optimization strategy, a simple yet effective measure to assess the overall discrepancy of the model's learning process when handling multiple modalities simultaneously. Experimental results validate the effectiveness of Ada2I with state-of-the-art performance compared to baselines on three benchmark datasets, particularly in addressing modality imbalances.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Offline RLHF Methods Need More Accurate Supervision Signals
Authors:
Shiqi Wang,
Zhengze Zhang,
Rui Zhao,
Fei Tan,
Cam Tu Nguyen
Abstract:
With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset…
▽ More
With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset. Current offline RLHF only captures the ``ordinal relationship'' between responses, overlooking the crucial aspect of ``how much'' one is preferred over the others. To address this issue, we propose a simple yet effective solution called \textbf{R}eward \textbf{D}ifference \textbf{O}ptimization, shorted as \textbf{RDO}. Specifically, we introduce {\it reward difference coefficients} to reweigh sample pairs in offline RLHF. We then develop a {\it difference model} involving rich interactions between a pair of responses for predicting these difference coefficients. Experiments with 7B LLMs on the HH and TL;DR datasets substantiate the effectiveness of our method in both automatic metrics and human evaluation, thereby highlighting its potential for aligning LLMs with human intent and values.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Bundle Recommendation with Item-level Causation-enhanced Multi-view Learning
Authors:
Huy-Son Nguyen,
Tuan-Nghia Bui,
Long-Hai Nguyen,
Hoang Manh-Hung,
Cam-Van Thi Nguyen,
Hoang-Quynh Le,
Duc-Trong Le
Abstract:
Bundle recommendation aims to enhance business profitability and user convenience by suggesting a set of interconnected items. In real-world scenarios, leveraging the impact of asymmetric item affiliations is crucial for effective bundle modeling and understanding user preferences. To address this, we present BunCa, a novel bundle recommendation approach employing item-level causation-enhanced mul…
▽ More
Bundle recommendation aims to enhance business profitability and user convenience by suggesting a set of interconnected items. In real-world scenarios, leveraging the impact of asymmetric item affiliations is crucial for effective bundle modeling and understanding user preferences. To address this, we present BunCa, a novel bundle recommendation approach employing item-level causation-enhanced multi-view learning. BunCa provides comprehensive representations of users and bundles through two views: the Coherent View, leveraging the Multi-Prospect Causation Network for causation-sensitive relations among items, and the Cohesive View, employing LightGCN for information propagation among users and bundles. Modeling user preferences and bundle construction combined from both views ensures rigorous cohesion in direct user-bundle interactions through the Cohesive View and captures explicit intents through the Coherent View. Simultaneously, the integration of concrete and discrete contrastive learning optimizes the consistency and self-discrimination of multi-view representations. Extensive experiments with BunCa on three benchmark datasets demonstrate the effectiveness of this novel research and validate our hypothesis.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
A combined study of thermohaline mixing and envelope overshooting with PARSEC: Calibration to NGC 6397 and M4
Authors:
C. T. Nguyen,
A. Bressan,
A. J. Korn,
G. Cescutti,
G. Costa,
F. Addari,
L. Girardi,
X. Fu,
Y. Chen,
P. Marigo
Abstract:
Thermohaline mixing is one of the main processes in low-mass red giant stars that affect the transport of chemicals and, thus, the surface abundances along the evolution. The interplay of thermohaline mixing with other processes, such as the downward overshooting from the convective envelope, should be carefully investigated. This study aims to understand the combined effects of thermohaline mixin…
▽ More
Thermohaline mixing is one of the main processes in low-mass red giant stars that affect the transport of chemicals and, thus, the surface abundances along the evolution. The interplay of thermohaline mixing with other processes, such as the downward overshooting from the convective envelope, should be carefully investigated. This study aims to understand the combined effects of thermohaline mixing and envelope overshooting. After implementing the thermohaline mixing process in the \textsc{parsec} stellar evolutionary code, we compute tracks and isochrones (with \textsc{trilegal} code) and compare them with observational data. To constrain the efficiencies of both processes, we perform a detailed modelling that is suitable for globular clusters NGC 6397 and M4. Our results indicate that an envelope overshooting efficiency parameter, $Λ_\mathrm{e}=0.6$, and a thermohaline efficiency parameter, $α_\mathrm{th}=50$, are necessary to reproduce the RGB bump magnitudes and lithium abundances observed in these clusters. We find that both envelope overshooting and thermohaline mixing have a significant impact on the variation of $^7$Li abundances. Additionally, we also explore the effects of adopting solar-scaled or $α$-enhanced mixtures on our models. The $^{12}$C and the $^{12}$C/$^{13}$C ratio are also effective indicators to probe extra mixing in RGB stars. Although, their usefulness is currently limited by the lack of precise and accurate C-isotopes abundances.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Convergence Speed for Fekete Points on Uniformly Polynomially Cuspidal Sets
Authors:
Hyunsoo Ahn,
Ngoc Cuong Nguyen
Abstract:
We obtain the convergence speed for Fekete points on uniformly polynomially cuspidal compact sets introduced by Pawlucki and Pleśniak. This is done by showing that these sets are $(\mathscr{C}^α, \mathscr{C}^{α'})$-regular in the sense of Dinh, Ma and Nguyen.
We obtain the convergence speed for Fekete points on uniformly polynomially cuspidal compact sets introduced by Pawlucki and Pleśniak. This is done by showing that these sets are $(\mathscr{C}^α, \mathscr{C}^{α'})$-regular in the sense of Dinh, Ma and Nguyen.
△ Less
Submitted 14 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Joint Design of Probabilistic Constellation Shaping and Precoding for Multi-user VLC Systems
Authors:
Thang K. Nguyen,
Thanh V. Pham,
Hoang D. Le,
Chuyen T. Nguyen,
Anh T. Pham
Abstract:
This paper proposes a joint design of probabilistic constellation shaping (PCS) and precoding to enhance the sum-rate performance of multi-user visible light communications (VLC) broadcast channels subject to signal amplitude constraint. In the proposed design, the transmission probabilities of bipolar $M$-pulse amplitude modulation ($M$-PAM) symbols for each user and the transmit precoding matrix…
▽ More
This paper proposes a joint design of probabilistic constellation shaping (PCS) and precoding to enhance the sum-rate performance of multi-user visible light communications (VLC) broadcast channels subject to signal amplitude constraint. In the proposed design, the transmission probabilities of bipolar $M$-pulse amplitude modulation ($M$-PAM) symbols for each user and the transmit precoding matrix are jointly optimized to improve the sum-rate performance. The joint design problem is shown to be a complex non-convex problem due to the non-convexity of the objective function. To tackle the problem, the firefly algorithm (FA), a nature-inspired heuristic optimization approach, is employed to solve a local optima to the original non-convex optimization problem. The FA-based approach, however, suffers from high computational complexity. Therefore, we propose a low-complexity design based on zero-forcing (ZF) precoding, which is solved using an alternating optimization (AO) approach. Simulation results reveal that the proposed joint design with PCS significantly improves the sum-rate performance compared to the conventional design with uniform signaling. Some insights into the optimal symbol distributions of the two joint design approaches are also provided.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Mastering Agile Jumping Skills from Simple Practices with Iterative Learning Control
Authors:
Chuong Nguyen,
Lingfan Bao,
Quan Nguyen
Abstract:
Achieving precise target jumping with legged robots poses a significant challenge due to the long flight phase and the uncertainties inherent in contact dynamics and hardware. Forcefully attempting these agile motions on hardware could result in severe failures and potential damage. Motivated by these challenging problems, we propose an Iterative Learning Control (ILC) approach that aims to learn…
▽ More
Achieving precise target jumping with legged robots poses a significant challenge due to the long flight phase and the uncertainties inherent in contact dynamics and hardware. Forcefully attempting these agile motions on hardware could result in severe failures and potential damage. Motivated by these challenging problems, we propose an Iterative Learning Control (ILC) approach that aims to learn and refine jumping skills from easy to difficult, instead of directly learning these challenging tasks. We verify that learning from simplicity can enhance safety and target jumping accuracy over trials. Compared to other ILC approaches for legged locomotion, our method can tackle the problem of a long flight phase where control input is not available. In addition, our approach allows the robot to apply what it learns from a simple jumping task to accomplish more challenging tasks within a few trials directly in hardware, instead of learning from scratch. We validate the method via extensive experiments in the A1 model and hardware for various jumping tasks. Starting from a small jump (e.g., a forward leap of 40cm), our learning approach empowers the robot to accomplish a variety of challenging targets, including jumping onto a 20cm high box, jumping to a greater distance of up to 60cm, as well as performing jumps while carrying an unknown payload of 2kg. Our framework can allow the robot to reach the desired position and orientation targets with approximate errors of 1cm and 1 degree within a few trials.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Homomorphic Encryption-Enabled Federated Learning for Privacy-Preserving Intrusion Detection in Resource-Constrained IoV Networks
Authors:
Bui Duc Manh,
Chi-Hieu Nguyen,
Dinh Thai Hoang,
Diep N. Nguyen
Abstract:
This paper aims to propose a novel framework to address the data privacy issue for Federated Learning (FL)-based Intrusion Detection Systems (IDSs) in Internet-of-Vehicles(IoVs) with limited computational resources. In particular, in conventional FL systems, it is usually assumed that the computing nodes have sufficient computational resources to process the training tasks. However, in practical I…
▽ More
This paper aims to propose a novel framework to address the data privacy issue for Federated Learning (FL)-based Intrusion Detection Systems (IDSs) in Internet-of-Vehicles(IoVs) with limited computational resources. In particular, in conventional FL systems, it is usually assumed that the computing nodes have sufficient computational resources to process the training tasks. However, in practical IoV systems, vehicles usually have limited computational resources to process intensive training tasks, compromising the effectiveness of deploying FL in IDSs. While offloading data from vehicles to the cloud can mitigate this issue, it introduces significant privacy concerns for vehicle users (VUs). To resolve this issue, we first propose a highly-effective framework using homomorphic encryption to secure data that requires offloading to a centralized server for processing. Furthermore, we develop an effective training algorithm tailored to handle the challenges of FL-based systems with encrypted data. This algorithm allows the centralized server to directly compute on quantum-secure encrypted ciphertexts without needing decryption. This approach not only safeguards data privacy during the offloading process from VUs to the centralized server but also enhances the efficiency of utilizing FL for IDSs in IoV systems. Our simulation results show that our proposed approach can achieve a performance that is as close to that of the solution without encryption, with a gap of less than 0.8%.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Adaptive-Frequency Model Learning and Predictive Control for Dynamic Maneuvers on Legged Robots
Authors:
Chuong Nguyen,
Abdullah Altawaitan,
Thai Duong,
Nikolay Atanasov,
Quan Nguyen
Abstract:
Achieving both target accuracy and robustness in dynamic maneuvers with long flight phases, such as high or long jumps, has been a significant challenge for legged robots. To address this challenge, we propose a novel learning-based control approach consisting of model learning and model predictive control (MPC) utilizing an adaptive frequency scheme. Compared to existing MPC techniques, we learn…
▽ More
Achieving both target accuracy and robustness in dynamic maneuvers with long flight phases, such as high or long jumps, has been a significant challenge for legged robots. To address this challenge, we propose a novel learning-based control approach consisting of model learning and model predictive control (MPC) utilizing an adaptive frequency scheme. Compared to existing MPC techniques, we learn a model directly from experiments, accounting not only for leg dynamics but also for modeling errors and unknown dynamics mismatch in hardware and during contact. Additionally, learning the model with adaptive frequency allows us to cover the entire flight phase and final jumping target, enhancing the prediction accuracy of the jumping trajectory. Using the learned model, we also design an adaptive-frequency MPC to effectively leverage different jumping phases and track the target accurately. In hardware experiments with a Unitree A1 robot, we demonstrate that our approach outperforms baseline MPC using a nominal model, reducing the jumping distance error up to 8 times. We achieve jumping distance errors of less than 3 percent during continuous jumping on uneven terrain with randomly-placed perturbations of random heights (up to 4 cm or 27 percent of the robot's standing height). Our approach obtains distance errors of 1-2 cm on 34 single and continuous jumps with different jumping targets and model uncertainties.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Authors:
Cuong Pham,
Hoang Anh Dung,
Cuong C. Nguyen,
Trung Le,
Dinh Phung,
Gustavo Carneiro,
Thanh-Toan Do
Abstract:
Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a large training set is not available. However, it often leads to overfitting on the small calibration dataset. Several methods have been proposed to address this i…
▽ More
Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a large training set is not available. However, it often leads to overfitting on the small calibration dataset. Several methods have been proposed to address this issue, yet they still rely on only the calibration set for the quantization and they do not validate the quantized model due to the lack of a validation set. In this work, we propose a novel meta-learning based approach to enhance the performance of post-training quantization. Specifically, to mitigate the overfitting problem, instead of only training the quantized model using the original calibration set without any validation during the learning process as in previous PTQ works, in our approach, we both train and validate the quantized model using two different sets of images. In particular, we propose a meta-learning based approach to jointly optimize a transformation network and a quantized model through bi-level optimization. The transformation network modifies the original calibration data and the modified data will be used as the training set to learn the quantized model with the objective that the quantized model achieves a good performance on the original calibration data. Extensive experiments on the widely used ImageNet dataset with different neural network architectures demonstrate that our approach outperforms the state-of-the-art PTQ methods.
△ Less
Submitted 27 July, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
A remark on the Hölder regularity of solutions to the complex Hessian equation
Authors:
Slawomir Kolodziej,
Ngoc Cuong Nguyen
Abstract:
We prove that the Dirichlet problem for the complex Hessian equation has the Hölder continuous solution provided it has a subsolution with this property. Compared to the previous result of Benali-Zeriahi and Charabati-Zeriahi we remove the assumption on the finite total mass of the measure on the right hand side.
We prove that the Dirichlet problem for the complex Hessian equation has the Hölder continuous solution provided it has a subsolution with this property. Compared to the previous result of Benali-Zeriahi and Charabati-Zeriahi we remove the assumption on the finite total mass of the measure on the right hand side.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Angular dependent measurement of electron-ion recombination in liquid argon for ionization calorimetry in the ICARUS liquid argon time projection chamber
Authors:
ICARUS collaboration,
P. Abratenko,
N. Abrego-Martinez,
A. Aduszkiewic,
F. Akbar,
L. Aliaga Soplin,
M. Artero Pons,
J. Asaadi,
W. F. Badgett,
B. Baibussinov,
B. Behera,
V. Bellini,
R. Benocci,
J. Berger,
S. Berkman,
S. Bertolucci,
M. Betancourt,
M. Bonesini,
T. Boone,
B. Bottino,
A. Braggiotti,
D. Brailsford,
S. J. Brice,
V. Brio,
C. Brizzolari
, et al. (156 additional authors not shown)
Abstract:
This paper reports on a measurement of electron-ion recombination in liquid argon in the ICARUS liquid argon time projection chamber (LArTPC). A clear dependence of recombination on the angle of the ionizing particle track relative to the drift electric field is observed. An ellipsoid modified box (EMB) model of recombination describes the data across all measured angles. These measurements are us…
▽ More
This paper reports on a measurement of electron-ion recombination in liquid argon in the ICARUS liquid argon time projection chamber (LArTPC). A clear dependence of recombination on the angle of the ionizing particle track relative to the drift electric field is observed. An ellipsoid modified box (EMB) model of recombination describes the data across all measured angles. These measurements are used for the calorimetric energy scale calibration of the ICARUS TPC, which is also presented. The impact of the EMB model is studied on calorimetric particle identification, as well as muon and proton energy measurements. Accounting for the angular dependence in EMB recombination improves the accuracy and precision of these measurements.
△ Less
Submitted 9 August, 2024; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run
Authors:
Gayathri Raman,
Samuele Ronchini,
James Delaunay,
Aaron Tohuvavohu,
Jamie A. Kennea,
Tyler Parsotan,
Elena Ambrosi,
Maria Grazia Bernardini,
Sergio Campana,
Giancarlo Cusumano,
Antonino D'Ai,
Paolo D'Avanzo,
Valerio D'Elia,
Massimiliano De Pasquale,
Simone Dichiara,
Phil Evans,
Dieter Hartmann,
Paul Kuin,
Andrea Melandri,
Paul O'Brien,
Julian P. Osborne,
Kim Page,
David M. Palmer,
Boris Sbarufatti,
Gianpiero Tagliaferri
, et al. (1797 additional authors not shown)
Abstract:
We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav…
▽ More
We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Calibration and simulation of ionization signal and electronics noise in the ICARUS liquid argon time projection chamber
Authors:
ICARUS collaboration,
P. Abratenko,
N. Abrego-Martinez,
A. Aduszkiewic,
F. Akbar,
L. Aliaga Soplin,
M. Artero Pons,
J. Asaadi,
W. F. Badgett,
B. Baibussinov,
B. Behera,
V. Bellini,
R. Benocci,
J. Berger,
S. Berkman,
S. Bertolucci,
M. Betancourt,
M. Bonesini,
T. Boone,
B. Bottino,
A. Braggiotti,
D. Brailsford,
S. J. Brice,
V. Brio,
C. Brizzolari
, et al. (156 additional authors not shown)
Abstract:
The ICARUS liquid argon time projection chamber (LArTPC) neutrino detector has been taking physics data since 2022 as part of the Short-Baseline Neutrino (SBN) Program. This paper details the equalization of the response to charge in the ICARUS time projection chamber (TPC), as well as data-driven tuning of the simulation of ionization charge signals and electronics noise. The equalization procedu…
▽ More
The ICARUS liquid argon time projection chamber (LArTPC) neutrino detector has been taking physics data since 2022 as part of the Short-Baseline Neutrino (SBN) Program. This paper details the equalization of the response to charge in the ICARUS time projection chamber (TPC), as well as data-driven tuning of the simulation of ionization charge signals and electronics noise. The equalization procedure removes non-uniformities in the ICARUS TPC response to charge in space and time. This work leverages the copious number of cosmic ray muons available to ICARUS at the surface. The ionization signal shape simulation applies a novel procedure that tunes the simulation to match what is measured in data. The end result of the equalization procedure and simulation tuning allows for a comparison of charge measurements in ICARUS between Monte Carlo simulation and data, showing good performance with minimal residual bias between the two.
△ Less
Submitted 5 August, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Can virtual staining for high-throughput screening generalize?
Authors:
Samuel Tonks,
Cuong Nguyen,
Steve Hood,
Ryan Musso,
Ceridwen Hopely,
Steve Titus,
Minh Doan,
Iain Styles,
Alexander Krull
Abstract:
The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung,…
▽ More
The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our findings indicate that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples. Generalization to unseen cell types shows variability depending on the cell type; models trained on ovarian or lung cell samples often perform well under other conditions, while those trained on breast cell samples consistently show poor generalization. Generalization to unseen cell types and phenotypes shows good generalization across all levels of evaluation compared to addressing unseen cell types alone. This study represents the first large-scale, data-centric analysis of the generalization capability of virtual staining models trained on diverse HTS datasets, providing valuable strategies for experimental training data generation.
△ Less
Submitted 13 August, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Authors:
Thong Nguyen,
Yi Bin,
Xiaobao Wu,
Xinshuai Dong,
Zhiyuan Hu,
Khoi Le,
Cong-Duy Nguyen,
See-Kiong Ng,
Luu Anh Tuan
Abstract:
Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t…
▽ More
Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, we propose a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model's focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, we improve video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets.
△ Less
Submitted 19 July, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning
Authors:
Cuong Pham,
Cuong C. Nguyen,
Trung Le,
Dinh Phung,
Gustavo Carneiro,
Thanh-Toan Do
Abstract:
Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The p…
▽ More
Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms
Authors:
Viet Cuong Nguyen,
Mini Jain,
Abhijat Chauhan,
Heather Jaime Soled,
Santiago Alvarez Lesmes,
Zihang Li,
Michael L. Birnbaum,
Sunny X. Tang,
Srijan Kumar,
Munmun De Choudhury
Abstract:
Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis…
▽ More
Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the "epidemic" of mental health misinformation online.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
SOAF: Scene Occlusion-aware Neural Acoustic Field
Authors:
Huiyu Gao,
Jiahao Ma,
David Ahmedt-Aristizabal,
Chuong Nguyen,
Miaomiao Liu
Abstract:
This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach…
▽ More
This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/.
△ Less
Submitted 2 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
AI-powered multimodal modeling of personalized hemodynamics in aortic stenosis
Authors:
Caglar Ozturk,
Daniel H. Pak,
Luca Rosalia,
Debkalpa Goswami,
Mary E. Robakowski,
Raymond McKay,
Christopher T. Nguyen,
James S. Duncan,
Ellen T. Roche
Abstract:
Aortic stenosis (AS) is the most common valvular heart disease in developed countries. High-fidelity preclinical models can improve AS management by enabling therapeutic innovation, early diagnosis, and tailored treatment planning. However, their use is currently limited by complex workflows necessitating lengthy expert-driven manual operations. Here, we propose an AI-powered computational framewo…
▽ More
Aortic stenosis (AS) is the most common valvular heart disease in developed countries. High-fidelity preclinical models can improve AS management by enabling therapeutic innovation, early diagnosis, and tailored treatment planning. However, their use is currently limited by complex workflows necessitating lengthy expert-driven manual operations. Here, we propose an AI-powered computational framework for accelerated and democratized patient-specific modeling of AS hemodynamics from computed tomography. First, we demonstrate that our automated meshing algorithms can generate task-ready geometries for both computational and benchtop simulations with higher accuracy and 100 times faster than existing approaches. Then, we show that our approach can be integrated with fluid-structure interaction and soft robotics models to accurately recapitulate a broad spectrum of clinical hemodynamic measurements of diverse AS patients. The efficiency and reliability of these algorithms make them an ideal complementary tool for personalized high-fidelity modeling of AS biomechanics, hemodynamics, and treatment planning.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Wafer-Scale Fabrication of InGaP-on-Insulator for Nonlinear and Quantum Photonic Applications
Authors:
Lillian Thiel,
Joshua E. Castro,
Trevor J. Steiner,
Catherine L. Nguyen,
Audrey Pechilis,
Liao Duan,
Nicholas Lewis,
Garrett D. Cole,
John E. Bowers,
Galan Moody
Abstract:
The development of manufacturable and scalable integrated nonlinear photonic materials is driving key technologies in diverse areas such as high-speed communications, signal processing, sensing, and quantum information. Here, we demonstrate a novel nonlinear platform -- InGaP-on-insulator -- optimized for visible-to-telecommunication wavelength $χ^{\left(2\right)}$ nonlinear optical processes. In…
▽ More
The development of manufacturable and scalable integrated nonlinear photonic materials is driving key technologies in diverse areas such as high-speed communications, signal processing, sensing, and quantum information. Here, we demonstrate a novel nonlinear platform -- InGaP-on-insulator -- optimized for visible-to-telecommunication wavelength $χ^{\left(2\right)}$ nonlinear optical processes. In this work, we detail our 100-mm wafer-scale InGaP-on-insulator fabrication process realized via wafer bonding, optical lithography, and dry-etching techniques. The resulting wafers yield 1000s of components in each fabrication cycle, with initial designs that include chip-to-fiber couplers, 12.5-cm-long nested spiral waveguides, and arrays of microring resonators with free-spectral ranges spanning 400-900 GHz. We demonstrate intrinsic resonator quality factors as high as 324,000 (440,000) for single-resonance (split-resonance) modes near 1550 nm corresponding to 1.56 dB cm$^{-1}$ (1.22 dB cm$^{-1}$) propagation loss. We analyze the loss versus waveguide width and resonator radius to establish the operating regime for optimal 775-to-1550 nm phase matching. By combining the high $χ^{\left(2\right)}$ and $χ^{\left(3\right)}$ optical nonlinearity of InGaP with wafer-scale fabrication and low propagation loss, these results open promising possibilities for entangled-photon, multi-photon, and squeezed light generation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber
Authors:
MicroBooNE collaboration,
P. Abratenko,
O. Alterkait,
D. Andrade Aldana,
L. Arellano,
J. Asaadi,
A. Ashkenazi,
S. Balasubramanian,
B. Baller,
A. Barnard,
G. Barr,
D. Barrow,
J. Barrow,
V. Basque,
J. Bateman,
O. Benevides Rodrigues,
S. Berkman,
A. Bhanderi,
A. Bhat,
M. Bhattacharya,
M. Bishai,
A. Blake,
B. Bogart,
T. Bolton,
J. Y. Book
, et al. (165 additional authors not shown)
Abstract:
A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const…
▽ More
A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Scintillation Light in SBND: Simulation, Reconstruction, and Expected Performance of the Photon Detection System
Authors:
SBND Collaboration,
P. Abratenko,
R. Acciarri,
C. Adams,
L. Aliaga-Soplin,
O. Alterkait,
R. Alvarez-Garrote,
C. Andreopoulos,
A. Antonakis,
L. Arellano,
J. Asaadi,
W. Badgett,
S. Balasubramanian,
V. Basque,
A. Beever,
B. Behera,
E. Belchior,
M. Betancourt,
A. Bhat,
M. Bishai,
A. Blake,
B. Bogart,
J. Bogenschuetz,
D. Brailsford,
A. Brandt
, et al. (158 additional authors not shown)
Abstract:
SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its…
▽ More
SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its design is a dual readout concept combining a system of 120 photomultiplier tubes, used for triggering, with a system of 192 X-ARAPUCA devices, located behind the anode wire planes. Furthermore, covering the cathode plane with highly-reflective panels coated with a wavelength-shifting compound recovers part of the light emitted towards the cathode, where no optical detectors exist. We show how this new design provides a high light yield and a more uniform detection efficiency, an excellent timing resolution and an independent 3D-position reconstruction using only the scintillation light. Finally, the whole reconstruction chain is applied to recover the temporal structure of the beam spill, which is resolved with a resolution on the order of nanoseconds.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Authors:
Thong Nguyen,
Yi Bin,
Junbin Xiao,
Leigang Qu,
Yicong Li,
Jay Zhangjie Wu,
Cong-Duy Nguyen,
See-Kiong Ng,
Luu Anh Tuan
Abstract:
Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with te…
▽ More
Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with temporal dynamics. In this survey, we review the key tasks of these systems and highlight the associated challenges. Based on the challenges, we summarize their methods from model architecture, model training, and data perspectives. We also conduct performance comparison among the methods, and discuss promising directions for future research.
△ Less
Submitted 1 July, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions
Authors:
Ons Aouedi,
Thai-Hoc Vu,
Alessio Sacco,
Dinh C. Nguyen,
Kandaraj Piamrat,
Guido Marchetto,
Quoc-Viet Pham
Abstract:
The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT…
▽ More
The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT (IIoT), which has the potential to significantly transform businesses and industrial domains. This paper presents a comprehensive survey of IIoT by investigating its significant applications in mobile networks, as well as its associated security and privacy issues. Specifically, we explore and discuss the roles of IIoT in a wide range of key application domains, from smart healthcare and smart cities to smart transportation and smart industries. Through such extensive discussions, we investigate important security issues in IIoT networks, where network attacks, confidentiality, integrity, and intrusion are analyzed, along with a discussion of potential countermeasures. Privacy issues in IIoT networks were also surveyed and discussed, including data, location, and model privacy leakage. Finally, we outline several key challenges and highlight potential research directions in this important area.
△ Less
Submitted 21 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Authors:
Thong Thanh Nguyen,
Zhiyuan Hu,
Xiaobao Wu,
Cong-Duy T Nguyen,
See-Kiong Ng,
Anh Tuan Luu
Abstract:
Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to e…
▽ More
Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to efficiently integrate global semantics of the video, which mitigates the video information loss caused by frame and region selection modules. Our SSL includes a gating unit to enable controllability over the flow of global semantics into visual representations. To further enhance the controllability, we introduce a cross-modal compositional congruence (C^3) objective to encourage global semantics aligned with the question. To rigorously evaluate long-form videoQA capacity, we construct two new benchmarks Ego-QA and MAD-QA featuring videos of considerably long length, i.e. 17.5 minutes and 1.9 hours, respectively. Extensive experiments demonstrate the superiority of our framework on these new as well as existing datasets.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Fully parallel implementation of digital memcomputing on FPGA
Authors:
Dyk Chung Nguyen,
Yuriy V. Pershin
Abstract:
We present a fully parallel digital memcomputing solver implemented on a field-programmable gate array (FPGA) board. For this purpose, we have designed an FPGA code that solves the ordinary differential equations associated with digital memcomputing in parallel. A feature of the code is the use of only integer-type variables and integer constants to enhance optimization. Consequently, each integra…
▽ More
We present a fully parallel digital memcomputing solver implemented on a field-programmable gate array (FPGA) board. For this purpose, we have designed an FPGA code that solves the ordinary differential equations associated with digital memcomputing in parallel. A feature of the code is the use of only integer-type variables and integer constants to enhance optimization. Consequently, each integration step in our solver is executed in 96~ns. This method was utilized for difficult instances of the Boolean satisfiability (SAT) problem close to a phase transition, involving up to about 150 variables. Our results demonstrate that the parallel implementation reduces the scaling exponent by about 1 compared to a sequential C++ code on a standard computer. Additionally, compared to C++ code, we observed a time-to-solution advantage of about three orders of magnitude. Given the limitations of FPGA resources, the current implementation of digital memcomputing will be especially useful for solving compact but challenging problems.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Quantum-symmetric equivalence is a graded Morita invariant
Authors:
Hongdi Huang,
Van C. Nguyen,
Padmini Veerapen,
Kent B. Vashaw,
Xingting Wang
Abstract:
We show that if two $m$-homogeneous algebras have Morita equivalent graded module categories, then they are quantum-symmetrically equivalent, that is, there is a monoidal equivalence between the categories of comodules for their associated universal quantum groups (in the sense of Manin) which sends one algebra to the other. As a consequence, any Zhang twist of an $m$-homogeneous algebra is a 2-co…
▽ More
We show that if two $m$-homogeneous algebras have Morita equivalent graded module categories, then they are quantum-symmetrically equivalent, that is, there is a monoidal equivalence between the categories of comodules for their associated universal quantum groups (in the sense of Manin) which sends one algebra to the other. As a consequence, any Zhang twist of an $m$-homogeneous algebra is a 2-cocycle twist by some 2-cocycle from its Manin's universal quantum group.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Industrial Metaverse: Enabling Technologies, Open Problems, and Future Trends
Authors:
Shiying Zhang,
Jun Li,
Long Shi,
Ming Ding,
Dinh C. Nguyen,
Wen Chen,
Zhu Han
Abstract:
As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspe…
▽ More
As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspection, and product testing. However, there lacks of in-depth understanding of the enabling technologies associated with the Industrial Metaverse. This encompasses both the precise industrial scenarios targeted by each technology and the potential migration of technologies developed in other domains to the industrial sector. Driven by this issue, in this article, we conduct a comprehensive survey of the state-of-the-art literature on the Industrial Metaverse. Specifically, we first analyze the advantages of the Metaverse for industrial production. Then, we review a collection of key enabling technologies of the Industrial Metaverse, including blockchain (BC), digital twin (DT), 6G, XR, and artificial intelligence (AI), and analyze how these technologies can support different aspects of industrial production. Subsequently, we present numerous formidable challenges encountered within the Industrial Metaverse, including confidentiality and security concerns, resource limitations, and interoperability constraints. Furthermore, we investigate the extant solutions devised to address them. Finally, we briefly outline several open issues and future research directions of the Industrial Metaverse.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Extended State Observer for Mismatch Disturbances Using Taylor Approximation of the Integral
Authors:
Cuong Duc Nguyen
Abstract:
The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that…
▽ More
The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that achieves exponential stability of the error dynamics. The proposed method's efficacy is demonstrated through a practical example, highlighting its potential for robust disturbance estimation in dynamic systems.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Environment-adaptive machine learning potentials
Authors:
Ngoc Cuong Nguyen,
Dionysios Sema
Abstract:
The development of interatomic potentials that can accurately capture a wide range of physical phenomena and diverse environments is of significant interest, but it presents a formidable challenge. This challenge arises from the numerous structural forms, multiple phases, complex intramolecular and intermolecular interactions, and varying external conditions. In this paper, we present a method to…
▽ More
The development of interatomic potentials that can accurately capture a wide range of physical phenomena and diverse environments is of significant interest, but it presents a formidable challenge. This challenge arises from the numerous structural forms, multiple phases, complex intramolecular and intermolecular interactions, and varying external conditions. In this paper, we present a method to construct environment-adaptive interatomic potentials by adapting to the local atomic environment of each atom within a system. The collection of atomic environments of interest is partitioned into several clusters of atomic environments. Each cluster represents a distinctive local environment and is used to define a corresponding local potential. We introduce a many-body many-potential expansion to smoothly blend these local potentials to ensure global continuity of the potential energy surface. This is achieved by computing the probability functions that determine the likelihood of an atom belonging to each cluster. We apply the environment-adaptive machine learning potentials to predict observable properties for Ta element and InP compound, and compare them with density functional theory calculations.
△ Less
Submitted 29 July, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods
Authors:
Steven Shave,
Richard Kasprowicz,
Abdullah M. Athar,
Denise Vlachou,
Neil O. Carragher,
Cuong Q. Nguyen
Abstract:
The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state o…
▽ More
The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state of the art techniques for enhanced information extraction and better informed decisions. Whilst multiple methods for measuring phenotypic similarity and interrogating profiles have been developed, the field is severely lacking standardized benchmarks using appropriate data splitting for training and unbiased evaluation of machine learning methods. To address this, we have developed 'Leak Proof CMap' and exemplified its application to a set of common transcriptomic and generic phenotypic similarity methods along with an exemplar triplet loss-based method. Benchmarking in three critical performance areas (compactness, distinctness, and uniqueness) is conducted using carefully crafted data splits ensuring no similar cell lines or treatments with shared or closely matching responses or mechanisms of action are present in training, validation, or test sets. This enables testing of models with unseen samples akin to exploring treatments with novel modes of action in novel patient derived cell lines. With a carefully crafted benchmark and data splitting regime in place, the tooling now exists to create performant phenotypic similarity methods for use in personalized medicine (novel cell lines) and to better augment high throughput phenotypic screening technologies with the L1000 transcriptomic technology.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Authors:
Hoang Chuong Nguyen,
Tianyu Wang,
Jose M. Alvarez,
Miaomiao Liu
Abstract:
This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper…
▽ More
This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data. The key contribution of our framework is to decouple depth estimation for static and dynamic regions of images in the training data. We start with an unsupervised depth estimation approach, which provides reliable depth estimates for static regions and motion cues for dynamic regions and allows us to extract moving object information at the instance level. In the next stage, we use an object network to estimate the depth of those moving objects assuming rigid motions. Then, we propose a new scale alignment module to address the scale ambiguity between estimated depths for static and dynamic regions. We can then use the depth labels generated to train an end-to-end depth estimation network and improve its performance. Extensive experiments on the Cityscapes and KITTI datasets show that our self-training strategy consistently outperforms existing self/unsupervised depth estimation methods.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
HashPoint: Accelerated Point Searching and Sampling for Neural Rendering
Authors:
Jiahao Ma,
Miaomiao Liu,
David Ahmedt-Aristizaba,
Chuong Nguyen
Abstract:
In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering…
▽ More
In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering time. We solve this problem by our HashPoint method combining these two strategies, leveraging rasterization for efficient point searching and sampling, and ray marching for rendering. Our method optimizes point searching by rasterizing points within the camera's view, organizing them in a hash table, and facilitating rapid searches. Notably, we accelerate the rendering process by adaptive sampling on the primary surface encountered by the ray. Our approach yields substantial speed-up for a range of state-of-the-art ray-tracing-based methods, maintaining equivalent or superior accuracy across synthetic and real test datasets. The code will be available at https://jiahao-ma.github.io/hashpoint/.
△ Less
Submitted 11 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study
Authors:
Zooey Nguyen,
Anthony Annunziata,
Vinh Luong,
Sang Dinh,
Quynh Le,
Anh Hai Ha,
Chanh Le,
Hong An Phan,
Shruti Raghavan,
Christopher Nguyen
Abstract:
This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accura…
▽ More
This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accuracy than generic models, with relatively greater gains attributable to fine-tuned embedding models. Additionally, employing reasoning iterations on top of RAG delivers an even bigger jump in performance, enabling the Q&A systems to get closer to human-expert quality. We discuss the implications of such findings, propose a structured technical design space capturing major technical components of Q&A AI, and provide recommendations for making high-impact technical choices for such components. We plan to follow up on this work with actionable guides for AI teams and further investigations into the impact of domain-specific augmentation in RAG and into agentic AI capabilities such as advanced planning and reasoning.
△ Less
Submitted 19 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Homography Guided Temporal Fusion for Road Line and Marking Segmentation
Authors:
Shan Wang,
Chuong Nguyen,
Jiawei Liu,
Kaihao Zhang,
Wenhan Luo,
Yanhao Zhang,
Sundaram Muthu,
Fahira Afzal Maken,
Hongdong Li
Abstract:
Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion…
▽ More
Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. To reduce computational complexity, a novel surface normal estimator is proposed to establish spatial correspondences between the sampled frames, allowing the HomoFusion module to perform a pixel-to-pixel attention mechanism in updating the representation of the occluded road lines or markings. Experiments on ApolloScape, a large-scale lane mark segmentation dataset, and ApolloScape Night with artificial simulated night-time road conditions, demonstrate that our method outperforms other existing SOTA lane mark segmentation models with less than 9\% of their parameters and computational complexity. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy. We also prove the versatility of our HomoFusion approach by applying it to the problem of water puddle segmentation and achieving SOTA performance.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Authors:
Qi Gou,
Cam-Tu Nguyen
Abstract:
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive le…
▽ More
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO). By analyzing the stability and robustness of RLHF and DPO, we propose MPO (Mixed Preference Optimization), a novel method that mitigates the weaknesses of both approaches. Specifically, we propose a two-stage training procedure: first train DPO on an easy dataset, and then perform RLHF on a difficult set with DPO model being the reference model. Here, the easy and difficult sets are constructed by a well-trained reward model that splits response pairs into those with large gaps of reward (easy), and those with small gaps (difficult). The first stage allows us to obtain a relatively optimal policy (LLM) model quickly, whereas the second stage refines LLM with online RLHF, thus mitigating the distribution shift issue associated with DPO. Experiments are conducted on two public alignment datasets, namely HH-RLHF and TLDR, demonstrating the effectiveness of MPO, both in terms of GPT4 and human evaluation.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
KDMCSE: Knowledge Distillation Multimodal Sentence Embeddings with Adaptive Angular margin Contrastive Learning
Authors:
Cong-Duy Nguyen,
Thong Nguyen,
Xiaobao Wu,
Anh Tuan Luu
Abstract:
Previous work on multimodal sentence embedding has proposed multimodal contrastive learning and achieved promising results. However, by taking the rest of the batch as negative samples without reviewing when forming contrastive pairs, those studies encountered many suspicious and noisy negative examples, significantly affecting the methods' overall performance. In this work, we propose KDMCSE (Kno…
▽ More
Previous work on multimodal sentence embedding has proposed multimodal contrastive learning and achieved promising results. However, by taking the rest of the batch as negative samples without reviewing when forming contrastive pairs, those studies encountered many suspicious and noisy negative examples, significantly affecting the methods' overall performance. In this work, we propose KDMCSE (Knowledge Distillation Multimodal contrastive learning of Sentence Embeddings), a novel approach that enhances the discrimination and generalizability of multimodal representation and inherits the knowledge from the teacher model to learn the difference between positive and negative instances and via that, can detect noisy and wrong negative samples effectively before they are calculated in the contrastive objective. Furthermore, to overcome the limitation of modeling the variation within negative pairs, we introduce a new contrastive objective, AdapACSE (Adaptive Angular Margin Supervised Contrastive Learning for Multimodal sentence embeddings), that enhances the discriminative representation by strengthening the margin within the angular space while capturing varying semantics within the negative. Experimental results on widely used Semantic Textual Similarity (STS) benchmarks demonstrate the effectiveness of our approach.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Physics mechanisms of fines detachment and migration during CO2-water corefloods
Authors:
C. Nguyen,
G. Loi,
T. Russell,
Y. Yang,
N. N. Zulkifli,
M. I. Mahamad Amir,
A. A. Abdul Manap,
S. R. Mohd Shafian,
A. Badalyan,
P. Bedrikovetsky,
A. Zeinijahromi
Abstract:
One of the key risks for a Carbon Capture Storage (CCS) is injectivity decline. Evaporation of the connate brine in near-wellbore region during CO2 injection may result in drying-up the rock yielding the mobilisation and migration of clay particles leading to decline rock permeability and consequent loss of well injectivity. Influx of the reservoir brine into the dried-up zone yields accumulation…
▽ More
One of the key risks for a Carbon Capture Storage (CCS) is injectivity decline. Evaporation of the connate brine in near-wellbore region during CO2 injection may result in drying-up the rock yielding the mobilisation and migration of clay particles leading to decline rock permeability and consequent loss of well injectivity. Influx of the reservoir brine into the dried-up zone yields accumulation of precipitated salt and injectivity decline. This paper presents the results of eight coreflooding experiments aiming investigation of the effect of rock dry-out, fines migration, and salt precipitation during CO2 injection. Pressure drops across the cores, brine saturation and produced clay fines concentration versus Pore Volume Injected (PVI) have been measured. All lab tests exhibit the following features: intensive fines production at the very beginning of gas-water production period following reduced-rate fines production during overall evaporation period and continuous fines disappearance at the late stage; abrupt increase in gas permeability in the middle of evaporation, and non-monotonic evaporation rate and pressure drop. To explain these phenomena, we distinguished three sequential regimes of fines detachment during two-phase displacement: (i) moving gas-water menisci; (ii) pendular rings of residual water; (iii) dry flux, and found that for the conditions of our corefloods, detachment is possible in regime (i) only. Fines production during overall evaporation period is explained by simultaneous occurrence of three regimes during unstable displacement of water by gas in micro-heterogeneous rock.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Emerging Technologies for 6G Non-Terrestrial-Networks: From Academia to Industrial Applications
Authors:
Cong T. Nguyen,
Yuris Mulya Saputra,
Nguyen Van Huynh,
Tan N. Nguyen,
Dinh Thai Hoang,
Diep N Nguyen,
Van-Quan Pham,
Miroslav Voznak,
Symeon Chatzinotas,
Dinh-Hieu Tran
Abstract:
Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient…
▽ More
Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient communication infrastructure for the future. To address these problems, Non-terrestrial Network (NTN) has emerged to be a promising solution. NTNs are communication networks that leverage airborne (e.g., unmanned aerial vehicles) and spaceborne vehicles (e.g., satellites) to facilitate ultra-reliable communications and connectivity with high data rates and low latency over expansive regions. This article aims to provide a comprehensive survey on the utilization of network slicing, Artificial Intelligence/Machine Learning (AI/ML), and Open Radio Access Network (ORAN) to address diverse challenges of NTNs from the perspectives of both academia and industry. Particularly, we first provide an in-depth tutorial on NTN and the key enabling technologies including network slicing, AI/ML, and ORAN. Then, we provide a comprehensive survey on how network slicing and AI/ML have been leveraged to overcome the challenges that NTNs are facing. Moreover, we present how ORAN can be utilized for NTNs. Finally, we highlight important challenges, open issues, and future research directions of NTN in the 6G era.
△ Less
Submitted 3 July, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
The least primary factor of the multiplicative group
Authors:
Greg Martin,
Chau Nguyen
Abstract:
Let $S(n)$ denote the least primary factor in the primary decomposition of the multiplicative group $M_n = (\Bbb Z/n\Bbb Z)^\times$. We give an asymptotic formula, with order of magnitude $x/(\log x)^{1/2}$, for the counting function of those integers $n$ for which $S(n) \ne 2$. We also give an asymptotic formula, for any prime power $q$, for the counting function of those integers $n$ for which…
▽ More
Let $S(n)$ denote the least primary factor in the primary decomposition of the multiplicative group $M_n = (\Bbb Z/n\Bbb Z)^\times$. We give an asymptotic formula, with order of magnitude $x/(\log x)^{1/2}$, for the counting function of those integers $n$ for which $S(n) \ne 2$. We also give an asymptotic formula, for any prime power $q$, for the counting function of those integers $n$ for which $S(n) = q$. This group-theoretic problem can be reduced to problems of counting integers with restrictions on their prime factors, allowing it to be addressed by classical techniques of analytic number theory.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection
Authors:
Jingyi Liao,
Xun Xu,
Manh Cuong Nguyen,
Adam Goodge,
Chuan Sheng Foo
Abstract:
Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we…
▽ More
Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition
Authors:
Cam-Van Thi Nguyen,
Cao-Bach Nguyen,
Quang-Thuy Ha,
Duc-Trong Le
Abstract:
Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to ad…
▽ More
Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models. We release the code for MultiDAG+CL and experiments: https://github.com/vanntc711/MultiDAG-CL
△ Less
Submitted 8 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Consensus seeking in diffusive multidimensional networks with a repeated interaction pattern and time-delays
Authors:
Hoang Huy Vu,
Quyen Ngoc Nguyen,
Chuong Van Nguyen,
Tuynh Van Pham,
Minh Hoang Trinh
Abstract:
This paper studies a consensus problem in multidimensional networks having the same agent-to-agent interaction pattern under both intra- and cross-layer time delays. Several conditions for the agents to globally asymptotically achieve a consensus are derived, which involve the overall network's structure, the local interacting pattern, and the values of the time delays. The validity of these condi…
▽ More
This paper studies a consensus problem in multidimensional networks having the same agent-to-agent interaction pattern under both intra- and cross-layer time delays. Several conditions for the agents to globally asymptotically achieve a consensus are derived, which involve the overall network's structure, the local interacting pattern, and the values of the time delays. The validity of these conditions is proved by direct eigenvalue evaluation and supported by numerical simulations.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Q-learning-based Joint Design of Adaptive Modulation and Precoding for Physical Layer Security in Visible Light Communications
Authors:
Duc M. T. Hoang,
Thanh V. Pham,
Anh T. Pham,
Chuyen T Nguyen
Abstract:
There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit e…
▽ More
There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit error rate (BER) performances. The proposed design is motivated by higher-order modulation, which results in better secrecy capacity at the expense of a higher BER. On the other hand, a proper precoding design, which can manipulate the received signal quality at the legitimate user and the eavesdropper, can also enhance secrecy performance and influence the BER. A reward function that considers the secrecy capacity and the BERs of the legitimate user's (Bob) and the eavesdropper's (Eve) channels is introduced and maximized. Due to the non-linearity and complexity of the reward function, it is challenging to solve the optical design using classical optimization techniques. Therefore, reinforcement learning-based designs using Q-learning and Deep Q-learning are proposed to maximize the reward function. Simulation results verify that compared with the baseline designs, the proposed joint designs achieve better reward values while maintaining the BER of Bob's channel (Eve's channel) well below (above) the pre-FEC (forward error correction) BER threshold.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling
Authors:
Phong C. H. Nguyen,
Xinlun Cheng,
Shahab Azarfar,
Pradeep Seshadri,
Yen T. Nguyen,
Munho Kim,
Sanghun Choi,
H. S. Udaykumar,
Stephen Baek
Abstract:
Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfa…
▽ More
Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfaces. Here, we investigate an inductive bias approach that is versatile and generalizable to model generic nonlinear field evolution problems. Our study focuses on the recent physics-aware recurrent convolutions (PARC), which incorporates a differentiator-integrator architecture that inductively models the spatiotemporal dynamics of generic physical systems. We extend the capabilities of PARC to simulate unsteady, transient, and advection-dominant systems. The extended model, referred to as PARCv2, is equipped with differential operators to model advection-reaction-diffusion equations, as well as a hybrid integral solver for stable, long-time predictions. PARCv2 is tested on both standard benchmark problems in fluid dynamics, namely Burgers and Navier-Stokes equations, and then applied to more complex shock-induced reaction problems in energetic materials. We evaluate the behavior of PARCv2 in comparison to other physics-informed and learning bias models and demonstrate its potential to model unsteady and advection-dominant dynamics regimes.
△ Less
Submitted 24 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.