-
XDC Staking and Tokenomics -- Improvement Proposal: Enhancing Sustainability and Decentralization on the Eve of XDC 2.0
Authors:
Van Khanh Nguyen
Abstract:
As the XDC network celebrates five years of stable mainnet operation and prepares for the highly anticipated launch of XDC 2.0, this research proposes a comprehensive improvement plan for the network's staking and tokenomics mechanisms. Our analysis reveals opportunities to optimize the current model, ensuring a more sustainable, decentralized, and resilient ecosystem. We introduce novel concepts,…
▽ More
As the XDC network celebrates five years of stable mainnet operation and prepares for the highly anticipated launch of XDC 2.0, this research proposes a comprehensive improvement plan for the network's staking and tokenomics mechanisms. Our analysis reveals opportunities to optimize the current model, ensuring a more sustainable, decentralized, and resilient ecosystem. We introduce novel concepts, including validator NFTs, decentralized governance, and utility-based tokenomics, to increase validator node liquidity and promote staking participation. Our proposal aims to establish a robust foundation for XDC 2.0, fostering a thriving ecosystem that rewards validators, stakeholders, and users alike. By addressing the intricacies of staking and tokenomics, this research paves the way for XDC to solidify its position as a leading decentralized network, poised for long-term success and growth.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
A Pervasive, Efficient and Private Future: Realizing Privacy-Preserving Machine Learning Through Hybrid Homomorphic Encryption
Authors:
Khoa Nguyen,
Mindaugas Budzys,
Eugene Frimpong,
Tanveer Khan,
Antonis Michalas
Abstract:
Machine Learning (ML) has become one of the most impactful fields of data science in recent years. However, a significant concern with ML is its privacy risks due to rising attacks against ML models. Privacy-Preserving Machine Learning (PPML) methods have been proposed to mitigate the privacy and security risks of ML models. A popular approach to achieving PPML uses Homomorphic Encryption (HE). Ho…
▽ More
Machine Learning (ML) has become one of the most impactful fields of data science in recent years. However, a significant concern with ML is its privacy risks due to rising attacks against ML models. Privacy-Preserving Machine Learning (PPML) methods have been proposed to mitigate the privacy and security risks of ML models. A popular approach to achieving PPML uses Homomorphic Encryption (HE). However, the highly publicized inefficiencies of HE make it unsuitable for highly scalable scenarios with resource-constrained devices. Hence, Hybrid Homomorphic Encryption (HHE) -- a modern encryption scheme that combines symmetric cryptography with HE -- has recently been introduced to overcome these challenges. HHE potentially provides a foundation to build new efficient and privacy-preserving services that transfer expensive HE operations to the cloud. This work introduces HHE to the ML field by proposing resource-friendly PPML protocols for edge devices. More precisely, we utilize HHE as the primary building block of our PPML protocols. We assess the performance of our protocols by first extensively evaluating each party's communication and computational cost on a dummy dataset and show the efficiency of our protocols by comparing them with similar protocols implemented using plain BFV. Subsequently, we demonstrate the real-world applicability of our construction by building an actual PPML application that uses HHE as its foundation to classify heart disease based on sensitive ECG data.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
High-Precision Intelligent Reflecting Surfaces-assisted Positioning Service in 5G Networks with Flexible Numerology
Authors:
Ti Ti Nguyen,
Kim-Khoa Nguyen
Abstract:
Accurate positioning is paramount for a wide array of location-based services (LBS) in fifth-generation (5G) wireless networks. Recent advances in 5G New Radio (NR) technology holds promise for very high-precision positioning services. Yet, challenges arise due to diverse types of numerology and massive connected devices. This paper presents a novel approach to improve positioning precision within…
▽ More
Accurate positioning is paramount for a wide array of location-based services (LBS) in fifth-generation (5G) wireless networks. Recent advances in 5G New Radio (NR) technology holds promise for very high-precision positioning services. Yet, challenges arise due to diverse types of numerology and massive connected devices. This paper presents a novel approach to improve positioning precision within a 5G NR framework with comb patterns on time-frequency resource mapping. We then formulate an optimization problem aimed at minimizing the maximum users' positioning error in an intelligent reflected surface (IRS)-assisted 5G network by controlling the user-anchor association, numerology-related selection, IRS's reflecting elements, privacy protection level, and transmit power. To address the non-convex nature of the underlying mixed-integer non-convex problem (MINLP), we propose an efficient algorithm that combines optimization, matching, and learning techniques. Through extensive numerical experiments, we demonstrate the effectiveness of our proposed algorithm in minimizing positioning errors compared to conventional methods.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
R2GQA: Retriever-Reader-Generator Question Answering System to Support Students Understanding Legal Regulations in Higher Education
Authors:
Phuc-Tinh Pham Do,
Duy-Ngoc Dinh Cao,
Khanh Quoc Tran,
Kiet Van Nguyen
Abstract:
In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art…
▽ More
In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art natural language understanding algorithms to comprehend the retrieved documents and extract answers. Finally, the Generator module synthesizes the extracted answers into concise and informative responses to questions of students regarding legal regulations. Furthermore, we built the ViRHE4QA dataset in the domain of university training regulations, comprising 9,758 question-answer pairs with a rigorous construction process. This is the first Vietnamese dataset in the higher regulations domain with various types of answers, both extractive and abstractive. In addition, the R2GQA system is the first system to offer abstractive answers in Vietnamese. This paper discusses the design and implementation of each module within the R2GQA system on the ViRHE4QA dataset, highlighting their functionalities and interactions. Furthermore, we present experimental results demonstrating the effectiveness and utility of the proposed system in supporting the comprehension of students of legal regulations in higher education settings. In general, the R2GQA system and the ViRHE4QA dataset promise to contribute significantly to related research and help students navigate complex legal documents and regulations, empowering them to make informed decisions and adhere to institutional policies effectively. Our dataset is available for research purposes.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Improving Electrolyte Performance for Target Cathode Loading Using Interpretable Data-Driven Approach
Authors:
Vidushi Sharma,
Andy Tek,
Khanh Nguyen,
Max Giammona,
Murtaza Zohair,
Linda Sundberg,
Young-Hye La
Abstract:
Higher loading of active electrode materials is desired in batteries, especially those based on conversion reactions, for enhanced energy density and cost efficiency. However, increasing active material loading in electrodes can cause significant performance depreciation due to internal resistance, shuttling, and parasitic side reactions, which can be alleviated to a certain extent by a compatible…
▽ More
Higher loading of active electrode materials is desired in batteries, especially those based on conversion reactions, for enhanced energy density and cost efficiency. However, increasing active material loading in electrodes can cause significant performance depreciation due to internal resistance, shuttling, and parasitic side reactions, which can be alleviated to a certain extent by a compatible design of electrolytes. In this work, a data-driven approach is leveraged to find a high-performing electrolyte formulation for a novel interhalogen battery custom to the target cathode loading. An electrolyte design consisting of 4 solvents and 4 salts is experimentally devised for a novel interhalogen battery based on a multi-electron redox reaction. The experimental dataset with variable electrolyte compositions and active cathode loading, is used to train a graph-based deep learning model mapping changing variables in the battery's material design to its specific capacity. The trained model is used to further optimize the electrolyte formulation compositions for enhancing the battery capacity at a target cathode loading by a two-fold approach: large-scale screening and interpreting electrolyte design principles for different cathode loadings. The data-driven approach is demonstrated to bring about an additional 20% increment in the specific capacity of the battery over capacities obtained from the experimental optimization.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Piecewise regular solutions to scalar balance laws with singular nonlocal sources
Authors:
Lorena Bociu,
Evangelia Ftaka,
Khai T. Nguyen,
Jacopo Schino
Abstract:
The present paper establishes a local well-posed result for piecewise regular solutions with single shock of scalar balance laws with singular integral of convolution type kernels. In a neighborhood of the shock curve, a detailed description of the solution is provided for a general class of initial data.
The present paper establishes a local well-posed result for piecewise regular solutions with single shock of scalar balance laws with singular integral of convolution type kernels. In a neighborhood of the shock curve, a detailed description of the solution is provided for a general class of initial data.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Counting rational maps on $\mathbb{P}^1$ with prescribed local conditions
Authors:
Khoa D. Nguyen,
Anwesh Ray
Abstract:
We explore distribution questions for rational maps on the projective line $\mathbb{P}^1$ over $\mathbb{Q}$ within the framework of arithmetic dynamics, drawing analogies to elliptic curves. Specifically, we investigate counting problems for rational maps $φ$ of fixed degree $d \geq 2$ with prescribed reduction properties. Our main result establishes that the set of rational maps with minimal resu…
▽ More
We explore distribution questions for rational maps on the projective line $\mathbb{P}^1$ over $\mathbb{Q}$ within the framework of arithmetic dynamics, drawing analogies to elliptic curves. Specifically, we investigate counting problems for rational maps $φ$ of fixed degree $d \geq 2$ with prescribed reduction properties. Our main result establishes that the set of rational maps with minimal resultant has positive density. Additionally, for degree 2 rational maps, we perform explicit computations demonstrating that over $32.7\%$ possess a squarefree, and hence minimal, resultant.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Authors:
Trung Dao,
Thuan Hoang Nguyen,
Thanh Le,
Duc Vu,
Khoi Nguyen,
Cuong Pham,
Anh Tran
Abstract:
In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modificat…
▽ More
In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modifications in the training methodology, including better weight initialization and efficient LoRA training. Moreover, our introduction of a novel clamped CLIP loss enhances image-text alignment and results in improved image quality. Remarkably, by combining the weights of models trained with efficient LoRA and full training, we achieve a new state-of-the-art one-step diffusion model, achieving an FID of 8.14 and surpassing all GAN-based and multi-step Stable Diffusion models. The project page is available at https://swiftbrushv2.github.io.
△ Less
Submitted 27 August, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Symmetric masking strategy enhances the performance of Masked Image Modeling
Authors:
Khanh-Binh Nguyen,
Chae Jung Park
Abstract:
Masked Image Modeling (MIM) is a technique in self-supervised learning that focuses on acquiring detailed visual representations from unlabeled images by estimating the missing pixels in randomly masked sections. It has proven to be a powerful tool for the preliminary training of Vision Transformers (ViTs), yielding impressive results across various tasks. Nevertheless, most MIM methods heavily de…
▽ More
Masked Image Modeling (MIM) is a technique in self-supervised learning that focuses on acquiring detailed visual representations from unlabeled images by estimating the missing pixels in randomly masked sections. It has proven to be a powerful tool for the preliminary training of Vision Transformers (ViTs), yielding impressive results across various tasks. Nevertheless, most MIM methods heavily depend on the random masking strategy to formulate the pretext task. This strategy necessitates numerous trials to ascertain the optimal dropping ratio, which can be resource-intensive, requiring the model to be pre-trained for anywhere between 800 to 1600 epochs. Furthermore, this approach may not be suitable for all datasets. In this work, we propose a new masking strategy that effectively helps the model capture global and local features. Based on this masking strategy, SymMIM, our proposed training pipeline for MIM is introduced. SymMIM achieves a new SOTA accuracy of 85.9\% on ImageNet using ViT-Large and surpasses previous SOTA across downstream tasks such as image classification, semantic segmentation, object detection, instance segmentation tasks, and so on.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Machine-Learning-Based Construction of Molecular Potential and Its Application in Exploring the Deep-Lying-Orbital Effect in High-Order Harmonic Generation
Authors:
Duong D. Hoang-Trong,
Khang Tran,
Doan-An Trieu,
Quan-Hao Truong,
Kim-Ngan H. Nguyen,
Cam-Tu Le,
DinhDuy Vu,
Ngoc-Hung Phan,
Ngoc-Ty Nguyen,
Van-Hoang Le,
Ngoc-Loan Phan
Abstract:
Creating soft-Coulomb-type (SC) molecular potential within single-active-electron approximation (SAE) is essential since it allows solving time-dependent Schrödinger equations with fewer computational resources compared to other multielectron methods. The current available SC potentials can accurately reproduce the energy of the highest occupied molecular orbital (HOMO), which is sufficient for an…
▽ More
Creating soft-Coulomb-type (SC) molecular potential within single-active-electron approximation (SAE) is essential since it allows solving time-dependent Schrödinger equations with fewer computational resources compared to other multielectron methods. The current available SC potentials can accurately reproduce the energy of the highest occupied molecular orbital (HOMO), which is sufficient for analyzing nonlinear effects in laser-molecule interactions like high-order harmonic generation (HHG). However, recent discoveries of significant effects of deep-lying molecular orbitals call for more precise potentials to analyze them. In this study, we present a fast and accurate method based on machine learning to construct SC potentials that simultaneously reproduce various molecular features, including energies, symmetries, and dipole moments of HOMO, HOMO-1, and HOMO-2. We use this ML model to create SC SAE potentials of the HCN molecule and then comprehensively analyze the fingerprints of lower-lying orbitals in HHG spectra emitted during the H-CN stretching. Our findings reveal that HOMO-1 plays a role in forming the second HHG plateau. Additionally, as the H-C distance increases, the plateau structure and the smoothness of HHG spectra are altered due to the redistribution of orbital electron density. These results are in line with other experimental and theoretical studies. Lastly, the machine learning approach using deconvolution and convolution neural networks in the present study is so general that it can be applied to construct molecular potential for other molecules and molecular dynamic processes.
△ Less
Submitted 4 September, 2024; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Open-Ended 3D Point Cloud Instance Segmentation
Authors:
Phuc D. A. Nguyen,
Minh Luu,
Anh Tran,
Cuong Pham,
Khoi Nguyen
Abstract:
Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined cl…
▽ More
Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. Moreover, we contribute a comprehensive set of strong baselines, derived from OV-3DIS approaches and leveraging 2D Multimodal Large Language Models. To assess the performance of our OE-3DIS system, we introduce a novel Open-Ended score, evaluating both the semantic and geometric quality of predicted masks and their associated class names, alongside the standard AP score. Our approach demonstrates significant performance improvements over the baselines on the ScanNet200 and ScanNet++ datasets. Remarkably, our method surpasses the performance of Open3DIS, the current state-of-the-art method in OV-3DIS, even in the absence of ground-truth object class names.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Authors:
Duc-Hai Pham,
Duc Dung Nguyen,
Hoang-Anh Pham,
Ho Lai Tuan,
Phong Ha Nguyen,
Khoi Nguyen,
Rang Nguyen
Abstract:
Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive natu…
▽ More
Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive nature of this annotating process significantly hampers the application and scalability of these methods. We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data. Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues, facilitating a more efficient training process. Our framework exhibits notable properties: (1) Generalizability, applicable to various 3D semantic scene completion approaches, including 2D-3D lifting and 3D-2D transformer methods. (2) Effectiveness, as demonstrated through experiments on SemanticKITTI and NYUv2, wherein our method achieves up to 85% of the fully-supervised performance using only 10% labeled data. This approach not only reduces the cost and labor associated with data annotation but also demonstrates the potential for broader adoption in camera-based systems for 3D semantic occupancy prediction.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Sampling Foundational Transformer: A Theoretical Perspective
Authors:
Viet Anh Nguyen,
Minh Lenhat,
Khoa Nguyen,
Duong Duc Hieu,
Dao Huu Hung,
Truong Son Hy
Abstract:
The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work…
▽ More
The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work on multiple data modalities (e.g., point cloud, graph, and sequence) and constraints (e.g., rotational-invariant). The existence of such model is important as contemporary foundational modeling requires operability on multiple data sources. For efficiency on large number of tokens, our model relies on our context aware sampling-without-replacement mechanism for both linear asymptotic computational complexity and real inference time gain. For efficiency, we rely on our newly discovered pseudoconvex formulation of transformer layer to increase model's convergence rate. As a model working on multiple data modalities, SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.
△ Less
Submitted 17 August, 2024; v1 submitted 11 August, 2024;
originally announced August 2024.
-
SAMSA: Efficient Transformer for Many Data Modalities
Authors:
Minh Lenhat,
Viet Anh Nguyen,
Khoa Nguyen,
Duong Duc Hieu,
Dao Huu Hung,
Truong Son Hy
Abstract:
The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on clever data-modality-dependent construction to get over the quadratic complexity of transformers. This greatly hinders their applications on different data modal…
▽ More
The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on clever data-modality-dependent construction to get over the quadratic complexity of transformers. This greatly hinders their applications on different data modalities, which is one of the pillars of contemporary foundational modeling. In this paper, we lay the groundwork for efficient foundational modeling by proposing SAMSA - SAMpling-Self-Attention, a context-aware linear complexity self-attention mechanism that works well on multiple data modalities. Our mechanism is based on a differentiable sampling without replacement method we discovered. This enables the self-attention module to attend to the most important token set, where the importance is defined by data. Moreover, as differentiability is not needed in inference, the sparse formulation of our method costs little time overhead, further lowering computational costs. In short, SAMSA achieved competitive or even SOTA results on many benchmarks, while being faster in inference, compared to other very specialized models. Against full self-attention, real inference time significantly decreases while performance ranges from negligible degradation to outperformance. We release our source code in the repository: https://github.com/HySonLab/SAMSA
△ Less
Submitted 18 August, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Large-scale cosmic ray anisotropies with 19 years of data from the Pierre Auger Observatory
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
R. Aloisio,
J. Alvarez-Muñiz,
A. Ambrosone,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
L. Andrade Dourado,
S. Andringa,
L. Apollonio,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
P. Assis,
G. Avila,
E. Avocone,
A. Bakalova
, et al. (333 additional authors not shown)
Abstract:
Results are presented for the measurement of large-scale anisotropies in the arrival directions of ultra-high-energy cosmic rays detected at the Pierre Auger Observatory during 19 years of operation, prior to AugerPrime, the upgrade of the Observatory. The 3D dipole amplitude and direction are reconstructed above $4\,$EeV in four energy bins. Besides the established dipolar anisotropy in right asc…
▽ More
Results are presented for the measurement of large-scale anisotropies in the arrival directions of ultra-high-energy cosmic rays detected at the Pierre Auger Observatory during 19 years of operation, prior to AugerPrime, the upgrade of the Observatory. The 3D dipole amplitude and direction are reconstructed above $4\,$EeV in four energy bins. Besides the established dipolar anisotropy in right ascension above $8\,$EeV, the Fourier amplitude of the $8$ to $16\,$EeV energy bin is now also above the $5σ$ discovery level. No time variation of the dipole moment above $8\,$EeV is found, setting an upper limit to the rate of change of such variations of $0.3\%$ per year at the $95\%$ confidence level. Additionally, the results for the angular power spectrum are shown, demonstrating no other statistically significant multipoles. The results for the equatorial dipole component down to $0.03\,$EeV are presented, using for the first time a data set obtained with a trigger that has been optimized for lower energies. Finally, model predictions are discussed and compared with observations, based on two source emission scenarios obtained in the combined fit of spectrum and composition above $0.6\,$EeV.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Joint Design of Probabilistic Constellation Shaping and Precoding for Multi-user VLC Systems
Authors:
Thang K. Nguyen,
Thanh V. Pham,
Hoang D. Le,
Chuyen T. Nguyen,
Anh T. Pham
Abstract:
This paper proposes a joint design of probabilistic constellation shaping (PCS) and precoding to enhance the sum-rate performance of multi-user visible light communications (VLC) broadcast channels subject to signal amplitude constraint. In the proposed design, the transmission probabilities of bipolar $M$-pulse amplitude modulation ($M$-PAM) symbols for each user and the transmit precoding matrix…
▽ More
This paper proposes a joint design of probabilistic constellation shaping (PCS) and precoding to enhance the sum-rate performance of multi-user visible light communications (VLC) broadcast channels subject to signal amplitude constraint. In the proposed design, the transmission probabilities of bipolar $M$-pulse amplitude modulation ($M$-PAM) symbols for each user and the transmit precoding matrix are jointly optimized to improve the sum-rate performance. The joint design problem is shown to be a complex non-convex problem due to the non-convexity of the objective function. To tackle the problem, the firefly algorithm (FA), a nature-inspired heuristic optimization approach, is employed to solve a local optima to the original non-convex optimization problem. The FA-based approach, however, suffers from high computational complexity. Therefore, we propose a low-complexity design based on zero-forcing (ZF) precoding, which is solved using an alternating optimization (AO) approach. Simulation results reveal that the proposed joint design with PCS significantly improves the sum-rate performance compared to the conventional design with uniform signaling. Some insights into the optimal symbol distributions of the two joint design approaches are also provided.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Generating variable $\hbar$ and $c$ via Fujii-Wetterich model in curved spacetimes
Authors:
Hoang Ky Nguyen
Abstract:
We revisit the Fujii-Wetterich model [Phys.Rev.D 26, 2580 (1982) and Nucl.Phys.B 302, 645 (1988)] which allows the Higgs doublet to couple with a "cosmon" scalar $χ$ of the background spacetime as $χ^2\,Φ^2$. Upon the SSB of the $SU(2)$ gauge, the VEV of the Higgs doublet is proportional to the field $χ$. Fujii and Wetterich employed this linkage to make particle masses dependent on $χ$. We shall…
▽ More
We revisit the Fujii-Wetterich model [Phys.Rev.D 26, 2580 (1982) and Nucl.Phys.B 302, 645 (1988)] which allows the Higgs doublet to couple with a "cosmon" scalar $χ$ of the background spacetime as $χ^2\,Φ^2$. Upon the SSB of the $SU(2)$ gauge, the VEV of the Higgs doublet is proportional to the field $χ$. Fujii and Wetterich employed this linkage to make particle masses dependent on $χ$. We shall present an $\textit alternative$ mechanism: at a given point $x^*$, the prevailing Higgs VEV will be used to $\textit construct$ a quantum of action $\hbar_*$ and a speed of light $c_*$ in association with $χ(x^*)$. Specifically, each open set vicinity of a given point $x^*$ on the manifold is equipped with a replica of the Glashow-Weinberg-Salam action operating with its own effective values of $\hbar_*$ and $c_*$, whereas particle masses induced via Higgs SSB remain independent of $χ(x^*)$. Our mechanism unambiguously generates the dependencies $\hbar_*\proptoχ^{-1/2}(x^*)$ and $c_*\proptoχ^{1/2}(x^*)$, causing these "fundamental constants" to vary along with the dynamical field $χ$ across the manifold. For late-time cosmology, a varying $c$ along the trajectory of light waves from distant supernovae towards Earth renders the classic Lemaître redshift formula $1+z=a^{-1}$ inapplicable. Using the dependency $c_*\proptoχ^{1/2}(x^*)$, we derive the new (variable-$c$) Lemaître redshift relation and apply it to analyze the Pantheon Catalog of SneIa $\textit without$ invoking the dark energy hypothesis. Key consequences are: (1) Accounting for the Pantheon Catalog with a fit exceeding the quality of the $Λ$CDM model; (2) Explaining the late-time cosmic acceleration based on variable $c$, eliminating the need for dark energy; (3) Revitalizing Blanchard-Douspis-Rowan-Robinson-Sarkar's CMB power spectrum analysis that bypassed dark energy [A&A 412, 35 (2003)].
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Active Sensing of Knee Osteoarthritis Progression with Reinforcement Learning
Authors:
Khanh Nguyen,
Huy Hoang Nguyen,
Egor Panfilov,
Aleksei Tiulpin
Abstract:
Osteoarthritis (OA) is the most common musculoskeletal disease, which has no cure. Knee OA (KOA) is one of the highest causes of disability worldwide, and it costs billions of United States dollars to the global community. Prediction of KOA progression has been of high interest to the community for years, as it can advance treatment development through more efficient clinical trials and improve pa…
▽ More
Osteoarthritis (OA) is the most common musculoskeletal disease, which has no cure. Knee OA (KOA) is one of the highest causes of disability worldwide, and it costs billions of United States dollars to the global community. Prediction of KOA progression has been of high interest to the community for years, as it can advance treatment development through more efficient clinical trials and improve patient outcomes through more efficient healthcare utilization. Existing approaches for predicting KOA, however, are predominantly static, i.e. consider data from a single time point to predict progression many years into the future, and knee level, i.e. consider progression in a single joint only. Due to these and related reasons, these methods fail to deliver the level of predictive performance, which is sufficient to result in cost savings and better patient outcomes. Collecting extensive data from all patients on a regular basis could address the issue, but it is limited by the high cost at a population level. In this work, we propose to go beyond static prediction models in OA, and bring a novel Active Sensing (AS) approach, designed to dynamically follow up patients with the objective of maximizing the number of informative data acquisitions, while minimizing their total cost over a period of time. Our approach is based on Reinforcement Learning (RL), and it leverages a novel reward function designed specifically for AS of disease progression in more than one part of a human body. Our method is end-to-end, relies on multi-modal Deep Learning, and requires no human input at inference time. Throughout an exhaustive experimental evaluation, we show that using RL can provide a higher monetary benefit when compared to state-of-the-art baselines.
△ Less
Submitted 22 August, 2024; v1 submitted 5 August, 2024;
originally announced August 2024.
-
Assessing the XDC Network: A Comprehensive Evaluation of its qualitative and technical aspects
Authors:
Atul Khekade,
Omkar Mestry,
Van Khanh Nguyen
Abstract:
This research provides a thorough assessment of the XDC Network, a delegated proof of stake (XDPoS) consensus-based blockchain technology, across its technical, security, and business dimensions. The study evaluates the network's decentralization, scalability, and security features, including its Nakamoto coefficient, validator participation, and client distribution. Additionally, it examines the…
▽ More
This research provides a thorough assessment of the XDC Network, a delegated proof of stake (XDPoS) consensus-based blockchain technology, across its technical, security, and business dimensions. The study evaluates the network's decentralization, scalability, and security features, including its Nakamoto coefficient, validator participation, and client distribution. Additionally, it examines the developer ecosystem, including GitHub metrics, and business aspects such as transaction costs and predictability. The findings of this research will provide valuable insights into the strengths and weaknesses of the XDC Network, informing stakeholders and decision-makers about its suitability for various use cases, particularly in trade finance, asset tokenization, and enterprise blockchain solutions.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
A Survey and Evaluation of Adversarial Attacks for Object Detection
Authors:
Khoi Nguyen Tiet Nguyen,
Wenyu Zhang,
Kangkang Lu,
Yuhuan Wu,
Xingjian Zheng,
Hui Li Tan,
Liangli Zhen
Abstract:
Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image class…
▽ More
Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image classification, the literature on such attacks in object detection is limited. This paper offers a comprehensive taxonomy of adversarial attacks specific to object detection, reviews existing adversarial robustness evaluation metrics, and systematically assesses open-source attack methods and model robustness. Key observations are provided to enhance the understanding of attack effectiveness and corresponding countermeasures. Additionally, we identify crucial research challenges to guide future efforts in securing automated object detection systems.
△ Less
Submitted 5 August, 2024; v1 submitted 4 August, 2024;
originally announced August 2024.
-
PINNs for Medical Image Analysis: A Survey
Authors:
Chayan Banerjee,
Kien Nguyen,
Olivier Salvado,
Truyen Tran,
Clinton Fookes
Abstract:
The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstr…
▽ More
The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstruction. We present a systematic literature review of over 80 papers on physics-informed methods dedicated to MIA. We propose a unified taxonomy to investigate what physics knowledge and processes are modelled, how they are represented, and the strategies to incorporate them into MIA models. We delve deep into a wide range of image analysis tasks, from imaging, generation, prediction, inverse imaging (super-resolution and reconstruction), registration, and image analysis (segmentation and classification). For each task, we thoroughly examine and present in a tabular format the central physics-guided operation, the region of interest (with respect to human anatomy), the corresponding imaging modality, the dataset used for model training, the deep network architecture employed, and the primary physical process, equation, or principle utilized. Additionally, we also introduce a novel metric to compare the performance of PIMIA methods across different tasks and datasets. Based on this review, we summarize and distil our perspectives on the challenges, open research questions, and directions for future research. We highlight key open challenges in PIMIA, including selecting suitable physics priors and establishing a standardized benchmarking platform.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
On Certain Polytopes Associated to Products of Algebraic Integer Conjugates
Authors:
Seda Albayrak,
Samprit Ghosh,
Greg Knapp,
Khoa D. Nguyen
Abstract:
Let $d>k$ be positive integers. Motivated by an earlier result of Bugeaud and Nguyen, we let $E_{k,d}$ be the set of $(c_1,\ldots,c_k)\in\mathbb{R}_{\geq 0}^k$ such that $\vertα_0\vert\vertα_1\vert^{c_1}\cdots\vertα_k\vert^{c_k}\geq 1$ for any algebraic integer $α$ of degree $d$, where we label its Galois conjugates as $α_0,\ldots,α_{d-1}$ with…
▽ More
Let $d>k$ be positive integers. Motivated by an earlier result of Bugeaud and Nguyen, we let $E_{k,d}$ be the set of $(c_1,\ldots,c_k)\in\mathbb{R}_{\geq 0}^k$ such that $\vertα_0\vert\vertα_1\vert^{c_1}\cdots\vertα_k\vert^{c_k}\geq 1$ for any algebraic integer $α$ of degree $d$, where we label its Galois conjugates as $α_0,\ldots,α_{d-1}$ with $\vertα_0\vert\geq \vertα_1\vert\geq\cdots \geq \vertα_{d-1}\vert$. First, we give an explicit description of $E_{k,d}$ as a polytope with $2^k$ vertices. Then we prove that for $d>3k$, for every $(c_1,\ldots,c_k)\in E_{k,d}$ and for every $α$ that is not a root of unity, the strict inequality $\vertα_0\vert\vertα_1\vert^{c_1}\cdots\vertα_k\vert^{c_k}>1$ holds. We also provide a quantitative version of this inequality in terms of $d$ and the height of the minimal polynomial of $α$.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Simulating intermediate black hole mass measurements for a sample of galaxies with nuclear star clusters using ELT/HARMONI high spatial resolution integral-field stellar kinematics
Authors:
Dieu D. Nguyen,
Michele Cappellari,
Hai N. Ngo,
Tinh Q. T. Le,
Khue N . H. Ho,
An K. Nguyen,
Huy G . Tong,
Phong T. On,
Tuan N. Le,
Miguel Pereira-Santaella
Abstract:
The fraction of low-mass galaxies hosting an intermediate-mass black hole (IMBH, with masses $M_{\rm BH} \approx 10^2-10^5$ M$_\odot$), is sensitive to how black hole seeds formed in the early Universe but is observationally still unconstrained. In this paper, we assemble a sample of dwarf galaxies within 10 Mpc hosting bright nuclear star clusters (NSCs) that could host IMBHs. For a subset of the…
▽ More
The fraction of low-mass galaxies hosting an intermediate-mass black hole (IMBH, with masses $M_{\rm BH} \approx 10^2-10^5$ M$_\odot$), is sensitive to how black hole seeds formed in the early Universe but is observationally still unconstrained. In this paper, we assemble a sample of dwarf galaxies within 10 Mpc hosting bright nuclear star clusters (NSCs) that could host IMBHs. For a subset of them, we use their observed surface brightness from {\it Hubble Space Telescope} (\hst) images, an assumed synthetic spectrum of their stellar population, Jeans Anisotropic Model (JAM) of the stellar dynamics, and the {\tt HSIM} simulator software to create mock observations with the High Angular Resolution Monolithic Optical and Near-infrared Integral (HARMONI) field spectrograph for the Extremely Large Telescope (ELT). We analyze the simulated data cube like real data, using JAM to infer the IMBH mass and its error in a Bayesian framework. Our simulations show that the ELT/HARMONI instrument can clearly detect the existence of IMBH demographics in NSCs down to a mass of about 0.5\% of the NSC.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (172 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 August, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Sentiment Reasoning for Healthcare
Authors:
Khai Le-Duc,
Khai-Nguyen Nguyen,
Bach Phan Tat,
Duy Le,
Jerry Ngo,
Long Vo-Dang,
Anh Totti Nguyen,
Truong-Son Hy
Abstract:
Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be…
▽ More
Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be explicitly stated. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Our study showed that rationale-augmented training enhances model performance in sentiment classification across both human transcript and ASR settings. Also, we found that the generated rationales typically exhibit different vocabularies compared to human-generated rationales, but maintain similar semantics. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Investigating the HIV Epidemic in Miami Using a Novel Approach for Bayesian Inference on Partially Observed Networks
Authors:
Ravi Goyal,
Kevin Nguyen,
Victor De Gruttola,
Susan J Little,
Colby Cohen,
Natasha K Martin
Abstract:
Molecular HIV Surveillance (MHS) has been described as key to enabling rapid responses to HIV outbreaks. It operates by linking individuals with genetically similar viral sequences, which forms a network. A major limitation of MHS is that it depends on sequence collection, which very rarely covers the entire population of interest. Ignoring missing data by conducting complete case analysis--which…
▽ More
Molecular HIV Surveillance (MHS) has been described as key to enabling rapid responses to HIV outbreaks. It operates by linking individuals with genetically similar viral sequences, which forms a network. A major limitation of MHS is that it depends on sequence collection, which very rarely covers the entire population of interest. Ignoring missing data by conducting complete case analysis--which assumes that the observed network is complete--has been shown to result in significantly biased estimates of network properties. We use MHS to investigate disease dynamics of the HIV epidemic in Miami-Dade County (MDC) among men who have sex with men (MSM)--only 30.1% have a reported sequence. To do so, we present an approach for making Bayesian inferences on partially observed networks. Through a simulation study, we demonstrate a reduction in error of 43%-63% between our estimates and complete case analyses. We estimate increased mixing between MSM communities in MDC, defined by race and transmission risk compared to the results based on complete case analysis. Our approach makes use of a flexible network model--congruence class model--to overcome the high computational burden of previously reported Bayesian approaches to estimate network properties from partially observed networks.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Adaptive Cascading Network for Continual Test-Time Adaptation
Authors:
Kien X. Nguyen,
Fengchun Qiao,
Xi Peng
Abstract:
We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to…
▽ More
We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Enhancing Semantic Segmentation with Adaptive Focal Loss: A Novel Approach
Authors:
Md Rakibul Islam,
Riad Hassan,
Abdullah Nazib,
Kien Nguyen,
Clinton Fookes,
Md Zahidul Islam
Abstract:
Deep learning has achieved outstanding accuracy in medical image segmentation, particularly for objects like organs or tumors with smooth boundaries or large sizes. Whereas, it encounters significant difficulties with objects that have zigzag boundaries or are small in size, leading to a notable decrease in segmentation effectiveness. In this context, using a loss function that incorporates smooth…
▽ More
Deep learning has achieved outstanding accuracy in medical image segmentation, particularly for objects like organs or tumors with smooth boundaries or large sizes. Whereas, it encounters significant difficulties with objects that have zigzag boundaries or are small in size, leading to a notable decrease in segmentation effectiveness. In this context, using a loss function that incorporates smoothness and volume information into a model's predictions offers a promising solution to these shortcomings. In this work, we introduce an Adaptive Focal Loss (A-FL) function designed to mitigate class imbalance by down-weighting the loss for easy examples that results in up-weighting the loss for hard examples and giving greater emphasis to challenging examples, such as small and irregularly shaped objects. The proposed A-FL involves dynamically adjusting a focusing parameter based on an object's surface smoothness, size information, and adjusting the class balancing parameter based on the ratio of targeted area to total area in an image. We evaluated the performance of the A-FL using ResNet50-encoded U-Net architecture on the Picai 2022 and BraTS 2018 datasets. On the Picai 2022 dataset, the A-FL achieved an Intersection over Union (IoU) of 0.696 and a Dice Similarity Coefficient (DSC) of 0.769, outperforming the regular Focal Loss (FL) by 5.5% and 5.4% respectively. It also surpassed the best baseline Dice-Focal by 2.0% and 1.2%. On the BraTS 2018 dataset, A-FL achieved an IoU of 0.883 and a DSC of 0.931. The comparative studies show that the proposed A-FL function surpasses conventional methods, including Dice Loss, Focal Loss, and their hybrid variants, in IoU, DSC, Sensitivity, and Specificity metrics. This work highlights A-FL's potential to improve deep learning models for segmenting clinically significant regions in medical images, leading to more precise and reliable diagnostic tools.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
The flux of ultra-high-energy cosmic rays along the supergalactic plane measured at the Pierre Auger Observatory
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
R. Aloisio,
J. Alvarez-Muñiz,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
L. Andrade Dourado,
S. Andringa,
L. Apollonio,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
P. Assis,
G. Avila,
E. Avocone,
A. Bakalova,
F. Barbato
, et al. (342 additional authors not shown)
Abstract:
Ultra-high-energy cosmic rays are known to be mainly of extragalactic origin, and their propagation is limited by energy losses, so their arrival directions are expected to correlate with the large-scale structure of the local Universe. In this work, we investigate the possible presence of intermediate-scale excesses in the flux of the most energetic cosmic rays from the direction of the supergala…
▽ More
Ultra-high-energy cosmic rays are known to be mainly of extragalactic origin, and their propagation is limited by energy losses, so their arrival directions are expected to correlate with the large-scale structure of the local Universe. In this work, we investigate the possible presence of intermediate-scale excesses in the flux of the most energetic cosmic rays from the direction of the supergalactic plane region using events with energies above 20 EeV recorded with the surface detector array of the Pierre Auger Observatory up to 31 December 2022, with a total exposure of 135,000 km^2 sr yr. The strongest indication for an excess that we find, with a post-trial significance of 3.1σ, is in the Centaurus region, as in our previous reports, and it extends down to lower energies than previously studied. We do not find any strong hints of excesses from any other region of the supergalactic plane at the same angular scale. In particular, our results do not confirm the reports by the Telescope Array collaboration of excesses from two regions in the Northern Hemisphere at the edge of the field of view of the Pierre Auger Observatory. With a comparable exposure, our results in those regions are in good agreement with the expectations from an isotropic distribution.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
Authors:
Khanh-Binh Nguyen,
Chae Jung Park
Abstract:
The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively considering data and model aspects to address this task effectively. This study presents a lightweight approach, SAVE, which efficiently adapts the pre-trained segment an…
▽ More
The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively considering data and model aspects to address this task effectively. This study presents a lightweight approach, SAVE, which efficiently adapts the pre-trained segment anything model (SAM) to the AVS task. By incorporating an image encoder adapter into the transformer blocks to better capture the distinct dataset information and proposing a residual audio encoder adapter to encode the audio features as a sparse prompt, our proposed model achieves effective audio-visual fusion and interaction during the encoding stage. Our proposed method accelerates the training and inference speed by reducing the input resolution from 1024 to 256 pixels while achieving higher performance compared with the previous SOTA. Extensive experimentation validates our approach, demonstrating that our proposed model outperforms other SOTA methods significantly. Moreover, leveraging the pre-trained model on synthetic data enhances performance on real AVSBench data, achieving 84.59 mIoU on the S4 (V1S) subset and 70.28 mIoU on the MS3 (V1M) set with only 256 pixels for input images. This increases up to 86.16 mIoU on the S4 (V1S) and 70.83 mIoU on the MS3 (V1M) with inputs of 1024 pixels.
△ Less
Submitted 3 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders
Authors:
Phat Lam,
Lam Pham,
Truong Nguyen,
Dat Ngo,
Thinh Pham,
Tin Nguyen,
Loi Khanh Nguyen,
Alexander Schindler
Abstract:
Existing speaker diarization systems typically rely on large amounts of manually annotated data, which is labor-intensive and difficult to obtain, especially in real-world scenarios. Additionally, language-specific constraints in these systems significantly hinder their effectiveness and scalability in multilingual settings. In this paper, we propose a cluster-based speaker diarization system desi…
▽ More
Existing speaker diarization systems typically rely on large amounts of manually annotated data, which is labor-intensive and difficult to obtain, especially in real-world scenarios. Additionally, language-specific constraints in these systems significantly hinder their effectiveness and scalability in multilingual settings. In this paper, we propose a cluster-based speaker diarization system designed for multilingual telephone call applications. Our proposed system supports multiple languages and eliminates the need for large-scale annotated data during training by utilizing the multilingual Whisper model to extract speaker embeddings. Additionally, we introduce a network architecture called Mixture of Sparse Autoencoders (Mix-SAE) for unsupervised speaker clustering. Experimental results on the evaluation dataset derived from two-speaker subsets of benchmark CALLHOME and CALLFRIEND telephonic speech corpora demonstrate the superior performance of the proposed Mix-SAE network to other autoencoder-based clustering methods. The overall performance of our proposed system also highlights the promising potential for developing unsupervised, multilingual speaker diarization systems within the context of limited annotated data. It also indicates the system's capability for integration into multi-task speech analysis applications based on general-purpose models such as those that combine speech-to-text, language detection, and speaker diarization.
△ Less
Submitted 12 September, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding
Authors:
Quang P. M. Pham,
Khoi T. N. Nguyen,
Lan C. Ngo,
Truong Do,
Truong Son Hy
Abstract:
Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, m…
▽ More
Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, multi-view 3D data. This work, to the best of our knowledge, is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence. ESGNN demands low computational resources and is easy to implement from available frameworks, paving the way for real-time applications such as robotics and computer vision.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator
Authors:
Khanh Duy Nguyen,
Woojoo Lee,
Jianchen Dang,
Tongyao Wu,
Gabriele Berruto,
Chenhui Yan,
Chi Ian Jess Ip,
Haoran Lin,
Qiang Gao,
Seng Huat Lee,
Binghai Yan,
Chaoxing Liu,
Zhiqiang Mao,
Xiao-Xiao Zhang,
Shuolong Yang
Abstract:
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th…
▽ More
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of this mechanism remains elusive, especially in intrinsic MTIs. Here we combine time-resolved photoemission spectroscopy with time-resolved magneto-optical Kerr effect measurements to elucidate the unique electromagnetism at the surface of an intrinsic MTI MnBi2Te4. Theoretical modeling based on 2D Ruderman-Kittel-Kasuya-Yosida interactions captures the initial quenching of a surface-rooted exchange gap within a factor of two but over-estimates the bulk demagnetization by one order of magnitude. This mechanism directly explains the sizable gap in the quasi-2D electronic state and the nonzero residual magnetization in even-layer MnBi2Te4. Furthermore, it leads to efficient light-induced demagnetization comparable to state-of-the-art magnetophotonic crystals, promising an effective manipulation of magnetism and topological orders for future topotronics.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
14 New Light Curves and an Updated Ephemeris for the Hot Jupiter HAT-P-54 b
Authors:
Heather B. Hewitt,
Bradley Hutson,
Michael Brockman,
Elizabeth Catogni,
Rosemary Ferreira,
Gary Fussell,
Atea Johnson,
Chris Kight,
Ryan A. Kilinski,
Khatu Nguyen,
Ty Perry,
Elizabeth Quinlan,
Eva Randazzo,
Kellan Reagan,
Kinley Subers,
Federico R. Noguer,
Molly N. Simon,
Robert T. Zellem
Abstract:
Here we present an analysis of 14 transit light curves of the hot Jupiter HAT-P-54 b. Thirteen of our datasets were obtained with the 6-inch MicroObservatory telescope, Cecilia, and one was measured with the 61-inch Kuiper Telescope. We used the EXOplanet Transit Interpretation Code (EXOTIC) to reduce 49 datasets in order to update the planet's ephemeris to a mid-transit time of 2460216.95257 +/-…
▽ More
Here we present an analysis of 14 transit light curves of the hot Jupiter HAT-P-54 b. Thirteen of our datasets were obtained with the 6-inch MicroObservatory telescope, Cecilia, and one was measured with the 61-inch Kuiper Telescope. We used the EXOplanet Transit Interpretation Code (EXOTIC) to reduce 49 datasets in order to update the planet's ephemeris to a mid-transit time of 2460216.95257 +/- 0.00022 BJD_TBD and an updated orbital period of 3.79985363 +/- 0.00000037 days. These results improve the mid-transit uncertainty by 70.27% from the most recent ephemeris update. The updated mid-transit time can help to ensure the efficient use of expensive, large ground- and space-based telescope missions in the future. This result demonstrates that amateur astronomers and citizen scientists can provide meaningful, cost-efficient, crowd-sourcing observations using ground-based telescopes to further refine current mid-transit times and orbital periods.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Violation of $γ$ in Brans-Dicke gravity
Authors:
Hoang Ky Nguyen,
Bertrand Chauvineau
Abstract:
The Brans Class I solution in Brans-Dicke gravity is a staple in the study of gravitational theories beyond General Relativity. Discovered in 1961, it describes the exterior vacuum of a spherical Brans-Dicke star and is characterized by two adjustable parameters. Surprisingly, the relationship between these parameters and the properties of the star has not been rigorously established. In this Proc…
▽ More
The Brans Class I solution in Brans-Dicke gravity is a staple in the study of gravitational theories beyond General Relativity. Discovered in 1961, it describes the exterior vacuum of a spherical Brans-Dicke star and is characterized by two adjustable parameters. Surprisingly, the relationship between these parameters and the properties of the star has not been rigorously established. In this Proceeding, we bridge this gap by deriving $\textit{the}$ complete exterior solution of Brans Class I, expressed in terms of the total energy and total pressure of the spherisymmetric gravity source. The solution allows for the $\textit{exact}$ derivation of $\textit{all}$ post-Newtonian parameters in Brans-Dicke gravity for far field regions of a spherical source. Particularly for the $γ$ parameter, instead of the conventional result $γ_{\,\text{PPN}}=\frac{ω+1}{ω+2}$, we obtain the analytical expression $γ_{\,\text{exact}}=\frac{ω+1+(ω+2)\,Θ}{ω+2+(ω+1)\,Θ}$ where $Θ$ is the ratio of the total pressure $P_{\parallel}^{*}+2P_{\perp}^{*}$ and total energy $E^{*}$ contained within the mass source. Our $\textit{non-perturbative}$ $γ$ formula is valid for all field strengths and types of matter comprising the mass source. Consequently, observational constraints on $γ$ thus set $\textit{joint}$ bounds on $ω$ and $\varTheta$, with the latter representing a global characteristic of the mass source. More broadly, our formula highlights the importance of pressure (when $\varTheta\neq0$) in spherical Brans-Dicke stars, and potentially in stars within other modified theories of gravitation.
△ Less
Submitted 25 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
ViANLI: Adversarial Natural Language Inference for Vietnamese
Authors:
Tin Van Huynh,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
The development of Natural Language Processing (NLI) datasets and models has been inspired by innovations in annotation design. With the rapid development of machine learning models today, the performance of existing machine learning models has quickly reached state-of-the-art results on a variety of tasks related to natural language processing, including natural language inference tasks. By using…
▽ More
The development of Natural Language Processing (NLI) datasets and models has been inspired by innovations in annotation design. With the rapid development of machine learning models today, the performance of existing machine learning models has quickly reached state-of-the-art results on a variety of tasks related to natural language processing, including natural language inference tasks. By using a pre-trained model during the annotation process, it is possible to challenge current NLI models by having humans produce premise-hypothesis combinations that the machine model cannot correctly predict. To remain attractive and challenging in the research of natural language inference for Vietnamese, in this paper, we introduce the adversarial NLI dataset to the NLP research community with the name ViANLI. This data set contains more than 10K premise-hypothesis pairs and is built by a continuously adjusting process to obtain the most out of the patterns generated by the annotators. ViANLI dataset has brought many difficulties to many current SOTA models when the accuracy of the most powerful model on the test set only reached 48.4%. Additionally, the experimental results show that the models trained on our dataset have significantly improved the results on other Vietnamese NLI datasets.
△ Less
Submitted 1 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Real-time Speech Summarization for Medical Conversations
Authors:
Khai Le-Duc,
Khai-Nguyen Nguyen,
Long Vo-Dang,
Truong-Son Hy
Abstract:
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation.…
▽ More
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
A mechanism for quantum-critical Planckian metal phase in high-temperature cuprate superconductors
Authors:
Yung-Yeh Chang,
Khoe Van Nguyen,
Kim Remund,
Chung-Hou Chung
Abstract:
The mysterious metallic phase showing perfect $T$-linear resistivity and a universal scattering rate $1/τ= α_P k_B T /\hbar$ with a universal prefactor $α_P \sim 1$ and logarithmic-in-temperature singular specific heat coefficient, so-called Planckian metal phase was observed in various overdoped high-$T_c$ cuprate superconductors over a finite range in doping. Here, we propose a microscopic mecha…
▽ More
The mysterious metallic phase showing perfect $T$-linear resistivity and a universal scattering rate $1/τ= α_P k_B T /\hbar$ with a universal prefactor $α_P \sim 1$ and logarithmic-in-temperature singular specific heat coefficient, so-called Planckian metal phase was observed in various overdoped high-$T_c$ cuprate superconductors over a finite range in doping. Here, we propose a microscopic mechanism for this exotic state based on quantum-critical bosonic charge Kondo fluctuations coupled to both spinon and a heavy conduction-electron Fermi surfaces within the heavy-fermion formulation of the slave-boson $t$-$J$ model. Using a controlled perturbative renormalization group (RG) analysis, we examine the competition between the pseudogap phase, characterized by Anderson's Resonating-Valence-Bond spin-liquid, and the Fermi-liquid state, characterized by the electron hoping (effective charge Kondo effect). We find a quantum-critical metallic phase with a universal Planckian $\hbar ω/k_B T$ scaling in scattering rate near a localized-delocalized (pseudogap-to-Fermi liquid) charge Kondo breakdown transition. Our results are in excellent agreement with the recent experimental observations on optical conductivity (without fine-tuning) in Nat. Commun. 14, 3033 (2023), universal doping-independent field-to-temperature scaling in magnetoresistance in Nature 595, 661 (2021), and the marginal Fermi-liquid spectral function observed in ARPES (Science 366, 1099 (2019)) as well as Hall coefficient in various overdoped cuprates in Nature 595, 661 (2021) and Annu. Rev. Condens. Matter Phys. 10, 409 (2019). Our mechanism offers a microscopic understanding of the quantum-critical Planckian metal phase observed in cuprates d-wave superconducting, and Fermi liquid phases.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine
Authors:
Vlad Vinogradov,
Ivan Izmailov,
Simon Steshin,
Kong T. Nguyen
Abstract:
Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecul…
▽ More
Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100\% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Medical Spoken Named Entity Recognition
Authors:
Khai Le-Duc,
David Thulke,
Hung-Phong Tran,
Long Vo-Dang,
Khai-Nguyen Nguyen,
Truong-Son Hy,
Ralf Schlüter
Abstract:
Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 dist…
▽ More
Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 distinct types. Secondly, we present baseline results using various state-of-the-art pre-trained models: encoder-only and sequence-to-sequence. We found that pre-trained multilingual models XLM-R outperformed all monolingual models on both reference text and ASR output. Also in general, encoders perform better than sequence-to-sequence models for the NER task. By simply translating, the transcript is applicable not just to Vietnamese but to other languages as well. All code, data and models are made publicly available here: https://github.com/leduckhai/MultiMed
△ Less
Submitted 20 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft
Authors:
Ian Vyse,
Rishit Dagli,
Dav Vrat Chadha,
John P. Ma,
Hector Chen,
Isha Ruparelia,
Prithvi Seran,
Matthew Xie,
Eesa Aamer,
Aidan Armstrong,
Naveen Black,
Ben Borstein,
Kevin Caldwell,
Orrin Dahanaggamaarachchi,
Joe Dai,
Abeer Fatima,
Stephanie Lu,
Maxime Michet,
Anoushka Paul,
Carrie Ann Po,
Shivesh Prakash,
Noa Prosser,
Riddhiman Roy,
Mirai Shinjo,
Iliya Shofman
, et al. (4 additional authors not shown)
Abstract:
Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and…
▽ More
Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Search for photons above 10$^{18}$ eV by simultaneously measuring the atmospheric depth and the muon content of air showers at the Pierre Auger Observatory
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
R. Aloisio,
J. Alvarez-Muñiz,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
L. Andrade Dourado,
S. Andringa,
L. Apollonio,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
P. Assis,
G. Avila,
E. Avocone,
A. Bakalova,
F. Barbato
, et al. (342 additional authors not shown)
Abstract:
The Pierre Auger Observatory is the most sensitive instrument to detect photons with energies above $10^{17}$ eV. It measures extensive air showers generated by ultra high energy cosmic rays using a hybrid technique that exploits the combination of a fluorescence detector with a ground array of particle detectors. The signatures of a photon-induced air shower are a larger atmospheric depth of the…
▽ More
The Pierre Auger Observatory is the most sensitive instrument to detect photons with energies above $10^{17}$ eV. It measures extensive air showers generated by ultra high energy cosmic rays using a hybrid technique that exploits the combination of a fluorescence detector with a ground array of particle detectors. The signatures of a photon-induced air shower are a larger atmospheric depth of the shower maximum ($X_{max}$) and a steeper lateral distribution function, along with a lower number of muons with respect to the bulk of hadron-induced cascades. In this work, a new analysis technique in the energy interval between 1 and 30 EeV (1 EeV = $10^{18}$ eV) has been developed by combining the fluorescence detector-based measurement of $X_{max}$ with the specific features of the surface detector signal through a parameter related to the air shower muon content, derived from the universality of the air shower development. No evidence of a statistically significant signal due to photon primaries was found using data collected in about 12 years of operation. Thus, upper bounds to the integral photon flux have been set using a detailed calculation of the detector exposure, in combination with a data-driven background estimation. The derived 95% confidence level upper limits are 0.0403, 0.01113, 0.0035, 0.0023, and 0.0021 km$^{-2}$ sr$^{-1}$ yr$^{-1}$ above 1, 2, 3, 5, and 10 EeV, respectively, leading to the most stringent upper limits on the photon flux in the EeV range. Compared with past results, the upper limits were improved by about 40% for the lowest energy threshold and by a factor 3 above 3 EeV, where no candidates were found and the expected background is negligible. The presented limits can be used to probe the assumptions on chemical composition of ultra-high energy cosmic rays and allow for the constraint of the mass and lifetime phase space of super-heavy dark matter particles.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
On the structure of the value function of optimal exit time problems
Authors:
Piermarco Cannarsa,
Marco Mazzola,
Khai T. Nguyen
Abstract:
In this paper, we study an optimal exit time problem with general running and terminal costs and a target $\mathcal{S}\subset\mathbb{R}^d$ having an inner ball property for a nonlinear control system that satisfies mild controllability assumptions. In particular, Petrov's condition at the boundary of $\mathcal{S}$ is not required and the value function $V$ may fail to be locally Lipschitz. In such…
▽ More
In this paper, we study an optimal exit time problem with general running and terminal costs and a target $\mathcal{S}\subset\mathbb{R}^d$ having an inner ball property for a nonlinear control system that satisfies mild controllability assumptions. In particular, Petrov's condition at the boundary of $\mathcal{S}$ is not required and the value function $V$ may fail to be locally Lipschitz. In such a weakened set-up, we first establish a representation formula for proximal (horizontal) supergradients of $V$ by using transported proximal normal vectors. This allows us to obtain an external sphere condition for the hypograph of $V$ which yields several regularity properties. In particular, $V$ is almost everywhere twice differentiable and the Hausdorff dimension of its singularities is not greater than $d-1/2$. Furthermore, besides optimality conditions for trajectories of the optimal control problem, we extend the analysis to propagation of singularities and differentiability properties of the value function. An upper bound for the Hausdorff measure of the singular set is also studied, which implies that $V$ is a function of special bounded variation.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the Depth of Maximum of Air-Shower Profiles with energies between $\mathbf{10^{18.5}}$ and $\mathbf{10^{20}}$ eV using the Surface Detector of the Pierre Auger Observatory and Deep Learning
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
R. Aloisio,
J. Alvarez-Muñiz,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
L. Andrade Dourado,
S. Andringa,
L. Apollonio,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
P. Assis,
G. Avila,
E. Avocone,
A. Bakalova,
F. Barbato
, et al. (342 additional authors not shown)
Abstract:
We report an investigation of the mass composition of cosmic rays with energies from 3 to 100 EeV (1 EeV=$10^{18}$ eV) using the distributions of the depth of shower maximum $X_\mathrm{max}$. The analysis relies on ${\sim}50,000$ events recorded by the Surface Detector of the Pierre Auger Observatory and a deep-learning-based reconstruction algorithm. Above energies of 5 EeV, the data set offers a…
▽ More
We report an investigation of the mass composition of cosmic rays with energies from 3 to 100 EeV (1 EeV=$10^{18}$ eV) using the distributions of the depth of shower maximum $X_\mathrm{max}$. The analysis relies on ${\sim}50,000$ events recorded by the Surface Detector of the Pierre Auger Observatory and a deep-learning-based reconstruction algorithm. Above energies of 5 EeV, the data set offers a 10-fold increase in statistics with respect to fluorescence measurements at the Observatory. After cross-calibration using the Fluorescence Detector, this enables the first measurement of the evolution of the mean and the standard deviation of the $X_\mathrm{max}$ distributions up to 100 EeV. Our findings are threefold:
(1.) The evolution of the mean logarithmic mass towards a heavier composition with increasing energy can be confirmed and is extended to 100 EeV.
(2.) The evolution of the fluctuations of $X_\mathrm{max}$ towards a heavier and purer composition with increasing energy can be confirmed with high statistics. We report a rather heavy composition and small fluctuations in $X_\mathrm{max}$ at the highest energies.
(3.) We find indications for a characteristic structure beyond a constant change in the mean logarithmic mass, featuring three breaks that are observed in proximity to the ankle, instep, and suppression features in the energy spectrum.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Inference of the Mass Composition of Cosmic Rays with energies from $\mathbf{10^{18.5}}$ to $\mathbf{10^{20}}$ eV using the Pierre Auger Observatory and Deep Learning
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
R. Aloisio,
J. Alvarez-Muñiz,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
L. Andrade Dourado,
S. Andringa,
L. Apollonio,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
P. Assis,
G. Avila,
E. Avocone,
A. Bakalova,
F. Barbato
, et al. (342 additional authors not shown)
Abstract:
We present measurements of the atmospheric depth of the shower maximum $X_\mathrm{max}$, inferred for the first time on an event-by-event level using the Surface Detector of the Pierre Auger Observatory. Using deep learning, we were able to extend measurements of the $X_\mathrm{max}$ distributions up to energies of 100 EeV ($10^{20}$ eV), not yet revealed by current measurements, providing new ins…
▽ More
We present measurements of the atmospheric depth of the shower maximum $X_\mathrm{max}$, inferred for the first time on an event-by-event level using the Surface Detector of the Pierre Auger Observatory. Using deep learning, we were able to extend measurements of the $X_\mathrm{max}$ distributions up to energies of 100 EeV ($10^{20}$ eV), not yet revealed by current measurements, providing new insights into the mass composition of cosmic rays at extreme energies. Gaining a 10-fold increase in statistics compared to the Fluorescence Detector data, we find evidence that the rate of change of the average $X_\mathrm{max}$ with the logarithm of energy features three breaks at $6.5\pm0.6~(\mathrm{stat})\pm1~(\mathrm{sys})$ EeV, $11\pm 2~(\mathrm{stat})\pm1~(\mathrm{sys})$ EeV, and $31\pm5~(\mathrm{stat})\pm3~(\mathrm{sys})$ EeV, in the vicinity to the three prominent features (ankle, instep, suppression) of the cosmic-ray flux. The energy evolution of the mean and standard deviation of the measured $X_\mathrm{max}$ distributions indicates that the mass composition becomes increasingly heavier and purer, thus being incompatible with a large fraction of light nuclei between 50 EeV and 100 EeV.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Unguided structure learning of DAGs for count data
Authors:
Thi Kim Hue Nguyen,
Monica Chiogna,
Davide Risso
Abstract:
Mainly motivated by the problem of modelling directional dependence relationships for multivariate count data in high-dimensional settings, we present a new algorithm, called learnDAG, for learning the structure of directed acyclic graphs (DAGs). In particular, the proposed algorithm tackled the problem of learning DAGs from observational data in two main steps: (i) estimation of candidate parent…
▽ More
Mainly motivated by the problem of modelling directional dependence relationships for multivariate count data in high-dimensional settings, we present a new algorithm, called learnDAG, for learning the structure of directed acyclic graphs (DAGs). In particular, the proposed algorithm tackled the problem of learning DAGs from observational data in two main steps: (i) estimation of candidate parent sets; and (ii) feature selection. We experimentally compare learnDAG to several popular competitors in recovering the true structure of the graphs in situations where relatively moderate sample sizes are available. Furthermore, to make our algorithm is stronger, a validation of the algorithm is presented through the analysis of real datasets.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
UnWave-Net: Unrolled Wavelet Network for Compton Tomography Image Reconstruction
Authors:
Ishak Ayad,
Cécilia Tarpau,
Javier Cebeiro,
Maï K. Nguyen
Abstract:
Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities…
▽ More
Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities with several advantages such as high sensitivity, compactness, and entirely fixed systems, image reconstruction remains an open problem due to the mathematical challenges of CST modeling. In contrast, deep unrolling networks have demonstrated potential in CT image reconstruction, despite their computationally intensive nature. In this study, we investigate the efficiency of unrolling networks for CST image reconstruction. To address the important computational cost required for training, we propose UnWave-Net, a novel unrolled wavelet-based reconstruction network. This architecture includes a non-local regularization term based on wavelets, which captures long-range dependencies within images and emphasizes the multi-scale components of the wavelet transform. We evaluate our approach using a CST of circular geometry which stays completely static during data acquisition, where UnWave-Net facilitates image reconstruction in the absence of a specific reconstruction formula. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, and offers an improved computational efficiency compared to traditional unrolling networks.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A sharp quantitative estimate of critical sets
Authors:
Andrew Murdza,
Khai T. Nguyen
Abstract:
The paper establishes a sharp quantitative estimate for the $(d-1)$-Hausdorff measure of the critical set of $\mathcal{C}^1$ vector-valued functions on $\mathbb{R}^d$. Additionally, we prove that for a generic $\mathcal{C}^2$ function where ``generic" is understood in the topological sense of Baire category, the critical set has a locally finite $(d-1)$-Hausdorff measure.
The paper establishes a sharp quantitative estimate for the $(d-1)$-Hausdorff measure of the critical set of $\mathcal{C}^1$ vector-valued functions on $\mathbb{R}^d$. Additionally, we prove that for a generic $\mathcal{C}^2$ function where ``generic" is understood in the topological sense of Baire category, the critical set has a locally finite $(d-1)$-Hausdorff measure.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Authors:
Quan Van Nguyen,
Huy Quang Pham,
Dan Quang Tran,
Thang Kien-Bao Nguyen,
Nhat-Hao Nguyen-Dang,
Bao-Thien Nguyen-Tat
Abstract:
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho…
▽ More
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.
△ Less
Submitted 27 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
Authors:
Khanh-Binh Nguyen,
Chae Jung Park
Abstract:
Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately.…
▽ More
Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.
△ Less
Submitted 24 August, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.