Zum Hauptinhalt springen

Showing 1–50 of 64 results for author: Pham, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.04174  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

    Authors: Khai Le-Duc, Quy-Anh Dang, Tan-Hanh Pham, Truong-Son Hy

    Abstract: Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Preprint, 32 pages

  2. arXiv:2408.02990  [pdf, ps, other

    eess.SY

    Joint Design of Probabilistic Constellation Shaping and Precoding for Multi-user VLC Systems

    Authors: Thang K. Nguyen, Thanh V. Pham, Hoang D. Le, Chuyen T. Nguyen, Anh T. Pham

    Abstract: This paper proposes a joint design of probabilistic constellation shaping (PCS) and precoding to enhance the sum-rate performance of multi-user visible light communications (VLC) broadcast channels subject to signal amplitude constraint. In the proposed design, the transmission probabilities of bipolar $M$-pulse amplitude modulation ($M$-PAM) symbols for each user and the transmit precoding matrix… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  3. arXiv:2408.02982  [pdf, ps, other

    eess.SY

    Practical Design of Probabilistic Constellation Shaping for Physical Layer Security in Visible Light Communications

    Authors: Thanh V. Pham, Susumu Ishihara

    Abstract: This paper studies a practical design of probabilistic constellation shaping (PCS) for physical layer security in visible light communications (VLC). In particular, we consider a wiretap VLC channel employing a probabilistically shaped $M$-ary pulse amplitude modulation (PAM) constellation. Considering the requirements for reliability of the legitimate user's channel, flickering-free transmission,… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  4. arXiv:2407.12064  [pdf, other

    eess.IV cs.CL cs.CV cs.LG cs.MM

    LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

    Authors: Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Preprint, 19 pages

  5. arXiv:2407.01963  [pdf, other

    eess.AS

    Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders

    Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Thinh Pham, Loi Khanh Nguyen, Alexander Schindler

    Abstract: Existing speaker diarization systems heavily rely on large amounts of manually annotated data, which is labor-intensive and challenging to collect in real-world scenarios. Additionally, the language-specific constraint in speaker diarization systems significantly hinders their applicability and scalability in multilingual settings. In this paper, we therefore propose a cluster-based speaker diariz… ▽ More

    Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures

  6. arXiv:2402.15677  [pdf, other

    eess.SY cs.MA

    Consensus seeking in diffusive multidimensional networks with a repeated interaction pattern and time-delays

    Authors: Hoang Huy Vu, Quyen Ngoc Nguyen, Chuong Van Nguyen, Tuynh Van Pham, Minh Hoang Trinh

    Abstract: This paper studies a consensus problem in multidimensional networks having the same agent-to-agent interaction pattern under both intra- and cross-layer time delays. Several conditions for the agents to globally asymptotically achieve a consensus are derived, which involve the overall network's structure, the local interacting pattern, and the values of the time delays. The validity of these condi… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 6 pages, 7 figures, submitted to a journal

  7. arXiv:2402.13554  [pdf, ps, other

    cs.IT eess.SP

    Secrecy Performance Analysis of Space-to-Ground Optical Satellite Communications

    Authors: Thang V. Nguyen, Thanh V. Pham, Anh T. Pham, Dang T. Ngoc

    Abstract: Free-space optics (FSO)-based satellite communication systems have recently received considerable attention due to their enhanced capacity compared to their radio frequency (RF) counterparts. This paper analyzes the performance of physical layer security of space-to-ground intensity modulation/direct detection FSO satellite links under the effect of atmospheric loss, misalignment, cloud attenuatio… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  8. arXiv:2402.13549  [pdf, ps, other

    cs.IT eess.SY

    Q-learning-based Joint Design of Adaptive Modulation and Precoding for Physical Layer Security in Visible Light Communications

    Authors: Duc M. T. Hoang, Thanh V. Pham, Anh T. Pham, Chuyen T Nguyen

    Abstract: There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit e… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  9. arXiv:2402.06226  [pdf

    eess.SY cs.LG

    N-1 Reduced Optimal Power Flow Using Augmented Hierarchical Graph Neural Network

    Authors: Thuan Pham, Xingpeng Li

    Abstract: Optimal power flow (OPF) is used to perform generation redispatch in power system real-time operations. N-1 OPF can ensure safe grid operations under diverse contingency scenarios. For large and intricate power networks with numerous variables and constraints, achieving an optimal solution for real-time N-1 OPF necessitates substantial computational resources. To mitigate this challenge, machine l… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  10. arXiv:2401.05915  [pdf, other

    eess.IV

    Neural Implicit Surface Reconstruction of Freehand 3D Ultrasound Volume with Geometric Constraints

    Authors: Hongbo Chen, Logiraj Kumaralingam, Shuhang Zhang, Sheng Song, Fayi Zhang, Haibin Zhang, Thanh-Tu Pham, Edmond H. M. Lou, Kumaradevan Punithakumar, Yuyao Zhang, Lawrence H. Le, Rui Zheng

    Abstract: Three-dimensional (3D) freehand ultrasound (US) is a widely used imaging modality that allows non-invasive imaging of medical anatomy without radiation exposure. Surface reconstruction of US volume is vital to acquire the accurate anatomical structures needed for modeling, registration, and visualization. However, traditional methods cannot produce a high-quality surface due to image noise. Despit… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Preprint

  11. arXiv:2312.12587  [pdf, other

    eess.SP cs.DC q-bio.TO

    Real-Time Diagnostic Integrity Meets Efficiency: A Novel Platform-Agnostic Architecture for Physiological Signal Compression

    Authors: Neel R Vora, Amir Hajighasemi, Cody T. Reynolds, Amirmohammad Radmehr, Mohamed Mohamed, Jillur Rahman Saurav, Abdul Aziz, Jai Prakash Veerla, Mohammad S Nasr, Hayden Lotspeich, Partha Sai Guttikonda, Thuong Pham, Aarti Darji, Parisa Boodaghi Malidarreh, Helen H Shang, Jay Harvey, Kan Ding, Phuc Nguyen, Jacob M Luber

    Abstract: Head-based signals such as EEG, EMG, EOG, and ECG collected by wearable systems will play a pivotal role in clinical diagnosis, monitoring, and treatment of important brain disorder diseases. However, the real-time transmission of the significant corpus physiological signals over extended periods consumes substantial power and time, limiting the viability of battery-dependent physiological monit… ▽ More

    Submitted 4 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  12. arXiv:2312.09422  [pdf, other

    eess.SP cs.LG math.FA stat.ME

    Joint Alignment of Multivariate Quasi-Periodic Functional Data Using Deep Learning

    Authors: Vi Thanh Pham, Jonas Bille Nielsen, Klaus Fuglsang Kofoed, Jørgen Tobias Kühl, Andreas Kryger Jensen

    Abstract: The joint alignment of multivariate functional data plays an important role in various fields such as signal processing, neuroscience and medicine, including the statistical analysis of data from wearable devices. Traditional methods often ignore the phase variability and instead focus on the variability in the observed amplitude. We present a novel method for joint alignment of multivariate quasi… ▽ More

    Submitted 14 November, 2023; originally announced December 2023.

    Comments: 28 pages, 6 figures

  13. arXiv:2312.03196  [pdf, other

    cs.LG eess.SP

    Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging

    Authors: Seungyeon Lee, Thai-Hoang Pham, Zhao Cheng, Ping Zhang

    Abstract: Sleep staging has become a critical task in diagnosing and treating sleep disorders to prevent sleep related diseases. With growing large scale sleep databases, significant progress has been made toward automatic sleep staging. However, previous studies face critical problems in sleep studies; the heterogeneity of subjects' physiological signals, the inability to extract meaningful information fro… ▽ More

    Submitted 9 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2311.18508  [pdf, other

    eess.IV cs.CV

    DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Joshua Tian Jin Tee, Trung X. Pham, Jinqiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

    Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  15. arXiv:2311.11096  [pdf, other

    eess.IV cs.CV

    On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

    Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

    Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

  16. arXiv:2310.09998  [pdf, other

    eess.IV cs.CV

    SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation

    Authors: Tan-Hanh Pham, Xianqi Li, Kim-Doang Nguyen

    Abstract: Automated medical image segmentation is becoming increasingly crucial to modern clinical practice, driven by the growing demand for precise diagnosis, the push towards personalized treatment plans, and the advancements in machine learning algorithms, especially the incorporation of deep learning methods. While convolutional neural networks (CNN) have been prevalent among these methods, the remarka… ▽ More

    Submitted 10 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  17. arXiv:2309.15483  [pdf, ps, other

    cs.IT eess.SY

    Energy-Efficient Precoding Designs for Multi-User Visible Light Communication Systems with Confidential Messages

    Authors: Son T. Duong, Thanh V. Pham, Chuyen T. Nguyen, Anh T. Pham

    Abstract: This paper studies energy-efficient precoding designs for multi-user visible light communication (VLC) systems from the perspective of physical layer security where users' messages must be kept mutually confidential. For such systems, we first derive a lower bound on the achievable secrecy rate of each user. Next, the total power consumption for illumination and data transmission is thoroughly ana… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  18. arXiv:2309.14636  [pdf, ps, other

    cs.IT eess.SY

    Design of Energy-Efficient Artificial Noise for Physical Layer Security in Visible Light Communications

    Authors: Thanh V. Pham, Anh T. Pham, Susumu Ishihara

    Abstract: This paper studies the design of energy-efficient artificial noise (AN) schemes in the context of physical layer security in visible light communications (VLC). Two different transmission schemes termed $\textit{selective AN-aided single-input single-output (SISO)}$ and $\textit{AN-aided multiple-input single-output (MISO)}$ are examined and compared in terms of secrecy energy efficiency (SEE). In… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  19. arXiv:2307.09261  [pdf, other

    eess.SP physics.optics

    Optical Diffraction Tomography Meets Fluorescence Localization Microscopy

    Authors: Thanh-An Pham, Emmanuel Soubies, Ferréol Soulez, Michael Unser

    Abstract: We show that structural information can be extracted from single molecule localization microscopy (SMLM) data. More precisely, we reinterpret SMLM data as the measures of a phaseless optical diffraction tomography system for which the illumination sources are fluorophores within the sample. Building upon this model, we propose a joint optimization framework to estimate both the refractive index ma… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Presented in ISCS23

    Report number: ISCS-11

  20. arXiv:2307.02043  [pdf, other

    math.OC eess.IV

    A Mini-Batch Quasi-Newton Proximal Method for Constrained Total-Variation Nonlinear Image Reconstruction

    Authors: Tao Hong, Thanh-an Pham, Irad Yavneh, Michael Unser

    Abstract: Over the years, computational imaging with accurate nonlinear physical models has drawn considerable interest due to its ability to achieve high-quality reconstructions. However, such nonlinear models are computationally demanding. A popular choice for solving the corresponding inverse problems is accelerated stochastic proximal methods (ASPMs), with the caveat that each iteration is expensive. To… ▽ More

    Submitted 16 August, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 12 Pages,12 Figures, 2 Tables

  21. arXiv:2305.19709  [pdf, other

    cs.CL cs.SD eess.AS

    XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

    Authors: Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

    Abstract: We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encod… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: In Proceedings of INTERSPEECH 2023 (to appear)

  22. arXiv:2305.19353  [pdf, other

    eess.SY

    Bearing-Constrained Leader-Follower Formation of Single-Integrators with Disturbance Rejection: Adaptive Variable-Structure Approaches

    Authors: Thanh Truong Nguyen, Dung Van Vu, Tuynh Van Pham, Minh Hoang Trinh

    Abstract: This paper studies the problem of stabilizing a leader-follower formation specified by a set of bearing constraints and being disturbed by some unknown uniformly bounded disturbance{s}. A set of leaders are positioned at their desired positions, while each follower is modeled by a single integrator with an additive time-varying disturbance. Adaptive variable-structure control laws using displaceme… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 19 pages, 6 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2303.12337  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Music-Driven Group Choreography

    Authors: Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

    Abstract: Music-driven choreography is a challenging problem with a wide variety of industrial applications. Recently, many methods have been proposed to synthesize dance motions from music for a single dancer. However, generating dance motion for a group remains an open problem. In this paper, we present $\rm AIOZ-GDANCE$, a new large-scale dataset for music-driven group dance generation. Unlike existing d… ▽ More

    Submitted 26 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: accepted in CVPR 2023

  24. arXiv:2302.12831  [pdf, other

    eess.IV cs.CV

    CDPMSR: Conditional Diffusion Probabilistic Models for Single Image Super-Resolution

    Authors: Axi Niu, Kang Zhang, Trung X. Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

    Abstract: Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-reso… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 4 pages, 4 figures

  25. arXiv:2302.11125  [pdf, ps, other

    cs.IT eess.SY

    On the Design of Artificial Noise for Physical Layer Security in Visible Light Communication Channels with Clipping

    Authors: Thanh V. Pham, Steve Hranilovic, Susumu Ishihara

    Abstract: Though visible light communication (VLC) systems are contained to a given room, improving their security is an important criterion in any practical deployment. In this paper, the design of artificial noise (AN) to enhance physical layer security in VLC systems is studied in the context of input signals with no explicit amplitude constraint (e.g., multicarrier systems). In such systems, clipping is… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2210.00438

  26. arXiv:2210.15876  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

    Authors: Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma

    Abstract: One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observatio… ▽ More

    Submitted 25 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures, 4 tables

  27. arXiv:2206.13591  [pdf

    eess.SY cs.LG

    Reduced Optimal Power Flow Using Graph Neural Network

    Authors: Thuan Pham, Xingpeng Li

    Abstract: OPF problems are formulated and solved for power system operations, especially for determining generation dispatch points in real-time. For large and complex power system networks with large numbers of variables and constraints, finding the optimal solution for real-time OPF in a timely manner requires a massive amount of computing power. This paper presents a new method to reduce the number of co… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 6 pages, 16 figures, 3 tables, Submitted (under review) to 54th North American Power Symposium (NAPS 2022)

  28. arXiv:2205.03122  [pdf

    physics.med-ph eess.IV physics.optics

    Ultrathin, high-speed, all-optical photoacoustic endomicroscopy probe for guiding minimally invasive surgery

    Authors: Tianrui Zhao, Truc Thuy Pham, Christian Baker, Michelle T. Ma, Sebastien Ourselin, Tom Vercauteren, Edward Zhang, Paul C. Beard, Wenfeng Xia

    Abstract: Photoacoustic (PA) endoscopy has shown significant potential for clinical diagnosis and surgical guidance. Multimode fibres (MMFs) are becoming increasing attractive for the development of miniature endoscopy probes owing to ultrathin size, low cost and diffraction-limited spatial resolution enabled by wavefront shaping. However, current MMF-based PA endomicroscopy probes are either limited by a b… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  29. arXiv:2203.10078  [pdf, other

    cs.CV eess.SP

    Bayesian Inversion for Nonlinear Imaging Models using Deep Generative Priors

    Authors: Pakshal Bohra, Thanh-an Pham, Jonathan Dong, Michael Unser

    Abstract: Most modern imaging systems incorporate a computational pipeline to infer the image of interest from acquired measurements. The Bayesian approach to solve such ill-posed inverse problems involves the characterization of the posterior distribution of the image. It depends on the model of the imaging system and on prior knowledge on the image of interest. In this work, we present a Bayesian reconstr… ▽ More

    Submitted 25 May, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

  30. Neural Network-based Power Flow Model

    Authors: Thuan Pham, Xingpeng Li

    Abstract: Power flow analysis is used to evaluate the flow of electricity in the power system network. Power flow calculation is used to determine the steady-state variables of the system, such as the voltage magnitude/phase angle of each bus and the active/reactive power flow on each branch. The DC power flow model is a popular linear power flow model that is widely used in the power industry. Although it… ▽ More

    Submitted 12 March, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Journal ref: IEEE Green Technologies Conference 2022

  31. arXiv:2109.09026  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

    Authors: Nhat Truong Pham, Duc Ngoc Minh Dang, Sy Dzung Nguyen

    Abstract: Speech emotion recognition (SER) has been one of the significant tasks in Human-Computer Interaction (HCI) applications. However, it is hard to choose the optimal features and deal with imbalance labeled data. In this article, we investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods. To evaluate the ef… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: 12 pages, 16 figures, 6 tables

  32. arXiv:2109.03219  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds

    Authors: Long H. Nguyen, Nhat Truong Pham, Van Huong Do, Liu Tai Nguyen, Thanh Tin Nguyen, Van Dung Do, Hai Nguyen, Ngoc Duy Nguyen

    Abstract: SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing serv… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 4 pages

  33. arXiv:2108.11089  [pdf, other

    cs.SD eess.AS

    Detecting Drill Failure in the Small Short-sound Drill Dataset

    Authors: Thanh Tran, Nhat Truong Pham, Jan Lundgren

    Abstract: Monitoring the conditions of machines is vital in the manufacturing industry. Early detection of faulty components in machines for stopping and repairing the failed components can minimize the downtime of the machine. This article presents an approach to detect the failure occurring in drill machines based on drill sounds from Valmet AB. The drill dataset includes three classes: anomalous sounds,… ▽ More

    Submitted 9 November, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: 8 pages, 10 figures, journal

  34. arXiv:2108.00475  [pdf, other

    cs.CV eess.IV

    Self-supervised Learning with Local Attention-Aware Feature

    Authors: Trung X. Pham, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

    Abstract: In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input sample and the patched images. Utilizing this approach, the proposed method is able to outperform the previous best competitor by 1.03% on the Tiny-Ima… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: 5 pages, 4 figures

  35. arXiv:2107.10701  [pdf, other

    eess.AS cs.SD

    Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech

    Authors: Duo Ma, Nana Hou, Van Tung Pham, Haihua Xu, Eng Siong Chng

    Abstract: To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 7pages,3figures,Submitted to APSIPA2021

  36. arXiv:2107.03679  [pdf, other

    eess.IV

    Diffraction Tomography with Helmholtz Equation: Efficient and Robust Multigrid-Based Solver

    Authors: Tao Hong, Thanh-an Pham, Eran Treister, Michael Unser

    Abstract: Diffraction tomography is a noninvasive technique that estimates the refractive indices of unknown objects and involves an inverse-scattering problem governed by the wave equation. Recent works have shown the benefit of nonlinear models of wave propagation that account for multiple scattering and reflections. In particular, the Lippmann-Schwinger~(LiS) model defines an inverse problem to simulate… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: 12 pages,13 figures, 2 tables

  37. arXiv:2106.02865  [pdf, ps, other

    eess.SY

    Consensus Analysis over Clustered Networks of Multi-Agent Systems under External Disturbances

    Authors: Thiem V. Pham, Quynh T. T. Nguyen

    Abstract: This paper studies a consensus problem of multi-agent systems subjected to external disturbances over the clustered network. It considers that the agents are divided into several clusters. They are almost all the time isolated one from another, which has a directed spanning tree. The goal of agents achieves a common value. To support interaction between clusters with a minimum exchange of informat… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  38. arXiv:2106.02822  [pdf, other

    eess.SY

    $\mathcal{H}_2/\mathcal{H}_{-}$ Distributed Fault Detection and Isolation for Heterogeneous Multi-Agent Systems

    Authors: Thiem V. Pham, Quynh T. T. Nguyen

    Abstract: The paper deals with the problem of distributed fault detection and isolation (FDI) for a group of heterogeneous multi-agent systems. The developed formation for the FDI is taken into account as a distributed observer design methodology, where the interaction between the agent and its neighbors is described as a vector of distributed relative output measurements. Based on two performance indexes… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  39. Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning

    Authors: Thanh Nguyen, Tung Luu, Trung Pham, Sanzhar Rakhimkul, Chang D. Yoo

    Abstract: Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm that provides good weight initialization of a model given a variety of learning tasks. The model initialized by provided weight can be fine-tuned to an unseen task despite only using a small amount of samples and within a few adaptation steps. MAML is simple and versatile but requires costly learning rate tun… ▽ More

    Submitted 10 June, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  40. arXiv:2102.10822  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Precoding for Multi-User Visible Light Communication with Confidential Messages

    Authors: Son T. Duong, Thanh V. Pham, Chuyen T. Nguyen, Anh T. Pham

    Abstract: In this paper, an energy-efficient precoding scheme is designed for multi-user visible light communication (VLC) systems in the context of physical layer security, where users' messages are kept mutually confidential. The design problem is shown to be non-convex fractional programming, therefore Dinkelbach algorithm and convex-concave procedure (CCCP) based on the first-order Taylor approximation… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  41. arXiv:2012.03422  [pdf, ps, other

    cs.IT eess.SY

    A General Conditional BER Expression of Rectangular QAM in the Presence of Phase Noise

    Authors: Thanh V. Pham, Thang V. Nguyen, Anh T. Pham

    Abstract: In this paper, we newly present a closed-form bit-error rate (BER) expression for an $M$-ary pulse-amplitude modulation ($M$-PAM) over additive white Gaussian noise (AWGN) channels by analytically characterizing the bit decision regions and positions. The obtained expression is then used to derive the conditional BER of a rectangular quadrature amplitude modulation (QAM) for a given value of phase… ▽ More

    Submitted 4 January, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

  42. arXiv:2010.13423  [pdf, ps, other

    eess.IV cs.LG math.OC physics.optics

    Optimal-transport-based metric for SMLM

    Authors: Quentin Denoyelle, Thanh-an Pham, Pol del Aguila Pla, Daniel Sage, Michael Unser

    Abstract: We propose the use of Flat Metric to assess the performance of reconstruction methods for single-molecule localization microscopy (SMLM) in scenarios where the ground-truth is available. Flat Metric is intimately related to the concept of optimal transport between measures of different mass, providing solid mathematical foundations for SMLM evaluation and integrating both localization and detectio… ▽ More

    Submitted 6 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI 2021) 5 pages, 4 figures

    MSC Class: 90C08; 49Q22; 92C55; 94A12 ACM Class: I.4; J.3; J.2; I.m

  43. arXiv:2010.12143  [pdf, other

    cs.SD eess.AS

    Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance

    Authors: Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Hao Huang, Aishan Wumaier, Eng Siong Chng

    Abstract: Automatic speech recognition (ASR) for under-represented named-entity (UR-NE) is challenging due to such named-entities (NE) have insufficient instances and poor contextual coverage in the training data to learn reliable estimates and representations. In this paper, we propose approaches to enriching UR-NEs to improve speech recognition performance. Specifically, our first priority is to ensure th… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  44. Distributed two-time-scale methods over clustered networks

    Authors: Thiem V. Pham, Thinh T. Doan, Dinh Hoa Nguyen

    Abstract: In this paper, we consider consensus problems over a network of nodes, where the network is divided into a number of clusters. We are interested in the case where the communication topology within each cluster is dense as compared to the sparse communication across the clusters. Moreover, each cluster has one leader which can communicate with other leaders in different clusters. The goal of the no… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  45. arXiv:2009.11554  [pdf, other

    eess.IV cs.LG eess.SP

    Robust Phase Unwrapping via Deep Image Prior for Quantitative Phase Imaging

    Authors: Fangshu Yang, Thanh-an Pham, Nathalie Brandenberg, Matthias P. Lutolf, Jianwei Ma, Michael Unser

    Abstract: Quantitative phase imaging (QPI) is an emerging label-free technique that produces images containing morphological and dynamical information without contrast agents. Unfortunately, the phase is wrapped in most imaging system. Phase unwrapping is the computational process that recovers a more informative image. It is particularly challenging with thick and complex samples such as organoids. Recent… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

  46. arXiv:2006.07094  [pdf, other

    eess.AS

    Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition

    Authors: Haobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang, Eng Siong Chng

    Abstract: In this paper, we conduct data selection analysis in building an English-Mandarin code-switching (CS) speech recognition (CSSR) system, which is aimed for a real CSSR contest in China. The overall training sets have three subsets, i.e., a code-switching data set, an English (LibriSpeech) and a Mandarin data set respectively. The code-switching data are Mandarin dominated. First of all, it is found… ▽ More

    Submitted 13 September, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 5 pages, conference, Accepted by Interspeech2020

  47. arXiv:2005.10407  [pdf, other

    eess.AS cs.LG cs.SD

    Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

    Authors: Zhiping Zeng, Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

    Abstract: In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the L… ▽ More

    Submitted 28 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

  48. arXiv:2005.08742  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

    Authors: Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Hao Huang, Eng Siong Chng

    Abstract: In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to dr… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  49. arXiv:2005.07068  [pdf

    cs.CV eess.IV

    Recognition of 26 Degrees of Freedom of Hands Using Model-based approach and Depth-Color Images

    Authors: Cong Hoang Quach, Minh Trien Pham, Anh Viet Dang, Dinh Tuan Pham, Thuan Hoang Tran, Manh Duong Phung

    Abstract: In this study, we present an model-based approach to recognize full 26 degrees of freedom of a human hand. Input data include RGB-D images acquired from a Kinect camera and a 3D model of the hand constructed from its anatomy and graphical matrices. A cost function is then defined so that its minimum value is achieved when the model and observation images are matched. To solve the optimization prob… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Comments: in Proceedings of the 2014 National Conference on Electronics, Communications and Information Technology (REV-ECIT). in Vietnamese language

  50. arXiv:2003.02597   

    cs.CV cs.LG eess.IV

    AI outperformed every dermatologist: Improved dermoscopic melanoma diagnosis through customizing batch logic and loss function in an optimized Deep CNN architecture

    Authors: Cong Tri Pham, Mai Chi Luong, Dung Van Hoang, Antoine Doucet

    Abstract: Melanoma, one of most dangerous types of skin cancer, re-sults in a very high mortality rate. Early detection and resection are two key points for a successful cure. Recent research has used artificial intelligence to classify melanoma and nevus and to compare the assessment of these algorithms to that of dermatologists. However, an imbalance of sensitivity and specificity measures affected the pe… ▽ More

    Submitted 28 August, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: We are submitting the article in the journal and waiting for the review result, so we want to temporarily delete the article. When the article is officially accepted, it will be resubmitted