Search | arXiv e-print repository

Certified Continual Learning for Neural Network Regression

Abstract: On the one hand, there has been considerable progress on neural network verification in recent years, which makes certifying neural networks a possibility. On the other hand, neural networks in practice are often re-trained over time to cope with new data distribution or for solving different tasks (a.k.a. continual learning). Once re-trained, the verified correctness of the neural network is like… ▽ More On the one hand, there has been considerable progress on neural network verification in recent years, which makes certifying neural networks a possibility. On the other hand, neural networks in practice are often re-trained over time to cope with new data distribution or for solving different tasks (a.k.a. continual learning). Once re-trained, the verified correctness of the neural network is likely broken, particularly in the presence of the phenomenon known as catastrophic forgetting. In this work, we propose an approach called certified continual learning which improves existing continual learning methods by preserving, as long as possible, the established correctness properties of a verified network. Our approach is evaluated with multiple neural networks and on two different continual learning methods. The results show that our approach is efficient and the trained models preserve their certified correctness and often maintain high utility. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.03110 [pdf, other]

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio \& visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications. △ Less

Submitted 2 May, 2024; originally announced July 2024.

arXiv:2407.01777 [pdf, other]

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

Authors: Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler

Abstract: In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and dis… ▽ More In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint, Convnext-Tiny, GoogLeNet, MNASsnet, RegNet. In the third approach, we leverage the state-of-the-art audio pre-trained models of Whisper, Seamless, Speechbrain, and Pyannote to extract audio embeddings from the input spectrograms. Then, the audio embeddings are explored by a Multilayer perceptron (MLP) model to detect the fake or real audio samples. Finally, high-performance deep learning models from these approaches are fused to achieve the best performance. We evaluated our proposed models on ASVspoof 2019 benchmark dataset. Our best ensemble model achieved an Equal Error Rate (EER) of 0.03, which is highly competitive to top-performing systems in the ASVspoofing 2019 challenge. Experimental results also highlight the potential of selective spectrograms and deep learning approaches to enhance the task of audio deepfake detection. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.14781 [pdf, other]

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Authors: Nay Myat Min, Long H. Pham, Jun Sun

Abstract: The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necess… ▽ More The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necessitate complex alterations to the training process, or fall short against diverse backdoor attacks. In this work, we introduce a novel method for comprehensive and effective elimination of backdoors, called ULRL (short for UnLearn and ReLearn for backdoor removal). ULRL requires only a small set of clean samples and works effectively against all kinds of backdoors. It first applies unlearning for identifying suspicious neurons and then targeted neural weight tuning for backdoor mitigation (i.e., by promoting significant weight deviation on the suspicious neurons). Evaluated against 12 different types of backdoors, ULRL is shown to significantly outperform state-of-the-art methods in eliminating backdoors whilst preserving the model utility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.09330 [pdf, other]

BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection

Authors: Luan Pham, Huong Ha, Hongyu Zhang

Abstract: Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimat… ▽ More Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimation with anomaly detection techniques. Inaccurate anomaly detection results can significantly affect the root cause localization results. To address this challenge, we propose BARO, an end-to-end approach that integrates anomaly detection and root cause analysis for effectively troubleshooting failures in microservice systems. BARO leverages the Multivariate Bayesian Online Change Point Detection technique to model the dependency within multivariate time-series metrics data, enabling it to detect anomalies more accurately. BARO also incorporates a novel nonparametric statistical hypothesis testing technique for robustly identifying root causes, which is less sensitive to the accuracy of anomaly detection compared to existing works. Our comprehensive experiments conducted on three popular benchmark microservice systems demonstrate that BARO consistently outperforms state-of-the-art approaches in both anomaly detection and root cause analysis. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: This paper has been accepted to FSE'24

arXiv:2405.06870 [pdf, other]

Noise-Tolerant Codebooks for Semi-Quantitative Group Testing: Application to Spatial Genomics

Authors: Kok Hao Chen, Duc Tu Dao, Han Mao Kiah, Van Long Phuoc Pham, Eitan Yaakobi

Abstract: Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $λ$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper b… ▽ More Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $λ$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper bounds for such codes are also studied. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: To appear in ISIT 2024 Proceedings

arXiv:2403.00379 [pdf, other]

The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effective to enhance the system performance. Among the evaluating techniques, the narrow frequency bands presents a significant impact. Indeed, our proposed model, which focuses on the narrow frequency bands, outperforms the DCASE baseline on the benchmark dataset of DCASE 2022 Task 2 Development set. The important role of the narrow frequency bands indicated in this paper inspires the research community on the task of Acoustic Anomaly Detection of Machines to further investigate and propose novel network architectures focusing on the frequency bands. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2401.17571 [pdf, other]

Is Registering Raw Tagged-MR Enough for Strain Estimation in the Era of Deep Learning?

Authors: Zhangxing Bian, Ahmed Alshareef, Shuwen Wei, Junyu Chen, Yuli Wang, Jonghye Woo, Dzung L. Pham, Jiachen Zhuo, Aaron Carass, Jerry L. Prince

Abstract: Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application… ▽ More Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application of radio frequency (RF) pulses during serial imaging sequences. This is a factor that has been overlooked in prior research on tMRI post-processing. Further, we have observed an emerging trend of utilizing raw tagged MRI within a deep learning-based (DL) registration framework for motion estimation. In this work, we evaluate and analyze the impact of commonly used image similarity objectives in training DL registrations on raw tMRI. This is then compared with the Harmonic Phase-based approach, a traditional approach which is claimed to be robust to tag fading. Our findings, derived from both simulated images and an actual phantom scan, reveal the limitations of various similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted to SPIE Medical Imaging 2024 (oral)

arXiv:2401.15854 [pdf, other]

LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts

Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Michael Seidl, Medina Andresel, Alexander Schindler

Abstract: The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic inf… ▽ More The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic information between words in the sentence and the contextual relationship of sentences within the abstract, which then enhances the SSC system performance. In this paper, we propose a LSTM-based deep learning network with a focus on creating comprehensive sentence representation at the sentence level. To demonstrate the efficacy of the created sentence representation, a system utilizing these sentence embeddings is also developed, which consists of a Convolutional-Recurrent neural network (C-RNN) at the abstract level and a multi-layer perception network (MLP) at the segment level. Our proposed system yields highly competitive results compared to state-of-the-art systems and further enhances the F1 scores of the baseline by 1.0%, 2.8%, and 2.6% on the benchmark datasets PudMed 200K RCT, PudMed 20K RCT and NICTA-PIBOSO, respectively. This indicates the significant impact of improving sentence representation on boosting model performance. △ Less

Submitted 31 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Submitted to FedCSIS 2024

arXiv:2401.11487 [pdf, other]

Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties

Authors: Nhi Pham, Lachlan Pham, Adam L. Meyers

Abstract: The prevalence of social media presents a growing opportunity to collect and analyse examples of English varieties. Whilst usage of these varieties was - and, in many cases, still is - used only in spoken contexts or hard-to-access private messages, social media sites like Twitter provide a platform for users to communicate informally in a scrapeable format. Notably, Indian English (Hinglish), Sin… ▽ More The prevalence of social media presents a growing opportunity to collect and analyse examples of English varieties. Whilst usage of these varieties was - and, in many cases, still is - used only in spoken contexts or hard-to-access private messages, social media sites like Twitter provide a platform for users to communicate informally in a scrapeable format. Notably, Indian English (Hinglish), Singaporean English (Singlish), and African-American English (AAE) can be commonly found online. These varieties pose a challenge to existing natural language processing (NLP) tools as they often differ orthographically and syntactically from standard English for which the majority of these tools are built. NLP models trained on standard English texts produced biased outcomes for users of underrepresented varieties. Some research has aimed to overcome the inherent biases caused by unrepresentative data through techniques like data augmentation or adjusting training models. We aim to address the issue of bias at its root - the data itself. We curate a dataset of tweets from countries with high proportions of underserved English variety speakers, and propose an annotation framework of six categorical classifications along a pseudo-spectrum that measures the degree of standard English and that thereby indirectly aims to surface the manifestations of English varieties in these tweets. Following best annotation practices, our growing corpus features 170,800 tweets taken from 7 countries, labeled by annotators who are from those countries and can communicate in regionally-dominant varieties of English. Our corpus highlights the accuracy discrepancies in pre-trained language identifiers between western English and non-western (i.e., less standard) English varieties. We hope to contribute to the growing literature identifying and reducing the implicit demographic discrepancies in NLP. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 10 pages (including limitations, references and appendices), 2 figures

arXiv:2312.16717 [pdf, other]

Landslide Detection and Segmentation Using Remote Sensing Images and Deep Neural Network

Authors: Cam Le, Lam Pham, Jasmin Lampert, Matthias Schlögl, Alexander Schindler

Abstract: Knowledge about historic landslide event occurrence is important for supporting disaster risk reduction strategies. Building upon findings from 2022 Landslide4Sense Competition, we propose a deep neural network based system for landslide detection and segmentation from multisource remote sensing image input. We use a U-Net trained with Cross Entropy loss as baseline model. We then improve the U-Ne… ▽ More Knowledge about historic landslide event occurrence is important for supporting disaster risk reduction strategies. Building upon findings from 2022 Landslide4Sense Competition, we propose a deep neural network based system for landslide detection and segmentation from multisource remote sensing image input. We use a U-Net trained with Cross Entropy loss as baseline model. We then improve the U-Net baseline model by leveraging a wide range of deep learning techniques. In particular, we conduct feature engineering by generating new band data from the original bands, which helps to enhance the quality of remote sensing image input. Regarding the network architecture, we replace traditional convolutional layers in the U-Net baseline by a residual-convolutional layer. We also propose an attention layer which leverages the multi-head attention scheme. Additionally, we generate multiple output masks with three different resolutions, which creates an ensemble of three outputs in the inference process to enhance the performance. Finally, we propose a combined loss function which leverages Focal loss and IoU loss to train the network. Our experiments on the development set of the Landslide4Sense challenge achieve an F1 score and an mIoU score of 84.07 and 76.07, respectively. Our best model setup outperforms the challenge baseline and the proposed U-Net baseline, improving the F1 score/mIoU score by 6.8/7.4 and 10.5/8.8, respectively. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.12414 [pdf, ps, other]

Translating Natural Language Queries to SQL Using the T5 Model

Authors: Albert Wong, Lien Pham, Young Lee, Shek Chan, Razel Sadaya, Youry Khmelevsky, Mathias Clement, Florence Wing Yau Cheng, Joe Mahony, Michael Ferri

Abstract: This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used s… ▽ More This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used successfully on a daily basis. The approach used in the model development could be implemented in a similar fashion for other database environments and with a more powerful pre-trained language model. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.01460 [pdf, other]

Towards an accurate and generalizable multiple sclerosis lesion segmentation model using self-ensembled lesion fusion

Authors: Jinwei Zhang, Lianrui Zuo, Blake E. Dewey, Samuel W. Remedios, Dzung L. Pham, Aaron Carass, Jerry L. Prince

Abstract: Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their… ▽ More Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their performance. In addition, the best-performing methods have not proven to be generalizable to diverse test datasets with contrast variations and image artifacts. In this work, we developed an accurate and generalizable MS lesion segmentation model using the well-known U-Net architecture without further modification. A novel test-time self-ensembled lesion fusion strategy is proposed that not only achieved the best performance using the ISBI 2015 MS segmentation challenge data but also demonstrated robustness across various self-ensemble parameter choices. Moreover, equipped with instance normalization rather than batch normalization widely used in literature, the model trained on the ISBI challenge data generalized well on clinical test datasets from different scanners. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.09231 [pdf, other]

Key Factors Affecting European Reactions to AI in European Full and Flawed Democracies

Authors: Long Pham, Barry O'Sullivan, Tai Tan Mai

Abstract: This study examines the key factors that affect European reactions to artificial intelligence (AI) in the context of both full and flawed democracies in Europe. Analysing a dataset of 4,006 respondents, categorised into full democracies and flawed democracies based on the Democracy Index developed by the Economist Intelligence Unit (EIU), this research identifies crucial factors that shape Europea… ▽ More This study examines the key factors that affect European reactions to artificial intelligence (AI) in the context of both full and flawed democracies in Europe. Analysing a dataset of 4,006 respondents, categorised into full democracies and flawed democracies based on the Democracy Index developed by the Economist Intelligence Unit (EIU), this research identifies crucial factors that shape European attitudes toward AI in these two types of democracies. The analysis reveals noteworthy findings. Firstly, it is observed that flawed democracies tend to exhibit higher levels of trust in government entities compared to their counterparts in full democracies. Additionally, individuals residing in flawed democracies demonstrate a more positive attitude toward AI when compared to respondents from full democracies. However, the study finds no significant difference in AI awareness between the two types of democracies, indicating a similar level of general knowledge about AI technologies among European citizens. Moreover, the study reveals that trust in AI measures, specifically "Trust AI Solution", does not significantly vary between full and flawed democracies. This suggests that despite the differences in democratic quality, both types of democracies have similar levels of confidence in AI solutions. △ Less

Submitted 4 October, 2023; originally announced November 2023.

Comments: IJCAI DemorcAI 2023 - The 2nd International Workshop on Democracy and AI in conjunction with IJCAI 2023, Macau

arXiv:2310.11477 [pdf, other]

Robust-MBFD: A Robust Deep Learning System for Motor Bearing Faults Detection Using Multiple Deep Learning Training Strategies and A Novel Double Loss Function

Authors: Khoa Tran, Lam Pham, Hai-Canh Vu

Abstract: This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following tra… ▽ More This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following training strategies: supervised learning, semi-supervised learning, and unsupervised learning. The proposed machine learning based systems and deep learning based systems are evaluated, compared, and then they are used to identify the best model for the MBFD task. We conducted extensive experiments on various benchmark datasets of motor bearing faults, including those from the American Society for Mechanical Failure Prevention Technology (MFPT), Case Western Reserve University Bearing Center (CWRU), and the Condition Monitoring of Bearing Damage in Electromechanical Drive Systems from Paderborn University (PU). The experimental results on different datasets highlight two main contributions of this study. First, we prove that deep learning based systems are more effective than machine learning based systems for the MBFD task. Second, we achieve a robust and general deep learning based system with a novel loss function for the MBFD task on several benchmark datasets, demonstrating its potential for real-life MBFD applications. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.06157 [pdf, other]

Robust-MBDL: A Robust Multi-branch Deep Learning Based Model for Remaining Useful Life Prediction and Operational Condition Identification of Rotating Machines

Authors: Khoa Tran, Hai-Canh Vu, Lam Pham, Nassim Boudaoud

Abstract: In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based feat… ▽ More In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based features from the denoised data; (3) a novel and robust multi-branch deep learning network architecture to exploit the multiple features. The performance of our proposed system was evaluated and compared to the state-of-the-art systems on two benchmark datasets of XJTU-SY and PRONOSTIA. The experimental results prove that our proposed system outperforms the state-of-the-art systems and presents potential for real-life applications on bearing machines. △ Less

Submitted 14 December, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.01261 [pdf, other]

Worst-Case Input Generation for Concurrent Programs under Non-Monotone Resource Metrics

Authors: Long Pham, Jan Hoffmann

Abstract: Worst-case input generation aims to automatically generate inputs that exhibit the worst-case performance of programs. It has several applications, and can, for example, detect vulnerabilities to denial-of-service attacks. However, it is non-trivial to generate worst-case inputs for concurrent programs, particularly for resources like memory where the peak cost depends on how processes are schedul… ▽ More Worst-case input generation aims to automatically generate inputs that exhibit the worst-case performance of programs. It has several applications, and can, for example, detect vulnerabilities to denial-of-service attacks. However, it is non-trivial to generate worst-case inputs for concurrent programs, particularly for resources like memory where the peak cost depends on how processes are scheduled. This article presents the first sound worst-case input generation algorithm for concurrent programs under non-monotone resource metrics like memory. The key insight is to leverage resource-annotated session types and symbolic execution. Session types describe communication protocols on channels in process calculi. Equipped with resource annotations, resource-annotated session types not only encode cost bounds but also indicate how many resources can be reused and transferred between processes. This information is critical for identifying a worst-case execution path during symbolic execution. The algorithm is sound: if it returns any input, it is guaranteed to be a valid worst-case input. The algorithm is also relatively complete: as long as resource-annotated session types are sufficiently expressive and the background theory for SMT solving is decidable, a worst-case input is guaranteed to be returned. A simple case study of a web server's memory usage demonstrates the utility of the worst-case input generation algorithm. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.09979 [pdf, ps, other]

Artificial Intelligence across Europe: A Study on Awareness, Attitude and Trust

Authors: Teresa Scantamburlo, Atia Cortés, Francesca Foffano, Cristian Barrué, Veronica Distefano, Long Pham, Alessandro Fabris

Abstract: This paper presents the results of an extensive study investigating the opinions on Artificial Intelligence (AI) of a sample of 4,006 European citizens from eight distinct countries (France, Germany, Italy, Netherlands, Poland, Romania, Spain, and Sweden). The aim of the study is to gain a better understanding of people's views and perceptions within the European context, which is already marked b… ▽ More This paper presents the results of an extensive study investigating the opinions on Artificial Intelligence (AI) of a sample of 4,006 European citizens from eight distinct countries (France, Germany, Italy, Netherlands, Poland, Romania, Spain, and Sweden). The aim of the study is to gain a better understanding of people's views and perceptions within the European context, which is already marked by important policy actions and regulatory processes. To survey the perceptions of the citizens of Europe we design and validate a new questionnaire (PAICE) structured around three dimensions: people's awareness, attitude, and trust. We observe that while awareness is characterized by a low level of self-assessed competency, the attitude toward AI is very positive for more than half of the population. Reflecting upon the collected results, we highlight implicit contradictions and identify trends that may interfere with the creation of an ecosystem of trust and the development of inclusive AI policies. The introduction of rules that ensure legal and ethical standards, along with the activity of high-level educational entities, and the promotion of AI literacy are identified as key factors in supporting a trustworthy AI ecosystem. We make some recommendations for AI governance focused on the European context and conclude with suggestions for future work. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2307.02289 [pdf, other]

Fuzzing with Quantitative and Adaptive Hot-Bytes Identification

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, c… ▽ More Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, checksum tests, and nested if-statements), which are commonly used in image decoders/encoders, XML parsers, and checksum tools. Existing approaches (such as Steelix and Neuzz) on addressing this problem assume unrealistic assumptions such as we can satisfy the branch condition byte-to-byte or we can identify and focus on the important bytes in the input (called hot-bytes) once and for all. In this work, we propose an approach called \tool~which is designed based on the following principles. First, there is a complicated relation between inputs and branching conditions and thus we need not only an expressive model to capture such relationship but also an informative measure so that we can learn such relationship effectively. Second, different branching conditions demand different hot-bytes and we must adjust our fuzzing strategy adaptively depending on which branches are the current bottleneck. We implement our approach as an open source project and compare its efficiency with other state-of-the-art fuzzers. Our evaluation results on 10 real-world programs and LAVA-M dataset show that \tool~achieves sustained increases in branch coverage and discovers more bugs than other fuzzers. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.14929 [pdf, other]

A Deep Learning Architecture with Spatio-Temporal Focusing for Detecting Respiratory Anomalies

Authors: Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi

Abstract: This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Our system initially performs audio feature extraction using Continuous Wavelet transformation. This transformation converts the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed deep learning architecture… ▽ More This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Our system initially performs audio feature extraction using Continuous Wavelet transformation. This transformation converts the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed deep learning architecture inspired by the Inception-residual-based backbone performs the spatial-temporal focusing and multi-head attention mechanism to classify respiratory anomalies. In this work, we evaluate our proposed models on the benchmark SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) database proposed by the IEEE BioCAS 2023 challenge. As regards the Score computed by an average between the average score and harmonic score, our robust system has achieved Top-1 performance with Scores of 0.810, 0.667, 0.744, and 0.608 in Tasks 1-1, 1-2, 2-1, and 2-2, respectively. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.04104

arXiv:2305.09463 [pdf, other]

Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

Authors: Lam Pham, Dat Ngo, Cam Le, Anahid Jalali, Alexander Schindler

Abstract: In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher… ▽ More In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher, the embeddings, which are the feature map of the second last layer of the teacher, are extracted. In the second phase, the student network, which presents a low complexity model, is trained with the embeddings extracted from the teacher. Our experiments conducted on DCASE 2023 Task 1 Development dataset have fulfilled the requirement of low-complexity and achieved the best classification accuracy of 57.4%, improving DCASE baseline by 14.5%. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2206.06057

arXiv:2305.07274 [pdf, other]

Deletion Correcting Codes for Efficient DNA Synthesis

Authors: Johan Chrisnata, Han Mao Kiah, Van Long Phuoc Pham

Abstract: The synthesis of DNA strands remains the most costly part of the DNA storage system. Thus, to make DNA storage system more practical, the time and materials used in the synthesis process have to be optimized. We consider the most common type of synthesis process where multiple DNA strands are synthesized in parallel from a common alternating supersequence, one nucleotide at a time. The synthesis t… ▽ More The synthesis of DNA strands remains the most costly part of the DNA storage system. Thus, to make DNA storage system more practical, the time and materials used in the synthesis process have to be optimized. We consider the most common type of synthesis process where multiple DNA strands are synthesized in parallel from a common alternating supersequence, one nucleotide at a time. The synthesis time or the number of synthesis cycles is then determined by the length of this common supersequence. In this model, we design quaternary codes that minimizes synthesis time that can correct deletions or insertions, which are the most prevalent types of error in array-based synthesis. We also propose polynomial-time algorithms that encode binary strings into these codes and show that the rate is close to capacity. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: A shorter version of this paper will be presented in in ISIT 2023

arXiv:2305.01476 [pdf, other]

Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification

Authors: Lam Pham, Trang Le, Cam Le, Dat Ngo, Weissenfeld Axel, Alexander Schindler

Abstract: In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we… ▽ More In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we obtain audio embeddings and visual embeddings by extracting feature maps from the pre-trained deep learning models. In the second training phase (Phase II), we train a fusion layer to combine the audio/visual embeddings and a dense layer to classify the combined embedding into target daily scenes. Our extensive experiments, which were conducted on the benchmark dataset of DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) 2021 Task 1B Development, achieved the best classification accuracy of 80.5%, 91.8%, and 95.3% with only audio data, with only visual data, both audio and visual data, respectively. The highest classification accuracy of 95.3% presents an improvement of 17.9% compared with DCASE baseline and shows very competitive to the state-of-the-art systems. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2303.04104 [pdf, other]

An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies

Authors: Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi, Sefki Kolozali

Abstract: This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system in… ▽ More This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system integrates Inception-residual-based backbone models combined with multi-head attention and multi-objective loss to classify respiratory anomalies. Instead of applying a simple concatenation approach by combining results from various spectrograms, we propose a Linear combination, which has the ability to regulate equally the contribution of each individual spectrogram throughout the training process. To evaluate the performance, we conducted experiments over the benchmark dataset of SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) proposed by the IEEE BioCAS 2022 challenge. As regards the Score computed by an average between the average score and harmonic score, our proposed system gained significant improvements of 9.7%, 15.8%, 17.8%, and 16.1% in Task 1-1, Task 1-2, Task 2-1, and Task 2-2, respectively, compared to the challenge baseline system. Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74.5% and 53.9%, respectively. △ Less

Submitted 19 June, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.13028 [pdf, other]

A Light-weight Deep Learning Model for Remote Sensing Image Classification

Authors: Lam Pham, Cam Le, Dat Ngo, Anh Nguyen, Jasmin Lampert, Alexander Schindler, Ian McLoughlin

Abstract: In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, C… ▽ More In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, ConNeXt Tiny/Large. Then, the best performing models are selected to train a compact model in a teacher-student arrangement. The knowledge distillation from the teacher aims to achieve high performance with significantly reduced complexity. By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems, and has potential to be applied on a wide rage of edge devices. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2211.02820 [pdf, other]

A Robust and Low Complexity Deep Learning Model for Remote Sensing Image Classification

Authors: Cam Le, Lam Pham, Nghia NVN, Truong Nguyen, Le Hong Trang

Abstract: In this paper, we present a robust and low complexity deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the scene of a remote sensing image. In particular, we firstly evaluate different low complexity and benchmark deep neural networks: MobileNetV1, MobileNetV2, NASNetMobile, and EfficientNetB0, which present the number of trainable parameters lower than 5… ▽ More In this paper, we present a robust and low complexity deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the scene of a remote sensing image. In particular, we firstly evaluate different low complexity and benchmark deep neural networks: MobileNetV1, MobileNetV2, NASNetMobile, and EfficientNetB0, which present the number of trainable parameters lower than 5 Million (M). After indicating best network architecture, we further improve the network performance by applying attention schemes to multiple feature maps extracted from middle layers of the network. To deal with the issue of increasing the model footprint as using attention schemes, we apply the quantization technique to satisfy the maximum of 20 MB memory occupation. By conducting extensive experiments on the benchmark datasets NWPU-RESISC45, we achieve a robust and low-complexity model, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices. △ Less

Submitted 12 December, 2022; v1 submitted 5 November, 2022; originally announced November 2022.

Comments: 8 pages

arXiv:2210.08610 [pdf, other]

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Authors: Lam Pham, Dusan Salovic, Anahid Jalali, Alexander Schindler, Khoa Tran, Canh Vu, Phu X. Nguyen

Abstract: In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile… ▽ More In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2209.09327 [pdf, ps, other]

S2TD: a Separation Logic Verifier that Supports Reasoning of the Absence and Presence of Bugs

Authors: Quang Loc Le, Jun Sun, Long H. Pham, Shengchao Qin

Abstract: Heap-manipulating programs are known to be challenging to reason about. We present a novel verifier for heap-manipulating programs called S2TD, which encodes programs systematically in the form of Constrained Horn Clauses (CHC) using a novel extension of separation logic (SL) with recursive predicates and dangling predicates. S2TD actively explores cyclic proofs to address the path explosion probl… ▽ More Heap-manipulating programs are known to be challenging to reason about. We present a novel verifier for heap-manipulating programs called S2TD, which encodes programs systematically in the form of Constrained Horn Clauses (CHC) using a novel extension of separation logic (SL) with recursive predicates and dangling predicates. S2TD actively explores cyclic proofs to address the path explosion problem. S2TD differentiates itself from existing CHC-based verifiers by focusing on heap-manipulating programs and employing cyclic proof to efficiently verify or falsify them with counterexamples. Compared with existing SL-based verifiers, S2TD precisely specifies the heaps of de-allocated pointers to avoid false positives in reasoning about the presence of bugs. S2TD has been evaluated using a comprehensive set of benchmark programs from the SV-COMP repository. The results show that S2TD is more effective than state-of-art program verifiers and is more efficient than most of them. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 24 pages

MSC Class: 68N15

arXiv:2209.02611 [pdf, other]

Deep filter bank regression for super-resolution of anisotropic MR brain images

Authors: Samuel W. Remedios, Shuo Han, Yuan Xue, Aaron Carass, Trac D. Tran, Dzung L. Pham, Jerry L. Prince

Abstract: In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, w… ▽ More In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, we reframe the SR problem statement in terms of perfect reconstruction filter banks, enabling us to identify and directly estimate the missing information. In this work, we propose a two-stage approach to approximate the completion of a perfect reconstruction filter bank corresponding to the anisotropic acquisition of a particular scan. In stage 1, we estimate the missing filters using gradient descent and in stage 2, we use deep networks to learn the mapping from coarse coefficients to detail coefficients. In addition, the proposed formulation does not rely on external training data, circumventing the need for domain shift correction. Under our approach, SR performance is improved particularly in "slice gap" scenarios, likely due to the constrained solution space imposed by the framework. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2206.13392 [pdf, ps, other]

Remote Sensing Image Classification using Transfer Learning and Attention Based Deep Neural Network

Authors: Lam Pham, Khoa Tran, Dat Ngo, Jasmin Lampert, Alexander Schindler

Abstract: The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community… ▽ More The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community focusing on RSISC task has shown significant effort to publish diverse datasets as well as propose different approaches to deal with the RSISC challenges. Recently, almost proposed RSISC systems base on deep learning models which prove powerful and outperform traditional approaches using image processing and machine learning. In this paper, we also leverage the power of deep learning technology, evaluate a variety of deep neural network architectures, indicate main factors affecting the performance of a RSISC system. Given the comprehensive analysis, we propose a deep learning based framework for RSISC, which makes use of the transfer learning technique and multihead attention scheme. The proposed deep learning framework is evaluated on the benchmark NWPU-RESISC45 dataset and achieves the best classification accuracy of 94.7% which shows competitive to the state-of-the-art systems and potential for real-life applications. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.06057 [pdf, ps, other]

Low-complexity deep learning frameworks for acoustic scene classification

Authors: Lam Pham, Dat Ngo, Anahid Jalali, Alexander Schindler

Abstract: In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. N… ▽ More In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.06992 [pdf, other]

Verifying Neural Networks Against Backdoor Attacks

Authors: Long H. Pham, Jun Sun

Abstract: Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a tr… ▽ More Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is 'backdoored' based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach integrates statistical sampling as well as abstract interpretation. The experiment results show that our approach effectively verifies the absence of backdoor or generates backdoor triggers. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2204.09274 [pdf, other]

Causality-based Neural Network Repair

Authors: Bing Sun, Jun Sun, Hong Long Pham, Jie Shi

Abstract: Neural networks have had discernible achievements in a wide range of applications. The wide-spread adoption also raises the concern of their dependability and reliability. Similar to traditional decision-making programs, neural networks can have defects that need to be repaired. The defects may cause unsafe behaviors, raise security concerns or unjust societal impacts. In this work, we address the… ▽ More Neural networks have had discernible achievements in a wide range of applications. The wide-spread adoption also raises the concern of their dependability and reliability. Similar to traditional decision-making programs, neural networks can have defects that need to be repaired. The defects may cause unsafe behaviors, raise security concerns or unjust societal impacts. In this work, we address the problem of repairing a neural network for desirable properties such as fairness and the absence of backdoor. The goal is to construct a neural network that satisfies the property by (minimally) adjusting the given neural network's parameters (i.e., weights). Specifically, we propose CARE (\textbf{CA}usality-based \textbf{RE}pair), a causality-based neural network repair technique that 1) performs causality-based fault localization to identify the `guilty' neurons and 2) optimizes the parameters of the identified neurons to reduce the misbehavior. We have empirically evaluated CARE on various tasks such as backdoor removal, neural network repair for fairness and safety properties. Our experiment results show that CARE is able to repair all neural networks efficiently and effectively. For fairness repair tasks, CARE successfully improves fairness by $61.91\%$ on average. For backdoor removal tasks, CARE reduces the attack success rate from over $98\%$ to less than $1\%$. For safety property repair tasks, CARE reduces the property violation rate to less than $1\%$. Results also show that thanks to the causality-based fault localization, CARE's repair focuses on the misbehavior and preserves the accuracy of the neural networks. △ Less

Submitted 7 July, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2203.12314 [pdf, other]

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Authors: Lam Pham, Khoa Dinh, Dat Ngo, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity… ▽ More In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction on the ASC baseline system. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 69.9% and a low complexity of 2.4M trainable parameters, which is competitive to the state-of-the-art ASC systems and potential for real-life applications on edge devices. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: This paper was submitted to INTERSPEECH 2022

arXiv:2202.05626 [pdf, other]

Audio-Based Deep Learning Frameworks for Detecting COVID-19

Authors: Dat Ngo, Lam Pham, Truong Hoang, Sefki Kolozali, Delaram Jarchi

Abstract: This paper evaluates a wide range of audio-based deep learning frameworks applied to the breathing, cough, and speech sounds for detecting COVID-19. In general, the audio recording inputs are transformed into low-level spectrogram features, then they are fed into pre-trained deep learning models to extract high-level embedding features. Next, the dimension of these high-level embedding features ar… ▽ More This paper evaluates a wide range of audio-based deep learning frameworks applied to the breathing, cough, and speech sounds for detecting COVID-19. In general, the audio recording inputs are transformed into low-level spectrogram features, then they are fed into pre-trained deep learning models to extract high-level embedding features. Next, the dimension of these high-level embedding features are reduced before finetuning using Light Gradient Boosting Machine (LightGBM) as a back-end classification. Our experiments on the Second DiCOVA Challenge achieved the highest Area Under the Curve (AUC), F1 score, sensitivity score, and specificity score of 89.03%, 64.41%, 63.33%, and 95.13%, respectively. Based on these scores, our method outperforms the state-of-the-art systems, and improves the challenge baseline by 4.33%, 6.00% and 8.33% in terms of AUC, F1 score and sensitivity score, respectively. △ Less

Submitted 2 March, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2201.04742 [pdf, other]

nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Authors: Paul Schmitt, Nicholas Britten, JiHyun Jeong, Amelia Coffey, Kevin Clark, Shweta Sunil Kothawade, Elena Corina Grigore, Adam Khaw, Christopher Konopka, Linh Pham, Kim Ryan, Christopher Schmitt, Aryaman Pandya, Emilio Frazzoli

Abstract: We present nuReality, a virtual reality 'VR' environment designed to test the efficacy of vehicular behaviors to communicate intent during interactions between autonomous vehicles 'AVs' and pedestrians at urban intersections. In this project we focus on expressive behaviors as a means for pedestrians to readily recognize the underlying intent of the AV's movements. VR is an ideal tool to use to te… ▽ More We present nuReality, a virtual reality 'VR' environment designed to test the efficacy of vehicular behaviors to communicate intent during interactions between autonomous vehicles 'AVs' and pedestrians at urban intersections. In this project we focus on expressive behaviors as a means for pedestrians to readily recognize the underlying intent of the AV's movements. VR is an ideal tool to use to test these situations as it can be immersive and place subjects into these potentially dangerous scenarios without risk. nuReality provides a novel and immersive virtual reality environment that includes numerous visual details (road and building texturing, parked cars, swaying tree limbs) as well as auditory details (birds chirping, cars honking in the distance, people talking). In these files we present the nuReality environment, its 10 unique vehicle behavior scenarios, and the Unreal Engine and Autodesk Maya source files for each scenario. The files are publicly released as open source at www.nuReality.org, to support the academic community studying the critical AV-pedestrian interaction. △ Less

Submitted 12 January, 2022; originally announced January 2022.

arXiv:2201.03054 [pdf, ps, other]

An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies

Authors: Lam Pham, Dat Ngo, Truong Hoang, Alexander Schindler, Ian McLoughlin

Abstract: In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks f… ▽ More In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks for classifying these respiratory cycles into certain categories. Finally, results from high-performed deep learning frameworks are fused to obtain the best score. Our experiments on ICBHI benchmark dataset achieve the highest ICBHI score of 57.3 from a late fusion of inception based and transfer learning based deep learning frameworks, which outperforms the state-of-the-art systems. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2112.09172 [pdf, ps, other]

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

Authors: Lam Pham, Dat Ngo, Phu X. Nguyen, Truong Hoang, Alexander Schindler

Abstract: This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning framew… ▽ More This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning frameworks are proposed to deploy either audio or visual input data independently. Finally, results obtained from high-performed deep learning frameworks are fused to achieve the best accuracy score. Our experimental results indicate that audio and visual input factors independently contribute to the SC task's performance. Significantly, an ensemble of deep learning frameworks exploring either audio or visual input data can achieve the best accuracy of 95.7%. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2111.08817 [pdf, other]

Compressive Features in Offline Reinforcement Learning for Recommender Systems

Authors: Hung Nguyen, Minh Nguyen, Long Pham, Jennifer Adorno Nieves

Abstract: In this paper, we develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider. Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge. The limitation of the offline data set and the curse of… ▽ More In this paper, we develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider. Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge. The limitation of the offline data set and the curse of high dimensionality pose significant obstacles to solving this problem. Our proposed method focuses on improving the total rewards and performance by tackling these main difficulties. More specifically, we utilized sparse PCA to extract important features of user behaviors. Our Q-learning-based system is then trained from the processed offline data set. To exploit all possible information from the provided data set, we cluster user features to different groups and build an independent Q-table for each group. Furthermore, to tackle the challenge of unknown formula for evaluation metrics, we design a metric to self-evaluate our system's performance based on the potential value the game provider might achieve and a small collection of actual evaluation metrics that we obtain from the live scoring environment. Our experiments show that our proposed metric is consistent with the results published by the challenge organizers. We have implemented the proposed training pipeline, and the results show that our method outperforms current state-of-the-art methods in terms of both total rewards and training speed. By addressing the main challenges and leveraging the state-of-the-art techniques, we have achieved the best public leaderboard result in the challenge. Furthermore, our proposed method achieved an estimated score of approximately 20% better and can be trained faster by 30 times than the best of the current state-of-the-art methods. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2111.04255 [pdf, ps, other]

Sequence Reconstruction Problem for Deletion Channels: A Complete Asymptotic Solution

Authors: Van Long Phuoc Pham, Keshav Goyal, Han Mao Kiah

Abstract: Transmit a codeword $x$, that belongs to an $(\ell-1)$-deletion-correcting code of length $n$, over a $t$-deletion channel for some $1\le \ell\le t<n$. Levenshtein, in 2001, proposed the problem of determining $N(n,\ell,t)+1$, the minimum number of distinct channel outputs required to uniquely reconstruct $x$. Prior to this work, $N(n,\ell,t)$ is known only when $\ell\in\{1,2\}$. Here, we provide… ▽ More Transmit a codeword $x$, that belongs to an $(\ell-1)$-deletion-correcting code of length $n$, over a $t$-deletion channel for some $1\le \ell\le t<n$. Levenshtein, in 2001, proposed the problem of determining $N(n,\ell,t)+1$, the minimum number of distinct channel outputs required to uniquely reconstruct $x$. Prior to this work, $N(n,\ell,t)$ is known only when $\ell\in\{1,2\}$. Here, we provide an asymptotically exact solution for all values of $\ell$ and $t$. Specifically, we show that $N(n,\ell,t)=\binom{2\ell}{\ell}/(t-\ell)! n^{t-\ell} - O(n^{t-\ell-1})$ and in the special instance where $\ell=t$, we show that $N(n,\ell,\ell)=\binom{2\ell}{\ell}$. We also provide a conjecture on the exact value of $N(n,\ell,t)$ for all values of $n$, $\ell$, and $t$. △ Less

Submitted 7 November, 2021; originally announced November 2021.

MSC Class: 94B99

arXiv:2110.06323 [pdf, other]

An Annihilating Filter-Based DOA Estimation for Uniform Linear Array

Authors: Son Phan, Lam Pham

Abstract: In this paper, we propose a new method to design an annihilating filter (AF) for direction-of-arrival (DOA) estimation of multiple snapshots within an uniform linear array. To evaluate the proposed method, we firstly design a DOA estimation using multiple signal classification (MUSIC) algorithm, referred to as the MUSIC baseline. We then compare the proposed method with the MUSIC baseline in two e… ▽ More In this paper, we propose a new method to design an annihilating filter (AF) for direction-of-arrival (DOA) estimation of multiple snapshots within an uniform linear array. To evaluate the proposed method, we firstly design a DOA estimation using multiple signal classification (MUSIC) algorithm, referred to as the MUSIC baseline. We then compare the proposed method with the MUSIC baseline in two environmental noise conditions: Only white noise, or both white noise and diffusion. The experimental results highlight two main contributions; the first is to modify conventional MUSIC algorithm for adapting different noise conditions, and the second is to propose an AF-based method that shows competitive accuracy of arrival angles detected and low complexity compared with the MUSIC baseline. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.03251 [pdf, other]

doi 10.1109/EMBC48229.2022.9871179

A Cough-based deep learning framework for detecting COVID-19

Authors: Truong Hoang, Lam Pham, Dat Ngo, Hoang D. Nguyen

Abstract: This paper presents a deep learning framework for detecting COVID-19 positive subjects from their cough sounds. In particular, the proposed approach comprises two main steps. In the first step, we generate a feature representing the cough sound by combining an embedding extracted from a pre-trained model and handcrafted features extracted from draw audio recording, referred to as the front-end fea… ▽ More This paper presents a deep learning framework for detecting COVID-19 positive subjects from their cough sounds. In particular, the proposed approach comprises two main steps. In the first step, we generate a feature representing the cough sound by combining an embedding extracted from a pre-trained model and handcrafted features extracted from draw audio recording, referred to as the front-end feature extraction. Then, the combined features are fed into different back-end classification models for detecting COVID-19 positive subjects in the second step. Our experiments on the Track-2 dataset of the Second 2021 DiCOVA Challenge achieved the second top ranking with an AUC score of 81.21 and the top F1 score of 53.21 on a Blind Test set, improving the challenge baseline by 8.43% and 23.4% respectively and showing deployability, robustness and competitiveness with the state-of-the-art systems. △ Less

Submitted 30 September, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: COVID-19, EMBC-2022, DiCOVA, top 2nd, benchmark on Spec > 0.95%

MSC Class: 92-05; 68Txx ACM Class: J.3; I.5.4; I.5.2; H.5.5; C.3; K.5

Journal ref: EMBC 44 (2022) 3422-3425

arXiv:2107.09268 [pdf, ps, other]

Robust Deep Learning Frameworks for Acoustic Scene and Respiratory Sound Classification

Authors: Lam Pham

Abstract: This thesis focuses on dealing with the task of acoustic scene classification (ASC), and then applied the techniques developed for ASC to a real-life application of detecting respiratory disease. To deal with ASC challenges, this thesis addresses three main factors that directly affect the performance of an ASC system. Firstly, this thesis explores input features by making use of multiple spectrog… ▽ More This thesis focuses on dealing with the task of acoustic scene classification (ASC), and then applied the techniques developed for ASC to a real-life application of detecting respiratory disease. To deal with ASC challenges, this thesis addresses three main factors that directly affect the performance of an ASC system. Firstly, this thesis explores input features by making use of multiple spectrograms (log-mel, Gamma, and CQT) for low-level feature extraction to tackle the issue of insufficiently discriminative or descriptive input features. Next, a novel Encoder network architecture is introduced. The Encoder firstly transforms each low-level spectrogram into high-level intermediate features, or embeddings, and thus combines these high-level features to form a very distinct composite feature. The composite or combined feature is then explored in terms of classification performance, with different Decoders such as Random Forest (RF), Multilayer Perception (MLP), and Mixture of Experts (MoE). By using this Encoder-Decoder framework, it helps to reduce the computation cost of the reference process in ASC systems which make use of multiple spectrogram inputs. Since the proposed techniques applied for general ASC tasks were shown to be highly effective, this inspired an application to a specific real-life application. This was namely the 2017 Internal Conference on Biomedical Health Informatics (ICBHI) respiratory sound dataset. Building upon the proposed ASC framework, the ICBHI tasks were tackled with a deep learning framework, and the resulting system shown to be capable at detecting respiratory anomaly cycles and diseases. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2106.06840 [pdf, ps, other]

Deep Learning Frameworks Applied For Audio-Visual Scene Classification

Authors: Lam Pham, Alexander Schindler, Mina Schütz, Jasmin Lampert, Sven Schlarb, Ross King

Abstract: In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification… ▽ More In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2%, 91.1%, and 93.9% with audio input only, visual input only, and both audio-visual input, respectively. The highest classification accuracy of 93.9%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5% compared with DCASE baseline. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: 6 pages

arXiv:2106.06838 [pdf, ps, other]

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Authors: Lam Pham, Hieu Tang, Anahid Jalali, Alexander Schindler, Ross King

Abstract: In this paper, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. First, we use Mel filter, Gammatone filter and Constant Q Transfrom (CQT) to transform raw audio signal into spectrograms, w… ▽ More In this paper, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. First, we use Mel filter, Gammatone filter and Constant Q Transfrom (CQT) to transform raw audio signal into spectrograms, where both frequency and temporal features are presented. Three spectrograms are then fed into three individual back-end convolutional neural networks (CNNs), classifying into ten urban scenes. Finally, a late fusion of three predicted probabilities obtained from three CNNs is conducted to achieve the final classification result. To reduce the complexity of our proposed CNN network, we apply two model compression techniques: model restriction and decomposed convolution. Our extensive experiments, which are conducted on DCASE 2021 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1A development dataset, achieve a low-complexity CNN based framework with 128 KB trainable parameters and the best classification accuracy of 66.7%, improving DCASE baseline by 19.0% △ Less

Submitted 12 June, 2021; originally announced June 2021.

arXiv:2104.02523 [pdf, other]

An Analysis of State-of-the-art Activation Functions For Supervised Deep Neural Network

Authors: Anh Nguyen, Khoa Pham, Dat Ngo, Thanh Ngo, Lam Pham

Abstract: This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two dee… ▽ More This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two deep learning network architectures integrating these activation functions are conducted. The first model, basing on Multilayer Perceptron (MLP), is evaluated with MNIST dataset to perform these activation functions. Meanwhile, the second model, likely VGGish-based architecture, is applied for Acoustic Scene Classification (ASC) Task 1A in DCASE 2018 challenge, thus evaluate whether these activation functions work well in different datasets as well as different network architectures. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 6 pages, 5 figures

arXiv:2104.01161 [pdf, ps, other]

An Audio-Based Deep Learning Framework For BBC Television Programme Classification

Authors: Lam Pham, Chris Baume, Qiuqiang Kong, Tassadaq Hussain, Wenwu Wang, Mark Plumbley

Abstract: This paper proposes a deep learning framework for classification of BBC television programmes using audio. The audio is firstly transformed into spectrograms, which are fed into a pre-trained convolutional Neural Network (CNN), obtaining predicted probabilities of sound events occurring in the audio recording. Statistics for the predicted probabilities and detected sound events are then calculated… ▽ More This paper proposes a deep learning framework for classification of BBC television programmes using audio. The audio is firstly transformed into spectrograms, which are fed into a pre-trained convolutional Neural Network (CNN), obtaining predicted probabilities of sound events occurring in the audio recording. Statistics for the predicted probabilities and detected sound events are then calculated to extract discriminative features representing the television programmes. Finally, the embedded features extracted are fed into a classifier for classifying the programmes into different genres. Our experiments are conducted over a dataset of 6,160 programmes belonging to nine genres labelled by the BBC. We achieve an average classification accuracy of 93.7% over 14-fold cross validation. This demonstrates the efficacy of the proposed framework for the task of audio-based classification of television programmes. △ Less

Submitted 11 February, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

arXiv:2103.02767 [pdf, other]

doi 10.1007/978-3-030-59520-3_1

Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis

Authors: Dzung L. Pham, Yi-Yu Chou, Blake E. Dewey, Daniel S. Reich, John A. Butman, Snehashis Roy

Abstract: Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantl… ▽ More Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 10 pages. MICCAI SASHIMI Workshop 2021

arXiv:2103.02420 [pdf, ps, other]

Multi-view Audio and Music Classification

Authors: Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Pham, Philipp Koch, Ian McLoughlin, Alfred Mertins

Abstract: We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view emb… ▽ More We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification branch, the network also maintains four classification branches on the single-view embedding of the subnetworks. A novel method is then proposed to keep track of the learning behavior on the classification branches and adapt their weights to proportionally blend their gradients for network training. The weights are adapted in such a way that learning on a branch that is generalizing well will be encouraged whereas learning on a branch that is overfitting will be slowed down. Experiments on three different audio and music classification tasks show that the proposed multi-view network not only outperforms the single-view baselines but also is superior to the multi-view baselines based on concatenation and late fusion. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Accepted to ICASSP 2021

arXiv:2101.01917 [pdf, other]

sGUARD: Towards Fixing Vulnerable Smart Contracts Automatically

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essen… ▽ More Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essential that smart contracts are guaranteed to be free of vulnerabilities. Unfortunately, smart contract languages such as Solidity are Turing-complete, which implies that verifying them statically is infeasible. Thus, alternative approaches must be developed to provide the guarantee. In this work, we develop an approach which automatically transforms smart contracts so that they are provably free of 4 common kinds of vulnerabilities. The key idea is to apply runtime verification in an efficient and provably correct manner. Experiment results with 5000 smart contracts show that our approach incurs minor run-time overhead in terms of time (i.e., 14.79%) and gas (i.e., 0.79%). △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: Published in IEEE S&P 2021

Showing 1–50 of 82 results for author: Pham, L