Search | arXiv e-print repository

PRESENT: Zero-Shot Text-to-Prosody Control

Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

Abstract: Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modi… ▽ More Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modifying the inference process directly. We apply our text-to-prosody framework to zero-shot language transfer using a JETS model exclusively trained on English LJSpeech data. We obtain character error rates (CER) of 12.8%, 18.7% and 5.9% for German, Hungarian and Spanish respectively, beating the previous state-of-the-art CER by over 2x for all three languages. Furthermore, we allow subphoneme-level control, a first in this field. To evaluate its effectiveness, we show that PRESENT can improve the prosody of questions, and use it to generate Mandarin, a tonal language where vowel pitch varies at subphoneme level. We attain 25.3% hanzi CER and 13.0% pinyin CER with the JETS model. All our code and audio samples are available online. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.03110 [pdf, other]

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio \& visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications. △ Less

Submitted 2 May, 2024; originally announced July 2024.

arXiv:2407.01777 [pdf, other]

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

Authors: Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler

Abstract: In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and dis… ▽ More In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint, Convnext-Tiny, GoogLeNet, MNASsnet, RegNet. In the third approach, we leverage the state-of-the-art audio pre-trained models of Whisper, Seamless, Speechbrain, and Pyannote to extract audio embeddings from the input spectrograms. Then, the audio embeddings are explored by a Multilayer perceptron (MLP) model to detect the fake or real audio samples. Finally, high-performance deep learning models from these approaches are fused to achieve the best performance. We evaluated our proposed models on ASVspoof 2019 benchmark dataset. Our best ensemble model achieved an Equal Error Rate (EER) of 0.03, which is highly competitive to top-performing systems in the ASVspoofing 2019 challenge. Experimental results also highlight the potential of selective spectrograms and deep learning approaches to enhance the task of audio deepfake detection. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2403.05146 [pdf, other]

Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy

Authors: Yuelin Zhang, Wanquan Yan, Kim Yan, Chun Ping Lam, Yufu Qiu, Pengyu Zheng, Raymond Shing-Yan Tang, Shing Shin Cheng

Abstract: Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackl… ▽ More Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackling several unique challenges. To address the issue of significant appearance variation of the endoscope tip while keeping dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed to introduce dynamic transient mutual templates to dual-camera tracking. To alleviate disturbance from large occlusion and distortion by the light source from the endoscope tip, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. It is the first application of Mamba for object tracking. The proposed tracker was evaluated on datasets captured by low-cost camera pairs during endoscopy procedures performed inside the mechanical simulator. The tracker achieves SOTA performance with robust and consistent tracking on dual cameras. Further downstream evaluation proves that the 3D tip position determined by the proposed tracker enables reliable skill differentiation. The code and dataset are available at https://github.com/PieceZhang/MotionDCTrack △ Less

Submitted 20 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.00379 [pdf, other]

The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effective to enhance the system performance. Among the evaluating techniques, the narrow frequency bands presents a significant impact. Indeed, our proposed model, which focuses on the narrow frequency bands, outperforms the DCASE baseline on the benchmark dataset of DCASE 2022 Task 2 Development set. The important role of the narrow frequency bands indicated in this paper inspires the research community on the task of Acoustic Anomaly Detection of Machines to further investigate and propose novel network architectures focusing on the frequency bands. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2401.15854 [pdf, other]

LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts

Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Michael Seidl, Medina Andresel, Alexander Schindler

Abstract: The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic inf… ▽ More The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic information between words in the sentence and the contextual relationship of sentences within the abstract, which then enhances the SSC system performance. In this paper, we propose a LSTM-based deep learning network with a focus on creating comprehensive sentence representation at the sentence level. To demonstrate the efficacy of the created sentence representation, a system utilizing these sentence embeddings is also developed, which consists of a Convolutional-Recurrent neural network (C-RNN) at the abstract level and a multi-layer perception network (MLP) at the segment level. Our proposed system yields highly competitive results compared to state-of-the-art systems and further enhances the F1 scores of the baseline by 1.0%, 2.8%, and 2.6% on the benchmark datasets PudMed 200K RCT, PudMed 20K RCT and NICTA-PIBOSO, respectively. This indicates the significant impact of improving sentence representation on boosting model performance. △ Less

Submitted 31 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Submitted to FedCSIS 2024

arXiv:2308.16122 [pdf, other]

Spatial Graph Coarsening: Weather and Weekday Prediction with London's Bike-Sharing Service using GNN

Authors: Yuta Sato, Pak Hei Lam, Shruti Gupta, Fareesah Hussain

Abstract: This study introduced the use of Graph Neural Network (GNN) for predicting the weather and weekday of a day in London, from the dataset of Santander Cycles bike-sharing system as a graph classification task. The proposed GNN models newly introduced (i) a concatenation operator of graph features with trained node embeddings and (ii) a graph coarsening operator based on geographical contiguity, name… ▽ More This study introduced the use of Graph Neural Network (GNN) for predicting the weather and weekday of a day in London, from the dataset of Santander Cycles bike-sharing system as a graph classification task. The proposed GNN models newly introduced (i) a concatenation operator of graph features with trained node embeddings and (ii) a graph coarsening operator based on geographical contiguity, namely "Spatial Graph Coarsening". With the node features of land-use characteristics and number of households around the bike stations and graph features of temperatures in the city, our proposed models outperformed the baseline model in cross-entropy loss and accuracy of the validation dataset. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2211.07283 [pdf, other]

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

Abstract: Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to acc… ▽ More Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that we were able to obtain better losses in the first few training epochs with SNIPER, and that the final SNIPER-trained models outperformed constant-sparsity models and edged out dense models, with negligible difference in training time. △ Less

Submitted 1 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2209.10890 [pdf, other]

doi 10.21437/Interspeech.2022-10626

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman

Abstract: Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model… ▽ More Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model complexity? We compare a Tacotron2 baseline and the results of applying five techniques. We then evaluate the performance via the factors of naturalness, intelligibility and prosody, while reporting model size and training time. Complementary to prior research, we find that pruning before or during training can achieve similar performance to pruning after training and can be trained much faster, while removing entire neurons degrades performance much more than removing parameters. To our best knowledge, this is the first work that compares sparsity paradigms in text-to-speech synthesis. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Journal ref: Interspeech 2022, 823-827 (2022)

arXiv:2209.01766 [pdf, ps, other]

Exploring the Verifiability of Code Generated by GitHub Copilot

Authors: Dakota Wong, Austin Kothig, Patrick Lam

Abstract: GitHub's Copilot generates code quickly. We investigate whether it generates good code. Our approach is to identify a set of problems, ask Copilot to generate solutions, and attempt to formally verify these solutions with Dafny. Our formal verification is with respect to hand-crafted specifications. We have carried out this process on 6 problems and succeeded in formally verifying 4 of the created… ▽ More GitHub's Copilot generates code quickly. We investigate whether it generates good code. Our approach is to identify a set of problems, ask Copilot to generate solutions, and attempt to formally verify these solutions with Dafny. Our formal verification is with respect to hand-crafted specifications. We have carried out this process on 6 problems and succeeded in formally verifying 4 of the created solutions. We found evidence which corroborates the current consensus in the literature: Copilot is a powerful tool; however, it should not be "flying the plane" by itself. △ Less

Submitted 27 October, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: HATRA workshop at SPLASH 2022

arXiv:2111.14973 [pdf, other]

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

Authors: Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivastava, Khaled S. Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi Pang Lam, Dragomir Anguelov, Benjamin Sapp

Abstract: Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model th… ▽ More Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many design choices. The first key design difference is a departure from dense image-based encoding of the input world state in favor of a sparse encoding of heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe road features, and raw agent state information directly (e.g., position, velocity, acceleration). We propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and output aggregation techniques -- common in other ML domains -- and find effective variants for our probabilistic multimodal output representation. We perform an extensive ablation on these design choices, and show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion Prediction Challenge. △ Less

Submitted 21 December, 2021; v1 submitted 29 November, 2021; originally announced November 2021.

arXiv:2108.06384 [pdf, other]

Finding Representative Interpretations on Convolutional Neural Networks

Authors: Peter Cho-Ho Lam, Lingyang Chu, Maxim Torgonskiy, Jian Pei, Yong Zhang, Lanjun Wang

Abstract: Interpreting the decision logic behind effective deep convolutional neural networks (CNN) on images complements the success of deep learning models. However, the existing methods can only interpret some specific decision logic on individual or a small number of images. To facilitate human understandability and generalization ability, it is important to develop representative interpretations that i… ▽ More Interpreting the decision logic behind effective deep convolutional neural networks (CNN) on images complements the success of deep learning models. However, the existing methods can only interpret some specific decision logic on individual or a small number of images. To facilitate human understandability and generalization ability, it is important to develop representative interpretations that interpret common decision logics of a CNN on a large group of similar images, which reveal the common semantics data contributes to many closely related predictions. In this paper, we develop a novel unsupervised approach to produce a highly representative interpretation for a large number of similar images. We formulate the problem of finding representative interpretations as a co-clustering problem, and convert it into a submodular cost submodular cover problem based on a sample of the linear decision boundaries of a CNN. We also present a visualization and similarity ranking method. Our extensive experiments demonstrate the excellent performance of our method. △ Less

Submitted 20 August, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV 2021 (http://iccv2021.thecvf.com/home) | A python notebook of the proposed method can be found in https://marketplace.huaweicloud.com/markets/aihub/notebook/detail/?id=8e92ea6c-2f4a-4cff-a89f-bf2905bb7ac0

arXiv:2107.04086 [pdf, other]

Robust Counterfactual Explanations on Graph Neural Networks

Authors: Mohit Bajaj, Lingyang Chu, Zi Yu Xue, Jian Pei, Lanjun Wang, Peter Cho-Ho Lam, Yong Zhang

Abstract: Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing th… ▽ More Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing the correlation for a single input can easily overfit noise. Moreover, they do not align well with human intuition because removing an identified subgraph from an input graph does not necessarily change the prediction result. In this paper, we propose a novel method to generate robust counterfactual explanations on GNNs by explicitly modelling the common decision logic of GNNs on similar input graphs. Our explanations are naturally robust to noise because they are produced from the common decision boundaries of a GNN that govern the predictions of many similar input graphs. The explanations also align well with human intuition because removing the set of edges identified by an explanation from the input graph changes the prediction significantly. Exhaustive experiments on many public datasets demonstrate the superior performance of our method. △ Less

Submitted 12 July, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2105.02866 [pdf, other]

Membership Inference Attacks on Deep Regression Models for Neuroimaging

Authors: Umang Gupta, Dimitris Stripelis, Pradeep K. Lam, Paul M. Thompson, José Luis Ambite, Greg Ver Steeg

Abstract: Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing para… ▽ More Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing parameters during the training process. However, we demonstrate that allowing access to parameters may leak private information even if data is never directly shared. In particular, we show that it is possible to infer if a sample was used to train the model given only access to the model prediction (black-box) or access to the model itself (white-box) and some leaked samples from the training data distribution. Such attacks are commonly referred to as Membership Inference attacks. We show realistic Membership Inference attacks on deep learning models trained for 3D neuroimaging tasks in a centralized as well as decentralized setup. We demonstrate feasible attacks on brain age prediction models (deep learning models that predict a person's age from their brain MRI scan). We correctly identified whether an MRI scan was used in model training with a 60% to over 80% success rate depending on model complexity and security assumptions. △ Less

Submitted 3 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: To appear at Medical Imaging with Deep Learning 2021 (MIDL 2021)

arXiv:2102.08440 [pdf, other]

Scaling Neuroscience Research using Federated Learning

Authors: Dimitris Stripelis, Jose Luis Ambite, Pradeep Lam, Paul Thompson

Abstract: The amount of biomedical data continues to grow rapidly. However, the ability to analyze these data is limited due to privacy and regulatory concerns. Machine learning approaches that require data to be copied to a single location are hampered by the challenges of data sharing. Federated Learning is a promising approach to learn a joint model over data silos. This architecture does not share any s… ▽ More The amount of biomedical data continues to grow rapidly. However, the ability to analyze these data is limited due to privacy and regulatory concerns. Machine learning approaches that require data to be copied to a single location are hampered by the challenges of data sharing. Federated Learning is a promising approach to learn a joint model over data silos. This architecture does not share any subject data across sites, only aggregated parameters, often in encrypted environments, thus satisfying privacy and regulatory requirements. Here, we describe our Federated Learning architecture and training policies. We demonstrate our approach on a brain age prediction model on structural MRI scans distributed across multiple sites with diverse amounts of data and subject (age) distributions. In these heterogeneous environments, our Semi-Synchronous protocol provides faster convergence. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021)

MSC Class: 68T07 ACM Class: I.5.4

arXiv:2102.04438 [pdf, other]

Improved Brain Age Estimation with Slice-based Set Networks

Authors: Umang Gupta, Pradeep K. Lam, Greg Ver Steeg, Paul M. Thompson

Abstract: Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer… ▽ More Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer parameters and may require fewer samples to achieve comparable performance. In this paper, we propose a new architecture for BrainAGE prediction. The proposed architecture works by encoding each 2D slice in an MRI with a deep 2D-CNN model. Next, it combines the information from these 2D-slice encodings using set networks or permutation invariant layers. Experiments on the BrainAGE prediction problem, using the UK Biobank dataset, showed that the model with the permutation invariant layers trains faster and provides better predictions compared to other state-of-the-art approaches. △ Less

Submitted 9 February, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021). Code is available at https://git.io/JtazG

arXiv:2008.07069 [pdf, other]

doi 10.1145/3426428.3426922

Putting the Semantics into Semantic Versioning

Authors: Patrick Lam, Jens Dietrich, David J. Pearce

Abstract: The long-standing aspiration for software reuse has made astonishing strides in the past few years. Many modern software development ecosystems now come with rich sets of publicly-available components contributed by the community. Downstream developers can leverage these upstream components, boosting their productivity. However, components evolve at their own pace. This imposes obligations on an… ▽ More The long-standing aspiration for software reuse has made astonishing strides in the past few years. Many modern software development ecosystems now come with rich sets of publicly-available components contributed by the community. Downstream developers can leverage these upstream components, boosting their productivity. However, components evolve at their own pace. This imposes obligations on and yields benefits for downstream developers, especially since changes can be breaking, requiring additional downstream work to adapt to. Upgrading too late leaves downstream vulnerable to security issues and missing out on useful improvements; upgrading too early results in excess work. Semantic versioning has been proposed as an elegant mechanism to communicate levels of compatibility, enabling downstream developers to automate dependency upgrades. While it is questionable whether a version number can adequately characterize version compatibility in general, we argue that developers would greatly benefit from tools such as semantic version calculators to help them upgrade safely. The time is now for the research community to develop such tools: large component ecosystems exist and are accessible, component interactions have become observable through automated builds, and recent advances in program analysis make the development of relevant tools feasible. In particular, contracts (both traditional and lightweight) are a promising input to semantic versioning calculators, which can suggest whether an upgrade is likely to be safe. △ Less

Submitted 16 August, 2020; originally announced August 2020.

Comments: to be published as Onward! Essays 2020

ACM Class: D.2.4; D.2.7; D.2.9; D.2.12; D.2.13

arXiv:1912.07224 [pdf, other]

Domain Knowledge Based Brain Tumor Segmentation and Overall Survival Prediction

Authors: Xiaoqing Guo, Chen Yang, Pak Lun Lam, Peter Y. M. Woo, Yixuan Yuan

Abstract: Automatically segmenting sub-regions of gliomas (necrosis, edema and enhancing tumor) and accurately predicting overall survival (OS) time from multimodal MRI sequences have important clinical significance in diagnosis, prognosis and treatment of gliomas. However, due to the high degree variations of heterogeneous appearance and individual physical state, the segmentation of sub-regions and OS pre… ▽ More Automatically segmenting sub-regions of gliomas (necrosis, edema and enhancing tumor) and accurately predicting overall survival (OS) time from multimodal MRI sequences have important clinical significance in diagnosis, prognosis and treatment of gliomas. However, due to the high degree variations of heterogeneous appearance and individual physical state, the segmentation of sub-regions and OS prediction are very challenging. To deal with these challenges, we utilize a 3D dilated multi-fiber network (DMFNet) with weighted dice loss for brain tumor segmentation, which incorporates prior volume statistic knowledge and obtains a balance between small and large objects in MRI scans. For OS prediction, we propose a DenseNet based 3D neural network with position encoding convolutional layer (PECL) to extract meaningful features from T1 contrast MRI, T2 MRI and previously segmented subregions. Both labeled data and unlabeled data are utilized to prevent over-fitting for semi-supervised learning. Those learned deep features along with handcrafted features (such as ages, volume of tumor) and position encoding segmentation features are fed to a Gradient Boosting Decision Tree (GBDT) to predict a specific OS day △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: 11 pages, 5 figures, BrainLes 2019

arXiv:1905.02342 [pdf, other]

Machine Learning Cryptanalysis of a Quantum Random Number Generator

Authors: Nhan Duy Truong, Jing Yan Haw, Syed Muhamad Assad, Ping Koy Lam, Omid Kavehei

Abstract: Random number generators (RNGs) that are crucial for cryptographic applications have been the subject of adversarial attacks. These attacks exploit environmental information to predict generated random numbers that are supposed to be truly random and unpredictable. Though quantum random number generators (QRNGs) are based on the intrinsic indeterministic nature of quantum properties, the presence… ▽ More Random number generators (RNGs) that are crucial for cryptographic applications have been the subject of adversarial attacks. These attacks exploit environmental information to predict generated random numbers that are supposed to be truly random and unpredictable. Though quantum random number generators (QRNGs) are based on the intrinsic indeterministic nature of quantum properties, the presence of classical noise in the measurement process compromises the integrity of a QRNG. In this paper, we develop a predictive machine learning (ML) analysis to investigate the impact of deterministic classical noise in different stages of an optical continuous variable QRNG. Our ML model successfully detects inherent correlations when the deterministic noise sources are prominent. After appropriate filtering and randomness extraction processes are introduced, our QRNG system, in turn, demonstrates its robustness against ML. We further demonstrate the robustness of our ML approach by applying it to uniformly distributed random numbers from the QRNG and a congruential RNG. Hence, our result shows that ML has potentials in benchmarking the quality of RNG devices. △ Less

Submitted 12 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: Accepted for publication in IEEE Transactions on Information Forensics and Security. Related code is at https://github.com/Nano-Neuro-Research-Lab/Machine-Learning-Cryptanalysis-of-a-Quantum-Random-Number-Generator

arXiv:1902.11114 [pdf]

Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey

Authors: Md Moniruzzaman, S. M. Shamsul Islam, Paul Lavery, Mohammed Bennamoun, C. Peng Lam

Abstract: Monitoring underwater habitats is a vital part of observing the condition of the environment. The detection and mapping of underwater vegetation, especially seagrass has drawn the attention of the research community as early as the nineteen eighties. Initially, this monitoring relied on in situ observation by experts. Later, advances in remote-sensing technology, satellite-monitoring techniques an… ▽ More Monitoring underwater habitats is a vital part of observing the condition of the environment. The detection and mapping of underwater vegetation, especially seagrass has drawn the attention of the research community as early as the nineteen eighties. Initially, this monitoring relied on in situ observation by experts. Later, advances in remote-sensing technology, satellite-monitoring techniques and, digital photo- and video-based techniques opened a window to quicker, cheaper, and, potentially, more accurate seagrass-monitoring methods. So far, for seagrass detection and mapping, digital images from airborne cameras, spectral images from satellites, acoustic image data using underwater sonar technology, and digital underwater photo and video images have been used to map the seagrass meadows or monitor their condition. In this article, we have reviewed the recent approaches to seagrass detection and mapping to understand the gaps of the present approaches and determine further research scope to monitor the ocean health more easily. We have identified four classes of approach to seagrass mapping and assessment: still image-, video data-, acoustic image-, and spectral image data-based techniques. We have critically analysed the surveyed approaches and found the research gaps including the need for quick, cheap and effective imaging techniques robust to depth, turbidity, location and weather conditions, fully automated seagrass detectors that can work in real-time, accurate techniques for estimating the seagrass density, and the availability of high computation facilities for processing large scale data. For addressing these gaps, future research should focus on developing cheaper image and video data collection techniques, deep learning based automatic annotation and classification, and real-time percentage-cover calculation. △ Less

Submitted 1 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: 36 pages, 14 figures, 8tables

arXiv:1512.08413 [pdf, ps, other]

Outlier Detection In Large-scale Traffic Data By Naïve Bayes Method and Gaussian Mixture Model Method

Authors: Philip Lam, Lili Wang, Henry Y. T. Ngan, Nelson H. C. Yung, Anthony G. O. Yeh

Abstract: It is meaningful to detect outliers in traffic data for traffic management. However, this is a massive task for people from large-scale database to distinguish outliers. In this paper, we present two methods: Kernel Smoothing Naïve Bayes (NB) method and Gaussian Mixture Model (GMM) method to automatically detect any hardware errors as well as abnormal traffic events in traffic data collected at a… ▽ More It is meaningful to detect outliers in traffic data for traffic management. However, this is a massive task for people from large-scale database to distinguish outliers. In this paper, we present two methods: Kernel Smoothing Naïve Bayes (NB) method and Gaussian Mixture Model (GMM) method to automatically detect any hardware errors as well as abnormal traffic events in traffic data collected at a four-arm junction in Hong Kong. Traffic data was recorded in a video format, and converted to spatial-temporal (ST) traffic signals by statistics. The ST signals are then projected to a two-dimensional (2D) (x,y)-coordinate plane by Principal Component Analysis (PCA) for dimension reduction. We assume that inlier data are normal distributed. As such, the NB and GMM methods are successfully applied in outlier detection (OD) for traffic data. The kernel smooth NB method assumes the existence of kernel distributions in traffic data and uses Bayes' Theorem to perform OD. In contrast, the GMM method believes the traffic data is formed by the mixture of Gaussian distributions and exploits confidence region for OD. This paper would address the modeling of each method and evaluate their respective performances. Experimental results show that the NB algorithm with Triangle kernel and GMM method achieve up to 93.78% and 94.50% accuracies, respectively. △ Less

Submitted 28 December, 2015; originally announced December 2015.

Comments: 6 pages, 5 figures

arXiv:0704.2475 [pdf]

Physical Layer Network Coding

Authors: Zhang Shengli, Soung-Chang Liew, Patrick P. K. Lam

Abstract: A main distinguishing feature of a wireless network compared with a wired network is its broadcast nature, in which the signal transmitted by a node may reach several other nodes, and a node may receive signals from several other nodes simultaneously. Rather than a blessing, this feature is treated more as an interference-inducing nuisance in most wireless networks today (e.g., IEEE 802.11). Thi… ▽ More A main distinguishing feature of a wireless network compared with a wired network is its broadcast nature, in which the signal transmitted by a node may reach several other nodes, and a node may receive signals from several other nodes simultaneously. Rather than a blessing, this feature is treated more as an interference-inducing nuisance in most wireless networks today (e.g., IEEE 802.11). This paper shows that the concept of network coding can be applied at the physical layer to turn the broadcast property into a capacity-boosting advantage in wireless ad hoc networks. Specifically, we propose a physical-layer network coding (PNC) scheme to coordinate transmissions among nodes. In contrast to straightforward network coding which performs coding arithmetic on digital bit streams after they have been received, PNC makes use of the additive nature of simultaneously arriving electromagnetic (EM) waves for equivalent coding operation. And in doing so, PNC can potentially achieve 100% and 50% throughput increases compared with traditional transmission and straightforward network coding, respectively, in multi-hop networks. More specifically, the information-theoretic capacity of PNC is almost double that of traditional transmission in the SNR region of practical interest (higher than 0dB). We believe this is a first paper that ventures into EM-wave-based network coding at the physical layer and demonstrates its potential for boosting network capacity. △ Less

Submitted 19 April, 2007; originally announced April 2007.

arXiv:cs/0408013 [pdf, ps, other]

Roles Are Really Great!

Authors: Viktor Kuncak, Patrick Lam, Martin Rinard

Abstract: We present a new role system for specifying changing referencing relationships of heap objects. The role of an object depends, in large part, on its aliasing relationships with other objects, with the role of each object changing as its aliasing relationships change. Roles therefore capture important object and data structure properties and provide useful information about how the actions of the… ▽ More We present a new role system for specifying changing referencing relationships of heap objects. The role of an object depends, in large part, on its aliasing relationships with other objects, with the role of each object changing as its aliasing relationships change. Roles therefore capture important object and data structure properties and provide useful information about how the actions of the program interact with these properties. Our role system enables the programmer to specify the legal aliasing relationships that define the set of roles that objects may play, the roles of procedure parameters and object fields, and the role changes that procedures perform while manipulating objects. We present an interprocedural, compositional, and context-sensitive role analysis algorithm that verifies that a program respects the role constraints. △ Less

Submitted 4 August, 2004; originally announced August 2004.

Comments: 29 pages. A version appeared in POPL 2002

Report number: MIT CSAIL 822 ACM Class: D.2.4; D.3.1; D.3.3; F.3.1; F.3.2

Showing 1–23 of 23 results for author: Lam, P