Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: Lam, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06827  [pdf, other

    eess.AS cs.LG

    PRESENT: Zero-Shot Text-to-Prosody Control

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2407.03110  [pdf, other

    cs.SD cs.AI eess.AS

    A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

    Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More

    Submitted 2 May, 2024; originally announced July 2024.

  3. arXiv:2407.01777  [pdf, other

    cs.SD cs.AI eess.AS

    Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

    Authors: Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler

    Abstract: In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and dis… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2403.05146  [pdf, other

    cs.CV

    Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy

    Authors: Yuelin Zhang, Wanquan Yan, Kim Yan, Chun Ping Lam, Yufu Qiu, Pengyu Zheng, Raymond Shing-Yan Tang, Shing Shin Cheng

    Abstract: Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackl… ▽ More

    Submitted 20 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2403.00379  [pdf, other

    eess.AS cs.SD

    The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

    Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  6. arXiv:2401.15854  [pdf, other

    cs.CL

    LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts

    Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Michael Seidl, Medina Andresel, Alexander Schindler

    Abstract: The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic inf… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: Submitted to FedCSIS 2024

  7. arXiv:2308.16122  [pdf, other

    cs.LG

    Spatial Graph Coarsening: Weather and Weekday Prediction with London's Bike-Sharing Service using GNN

    Authors: Yuta Sato, Pak Hei Lam, Shruti Gupta, Fareesah Hussain

    Abstract: This study introduced the use of Graph Neural Network (GNN) for predicting the weather and weekday of a day in London, from the dataset of Santander Cycles bike-sharing system as a graph classification task. The proposed GNN models newly introduced (i) a concatenation operator of graph features with trained node embeddings and (ii) a graph coarsening operator based on geographical contiguity, name… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  8. arXiv:2211.07283  [pdf, other

    eess.AS cs.SD

    SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to acc… ▽ More

    Submitted 1 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

  9. EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman

    Abstract: Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Journal ref: Interspeech 2022, 823-827 (2022)

  10. arXiv:2209.01766  [pdf, ps, other

    cs.SE cs.PL

    Exploring the Verifiability of Code Generated by GitHub Copilot

    Authors: Dakota Wong, Austin Kothig, Patrick Lam

    Abstract: GitHub's Copilot generates code quickly. We investigate whether it generates good code. Our approach is to identify a set of problems, ask Copilot to generate solutions, and attempt to formally verify these solutions with Dafny. Our formal verification is with respect to hand-crafted specifications. We have carried out this process on 6 problems and succeeded in formally verifying 4 of the created… ▽ More

    Submitted 27 October, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

    Comments: HATRA workshop at SPLASH 2022

  11. arXiv:2111.14973  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

    Authors: Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivastava, Khaled S. Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi Pang Lam, Dragomir Anguelov, Benjamin Sapp

    Abstract: Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model th… ▽ More

    Submitted 21 December, 2021; v1 submitted 29 November, 2021; originally announced November 2021.

  12. arXiv:2108.06384  [pdf, other

    cs.CV

    Finding Representative Interpretations on Convolutional Neural Networks

    Authors: Peter Cho-Ho Lam, Lingyang Chu, Maxim Torgonskiy, Jian Pei, Yong Zhang, Lanjun Wang

    Abstract: Interpreting the decision logic behind effective deep convolutional neural networks (CNN) on images complements the success of deep learning models. However, the existing methods can only interpret some specific decision logic on individual or a small number of images. To facilitate human understandability and generalization ability, it is important to develop representative interpretations that i… ▽ More

    Submitted 20 August, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021 (http://iccv2021.thecvf.com/home) | A python notebook of the proposed method can be found in https://marketplace.huaweicloud.com/markets/aihub/notebook/detail/?id=8e92ea6c-2f4a-4cff-a89f-bf2905bb7ac0

  13. arXiv:2107.04086  [pdf, other

    cs.LG cs.AI

    Robust Counterfactual Explanations on Graph Neural Networks

    Authors: Mohit Bajaj, Lingyang Chu, Zi Yu Xue, Jian Pei, Lanjun Wang, Peter Cho-Ho Lam, Yong Zhang

    Abstract: Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing th… ▽ More

    Submitted 12 July, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

  14. arXiv:2105.02866  [pdf, other

    q-bio.QM cs.CR cs.LG eess.IV

    Membership Inference Attacks on Deep Regression Models for Neuroimaging

    Authors: Umang Gupta, Dimitris Stripelis, Pradeep K. Lam, Paul M. Thompson, José Luis Ambite, Greg Ver Steeg

    Abstract: Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing para… ▽ More

    Submitted 3 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear at Medical Imaging with Deep Learning 2021 (MIDL 2021)

  15. arXiv:2102.08440  [pdf, other

    cs.LG cs.DC

    Scaling Neuroscience Research using Federated Learning

    Authors: Dimitris Stripelis, Jose Luis Ambite, Pradeep Lam, Paul Thompson

    Abstract: The amount of biomedical data continues to grow rapidly. However, the ability to analyze these data is limited due to privacy and regulatory concerns. Machine learning approaches that require data to be copied to a single location are hampered by the challenges of data sharing. Federated Learning is a promising approach to learn a joint model over data silos. This architecture does not share any s… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021)

    MSC Class: 68T07 ACM Class: I.5.4

  16. arXiv:2102.04438  [pdf, other

    eess.IV cs.LG q-bio.QM

    Improved Brain Age Estimation with Slice-based Set Networks

    Authors: Umang Gupta, Pradeep K. Lam, Greg Ver Steeg, Paul M. Thompson

    Abstract: Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer… ▽ More

    Submitted 9 February, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021). Code is available at https://git.io/JtazG

  17. Putting the Semantics into Semantic Versioning

    Authors: Patrick Lam, Jens Dietrich, David J. Pearce

    Abstract: The long-standing aspiration for software reuse has made astonishing strides in the past few years. Many modern software development ecosystems now come with rich sets of publicly-available components contributed by the community. Downstream developers can leverage these upstream components, boosting their productivity. However, components evolve at their own pace. This imposes obligations on an… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: to be published as Onward! Essays 2020

    ACM Class: D.2.4; D.2.7; D.2.9; D.2.12; D.2.13

  18. arXiv:1912.07224  [pdf, other

    eess.IV cs.CV

    Domain Knowledge Based Brain Tumor Segmentation and Overall Survival Prediction

    Authors: Xiaoqing Guo, Chen Yang, Pak Lun Lam, Peter Y. M. Woo, Yixuan Yuan

    Abstract: Automatically segmenting sub-regions of gliomas (necrosis, edema and enhancing tumor) and accurately predicting overall survival (OS) time from multimodal MRI sequences have important clinical significance in diagnosis, prognosis and treatment of gliomas. However, due to the high degree variations of heterogeneous appearance and individual physical state, the segmentation of sub-regions and OS pre… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: 11 pages, 5 figures, BrainLes 2019

  19. arXiv:1905.02342  [pdf, other

    cs.LG cs.CR quant-ph stat.ML

    Machine Learning Cryptanalysis of a Quantum Random Number Generator

    Authors: Nhan Duy Truong, Jing Yan Haw, Syed Muhamad Assad, Ping Koy Lam, Omid Kavehei

    Abstract: Random number generators (RNGs) that are crucial for cryptographic applications have been the subject of adversarial attacks. These attacks exploit environmental information to predict generated random numbers that are supposed to be truly random and unpredictable. Though quantum random number generators (QRNGs) are based on the intrinsic indeterministic nature of quantum properties, the presence… ▽ More

    Submitted 12 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Accepted for publication in IEEE Transactions on Information Forensics and Security. Related code is at https://github.com/Nano-Neuro-Research-Lab/Machine-Learning-Cryptanalysis-of-a-Quantum-Random-Number-Generator

  20. arXiv:1902.11114  [pdf

    cs.CV

    Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey

    Authors: Md Moniruzzaman, S. M. Shamsul Islam, Paul Lavery, Mohammed Bennamoun, C. Peng Lam

    Abstract: Monitoring underwater habitats is a vital part of observing the condition of the environment. The detection and mapping of underwater vegetation, especially seagrass has drawn the attention of the research community as early as the nineteen eighties. Initially, this monitoring relied on in situ observation by experts. Later, advances in remote-sensing technology, satellite-monitoring techniques an… ▽ More

    Submitted 1 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: 36 pages, 14 figures, 8tables

  21. arXiv:1512.08413  [pdf, ps, other

    cs.CV

    Outlier Detection In Large-scale Traffic Data By Naïve Bayes Method and Gaussian Mixture Model Method

    Authors: Philip Lam, Lili Wang, Henry Y. T. Ngan, Nelson H. C. Yung, Anthony G. O. Yeh

    Abstract: It is meaningful to detect outliers in traffic data for traffic management. However, this is a massive task for people from large-scale database to distinguish outliers. In this paper, we present two methods: Kernel Smoothing Naïve Bayes (NB) method and Gaussian Mixture Model (GMM) method to automatically detect any hardware errors as well as abnormal traffic events in traffic data collected at a… ▽ More

    Submitted 28 December, 2015; originally announced December 2015.

    Comments: 6 pages, 5 figures

  22. arXiv:0704.2475  [pdf

    cs.IT

    Physical Layer Network Coding

    Authors: Zhang Shengli, Soung-Chang Liew, Patrick P. K. Lam

    Abstract: A main distinguishing feature of a wireless network compared with a wired network is its broadcast nature, in which the signal transmitted by a node may reach several other nodes, and a node may receive signals from several other nodes simultaneously. Rather than a blessing, this feature is treated more as an interference-inducing nuisance in most wireless networks today (e.g., IEEE 802.11). Thi… ▽ More

    Submitted 19 April, 2007; originally announced April 2007.

  23. arXiv:cs/0408013  [pdf, ps, other

    cs.PL cs.SE

    Roles Are Really Great!

    Authors: Viktor Kuncak, Patrick Lam, Martin Rinard

    Abstract: We present a new role system for specifying changing referencing relationships of heap objects. The role of an object depends, in large part, on its aliasing relationships with other objects, with the role of each object changing as its aliasing relationships change. Roles therefore capture important object and data structure properties and provide useful information about how the actions of the… ▽ More

    Submitted 4 August, 2004; originally announced August 2004.

    Comments: 29 pages. A version appeared in POPL 2002

    Report number: MIT CSAIL 822 ACM Class: D.2.4; D.3.1; D.3.3; F.3.1; F.3.2