Search | arXiv e-print repository

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Authors: Lionel Z. Wang, Yiming Ma, Renfei Gao, Beichen Guo, Zhuoran Li, Han Zhu, Wenqi Fan, Zexin Lu, Ka Chung Ng

Abstract: The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psyc… ▽ More The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.02709 [pdf, other]

Enhancing Medical Learning and Reasoning Systems: A Boxology-Based Comparative Analysis of Design Patterns

Authors: Chi Him Ng

Abstract: This study analyzes hybrid AI systems' design patterns and their effectiveness in clinical decision-making using the boxology framework. It categorizes and copares various architectures combining machine learning and rule-based reasoning to provide insights into their structural foundations and healthcare applications. Addressing two main questions, how to categorize these systems againts establis… ▽ More This study analyzes hybrid AI systems' design patterns and their effectiveness in clinical decision-making using the boxology framework. It categorizes and copares various architectures combining machine learning and rule-based reasoning to provide insights into their structural foundations and healthcare applications. Addressing two main questions, how to categorize these systems againts established design patterns and how to extract insights through comparative analysis, the study uses design patterns from software engineering to understand and optimize healthcare AI systems. Boxology helps identify commonalities and create reusable solutions, enhancing these systems' scalability, reliability, and performance. Five primary architectures are examined: REML, MLRB, RBML, RMLT, and PERML. Each has unique strengths and weaknesses, highlighting the need for tailored approaches in clinical tasks. REML excels in high-accuracy prediction for datasets with limited data; MLRB in handling large datasets and complex data integration; RBML in explainability and trustworthiness; RMLT in managing high-dimensional data; and PERML, though limited in analysis, shows promise in urgent care scenarios. The study introduces four new patterns, creates five abstract categorization patterns, and refines those five further to specific systems. These contributions enhance Boxlogy's taxonomical organization and offer novel approaches to integrating expert knowledge with machine learning. Boxology's structured, modular apporach offers significant advantages in developing and analyzing hybrid AI systems, revealing commonalities, and promoting reusable solutions. In conclusion, this study underscores hybrid AI systems' crucial role in advancing healthcare and Boxology's potential to drive further innovation in AI integration, ultimately improving clinical decision support and patient outcomes. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2407.11773 [pdf, other]

Educational Personalized Learning Path Planning with Large Language Models

Authors: Chee Ng, Yuen Fung

Abstract: Educational Personalized Learning Path Planning (PLPP) aims to tailor learning experiences to individual learners' needs, enhancing learning efficiency and engagement. Despite its potential, traditional PLPP systems often lack adaptability, interactivity, and transparency. This paper proposes a novel approach integrating Large Language Models (LLMs) with prompt engineering to address these challen… ▽ More Educational Personalized Learning Path Planning (PLPP) aims to tailor learning experiences to individual learners' needs, enhancing learning efficiency and engagement. Despite its potential, traditional PLPP systems often lack adaptability, interactivity, and transparency. This paper proposes a novel approach integrating Large Language Models (LLMs) with prompt engineering to address these challenges. By designing prompts that incorporate learner-specific information, our method guides LLMs like LLama-2-70B and GPT-4 to generate personalized, coherent, and pedagogically sound learning paths. We conducted experiments comparing our method with a baseline approach across various metrics, including accuracy, user satisfaction, and the quality of learning paths. The results show significant improvements in all areas, particularly with GPT-4, demonstrating the effectiveness of prompt engineering in enhancing PLPP. Additional long-term impact analysis further validates our method's potential to improve learner performance and retention. This research highlights the promise of LLMs and prompt engineering in advancing personalized education. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 6 pages

arXiv:2406.13434 [pdf, other]

Tactile Aware Dynamic Obstacle Avoidance in Crowded Environment with Deep Reinforcement Learning

Authors: Yung Chuen Ng, Qi Wen, Lim, Chun Ye Tan, Zhen Hao Gan, Meng Yee, Chuah

Abstract: Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile laye… ▽ More Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile layer to complement the use of a LiDAR for the purpose of inducing awareness of contact with any surrounding objects within immediate vicinity of a mobile robot undetected by LiDARs. By incorporating the tactile layer, the robot can take more risks in its movements and possibly go right up to an obstacle or wall, and gently squeeze past it. In addition, we built up a simulation platform via Pybullet which integrates Robot Operating System (ROS) and reinforcement learning (RL) together. A touch-aware neural network model was trained on it to create an RL-based local path planner for dynamic obstacle avoidance. Our proposed method was demonstrated successfully on an omni-directional mobile robot who was able to navigate in a crowded environment with high agility and versatility in movement, while not being overly sensitive to nearby obstacles-not-in-contact. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.11622 [pdf, other]

Continuous Predictive Modeling of Clinical Notes and ICD Codes in Patient Health Records

Authors: Mireia Hernandez Caralt, Clarence Boon Liang Ng, Marek Rei

Abstract: Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the pot… ▽ More Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the potential of predicting these codes for the whole patient stay at different time points during their stay, even before they are officially assigned by clinicians. The development of methods to predict diagnoses and treatments earlier in advance could open opportunities for predictive medicine, such as identifying disease risks sooner, suggesting treatments, and optimizing resource allocation. Our experiments show that predictions regarding final ICD codes can be made already two days after admission and we propose a custom model that improves performance on this early prediction task. △ Less

Submitted 5 July, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

ACM Class: I.2.7; J.3

arXiv:2405.04165 [pdf, other]

LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection

Authors: Jasraj Singh, Fang Liu, Hong Xu, Bee Chin Ng, Wei Zhang

Abstract: Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with… ▽ More Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages

arXiv:2405.01842 [pdf, ps, other]

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract: To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native ann… ▽ More To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.14135 [pdf, other]

Text in the Dark: Extremely Low-Light Text Image Enhancement

Authors: Che-Tsung Lin, Chun Chet Ng, Zhi Qin Tan, Wan Jun Nah, Xinyu Wang, Jie Long Kew, Pohao Hsu, Shang Hong Lai, Chee Seng Chan, Christopher Zach

Abstract: Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text t… ▽ More Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at https://github.com/chunchet-ng/Text-in-the-Dark. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.06224 [pdf, other]

Low-Cost Generation and Evaluation of Dictionary Example Sentences

Authors: Bill Cai, Clarence Boon Liang Ng, Daniel Tan, Shelvia Hotama

Abstract: Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundationa… ▽ More Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models present the opportunity to create low-cost, zero-shot methods for the generation and evaluation of dictionary example sentences. We introduce a new automatic evaluation metric called OxfordEval that measures the win-rate of generated sentences against existing Oxford Dictionary sentences. OxfordEval shows high alignment with human judgments, enabling large-scale automated quality evaluation. We experiment with various LLMs and configurations to generate dictionary sentences across word classes. We complement this with a novel approach of using masked language models to identify and select sentences that best exemplify word meaning. The eventual model, FM-MLM, achieves over 85.1% win rate against Oxford baseline sentences according to OxfordEval, compared to 39.8% win rate for prior model-generated sentences. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2312.11560 [pdf, other]

Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks

Authors: Jiachuan Wang, Shimin Di, Lei Chen, Charles Wang Wai Ng

Abstract: Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have… ▽ More Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity. △ Less

Submitted 19 June, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: 16 pages, 5 figures, KDD2024

arXiv:2311.15530 [pdf, other]

doi 10.1145/3589321

SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation

Authors: Jia Li, Yanyan Shen, Lei Chen, Charles Wang Wai NG

Abstract: The acquisition of accurate rainfall distribution in space is an important task in hydrological analysis and natural disaster pre-warning. However, it is impossible to install rain gauges on every corner. Spatial interpolation is a common way to infer rainfall distribution based on available raingauge data. However, the existing works rely on some unrealistic pre-settings to capture spatial correl… ▽ More The acquisition of accurate rainfall distribution in space is an important task in hydrological analysis and natural disaster pre-warning. However, it is impossible to install rain gauges on every corner. Spatial interpolation is a common way to infer rainfall distribution based on available raingauge data. However, the existing works rely on some unrealistic pre-settings to capture spatial correlations, which limits their performance in real scenarios. To tackle this issue, we propose the SSIN, which is a novel data-driven self-supervised learning framework for rainfall spatial interpolation by mining latent spatial patterns from historical observation data. Inspired by the Cloze task and BERT, we fully consider the characteristics of spatial interpolation and design the SpaFormer model based on the Transformer architecture as the core of SSIN. Our main idea is: by constructing rich self-supervision signals via random masking, SpaFormer can learn informative embeddings for raw data and then adaptively model spatial correlations based on rainfall spatial context. Extensive experiments on two real-world raingauge datasets show that our method outperforms the state-of-the-art solutions. In addition, we take traffic spatial interpolation as another use case to further explore the performance of our method, and SpaFormer achieves the best performance on one large real-world traffic dataset, which further confirms the effectiveness and generality of our method. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: SIGMOD 2023 Data-intensive Applications (DIA) Track; Code is available at https://github.com/jlidw/SSIN

arXiv:2302.12666 [pdf, other]

Modelling Temporal Document Sequences for Clinical ICD Coding

Authors: Clarence Boon Liang Ng, Diogo Santos, Marek Rei

Abstract: Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical no… ▽ More Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note. While using all clinical notes increases the quantity of data substantially, superconvergence can be used to reduce training costs. We evaluate the model on the MIMIC-III dataset. Our model exceeds the prior state-of-the-art when using only discharge summaries as input, and achieves further performance improvements when all clinical notes are used as input. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2301.12670 [pdf, other]

doi 10.1038/s41550-022-01872-z

A deep-learning search for technosignatures of 820 nearby stars

Authors: Peter Xiangyuan Ma, Cherry Ng, Leandro Rizk, Steve Croft, Andrew P. V. Siemion, Bryan Brzycki, Daniel Czech, Jamie Drew, Vishal Gajjar, John Hoang, Howard Isaacson, Matt Lebofsky, David MacMahon, Imke de Pater, Danny C. Price, Sofia Z. Sheikh, S. Pete Worden

Abstract: The goal of the Search for Extraterrestrial Intelligence (SETI) is to quantify the prevalence of technological life beyond Earth via their "technosignatures". One theorized technosignature is narrowband Doppler drifting radio signals. The principal challenge in conducting SETI in the radio domain is developing a generalized technique to reject human radio frequency interference (RFI). Here, we pre… ▽ More The goal of the Search for Extraterrestrial Intelligence (SETI) is to quantify the prevalence of technological life beyond Earth via their "technosignatures". One theorized technosignature is narrowband Doppler drifting radio signals. The principal challenge in conducting SETI in the radio domain is developing a generalized technique to reject human radio frequency interference (RFI). Here, we present the most comprehensive deep-learning based technosignature search to date, returning 8 promising ETI signals of interest for re-observation as part of the Breakthrough Listen initiative. The search comprises 820 unique targets observed with the Robert C. Byrd Green Bank Telescope, totaling over 480, hr of on-sky data. We implement a novel beta-Convolutional Variational Autoencoder to identify technosignature candidates in a semi-unsupervised manner while keeping the false positive rate manageably low. This new approach presents itself as a leading solution in accelerating SETI and other transient research into the age of data-driven astronomy. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: 10 pages of main paper followed by 16 pages of methods; 17 figures total and 7 tables; published in Nature Astronomy

arXiv:2212.00581 [pdf]

An enhanced simulation-based multi-objective optimization approach with knowledge discovery for reconfigurable manufacturing systems

Authors: Carlos Alberto Barrera-Diaz, Amir Nourmohammdi, Henrik Smedberg, Tehseen Aslam, Amos H. C. Ng

Abstract: In today's uncertain and competitive market, where enterprises are subjected to increasingly shortened product life-cycles and frequent volume changes, reconfigurable manufacturing systems (RMS) applications play a significant role in the manufacturing industry's success. Despite the advantages offered by RMS, achieving a high-efficiency degree constitutes a challenging task for stakeholders and d… ▽ More In today's uncertain and competitive market, where enterprises are subjected to increasingly shortened product life-cycles and frequent volume changes, reconfigurable manufacturing systems (RMS) applications play a significant role in the manufacturing industry's success. Despite the advantages offered by RMS, achieving a high-efficiency degree constitutes a challenging task for stakeholders and decision-makers when they face the trade-off decisions inherent in these complex systems. This study addresses work tasks and resource allocations to workstations together with buffer capacity allocation in RMS. The aim is to simultaneously maximize throughput and minimize total buffer capacity under fluctuating production volumes and capacity changes while considering the stochastic behavior of the system. An enhanced simulation-based multi-objective optimization (SMO) approach with customized simulation and optimization components is proposed to address the abovementioned challenges. Apart from presenting the optimal solutions subject to volume and capacity changes, the proposed approach support decision-makers with discovered knowledge to further understand the RMS design. In particular, this study presents a problem-specific customized SMO combined with a novel flexible pattern mining method for optimizing RMS and conducting post-optimal analyzes. To this extent, this study demonstrates the benefits of applying SMO and knowledge discovery methods for fast decision-support and production planning of RMS. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.15322 [pdf, other]

Transductive Kernels for Gaussian Processes on Graphs

Authors: Yin-Cong Zhi, Felix L. Opolka, Yin Cheng Ng, Pietro Liò, Xiaowen Dong

Abstract: Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel de… ▽ More Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel defined this way has transductive properties, and this leads to improved ability to learn on fewer training points, as well as better handling of highly non-Euclidean data. We demonstrate these advantages using synthetic data where the distribution of the whole graph can inform the pattern of the labels. Finally, by utilizing a flexible polynomial of the graph Laplacian within the kernel, the model also performed effectively in semi-supervised classification on graphs of various levels of homophily. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.03057 [pdf, other]

Towards Green Metaverse Networking Technologies, Advancements and Future Directions

Authors: Siyue Zhang, Wei Yang Bryan Lim, Wei Chong Ng, Zehui Xiong, Dusit Niyato, Xuemin Sherman Shen, Chunyan Miao

Abstract: As the Metaverse is iteratively being defined, its potential to unleash the next wave of digital disruption and create real-life value becomes increasingly clear. With distinctive features of immersive experience, simultaneous interactivity, and user agency, the Metaverse has the capability to transform all walks of life. However, the enabling technologies of the Metaverse, i.e., digital twin, art… ▽ More As the Metaverse is iteratively being defined, its potential to unleash the next wave of digital disruption and create real-life value becomes increasingly clear. With distinctive features of immersive experience, simultaneous interactivity, and user agency, the Metaverse has the capability to transform all walks of life. However, the enabling technologies of the Metaverse, i.e., digital twin, artificial intelligence, blockchain, and extended reality, are known to be energy-hungry, therefore raising concerns about the sustainability of its large-scale deployment and development. This article proposes Green Metaverse Networking for the first time to optimize energy efficiencies of all network components for Metaverse sustainable development. We first analyze energy consumption, efficiency, and sustainability of energy-intensive technologies in the Metaverse. Next, focusing on computation and networking, we present major advancements related to energy efficiency and their integration into the Metaverse. A case study of energy conservation by incorporating semantic communication and stochastic resource allocation in the Metaverse is presented. Finally, we outline the critical challenges of Metaverse sustainable development, thereby indicating potential directions of future research towards the green Metaverse. △ Less

Submitted 13 April, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

arXiv:2209.09508 [pdf, other]

Real-time Digital Double Framework to Predict Collapsible Terrains for Legged Robots

Authors: Garen Haddeler, Hari P. Palanivelu, Yung Chuen Ng, Fabien Colonnier, Albertus H. Adiwahono, Zhibin Li, Chee-Meng Chew, Meng Yee, Chuah

Abstract: Inspired by the digital twinning systems, a novel real-time digital double framework is developed to enhance robot perception of the terrain conditions. Based on the very same physical model and motion control, this work exploits the use of such simulated digital double synchronized with a real robot to capture and extract discrepancy information between the two systems, which provides high dimens… ▽ More Inspired by the digital twinning systems, a novel real-time digital double framework is developed to enhance robot perception of the terrain conditions. Based on the very same physical model and motion control, this work exploits the use of such simulated digital double synchronized with a real robot to capture and extract discrepancy information between the two systems, which provides high dimensional cues in multiple physical quantities to represent differences between the modelled and the real world. Soft, non-rigid terrains cause common failures in legged locomotion, whereby visual perception solely is insufficient in estimating such physical properties of terrains. We used digital double to develop the estimation of the collapsibility, which addressed this issue through physical interactions during dynamic walking. The discrepancy in sensory measurements between the real robot and its digital double are used as input of a learning-based algorithm for terrain collapsibility analysis. Although trained only in simulation, the learned model can perform collapsibility estimation successfully in both simulation and real world. Our evaluation of results showed the generalization to different scenarios and the advantages of the digital double to reliably detect nuances in ground conditions. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Preprint version. Accepted June 2022

arXiv:2208.14661 [pdf, other]

Stochastic Resource Allocation for Semantic Communication-aided Virtual Transportation Networks in the Metaverse

Authors: Wei Chong Ng, Hongyang Du, Wei Yang Bryan Lim, Zehui Xiong, Dusit Niyato, Chunyan Miao

Abstract: The physical-virtual world synchronization to develop the Metaverse will require a massive transmission and exchange of data. In this paper, we introduce semantic communication for the development of virtual transportation networks in the Metaverse. Leveraging the perception capabilities of edge devices, virtual service providers (VSPs) can subscribe to their preferred edge devices to receive the… ▽ More The physical-virtual world synchronization to develop the Metaverse will require a massive transmission and exchange of data. In this paper, we introduce semantic communication for the development of virtual transportation networks in the Metaverse. Leveraging the perception capabilities of edge devices, virtual service providers (VSPs) can subscribe to their preferred edge devices to receive the semantic data of interest. However, the demands of the VSPs are highly dependent on the users that they are serving. To address the resource allocation problem amid stochastic user demand, we propose a stochastic semantic transmission scheme (SSTS) based on two-stage stochastic integer programming. Using real data captured by edge devices we deploy in Singapore, the simulation results show that SSTS can minimize the transmission cost of the VSPs while accounting for the users' demand uncertainties. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures and 3 tables

arXiv:2204.03724 [pdf, other]

A Kernel Method to Nonlinear Location Estimation with RSS-based Fingerprint

Authors: Pai Chet Ng, Petros Spachos, James She, Konstantinos N. Plataniotis

Abstract: This paper presents a nonlinear location estimation to infer the position of a user holding a smartphone. We consider a large location with $M$ number of grid points, each grid point is labeled with a unique fingerprint consisting of the received signal strength (RSS) values measured from $N$ number of Bluetooth Low Energy (BLE) beacons. Given the fingerprint observed by the smartphone, the user's… ▽ More This paper presents a nonlinear location estimation to infer the position of a user holding a smartphone. We consider a large location with $M$ number of grid points, each grid point is labeled with a unique fingerprint consisting of the received signal strength (RSS) values measured from $N$ number of Bluetooth Low Energy (BLE) beacons. Given the fingerprint observed by the smartphone, the user's current location can be estimated by finding the top-k similar fingerprints from the list of fingerprints registered in the database. Besides the environmental factors, the dynamicity in holding the smartphone is another source to the variation in fingerprint measurements, yet there are not many studies addressing the fingerprint variability due to dynamic smartphone positions held by human hands during online detection. To this end, we propose a nonlinear location estimation using the kernel method. Specifically, our proposed method comprises of two steps: 1) a beacon selection strategy to select a subset of beacons that is insensitive to the subtle change of holding positions, and 2) a kernel method to compute the similarity between this subset of observed signals and all the fingerprints registered in the database. The experimental results based on large-scale data collected in a complex building indicate a substantial performance gain of our proposed approach in comparison to state-of-the-art methods. The dataset consisting of the signal information collected from the beacons is available online. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2204.03195 [pdf, other]

3D Perception based Imitation Learning under Limited Demonstration for Laparoscope Control in Robotic Surgery

Authors: Bin Li, Ruofeng Wei, Jiaqi Xu, Bo Lu, Chi-Hang Yee, Chi-Fai Ng, Pheng-Ann Heng, Qi Dou, Yun-Hui Liu

Abstract: Automatic laparoscope motion control is fundamentally important for surgeons to efficiently perform operations. However, its traditional control methods based on tool tracking without considering information hidden in surgical scenes are not intelligent enough, while the latest supervised imitation learning (IL)-based methods require expensive sensor data and suffer from distribution mismatch issu… ▽ More Automatic laparoscope motion control is fundamentally important for surgeons to efficiently perform operations. However, its traditional control methods based on tool tracking without considering information hidden in surgical scenes are not intelligent enough, while the latest supervised imitation learning (IL)-based methods require expensive sensor data and suffer from distribution mismatch issues caused by limited demonstrations. In this paper, we propose a novel Imitation Learning framework for Laparoscope Control (ILLC) with reinforcement learning (RL), which can efficiently learn the control policy from limited surgical video clips. Specially, we first extract surgical laparoscope trajectories from unlabeled videos as the demonstrations and reconstruct the corresponding surgical scenes. To fully learn from limited motion trajectory demonstrations, we propose Shape Preserving Trajectory Augmentation (SPTA) to augment these data, and build a simulation environment that supports parallel RGB-D rendering to reinforce the RL policy for interacting with the environment efficiently. With adversarial training for IL, we obtain the laparoscope control policy based on the generated rollouts and surgical demonstrations. Extensive experiments are conducted in unseen reconstructed surgical scenes, and our method outperforms the previous IL methods, which proves the feasibility of our unified learning-based framework for laparoscope control. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: 7 pages, 7 figures, 2022 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2204.00630 [pdf, other]

Extremely Low-light Image Enhancement with Scene Text Restoration

Authors: Pohao Hsu, Che-Tsung Lin, Chun Chet Ng, Jie-Long Kew, Mei Yih Tan, Shang-Hong Lai, Chee Seng Chan, Christopher Zach

Abstract: Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene… ▽ More Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene texts, as well as the overall quality of the image simultaneously under extremely low-light images conditions. Mainly, we employed a self-regularised attention map, an edge map, and a novel text detection loss. In addition, leveraging synthetic low-light images is beneficial for image enhancement on the genuine ones in terms of text detection. The quantitative and qualitative experimental results have shown that the proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting on See In the Dark and ICDAR15 datasets. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2203.15405 [pdf, other]

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations

Authors: Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

Abstract: This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individ… ▽ More This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individual phone- and word-level errors, we propose to extract a subject-level representation from a long utterance that is constructed by concatenating multiple test words. The speaker verification approach, and posterior features generated by deep neural network models, are applied to derive various types of holistic representations. A linear classifier is trained to differentiate disordered speech in normal one. On the task of detecting SSD in Cantonese-speaking children, experimental results show that the proposed approach achieves improved detection performance over previous method that requires fusing phone-level detection results. Using articulatory posterior features to derive i-vectors from multiple-word utterances achieves an unweighted average recall of 78.2% and a macro F1 score of 78.0%. △ Less

Submitted 29 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted to Interspeech 2022

arXiv:2203.05471 [pdf, other]

A Full Dive into Realizing the Edge-enabled Metaverse: Visions, Enabling Technologies,and Challenges

Authors: Minrui Xu, Wei Chong Ng, Wei Yang Bryan Lim, Jiawen Kang, Zehui Xiong, Dusit Niyato, Qiang Yang, Xuemin Sherman Shen, Chunyan Miao

Abstract: Dubbed "the successor to the mobile Internet", the concept of the Metaverse has grown in popularity. While there exist lite versions of the Metaverse today, they are still far from realizing the full vision of an immersive, embodied, and interoperable Metaverse. Without addressing the issues of implementation from the communication and networking, as well as computation perspectives, the Metaverse… ▽ More Dubbed "the successor to the mobile Internet", the concept of the Metaverse has grown in popularity. While there exist lite versions of the Metaverse today, they are still far from realizing the full vision of an immersive, embodied, and interoperable Metaverse. Without addressing the issues of implementation from the communication and networking, as well as computation perspectives, the Metaverse is difficult to succeed the Internet, especially in terms of its accessibility to billions of users today. In this survey, we focus on the edge-enabled Metaverse to realize its ultimate vision. We first provide readers with a succinct tutorial of the Metaverse, an introduction to the architecture, as well as current developments. To enable ubiquitous, seamless, and embodied access to the Metaverse, we discuss the communication and networking challenges and survey cutting-edge solutions and concepts that leverage next-generation communication systems for users to immerse as and interact with embodied avatars in the Metaverse. Moreover, given the high computation costs required, e.g., to render 3D virtual worlds and run data-hungry artificial intelligence-driven avatars, we discuss the computation challenges and cloud-edge-end computation framework-driven solutions to realize the Metaverse on resource-constrained edge devices. Next, we explore how blockchain technologies can aid in the interoperable development of the Metaverse, not just in terms of empowering the economic circulation of virtual user-generated content but also to manage physical edge resources in a decentralized, transparent, and immutable manner. Finally, we discuss the future research directions towards realizing the true vision of the edge-enabled Metaverse. △ Less

Submitted 20 August, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

arXiv:2202.11697 [pdf, other]

Stochastic Coded Offloading Scheme for Unmanned Aerial Vehicle-Assisted Edge Computing

Authors: Wei Chong Ng, Wei Yang Bryan Lim, Zehui Xiong, Dusit Niyato, Chunyan Miao, Zhu Han, Dong In Kim

Abstract: Unmanned aerial vehicles (UAVs) have gained wide research interests due to their technological advancement and high mobility. The UAVs are equipped with increasingly advanced capabilities to run computationally intensive applications enabled by machine learning techniques. However, because of both energy and computation constraints, the UAVs face issues hovering in the sky while performing computa… ▽ More Unmanned aerial vehicles (UAVs) have gained wide research interests due to their technological advancement and high mobility. The UAVs are equipped with increasingly advanced capabilities to run computationally intensive applications enabled by machine learning techniques. However, because of both energy and computation constraints, the UAVs face issues hovering in the sky while performing computation due to weather uncertainty. To overcome the computation constraints, the UAVs can partially or fully offload their computation tasks to the edge servers. In ordinary computation offloading operations, the UAVs can retrieve the result from the returned output. Nevertheless, if the UAVs are unable to retrieve the entire result from the edge servers, i.e., straggling edge servers, this operation will fail. In this paper, we propose a coded distributed computing approach for computation offloading to mitigate straggling edge servers. The UAVs can retrieve the returned result when the number of returned copies is greater than or equal to the recovery threshold. There is a shortfall if the returned copies are less than the recovery threshold. To minimize the cost of the network, energy consumption by the UAVs, and prevent over and under subscription of the resources, we devise a two-phase Stochastic Coded Offloading Scheme (SCOS). In the first phase, the appropriate UAVs are allocated to the charging stations amid weather uncertainty. In the second phase, we use the $z$-stage Stochastic Integer Programming (SIP) to optimize the number of computation subtasks offloaded and computed locally, while taking into account the computation shortfall and demand uncertainty. By using a real dataset, the simulation results show that our proposed scheme is fully dynamic, and minimizes the cost of the network and UAV energy consumption amid stochastic uncertainties. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Accepted by IEEE Internet of Things Journal. 20 pages, 18 figures. arXiv admin note: text overlap with arXiv:2110.14873

arXiv:2110.14873 [pdf, other]

Optimal Stochastic Coded Computation Offloading in Unmanned Aerial Vehicles Network

Authors: Wei Chong Ng, Wei Yang Bryan Lim, Jer Shyuan Ng, Suttinee Sawadsitang, Zehui Xiong, Dusit Niyato

Abstract: Today, modern unmanned aerial vehicles (UAVs) are equipped with increasingly advanced capabilities that can run applications enabled by machine learning techniques, which require computationally intensive operations such as matrix multiplications. Due to computation constraints, the UAVs can offload their computation tasks to edge servers. To mitigate stragglers, coded distributed computing (CDC)… ▽ More Today, modern unmanned aerial vehicles (UAVs) are equipped with increasingly advanced capabilities that can run applications enabled by machine learning techniques, which require computationally intensive operations such as matrix multiplications. Due to computation constraints, the UAVs can offload their computation tasks to edge servers. To mitigate stragglers, coded distributed computing (CDC) based offloading can be adopted. In this paper, we propose an Optimal Task Allocation Scheme (OTAS) based on Stochastic Integer Programming with the objective to minimize energy consumption during computation offloading. The simulation results show that amid uncertainty of task completion, the energy consumption in the UAV network is minimized. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: To be published in IEEE Global Communications Conference

arXiv:2110.14325 [pdf, other]

Unified Resource Allocation Framework for the Edge Intelligence-Enabled Metaverse

Authors: Wei Chong Ng, Wei Yang Bryan Lim, Jer Shyuan Ng, Zehui Xiong, Dusit Niyato, Chunyan Miao

Abstract: Dubbed as the next-generation Internet, the metaverse is a virtual world that allows users to interact with each other or objects in real-time using their avatars. The metaverse is envisioned to support novel ecosystems of service provision in an immersive environment brought about by an intersection of the virtual and physical worlds. The native AI systems in metaverse will personalized user expe… ▽ More Dubbed as the next-generation Internet, the metaverse is a virtual world that allows users to interact with each other or objects in real-time using their avatars. The metaverse is envisioned to support novel ecosystems of service provision in an immersive environment brought about by an intersection of the virtual and physical worlds. The native AI systems in metaverse will personalized user experience over time and shape the experience in a scalable, seamless, and synchronous way. However, the metaverse is characterized by diverse resource types amid a highly dynamic demand environment. In this paper, we propose the case study of virtual education in the metaverse and address the unified resource allocation problem amid stochastic user demand. We propose a stochastic optimal resource allocation scheme (SORAS) based on stochastic integer programming with the objective of minimizing the cost of the virtual service provider. The simulation results show that SORAS can minimize the cost of the virtual service provider while accounting for the users' demands uncertainty. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 6 pages, 10 figures

arXiv:2108.02008 [pdf, other]

Personal Devices for Contact Tracing: Smartphones and Wearables to Fight Covid-19

Authors: Pai Chet Ng, Petros Spachos, Stefano Gregori, Konstantinos Plataniotis

Abstract: Digital contact tracing has emerged as a viable tool supplementing manual contact tracing. To date, more than 100 contact tracing applications have been published to slow down the spread of highly contagious Covid-19. Despite subtle variabilities among these applications, all of them achieve contact tracing by manipulating the following three components: a) use a personal device to identify the us… ▽ More Digital contact tracing has emerged as a viable tool supplementing manual contact tracing. To date, more than 100 contact tracing applications have been published to slow down the spread of highly contagious Covid-19. Despite subtle variabilities among these applications, all of them achieve contact tracing by manipulating the following three components: a) use a personal device to identify the user while designing a secure protocol to anonymize the user's identity; b) leverage networking technologies to analyze and store the data; c) exploit rich sensing features on the user device to detect the interaction among users and thus estimate the exposure risk. This paper reviews the current digital contact tracing based on these three components. We focus on two personal devices that are intimate to the user: smartphones and wearables. We discuss the centralized and decentralized networking approaches that use to facilitate the data flow. Lastly, we investigate the sensing feature available on smartphones and wearables to detect the proximity between any two users and present experiments comparing the proximity sensing performance between these two personal devices. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted at the IEEE Communications Magazine

arXiv:2107.05279 [pdf, other]

ICDAR 2021 Competition on Integrated Circuit Text Spotting and Aesthetic Assessment

Authors: Chun Chet Ng, Akmalul Khairi Bin Nazaruddin, Yeong Khang Lee, Xinyu Wang, Yuliang Liu, Chee Seng Chan, Lianwen Jin, Yipeng Sun, Lixin Fan

Abstract: With hundreds of thousands of electronic chip components are being manufactured every day, chip manufacturers have seen an increasing demand in seeking a more efficient and effective way of inspecting the quality of printed texts on chip components. The major problem that deters this area of research is the lacking of realistic text on chips datasets to act as a strong foundation. Hence, a text on… ▽ More With hundreds of thousands of electronic chip components are being manufactured every day, chip manufacturers have seen an increasing demand in seeking a more efficient and effective way of inspecting the quality of printed texts on chip components. The major problem that deters this area of research is the lacking of realistic text on chips datasets to act as a strong foundation. Hence, a text on chips dataset, ICText is used as the main target for the proposed Robust Reading Challenge on Integrated Circuit Text Spotting and Aesthetic Assessment (RRC-ICText) 2021 to encourage the research on this problem. Throughout the entire competition, we have received a total of 233 submissions from 10 unique teams/individuals. Details of the competition and submission results are presented in this report. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: Technical report of ICDAR 2021 Competition on Integrated Circuit Text Spotting and Aesthetic Assessment

Journal ref: International Conference on Document Analysis and Recognition (ICDAR) 2021

arXiv:2107.00229 [pdf, other]

E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-based Stereoscopic Depth Perception

Authors: Yonghao Long, Zhaoshuo Li, Chi Hang Yee, Chi Fai Ng, Russell H. Taylor, Mathias Unberath, Qi Dou

Abstract: Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, too… ▽ More Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a light-weight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to MICCAI 2021

arXiv:2106.08536 [pdf, other]

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Authors: Si-Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee

Abstract: Speech sound disorder (SSD) refers to a type of developmental disorder in young children who encounter persistent difficulties in producing certain speech sounds at the expected age. Consonant errors are the major indicator of SSD in clinical assessment. Previous studies on automatic assessment of SSD revealed that detection of speech errors concerning short and transitory consonants is less satis… ▽ More Speech sound disorder (SSD) refers to a type of developmental disorder in young children who encounter persistent difficulties in producing certain speech sounds at the expected age. Consonant errors are the major indicator of SSD in clinical assessment. Previous studies on automatic assessment of SSD revealed that detection of speech errors concerning short and transitory consonants is less satisfactory. This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment. The underlying assumption is that the vowel part of a CV segment carries important information of co-articulation from the consonant. Speech embeddings are extracted from CV segments by a recurrent neural network model. The similarity scores between the embeddings of the test segment and the reference segments are computed to determine if the test segment is the expected consonant or not. Experimental results show that using CV segments achieves improved performance on detecting speech errors concerning those "difficult" consonants reported in the previous studies. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: Accepted to INTERSPEECH 2021

arXiv:2106.00795 [pdf]

Classification of MIMO Equalizers

Authors: Wing Chau Ng, Chuandong Li

Abstract: In this theoretical work, the DSP-perceived channel in optical coherent communications is first simplified, based on which we categorize linear MIMO equalizers into four classes according to their reference locations. The entire channel inverse can be represented by a complex conjugate-dependent system, coinciding with the widely linear equalization theory. Suboptimally removing FO dynamics, relat… ▽ More In this theoretical work, the DSP-perceived channel in optical coherent communications is first simplified, based on which we categorize linear MIMO equalizers into four classes according to their reference locations. The entire channel inverse can be represented by a complex conjugate-dependent system, coinciding with the widely linear equalization theory. Suboptimally removing FO dynamics, relatively static channel inverses parameterized with common device and channel parameters are presented for monitoring or calibration purposes. △ Less

Submitted 7 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: This work was submitted to ECOC 2021. This theoretical paper also explains the principle of the experimental demonstration in Joint Transmitter and Receiver IQ Differential Phase Calibration using a single 4x8 MIMO Equalizer, Proc. Advanced Photonics Congress 2021 (SPPCom), SpTh1D.4

arXiv:2103.12988 [pdf, other]

One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video

Authors: Zixu Zhao, Yueming Jin, Bo Lu, Chi-Fai Ng, Qi Dou, Yun-Hui Liu, Pheng-Ann Heng

Abstract: Surgical instrument segmentation in robot-assisted surgery (RAS) - especially that using learning-based models - relies on the assumption that training and testing videos are sampled from the same domain. However, it is impractical and expensive to collect and annotate sufficient data from every new domain. To greatly increase the label efficiency, we explore a new problem, i.e., adaptive instrume… ▽ More Surgical instrument segmentation in robot-assisted surgery (RAS) - especially that using learning-based models - relies on the assumption that training and testing videos are sampled from the same domain. However, it is impractical and expensive to collect and annotate sufficient data from every new domain. To greatly increase the label efficiency, we explore a new problem, i.e., adaptive instrument segmentation, which is to effectively adapt one source model to new robotic surgical videos from multiple target domains, only given the annotated instruments in the first frame. We propose MDAL, a meta-learning based dynamic online adaptive learning scheme with a two-stage framework to fast adapt the model parameters on the first frame and partial subsequent frames while predicting the results. MDAL learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm. The added gradient gate excludes the noisy supervision from pseudo masks for dynamic online adaptation on target videos. We demonstrate empirically that MDAL outperforms other state-of-the-art methods on two datasets (including a real-world RAS dataset). The promising performance on ex-vivo scenes also benefits the downstream tasks such as robot-assisted suturing and camera control. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: Accepted by ICRA 2021

arXiv:2101.12149 [pdf, other]

doi 10.1016/j.parco.2021.102833

Porting WarpX to GPU-accelerated platforms

Authors: A. Myers, A. Almgren, L. D. Amorim, J. Bell, L. Fedeli, L. Ge, K. Gott, D. P. Grote, M. Hogan, A. Huebl, R. Jambunathan, R. Lehe, C. Ng, M. Rowan, O. Shapoval, M. Thévenet, J. -L. Vay, H. Vincenti, E. Yang, N. Zaïm, W. Zhang, Y. Zhao, E. Zoni

Abstract: WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren… ▽ More WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems. △ Less

Submitted 2 September, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: 11 pages, 5 figures, accepted by Parallel Computing. Minor revisions, results unchanged

Journal ref: Parallel Computing, Volume 108, 2021, 102833

arXiv:2012.14309 [pdf, other]

General Mechanism of Evolution Shared by Proteins and Words

Authors: Li-Min Wang, Hsing-Yi Lai, Sun-Ting Tsai, Chen Siang Ng, Shan-Jyun Wu, Meng-Xue Tsai, Yi-Ching Su, Daw-Wei Wang, Tzay-Ming Hong

Abstract: Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation… ▽ More Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation for characterizing and analyzing protein sequences, human corpora, and their evolution. However, no general mathematical formula has been proposed so far to illuminate the origin of quantitative hallmarks shared by life and language. Here we show several new statistical relationships shared by proteins and words, which inspire us to establish a general mechanism of evolution with explicit formulations that can incorporate both old and new characteristics. We found natural selection can be quantified via the entropic formulation by the principle of least effort to determine the sequence variation that survives in evolution. Besides, the origin of power law behavior and how changes in the environment stimulate the emergence of new proteins and words can also be explained via the introduction of function connection network. Our results demonstrate not only the correspondence between genetics and linguistics over their different hierarchies but also new fundamental physical properties for the evolution of complex adaptive systems. We anticipate our statistical tests can function as quantitative criteria to examine whether an evolution theory of sequence is consistent with the regularity of real data. In the meantime, their correspondence broadens the bridge to exchange existing knowledge, spurs new interpretations, and opens Pandora's box to release several potentially revolutionary challenges. For example, does linguistic arbitrariness conflict with the dogma that structure determines function? △ Less

Submitted 16 December, 2022; v1 submitted 28 December, 2020; originally announced December 2020.

arXiv:2008.03188 [pdf, other]

CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

Authors: Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong

Abstract: This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus… ▽ More This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: Accepted to INTERSPEECH 2020, Shanghai, China

arXiv:2007.04399 [pdf, other]

Epidemic Exposure Notification with Smartwatch: A Proximity-Based Privacy-Preserving Approach

Authors: Pai Chet Ng, Petros Spachos, Stefano Gregori, Konstantinos Plataniotis

Abstract: Businesses planning for the post-pandemic world are looking for innovative ways to protect the health and welfare of their employees and customers. Wireless technologies can play a key role in assisting contact tracing to quickly halt a local infection outbreak and prevent further spread. In this work, we present a wearable proximity and exposure notification solution based on a smartwatch that al… ▽ More Businesses planning for the post-pandemic world are looking for innovative ways to protect the health and welfare of their employees and customers. Wireless technologies can play a key role in assisting contact tracing to quickly halt a local infection outbreak and prevent further spread. In this work, we present a wearable proximity and exposure notification solution based on a smartwatch that also promotes safe physical distancing in business, hospitality, or recreational facilities. Our proximity-based privacy-preserving contact tracing (P$^3$CT) leverages the Bluetooth Low Energy (BLE) technology for reliable proximity sensing, and an ambient signature protocol for preserving identity. Proximity sensing exploits the received signal strength (RSS) to detect the user's interaction and thus classifying them into low- or high-risk with respect to a patient diagnosed with an infectious disease. More precisely, a user is notified of their exposure based on their interactions, in terms of distance and time, with a patient. Our privacy-preserving protocol uses the ambient signatures to ensure that users' identities be anonymized. We demonstrate the feasibility of our proposed solution through extensive experimentation. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:2006.07361 [pdf, other]

Gaussian Processes on Graphs via Spectral Kernel Learning

Authors: Yin-Cong Zhi, Yin Cheng Ng, Xiaowen Dong

Abstract: We propose a graph spectrum-based Gaussian process for prediction of signals defined on nodes of the graph. The model is designed to capture various graph signal structures through a highly adaptive kernel that incorporates a flexible polynomial function in the graph spectral domain. Unlike most existing approaches, we propose to learn such a spectral kernel, where the polynomial setup enables lea… ▽ More We propose a graph spectrum-based Gaussian process for prediction of signals defined on nodes of the graph. The model is designed to capture various graph signal structures through a highly adaptive kernel that incorporates a flexible polynomial function in the graph spectral domain. Unlike most existing approaches, we propose to learn such a spectral kernel, where the polynomial setup enables learning without the need for eigen-decomposition of the graph Laplacian. In addition, this kernel has the interpretability of graph filtering achieved by a bespoke maximum likelihood learning algorithm that enforces the positivity of the spectrum. We demonstrate the interpretability of the model in synthetic experiments from which we show the various ground truth spectral filters can be accurately recovered, and the adaptability translates to superior performances in the prediction of real-world graph data of various characteristics. △ Less

Submitted 28 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: 13 pages, 5 Figures

arXiv:2005.13754 [pdf, other]

doi 10.1109/JSYST.2021.3055675

COVID-19 and Your Smartphone: BLE-based Smart Contact Tracing

Authors: Pai Chet Ng, Petros Spachos, Konstantinos Plataniotis

Abstract: Contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. Contact tracing is usually performed manually by authorized personnel. Manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. This… ▽ More Contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. Contact tracing is usually performed manually by authorized personnel. Manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. This paper introduces an alternative way to manual contact tracing. The proposed Smart Contact Tracing (SCT) system utilizes the smartphone's Bluetooth Low Energy (BLE) signals and machine learning classifier to accurately and quickly determined the contact profile. SCT's contribution is two-fold: a) classification of the user's contact as high/low-risk using precise proximity sensing, and b) user anonymity using a privacy-preserving communications protocol. SCT leverages BLE's non-connectable advertising feature to broadcast a signature packet when the user is in the public space. Both broadcasted and observed signatures are stored in the user's smartphone and they are only uploaded to a secure signature database when a user is confirmed by public health authorities to be infected. Using received signal strength (RSS) each smartphone estimates its distance from other user's phones and issues real-time alerts when social distancing rules are violated. The paper includes extensive experimentation utilizing real-life smartphone positions and a comparative evaluation of five machine learning classifiers. Reported results indicate that a decision tree classifier outperforms other states of the art classification methods in terms of accuracy. Lastly, to facilitate research in this area, and to contribute to the timely development of advanced solutions the entire data set of six experiments with about 123,000 data points is made publicly available. △ Less

Submitted 27 May, 2020; originally announced May 2020.

arXiv:2005.02780 [pdf, other]

A Large-scale Industrial and Professional Occupation Dataset

Authors: Junhua Liu, Yung Chuen Ng, Kwan Hui Lim

Abstract: There has been growing interest in utilizing occupational data mining and analysis. In today's job market, occupational data mining and analysis is growing in importance as it enables companies to predict employee turnover, model career trajectories, screen through resumes and perform other human resource tasks. A key requirement to facilitate these tasks is the need for an occupation-related data… ▽ More There has been growing interest in utilizing occupational data mining and analysis. In today's job market, occupational data mining and analysis is growing in importance as it enables companies to predict employee turnover, model career trajectories, screen through resumes and perform other human resource tasks. A key requirement to facilitate these tasks is the need for an occupation-related dataset. However, most research use proprietary datasets or do not make their dataset publicly available, thus impeding development in this area. To solve this issue, we present the Industrial and Professional Occupation Dataset (IPOD), which comprises 192k job titles belonging to 56k LinkedIn users. In addition to making IPOD publicly available, we also: (i) manually annotate each job title with its associated level of seniority, domain of work and location; and (ii) provide embedding for job titles and discuss various use cases. This dataset is publicly available at https://github.com/junhua/ipod. △ Less

Submitted 25 April, 2020; originally announced May 2020.

arXiv:2002.10215 [pdf, other]

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Authors: Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang

Abstract: Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions… ▽ More Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method's ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge. Experiments and analysis are provided that show the value of the dataset. △ Less

Submitted 25 February, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: Accepted to Proc. IEEE Conf. Computer Vision and Pattern Recognition 2020

arXiv:1910.10495 [pdf, other]

IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis

Authors: Junhua Liu, Yung Chuen Ng, Kristin L. Wood, Kwan Hui Lim

Abstract: Occupational data mining and analysis is an important task in understanding today's industry and job market. Various machine learning techniques are proposed and gradually deployed to improve companies' operations for upstream tasks, such as employee churn prediction, career trajectory modelling and automated interview. Job titles analysis and embedding, as the fundamental building blocks, are cru… ▽ More Occupational data mining and analysis is an important task in understanding today's industry and job market. Various machine learning techniques are proposed and gradually deployed to improve companies' operations for upstream tasks, such as employee churn prediction, career trajectory modelling and automated interview. Job titles analysis and embedding, as the fundamental building blocks, are crucial upstream tasks to address these occupational data mining and analysis problems. In this work, we present the Industrial and Professional Occupations Dataset (IPOD), which consists of over 190,000 job titles crawled from over 56,000 profiles from Linkedin. We also illustrate the usefulness of IPOD by addressing two challenging upstream tasks, including: (i) proposing Title2vec, a contextual job title vector representation using a bidirectional Language Model (biLM) approach; and (ii) addressing the important occupational Named Entity Recognition problem using Conditional Random Fields (CRF) and bidirectional Long Short-Term Memory with CRF (LSTM-CRF). Both CRF and LSTM-CRF outperform human and baselines in both exact-match accuracy and F1 scores. The dataset and pre-trained embeddings are available at https://www.github.com/junhua/ipod. △ Less

Submitted 26 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

arXiv:1909.07741 [pdf, other]

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT

Authors: Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

Abstract: Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduc… ▽ More Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: ICDAR 2019 Robust Reading Challenge in IAPR International Conference on Document Analysis and Recognition (ICDAR)

arXiv:1909.07145 [pdf, other]

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

Authors: Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

Abstract: This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.… ▽ More This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants methods. The dataset, the evaluation kit as well as the results are publicly available at https://rrc.cvc.uab.es/?ch=14 △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: Technical report of ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) Competition

arXiv:1902.01821 [pdf]

doi 10.25147/ijcsr.2017.001.1.25

A Study of an Agile Methodology with Scrum Approach to the Filipino Company-Sponsored I.T. Capstone Program

Authors: Giuseppe C. Ng

Abstract: Purpose - The research aims to show the relevance of company client sponsored student projects in the University of Asia and the Pacific Information Technology (UA&P IT) Capstone Program through the use ofan Agile Methodology with Scrum Approach. Method - The modified program is employed on two batches with content analysis and survey results as benchmarks. Results - Surveys at the end of the spri… ▽ More Purpose - The research aims to show the relevance of company client sponsored student projects in the University of Asia and the Pacific Information Technology (UA&P IT) Capstone Program through the use ofan Agile Methodology with Scrum Approach. Method - The modified program is employed on two batches with content analysis and survey results as benchmarks. Results - Surveys at the end of the sprints for both clients and students revealed that the length of the sprint was a critical factor in the development of the information system, and that students learned from addressing additional challenges such as academic load, team pressure and communication issues. Conclusion - Over-all results showed that clients were impressed and keen to adopt the student works. Recommendations - Maintainability aspects of the research can be analyzed for future studies. Increasing the sample size with additional batches could lead to discovery of additional factors not previously seen. Research Implications - The research could help improve other Capstone Programs while improving communication with company clients. △ Less

Submitted 3 February, 2019; originally announced February 2019.

Journal ref: International Journal of Computing Sciences Research (ISSN print: 2546-0552; ISSN online: 2546-115X) Vol. 2, No. 2, 2018

arXiv:1811.08933 [pdf, other]

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

Authors: Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, Tor Aamodt

Abstract: Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which has been made available publicly with this paper, to study some simple deep lear… ▽ More Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which has been made available publicly with this paper, to study some simple deep learning workloads. With our changes to GPGPU-Sim's functional simulation model, we find GPGPU-Sim performance model running a cuDNN enabled implementation of LeNet for MNIST reports results within 30% of real hardware. Using GPGPU-Sim's AerialVision performance analysis tool we observe that cuDNN API calls contain many varying phases and appear to include potentially inefficient microarchitecture behaviour such as DRAM partition bank camping, at least when executed on GPGPU-Sim's current performance model. △ Less

Submitted 26 January, 2019; v1 submitted 18 November, 2018; originally announced November 2018.

Comments: Source code available at: https://github.com/gpgpu-sim/gpgpu-sim_distribution/tree/dev

arXiv:1809.05210 [pdf, other]

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

Authors: Laramie Paxton, Yufeng Cao, Kevin R. Vixie, Yuan Wang, Brian Hobbs, Chaan Ng

Abstract: Tumor detection in biomedical imaging is a time-consuming process for medical professionals and is not without errors. Thus in recent decades, researchers have developed algorithmic techniques for image processing using a wide variety of mathematical methods, such as statistical modeling, variational techniques, and machine learning. In this paper, we propose a semi-automatic method for liver segm… ▽ More Tumor detection in biomedical imaging is a time-consuming process for medical professionals and is not without errors. Thus in recent decades, researchers have developed algorithmic techniques for image processing using a wide variety of mathematical methods, such as statistical modeling, variational techniques, and machine learning. In this paper, we propose a semi-automatic method for liver segmentation of 2D CT scans into three labels denoting healthy, vessel, or tumor tissue based on graph cuts. First, we create a feature vector for each pixel in a novel way that consists of the 59 intensity values in the time series data and propose a simplified perimeter cost term in the energy functional. We normalize the data and perimeter terms in the functional to expedite the graph cut without having to optimize the scaling parameter $λ$. In place of a training process, predetermined tissue means are computed based on sample regions identified by expert radiologists. The proposed method also has the advantage of being relatively simple to implement computationally. It was evaluated against the ground truth on a clinical CT dataset of 10 tumors and yielded segmentations with a mean Dice similarity coefficient (DSC) of .77 and mean volume overlap error (VOE) of 36.7%. The average processing time was 1.25 minutes per slice. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: Image processing; image analysis; medical imaging

arXiv:1809.04379 [pdf, other]

Bayesian Semi-supervised Learning with Graph Gaussian Processes

Authors: Yin Cheng Ng, Nicolo Colombo, Ricardo Silva

Abstract: We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the m… ▽ More We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the model does not require a validation data set for early stopping to control over-fitting. Our model can be viewed as an instance of empirical distribution regression weighted locally by network connectivity. We further motivate the intuitive construction of the model with a Bayesian linear model interpretation where the node features are filtered by an operator related to the graph Laplacian. The method can be easily implemented by adapting off-the-shelf scalable variational inference algorithms for Gaussian processes. △ Less

Submitted 12 October, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

Comments: To appear in NIPS 2018 Fixed an error in Figure 2. The previous arxiv version contains two identical sub-figures

arXiv:1801.09029 [pdf, other]

Adaptive Hybrid Beamforming with Massive Phased Arrays in Macro-Cellular Networks

Authors: Shahram Shahsavari, S. Amir Hosseini, Chris Ng, Elza Erkip

Abstract: Hybrid beamforming via large antenna arrays has shown a great potential for increasing data rate in cellular networks by delivering multiple data streams simultaneously. In this paper, several beamforming design algorithms are proposed based on the long-term channel information for macro-cellular environments where the base station is equipped with a massive phased array under per-antenna power co… ▽ More Hybrid beamforming via large antenna arrays has shown a great potential for increasing data rate in cellular networks by delivering multiple data streams simultaneously. In this paper, several beamforming design algorithms are proposed based on the long-term channel information for macro-cellular environments where the base station is equipped with a massive phased array under per-antenna power constraint. Using an adaptive scheme, beamforming vectors are updated whenever the long-term channel information changes. First, the problem is studied when the base station has a single RF chain (single-beam scenario). Semi-definite relaxation (SDR) with randomization is used to solve the problem. As a second approach, a low-complexity heuristic beam composition algorithm is proposed which performs very close to the upper-bound obtained by SDR. Next, the problem is studied for a generic number of RF chains (multi-beam scenario) where the Gradient Projection method is used to obtain local solutions. Numerical results reveal that using massive antenna arrays with optimized beamforming vectors can lead to 5X network throughput improvement over systems with conventional antennas. △ Less

Submitted 3 February, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

arXiv:1710.04008 [pdf, other]

A Dynamic Edge Exchangeable Model for Sparse Temporal Networks

Authors: Yin Cheng Ng, Ricardo Silva

Abstract: We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition… ▽ More We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition to sparsity, the model accounts for the effect of social influence on vertices' future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm. △ Less

Submitted 11 October, 2017; originally announced October 2017.

arXiv:1612.05038 [pdf, other]

Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation

Authors: Adrian K. Davison, Cliff Lansley, Choon Ching Ng, Kevin Tan, Moi Hoon Yap

Abstract: Micro-facial expressions are regarded as an important human behavioural event that can highlight emotional deception. Spotting these movements is difficult for humans and machines, however research into using computer vision to detect subtle facial expressions is growing in popularity. This paper proposes an individualised baseline micro-movement detection method using 3D Histogram of Oriented Gra… ▽ More Micro-facial expressions are regarded as an important human behavioural event that can highlight emotional deception. Spotting these movements is difficult for humans and machines, however research into using computer vision to detect subtle facial expressions is growing in popularity. This paper proposes an individualised baseline micro-movement detection method using 3D Histogram of Oriented Gradients (3D HOG) temporal difference method. We define a face template consisting of 26 regions based on the Facial Action Coding System (FACS). We extract the temporal features of each region using 3D HOG. Then, we use Chi-square distance to find subtle facial motion in the local regions. Finally, an automatic peak detector is used to detect micro-movements above the newly proposed adaptive baseline threshold. The performance is validated on two FACS coded datasets: SAMM and CASME II. This objective method focuses on the movement of the 26 face regions. When comparing with the ground truth, the best result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The results show that 3D HOG outperformed for micro-movement detection, compared to state-of-the-art feature representations: Local Binary Patterns in Three Orthogonal Planes and Histograms of Oriented Optical Flow. △ Less

Submitted 15 December, 2016; originally announced December 2016.

Showing 1–50 of 71 results for author: Ng, C