Search | arXiv e-print repository

arXiv:2407.19651 [pdf, other]

ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

Authors: Chia-Hao Kao, Cheng Chien, Yu-Jen Tseng, Yi-Hsin Chen, Alessandro Gnutti, Shao-Yuan Lo, Wen-Hsiao Peng, Riccardo Leonardi

Abstract: This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, t… ▽ More This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, transmitting raw, uncompressed images captured by end devices to the cloud requires an efficient image compression system. To address this, we focus on emerging neural image compression and propose a novel framework with a lightweight transform-neck and a surrogate loss to adapt compressed image latents for MLLM-based vision tasks. The proposed framework is generic and applicable to multiple application scenarios, where the neural image codec can be (1) pre-trained for human perception without updating, (2) fully updated for joint human and machine perception, or (3) fully updated for only machine perception. The transform-neck trained with the surrogate loss is universal, for it can serve various downstream vision tasks enabled by a variety of MLLMs that share the same visual encoder. Our framework has the striking feature of excluding the downstream MLLMs from training the transform-neck, and potentially the neural image codec as well. This stands out from most existing coding for machine approaches that involve downstream networks in training and thus could be impractical when the networks are MLLMs. Extensive experiments on different neural image codecs and various MLLM-based vision tasks show that our method achieves great rate-accuracy performance with much less complexity, demonstrating its effectiveness. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.13386 [pdf, other]

Time Synchronization of TESLA-enabled GNSS Receivers

Authors: Jason Anderson, Sherman Lo, Todd Walter

Abstract: As TESLA-enabled GNSS for authenticated positioning reaches ubiquity, receivers must use an onboard, GNSS-independent clock and carefully constructed time synchronization algorithms to assert the authenticity afforded. This work provides the necessary checks and synchronization protocols needed in the broadcast-only GNSS context. We provide proof of security for each of our algorithms under a dela… ▽ More As TESLA-enabled GNSS for authenticated positioning reaches ubiquity, receivers must use an onboard, GNSS-independent clock and carefully constructed time synchronization algorithms to assert the authenticity afforded. This work provides the necessary checks and synchronization protocols needed in the broadcast-only GNSS context. We provide proof of security for each of our algorithms under a delay-capable adversary. The algorithms included herein enable a GNSS receiver to use its onboard, GNSS-independent clock to determine whether a message arrived at the correct time, to determine whether its onboard, GNSS-independent clock is safe to use and when the clock will no longer be safe in the future due to predicted clock drift, and to resynchronize its onboard, GNSS-independent clock. Each algorithm is safe to use even when an adversary induces delays within the protocol. Moreover, we discuss the implications of GNSS authentication schemes that use two simultaneous TESLA instances of different authentication cadences. To a receiver implementer or standards author, this work provides the necessary implementation algorithms to assert security and provides a comprehensive guide on why these methods are required. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 16 pages, 15 figures

arXiv:2407.10299 [pdf, other]

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

Authors: Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo

Abstract: Video Anomaly Detection (VAD) is crucial for applications such as security surveillance and autonomous driving. However, existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direc… ▽ More Video Anomaly Detection (VAD) is crucial for applications such as security surveillance and autonomous driving. However, existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direct use falls short of VAD. Specifically, the implicit knowledge pre-trained in LLMs focuses on general context and thus may not apply to every specific real-world VAD scenario, leading to inflexibility and inaccuracy. To address this, we propose AnomalyRuler, a novel rule-based reasoning framework for VAD with LLMs. AnomalyRuler comprises two main stages: induction and deduction. In the induction stage, the LLM is fed with few-shot normal reference samples and then summarizes these normal patterns to induce a set of rules for detecting anomalies. The deduction stage follows the induced rules to spot anomalous frames in test videos. Additionally, we design rule aggregation, perception smoothing, and robust reasoning strategies to further enhance AnomalyRuler's robustness. AnomalyRuler is the first reasoning approach for the one-class VAD task, which requires only few-normal-shot prompting without the need for full-shot training, thereby enabling fast adaption to various VAD scenarios. Comprehensive experiments across four VAD benchmarks demonstrate AnomalyRuler's state-of-the-art detection performance and reasoning ability. AnomalyRuler is open-source and available at: https://github.com/Yuchen413/AnomalyRuler △ Less

Submitted 20 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

Comments: Accepted at European Conference on Computer Vision (ECCV) 2024

arXiv:2405.20305 [pdf, other]

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

Authors: Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee

Abstract: We introduce PlausiVL, a large video-language model for anticipating action sequences that are plausible in the real-world. While significant efforts have been made towards anticipating future actions, prior approaches do not take into account the aspect of plausibility in an action sequence. To address this limitation, we explore the generative capability of a large video-language model in our wo… ▽ More We introduce PlausiVL, a large video-language model for anticipating action sequences that are plausible in the real-world. While significant efforts have been made towards anticipating future actions, prior approaches do not take into account the aspect of plausibility in an action sequence. To address this limitation, we explore the generative capability of a large video-language model in our work and further, develop the understanding of plausibility in an action sequence by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss. We utilize temporal logical constraints as well as verb-noun action pair logical constraints to create implausible/counterfactual action sequences and use them to train the model with plausible action sequence learning loss. This loss helps the model to differentiate between plausible and not plausible action sequences and also helps the model to learn implicit temporal cues crucial for the task of action anticipation. The long-horizon action repetition loss puts a higher penalty on the actions that are more prone to repetition over a longer temporal window. With this penalization, the model is able to generate diverse, plausible action sequences. We evaluate our approach on two large-scale datasets, Ego4D and EPIC-Kitchens-100, and show improvements on the task of action anticipation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2405.19413 [pdf, other]

VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture

Authors: Heesup Yun, Sassoum Lo, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles

Abstract: Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature acc… ▽ More Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computer vision techniques, particularly deep learning networks, we propose a method, called $\textbf{VisTA-SR}$ ($\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.11708 [pdf, other]

Adaptive Batch Normalization Networks for Adversarial Robustness

Authors: Shao-Yuan Lo, Vishal M. Patel

Abstract: Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still r… ▽ More Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still robust against strong adversarial attacks? To answer this question, we resort to adaptive Batch Normalization (BN), inspired by the recent advances in test-time domain adaptation. We propose a novel defense accordingly, referred to as the Adaptive Batch Normalization Network (ABNN). ABNN employs a pre-trained substitute model to generate clean BN statistics and sends them to the target model. The target model is exclusively trained on clean data and learns to align the substitute model's BN statistics. Experimental results show that ABNN consistently improves adversarial robustness against both digital and physically realizable attacks on both image and video datasets. Furthermore, ABNN can achieve higher clean data performance and significantly lower training time complexity compared to AT-based approaches. △ Less

Submitted 26 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: Accepted at IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 2024

arXiv:2405.10467 [pdf, other]

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

Abstract: Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking… ▽ More Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 17 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation. △ Less

Submitted 24 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.09290 [pdf, other]

RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion

Authors: Kyle Shih-Huang Lo, Jörg Peters, Eric Spellman

Abstract: Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings. Repairing sparse points can enhance low-cost sensor use and reduce UAV flight overlap. RoofDiffusion is a new end-to-end self-supervised diffusion technique for robustly completing, in particular difficult, roof height maps. RoofDiffusion leverages widely-available curated footprints and… ▽ More Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings. Repairing sparse points can enhance low-cost sensor use and reduce UAV flight overlap. RoofDiffusion is a new end-to-end self-supervised diffusion technique for robustly completing, in particular difficult, roof height maps. RoofDiffusion leverages widely-available curated footprints and can so handle up to 99\% point sparsity and 80\% roof area occlusion (regional incompleteness). A variant, No-FP RoofDiffusion, simultaneously predicts building footprints and heights. Both quantitatively outperform state-of-the-art unguided depth completion and representative inpainting methods for Digital Elevation Models (DEM), on both a roof-specific benchmark and the BuildingNet dataset. Qualitative assessments show the effectiveness of RoofDiffusion for datasets with real-world scans including AHN3, Dales3D, and USGS 3DEP LiDAR. Tested with the leading City3D algorithm, preprocessing height maps with RoofDiffusion noticeably improves 3D building reconstruction. RoofDiffusion is complemented by a new dataset of 13k complex roof geometries, focusing on long-tail issues in remote sensing; a novel simulation of tree occlusion; and a wide variety of large-area roof cut-outs for data augmentation and benchmarking. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.05583 [pdf, other]

Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

Authors: Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, Jun-Cheng Chen

Abstract: With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake de… ▽ More With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake detection approach by adapting the Foundation Models with rich information encoded inside, specifically using the image encoder from CLIP which has demonstrated strong zero-shot capability for downstream tasks. Inspired by the recent advances of parameter efficient fine-tuning, we propose a novel side-network-based decoder to extract spatial and temporal cues from the given video clip, with the promotion of the Facial Component Guidance (FCG) to encourage the spatial feature to include features of key facial parts for more robust and general Deepfake detection. Through extensive cross-dataset evaluations, our approach exhibits superior effectiveness in identifying unseen Deepfake samples, achieving notable performance improvement even with limited training samples and manipulation types. Our model secures an average performance enhancement of 0.9\% AUROC in cross-dataset assessments comparing with state-of-the-art methods, especially a significant lead of achieving 4.4\% improvement on the challenging DFDC dataset. △ Less

Submitted 5 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.07104 [pdf]

The Aleph & Other Metaphors for Image Generation

Authors: Gonzalo Ramos, Rick Barraza, Victor Dibia, Sharon Lo

Abstract: In this position paper, we reflect on fictional stories dealing with the infinite and how they connect with the current, fast-evolving field of image generation models. We draw attention to how some of these literary constructs can serve as powerful metaphors for guiding human-centered design and technical thinking in the space of these emerging technologies and the experiences we build around the… ▽ More In this position paper, we reflect on fictional stories dealing with the infinite and how they connect with the current, fast-evolving field of image generation models. We draw attention to how some of these literary constructs can serve as powerful metaphors for guiding human-centered design and technical thinking in the space of these emerging technologies and the experiences we build around them. We hope our provocations seed conversations about current and yet-to-be developed interactions with these emerging models in ways that may amplify human agency. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2306.08056 [pdf, other]

Distributed Trust Through the Lens of Software Architecture

Authors: Sin Kit Lo, Yue Liu, Guangsheng Yu, Qinghua Lu, Xiwei Xu, Liming Zhu

Abstract: Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues across organizational boundaries, and zero trust… ▽ More Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues across organizational boundaries, and zero trust cybersecurity. This paper will survey the concept of distributed trust in multiple disciplines. It will take a system/software architecture point of view to look at trust redistribution/shift and the associated tradeoffs in systems and applications enabled by distributed trust technologies. △ Less

Submitted 25 May, 2023; originally announced June 2023.

arXiv:2305.12292 [pdf, other]

Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Sean Lo, Jean Pauphilet

Abstract: Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We… ▽ More Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We reformulate these low-rank problems as convex problems over the non-convex set of projection matrices and implement a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing that two-by-two minors in each rank-one matrix have determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts, and our disjunctive branch-and-bound scheme solves nxn rank-r matrix completion problems to certifiable optimality in hours for n<=150 and r<=5. △ Less

Submitted 26 January, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: Updated version with new numerics showcasing relaxation for rank k>1

arXiv:2303.14361 [pdf, other]

Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation

Authors: Shao-Yuan Lo, Poojan Oza, Sumanth Chennupati, Alejandro Galindo, Vishal M. Patel

Abstract: Unsupervised Domain Adaptation (UDA) of semantic segmentation transfers labeled source knowledge to an unlabeled target domain by relying on accessing both the source and target data. However, the access to source data is often restricted or infeasible in real-world scenarios. Under the source data restrictive circumstances, UDA is less practical. To address this, recent works have explored soluti… ▽ More Unsupervised Domain Adaptation (UDA) of semantic segmentation transfers labeled source knowledge to an unlabeled target domain by relying on accessing both the source and target data. However, the access to source data is often restricted or infeasible in real-world scenarios. Under the source data restrictive circumstances, UDA is less practical. To address this, recent works have explored solutions under the Source-Free Domain Adaptation (SFDA) setup, which aims to adapt a source-trained model to the target domain without accessing source data. Still, existing SFDA approaches use only image-level information for adaptation, making them sub-optimal in video applications. This paper studies SFDA for Video Semantic Segmentation (VSS), where temporal information is leveraged to address video adaptation. Specifically, we propose Spatio-Temporal Pixel-Level (STPL) contrastive learning, a novel method that takes full advantage of spatio-temporal information to tackle the absence of source data better. STPL explicitly learns semantic correlations among pixels in the spatio-temporal space, providing strong self-supervision for adaptation to the unlabeled target domain. Extensive experiments show that STPL achieves state-of-the-art performance on VSS benchmarks compared to current UDA and SFDA approaches. Code is available at: https://github.com/shaoyuanlo/STPL △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

arXiv:2302.13172 [pdf]

Deep Learning-based Multi-Organ CT Segmentation with Adversarial Data Augmentation

Authors: Shaoyan Pan, Shao-Yuan Lo, Min Huang, Chaoqiong Ma, Jacob Wynne, Tonghe Wang, Tian Liu, Xiaofeng Yang

Abstract: In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution s… ▽ More In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution statistics and improve generalization and robustness to noises. AFA-MI augmentation consists of three steps: 1) generate adversarial noises by Fast Gradient Sign Method (FGSM) on the intermediate features of the segmentation network's encoder; 2) inject the generated adversarial noises into the network, intentionally compromising performance; 3) optimize the network with both clean and adversarial features. Experiments are conducted segmenting the heart, left and right kidney, liver, left and right lung, spinal cord, and stomach. We first evaluate the AFA-MI augmentation using nnUnet and TT-Vnet on the test data from a public abdominal dataset and an institutional dataset. In addition, we validate how AFA-MI affects the networks' robustness to the noisy data by evaluating the networks with added Gaussian noises of varying magnitudes to the institutional dataset. Network performance is quantitatively evaluated using Dice Similarity Coefficient (DSC) for volume-based accuracy. Also, Hausdorff Distance (HD) is applied for surface-based accuracy. On the public dataset, nnUnet with AFA-MI achieves DSC = 0.85 and HD = 6.16 millimeters (mm); and TT-Vnet achieves DSC = 0.86 and HD = 5.62 mm. AFA-MI augmentation further improves all contour accuracies up to 0.217 DSC score when tested on images with Gaussian noises. AFA-MI augmentation is therefore demonstrated to improve segmentation performance and robustness in CT multi-organ segmentation. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: Accepted at SPIE Medical Imaging 2023

arXiv:2211.08105 [pdf, other]

Few hamiltonian cycles in graphs with one or two vertex degrees

Authors: Jan Goedgebeur, Jorik Jooken, On-Hei Solomon Lo, Ben Seamone, Carol T. Zamfirescu

Abstract: We fully disprove a conjecture of Haythorpe on the minimum number of hamiltonian cycles in regular hamiltonian graphs, thereby extending a result of Zamfirescu, as well as correct and complement Haythorpe's computational enumerative results from [Experim. Math. 27 (2018) 426-430]. Thereafter, we use the Lovász Local Lemma to extend Thomassen's independent dominating set method. Regarding the limit… ▽ More We fully disprove a conjecture of Haythorpe on the minimum number of hamiltonian cycles in regular hamiltonian graphs, thereby extending a result of Zamfirescu, as well as correct and complement Haythorpe's computational enumerative results from [Experim. Math. 27 (2018) 426-430]. Thereafter, we use the Lovász Local Lemma to extend Thomassen's independent dominating set method. Regarding the limitations of this method, we answer a question of Haxell, Seamone, and Verstraete, and settle the first open case of a problem of Thomassen. Motivated by an observation of Aldred and Thomassen, we prove that for every $κ\in \{ 2, 3 \}$ and any positive integer $k$, there are infinitely many non-regular graphs of connectivity $κ$ containing exactly one hamiltonian cycle and in which every vertex has degree $3$ or $2k$. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2208.00160 [pdf, other]

Learning Feature Decomposition for Domain Adaptive Monocular Depth Estimation

Authors: Shao-Yuan Lo, Wei Wang, Jim Thomas, Jingjing Zheng, Vishal M. Patel, Cheng-Hao Kuo

Abstract: Monocular depth estimation (MDE) has attracted intense study due to its low cost and critical functions for robotic tasks such as localization, mapping and obstacle detection. Supervised approaches have led to great success with the advance of deep learning, but they rely on large quantities of ground-truth depth annotations that are expensive to acquire. Unsupervised domain adaptation (UDA) trans… ▽ More Monocular depth estimation (MDE) has attracted intense study due to its low cost and critical functions for robotic tasks such as localization, mapping and obstacle detection. Supervised approaches have led to great success with the advance of deep learning, but they rely on large quantities of ground-truth depth annotations that are expensive to acquire. Unsupervised domain adaptation (UDA) transfers knowledge from labeled source data to unlabeled target data, so as to relax the constraint of supervised learning. However, existing UDA approaches may not completely align the domain gap across different datasets because of the domain shift problem. We believe better domain alignment can be achieved via well-designed feature decomposition. In this paper, we propose a novel UDA method for MDE, referred to as Learning Feature Decomposition for Adaptation (LFDA), which learns to decompose the feature space into content and style components. LFDA only attempts to align the content component since it has a smaller domain gap. Meanwhile, it excludes the style component which is specific to the source domain from training the primary task. Furthermore, LFDA uses separate feature distribution estimations to further bridge the domain gap. Extensive experiments on three domain adaptative MDE scenarios show that the proposed method achieves superior accuracy and lower computational cost compared to the state-of-the-art approaches. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2207.05138 [pdf, other]

Towards Personalized Healthcare in Cardiac Population: The Development of a Wearable ECG Monitoring System, an ECG Lossy Compression Schema, and a ResNet-Based AF Detector

Authors: Wei-Ying Yi, Peng-Fei Liu, Sheung-Lai Lo, Ya-Fen Chan, Yu Zhou, Yee Leung, Kam-Sang Woo, Alex Pui-Wai Lee, Jia-Min Chen, Kwong-Sak Leung

Abstract: Cardiovascular diseases (CVDs) are the number one cause of death worldwide. While there is growing evidence that the atrial fibrillation (AF) has strong associations with various CVDs, this heart arrhythmia is usually diagnosed using electrocardiography (ECG) which is a risk-free, non-intrusive, and cost-efficient tool. Continuously and remotely monitoring the subjects' ECG information unlocks the… ▽ More Cardiovascular diseases (CVDs) are the number one cause of death worldwide. While there is growing evidence that the atrial fibrillation (AF) has strong associations with various CVDs, this heart arrhythmia is usually diagnosed using electrocardiography (ECG) which is a risk-free, non-intrusive, and cost-efficient tool. Continuously and remotely monitoring the subjects' ECG information unlocks the potentials of prompt pre-diagnosis and timely pre-treatment of AF before the development of any life-threatening conditions/diseases. Ultimately, the CVDs associated mortality could be reduced. In this manuscript, the design and implementation of a personalized healthcare system embodying a wearable ECG device, a mobile application, and a back-end server are presented. This system continuously monitors the users' ECG information to provide personalized health warnings/feedbacks. The users are able to communicate with their paired health advisors through this system for remote diagnoses, interventions, etc. The implemented wearable ECG devices have been evaluated and showed excellent intra-consistency (CVRMS=5.5%), acceptable inter-consistency (CVRMS=12.1%), and negligible RR-interval errors (ARE<1.4%). To boost the battery life of the wearable devices, a lossy compression schema utilizing the quasi-periodic feature of ECG signals to achieve compression was proposed. Compared to the recognized schemata, it outperformed the others in terms of compression efficiency and distortion, and achieved at least 2x of CR at a certain PRD or RMSE for ECG signals from the MIT-BIH database. To enable automated AF diagnosis/screening in the proposed system, a ResNet-based AF detector was developed. For the ECG records from the 2017 PhysioNet CinC challenge, this AF detector obtained an average testing F1=85.10% and a best testing F1=87.31%, outperforming the state-of-the-art. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2204.13291 [pdf, other]

Decision Models for Selecting Federated Learning Architecture Patterns

Authors: Sin Kit Lo, Qinghua Lu, Hye-Young Paik, Liming Zhu

Abstract: Federated machine learning is growing fast in academia and industries as a solution to solve data hungriness and privacy issues in machine learning. Being a widely distributed system, federated machine learning requires various system design thinking. To better design a federated machine learning system, researchers have introduced multiple patterns and tactics that cover various system design asp… ▽ More Federated machine learning is growing fast in academia and industries as a solution to solve data hungriness and privacy issues in machine learning. Being a widely distributed system, federated machine learning requires various system design thinking. To better design a federated machine learning system, researchers have introduced multiple patterns and tactics that cover various system design aspects. However, the multitude of patterns leaves the designers confused about when and which pattern to adopt. In this paper, we present a set of decision models for the selection of patterns for federated machine learning architecture design based on a systematic literature review on federated machine learning, to assist designers and architects who have limited knowledge of federated machine learning. Each decision model maps functional and non-functional requirements of federated machine learning systems to a set of patterns. We also clarify the drawbacks of the patterns. We evaluated the decision models by mapping the decision patterns to concrete federated machine learning architectures by big tech firms to assess the models' correctness and usefulness. The evaluation results indicate that the proposed decision models are able to bring structure to the federated machine learning architecture design process and help explicitly articulate the design rationale. △ Less

Submitted 27 April, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2202.09300 [pdf, other]

Exploring Adversarially Robust Training for Unsupervised Domain Adaptation

Authors: Shao-Yuan Lo, Vishal M. Patel

Abstract: Unsupervised Domain Adaptation (UDA) methods aim to transfer knowledge from a labeled source domain to an unlabeled target domain. UDA has been extensively studied in the computer vision literature. Deep networks have been shown to be vulnerable to adversarial attacks. However, very little focus is devoted to improving the adversarial robustness of deep UDA models, causing serious concerns about m… ▽ More Unsupervised Domain Adaptation (UDA) methods aim to transfer knowledge from a labeled source domain to an unlabeled target domain. UDA has been extensively studied in the computer vision literature. Deep networks have been shown to be vulnerable to adversarial attacks. However, very little focus is devoted to improving the adversarial robustness of deep UDA models, causing serious concerns about model reliability. Adversarial Training (AT) has been considered to be the most successful adversarial defense approach. Nevertheless, conventional AT requires ground-truth labels to generate adversarial examples and train models, which limits its effectiveness in the unlabeled target domain. In this paper, we aim to explore AT to robustify UDA models: How to enhance the unlabeled data robustness via AT while learning domain-invariant features for UDA? To answer this question, we provide a systematic study into multiple AT variants that can potentially be applied to UDA. Moreover, we propose a novel Adversarially Robust Training method for UDA accordingly, referred to as ARTUDA. Extensive experiments on multiple adversarial attacks and UDA benchmarks show that ARTUDA consistently improves the adversarial robustness of UDA models. Code is available at https://github.com/shaoyuanlo/ARTUDA △ Less

Submitted 4 October, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted at Asian Conference on Computer Vision (ACCV) 2022

arXiv:2202.07983 [pdf, other]

doi 10.1109/TMI.2022.3172773

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Authors: Huihui Fang, Fei Li, Huazhu Fu, Xu Sun, Xingxing Cao, Fengbin Lin, Jaemin Son, Sunho Kim, Gwenole Quellec, Sarah Matta, Sharath M Shankaranarayana, Yi-Ting Chen, Chuen-heng Wang, Nisarg A. Shah, Chia-Yen Lee, Chih-Chung Hsu, Hai Xie, Baiying Lei, Ujjwal Baid, Shubham Innani, Kang Dang, Wenxiu Shi, Ravi Kamble, Nitin Singhal, Ching-Wei Wang , et al. (6 additional authors not shown)

Abstract: Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently develo… ▽ More Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently developed for automatically detecting AMD from fundus images. However, there are still lack of a comprehensive annotated dataset and standard evaluation benchmarks. To deal with this issue, we set up the Automatic Detection challenge on Age-related Macular degeneration (ADAM), which was held as a satellite event of the ISBI 2020 conference. The ADAM challenge consisted of four tasks which cover the main aspects of detecting and characterizing AMD from fundus images, including detection of AMD, detection and segmentation of optic disc, localization of fovea, and detection and segmentation of lesions. As part of the challenge, we have released a comprehensive dataset of 1200 fundus images with AMD diagnostic labels, pixel-wise segmentation masks for both optic disc and AMD-related lesions (drusen, exudates, hemorrhages and scars, among others), as well as the coordinates corresponding to the location of the macular fovea. A uniform evaluation framework has been built to make a fair comparison of different models using this dataset. During the challenge, 610 results were submitted for online evaluation, with 11 teams finally participating in the onsite challenge. This paper introduces the challenge, the dataset and the evaluation methods, as well as summarizes the participating methods and analyzes their results for each task. In particular, we observed that the ensembling strategy and the incorporation of clinical domain knowledge were the key to improve the performance of the deep learning models. △ Less

Submitted 6 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: 31 pages, 17 figures

arXiv:2112.02997 [pdf, other]

Language Semantics Interpretation with an Interaction-based Recurrent Neural Networks

Authors: Shaw-Hwa Lo, Yiqiao Yin

Abstract: Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models is capable making good predictions yet there is lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm called Backward Dropping Algorithm (BDA), and a novel feature engineering technique cal… ▽ More Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models is capable making good predictions yet there is lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the "dagger technique". First, the paper proposes a novel influence score (I-score) to detect and search for the important language semantics in text document that are useful for making good prediction in text classification tasks. Next, a greedy search algorithm called the Backward Dropping Algorithm is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the "dagger technique" that fully preserve the relationship between explanatory variable and response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction comparing with other popular peers if I-score and "dagger technique" are not implemented. △ Less

Submitted 1 November, 2021; originally announced December 2021.

arXiv:2111.04096 [pdf, other]

Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

Authors: Shing Yan Loo, Moein Shakeri, Sai Hong Tang, Syamsiah Mashohor, Hong Zhang

Abstract: The ability of accurate depth prediction by a convolutional neural network (CNN) is a major challenge for its wide use in practical visual simultaneous localization and mapping (SLAM) applications, such as enhanced camera tracking and dense mapping. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is… ▽ More The ability of accurate depth prediction by a convolutional neural network (CNN) is a major challenge for its wide use in practical visual simultaneous localization and mapping (SLAM) applications, such as enhanced camera tracking and dense mapping. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is not trained for the current operating environment in order to benefit the SLAM performance? To this end, we propose a novel online adaptation framework consisting of two complementary processes: a SLAM algorithm that is used to generate keyframes to fine-tune the depth prediction and another algorithm that uses the online adapted depth to improve map quality. Once the potential noisy map points are removed, we perform global photometric bundle adjustment (BA) to improve the overall SLAM performance. Experimental results on both benchmark datasets and a real robot in our own experimental environments show that our proposed method improves the overall SLAM accuracy. While regularization has been shown to be effective in multi-task classification problems, we present experimental results and an ablation study to show the effectiveness of regularization in preventing catastrophic forgetting in the online adaptation of depth prediction, a single-task regression problem. In addition, we compare our online adaptation framework against the state-of-the-art pre-trained depth prediction CNNs to show that our online adapted depth prediction CNN outperforms the depth prediction CNNs that have been trained on a large collection of datasets. △ Less

Submitted 1 February, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: 11 pages, 6 figures

arXiv:2108.11168 [pdf, other]

Adversarially Robust One-class Novelty Detection

Authors: Shao-Yuan Lo, Poojan Oza, Vishal M. Patel

Abstract: One-class novelty detectors are trained with examples of a particular class and are tasked with identifying whether a query example belongs to the same known class. Most recent advances adopt a deep auto-encoder style architecture to compute novelty scores for detecting novel class data. Deep networks have shown to be vulnerable to adversarial attacks, yet little focus is devoted to studying the a… ▽ More One-class novelty detectors are trained with examples of a particular class and are tasked with identifying whether a query example belongs to the same known class. Most recent advances adopt a deep auto-encoder style architecture to compute novelty scores for detecting novel class data. Deep networks have shown to be vulnerable to adversarial attacks, yet little focus is devoted to studying the adversarial robustness of deep novelty detectors. In this paper, we first show that existing novelty detectors are susceptible to adversarial examples. We further demonstrate that commonly-used defense approaches for classification tasks have limited effectiveness in one-class novelty detection. Hence, we need a defense specifically designed for novelty detection. To this end, we propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples. The proposed method, referred to as Principal Latent Space (PrincipaLS), learns the incrementally-trained cascade principal components in the latent space to robustify novelty detectors. PrincipaLS can purify latent space against adversarial examples and constrain latent space to exclusively model the known class distribution. We conduct extensive experiments on eight attacks, five datasets and seven novelty detectors, showing that PrincipaLS consistently enhances the adversarial robustness of novelty detection models. Code is available at https://github.com/shaoyuanlo/PrincipaLS △ Less

Submitted 3 October, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2022

arXiv:2108.06912 [pdf, other]

Blockchain-based Trustworthy Federated Learning Architecture

Authors: Sin Kit Lo, Yue Liu, Qinghua Lu, Chen Wang, Xiwei Xu, Hye-Young Paik, Liming Zhu

Abstract: Federated learning is an emerging privacy-preserving AI technique where clients (i.e., organisations or devices) train models locally and formulate a global model based on the local model updates without transferring local data externally. However, federated learning systems struggle to achieve trustworthiness and embody responsible AI principles. In particular, federated learning systems face acc… ▽ More Federated learning is an emerging privacy-preserving AI technique where clients (i.e., organisations or devices) train models locally and formulate a global model based on the local model updates without transferring local data externally. However, federated learning systems struggle to achieve trustworthiness and embody responsible AI principles. In particular, federated learning systems face accountability and fairness challenges due to multi-stakeholder involvement and heterogeneity in client data distribution. To enhance the accountability and fairness of federated learning systems, we present a blockchain-based trustworthy federated learning architecture. We first design a smart contract-based data-model provenance registry to enable accountability. Additionally, we propose a weighted fair data sampler algorithm to enhance fairness in training data. We evaluate the proposed approach using a COVID-19 X-ray detection use case. The evaluation results show that the approach is feasible to enable accountability and improve fairness. The proposed algorithm can achieve better performance than the default federated learning setting in terms of the model's generalisation and accuracy. △ Less

Submitted 28 October, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

arXiv:2106.11570 [pdf, other]

FLRA: A Reference Architecture for Federated Learning Systems

Authors: Sin Kit Lo, Qinghua Lu, Hye-Young Paik, Liming Zhu

Abstract: Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system… ▽ More Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system requires both software system design thinking and machine learning knowledge. Although much effort has been put into federated learning from the machine learning perspectives, our previous systematic literature review on the area shows that there is a distinct lack of considerations for software architecture design for federated learning. In this paper, we propose FLRA, a reference architecture for federated learning systems, which provides a template design for federated learning-based solutions. The proposed FLRA reference architecture is based on an extensive review of existing patterns of federated learning systems found in the literature and existing industrial implementation. The FLRA reference architecture consists of a pool of architectural patterns that could address the frequently recurring design problems in federated learning architectures. The FLRA reference architecture can serve as a design guideline to assist architects and developers with practical solutions for their problems, which can be further customised. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: Accepted by ECSA 2021

arXiv:2106.06911 [pdf, other]

doi 10.3390/a14110337

An Interaction-based Convolutional Neural Network (ICNN) Towards Better Understanding of COVID-19 X-ray Images

Authors: Shaw-Hwa Lo, Yiqiao Yin

Abstract: The field of Explainable Artificial Intelligence (XAI) aims to build explainable and interpretable machine learning (or deep learning) methods without sacrificing prediction performance. Convolutional Neural Networks (CNNs) have been successful in making predictions, especially in image classification. However, these famous deep learning models use tens of millions of parameters based on a large n… ▽ More The field of Explainable Artificial Intelligence (XAI) aims to build explainable and interpretable machine learning (or deep learning) methods without sacrificing prediction performance. Convolutional Neural Networks (CNNs) have been successful in making predictions, especially in image classification. However, these famous deep learning models use tens of millions of parameters based on a large number of pre-trained filters which have been repurposed from previous data sets. We propose a novel Interaction-based Convolutional Neural Network (ICNN) that does not make assumptions about the relevance of local information. Instead, we use a model-free Influence Score (I-score) to directly extract the influential information from images to form important variable modules. We demonstrate that the proposed method produces state-of-the-art prediction performance of 99.8% on a real-world data set classifying COVID-19 Chest X-ray images without sacrificing the explanatory power of the model. This proposed design can efficiently screen COVID-19 patients before human diagnosis, and will be the benchmark for addressing future XAI problems in large-scale data sets. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Report number: 337

Journal ref: Algorithms 2021

arXiv:2104.12672 [pdf, other]

doi 10.3390/a14110337

A Novel Interaction-based Methodology Towards Explainable AI with Better Understanding of Pneumonia Chest X-ray Images

Authors: Shaw-Hwa Lo, Yiqiao Yin

Abstract: In the field of eXplainable AI (XAI), robust "blackbox" algorithms such as Convolutional Neural Networks (CNNs) are known for making high prediction performance. However, the ability to explain and interpret these algorithms still require innovation in the understanding of influential and, more importantly, explainable features that directly or indirectly impact the performance of predictivity. A… ▽ More In the field of eXplainable AI (XAI), robust "blackbox" algorithms such as Convolutional Neural Networks (CNNs) are known for making high prediction performance. However, the ability to explain and interpret these algorithms still require innovation in the understanding of influential and, more importantly, explainable features that directly or indirectly impact the performance of predictivity. A number of methods existing in literature focus on visualization techniques but the concepts of explainability and interpretability still require rigorous definition. In view of the above needs, this paper proposes an interaction-based methodology -- Influence Score (I-score) -- to screen out the noisy and non-informative variables in the images hence it nourishes an environment with explainable and interpretable features that are directly associated to feature predictivity. We apply the proposed method on a real world application in Pneumonia Chest X-ray Image data set and produced state-of-the-art results. We demonstrate how to apply the proposed approach for more general big data problems by improving the explainability and interpretability without sacrificing the prediction performance. The contribution of this paper opens a novel angle that moves the community closer to the future pipelines of XAI problems. △ Less

Submitted 15 June, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Journal ref: Algorithms 2021, 14(11), 337

arXiv:2103.05927 [pdf]

Deep Sensing of Urban Waterlogging

Authors: Shi-Wei Lo, Jyh-Horng Wu, Jo-Yu Chang, Chien-Hao Tseng, Meng-Wei Lin, Fang-Pang Lin

Abstract: In the monsoon season, sudden flood events occur frequently in urban areas, which hamper the social and economic activities and may threaten the infrastructure and lives. The use of an efficient large-scale waterlogging sensing and information system can provide valuable real-time disaster information to facilitate disaster management and enhance awareness of the general public to alleviate losses… ▽ More In the monsoon season, sudden flood events occur frequently in urban areas, which hamper the social and economic activities and may threaten the infrastructure and lives. The use of an efficient large-scale waterlogging sensing and information system can provide valuable real-time disaster information to facilitate disaster management and enhance awareness of the general public to alleviate losses during and after flood disasters. Therefore, in this study, a visual sensing approach driven by deep neural networks and information and communication technology was developed to provide an end-to-end mechanism to realize waterlogging sensing and event-location mapping. The use of a deep sensing system in the monsoon season in Taiwan was demonstrated, and waterlogging events were predicted on the island-wide scale. The system could sense approximately 2379 vision sources through an internet of video things framework and transmit the event-location information in 5 min. The proposed approach can sense waterlogging events at a national scale and provide an efficient and highly scalable alternative to conventional waterlogging sensing methods. △ Less

Submitted 15 August, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 19 pages, 14 figures, under submitting and patenting

Report number: revise-2021-05-25

arXiv:2102.05212 [pdf, other]

Polarimetric Monocular Dense Mapping Using Relative Deep Depth Prior

Authors: Moein Shakeri, Shing Yan Loo, Hong Zhang

Abstract: This paper is concerned with polarimetric dense map reconstruction based on a polarization camera with the help of relative depth information as a prior. In general, polarization imaging is able to reveal information about surface normal such as azimuth and zenith angles, which can support the development of solutions to the problem of dense reconstruction, especially in texture-poor regions. Howe… ▽ More This paper is concerned with polarimetric dense map reconstruction based on a polarization camera with the help of relative depth information as a prior. In general, polarization imaging is able to reveal information about surface normal such as azimuth and zenith angles, which can support the development of solutions to the problem of dense reconstruction, especially in texture-poor regions. However, polarimetric shape cues are ambiguous due to two types of polarized reflection (specular/diffuse). Although methods have been proposed to address this issue, they either are offline and therefore not practical in robotics applications, or use incomplete polarimetric cues, leading to sub-optimal performance. In this paper, we propose an online reconstruction method that uses full polarimetric cues available from the polarization camera. With our online method, we can propagate sparse depth values both along and perpendicular to iso-depth contours. Through comprehensive experiments on challenging image sequences, we demonstrate that our method is able to significantly improve the accuracy of the depthmap as well as increase its density, specially in regions of poor texture. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 9 pages, 9 figure

arXiv:2101.09451 [pdf, other]

Error Diffusion Halftoning Against Adversarial Examples

Authors: Shao-Yuan Lo, Vishal M. Patel

Abstract: Adversarial examples contain carefully crafted perturbations that can fool deep neural networks (DNNs) into making wrong predictions. Enhancing the adversarial robustness of DNNs has gained considerable interest in recent years. Although image transformation-based defenses were widely considered at an earlier time, most of them have been defeated by adaptive attacks. In this paper, we propose a ne… ▽ More Adversarial examples contain carefully crafted perturbations that can fool deep neural networks (DNNs) into making wrong predictions. Enhancing the adversarial robustness of DNNs has gained considerable interest in recent years. Although image transformation-based defenses were widely considered at an earlier time, most of them have been defeated by adaptive attacks. In this paper, we propose a new image transformation defense based on error diffusion halftoning, and combine it with adversarial training to defend against adversarial examples. Error diffusion halftoning projects an image into a 1-bit space and diffuses quantization error to neighboring pixels. This process can remove adversarial perturbations from a given image while maintaining acceptable image quality in the meantime in favor of recognition. Experimental results demonstrate that the proposed method is able to improve adversarial robustness even under advanced adaptive attacks, while most of the other image transformation-based defenses do not. We show that a proper image transformation can still be an effective defense approach. Code: https://github.com/shaoyuanlo/Halftoning-Defense △ Less

Submitted 24 July, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: Accepted at IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2101.02373 [pdf, other]

Architectural Patterns for the Design of Federated Learning Systems

Authors: Sin Kit Lo, Qinghua Lu, Liming Zhu, Hye-young Paik, Xiwei Xu, Chen Wang

Abstract: Federated learning has received fast-growing interests from academia and industry to tackle the challenges of data hungriness and privacy in machine learning. A federated learning system can be viewed as a large-scale distributed system with different components and stakeholders as numerous client devices participate in federated learning. Designing a federated learning system requires software sy… ▽ More Federated learning has received fast-growing interests from academia and industry to tackle the challenges of data hungriness and privacy in machine learning. A federated learning system can be viewed as a large-scale distributed system with different components and stakeholders as numerous client devices participate in federated learning. Designing a federated learning system requires software system design thinking apart from machine learning knowledge. Although much effort has been put into federated learning from the machine learning technique aspects, the software architecture design concerns in building federated learning systems have been largely ignored. Therefore, in this paper, we present a collection of architectural patterns to deal with the design challenges of federated learning systems. Architectural patterns present reusable solutions to a commonly occurring problem within a given context during software architecture design. The presented patterns are based on the results of a systematic literature review and include three client management patterns, four model management patterns, three model training patterns, and four model aggregation patterns. The patterns are associated to the particular state transitions in a federated learning model lifecycle, serving as a guidance for effective use of the patterns in the design of federated learning systems. △ Less

Submitted 18 June, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: Resubmitted after minor revision to Elsevier's Journal of Systems and Software, Special issue on Software Architecture and Artificial Intelligence

arXiv:2012.04262 [pdf, other]

Overcomplete Representations Against Adversarial Videos

Authors: Shao-Yuan Lo, Jeya Maria Jose Valanarasu, Vishal M. Patel

Abstract: Adversarial robustness of deep neural networks is an extensively studied problem in the literature and various methods have been proposed to defend against adversarial images. However, only a handful of defense methods have been developed for defending against attacked videos. In this paper, we propose a novel Over-and-Under complete restoration network for Defending against adversarial videos (OU… ▽ More Adversarial robustness of deep neural networks is an extensively studied problem in the literature and various methods have been proposed to defend against adversarial images. However, only a handful of defense methods have been developed for defending against attacked videos. In this paper, we propose a novel Over-and-Under complete restoration network for Defending against adversarial videos (OUDefend). Most restoration networks adopt an encoder-decoder architecture that first shrinks spatial dimension then expands it back. This approach learns undercomplete representations, which have large receptive fields to collect global information but overlooks local details. On the other hand, overcomplete representations have opposite properties. Hence, OUDefend is designed to balance local and global features by learning those two representations. We attach OUDefend to target video recognition models as a feature restoration block and train the entire network end-to-end. Experimental results show that the defenses focusing on images may be ineffective to videos, while OUDefend enhances robustness against different types of adversarial videos, ranging from additive attacks, multiplicative attacks to physically realizable attacks. Code: https://github.com/shaoyuanlo/OUDefend △ Less

Submitted 14 June, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Accepted at IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2009.10401 [pdf, other]

Dynamic Fusion based Federated Learning for COVID-19 Detection

Authors: Weishan Zhang, Tao Zhou, Qinghua Lu, Xiao Wang, Chunsheng Zhu, Haoyun Sun, Zhipeng Wang, Sin Kit Lo, Fei-Yue Wang

Abstract: Medical diagnostic image analysis (e.g., CT scan or X-Ray) using machine learning is an efficient and accurate way to detect COVID-19 infections. However, sharing diagnostic images across medical institutions is usually not allowed due to the concern of patients' privacy. This causes the issue of insufficient datasets for training the image classification model. Federated learning is an emerging p… ▽ More Medical diagnostic image analysis (e.g., CT scan or X-Ray) using machine learning is an efficient and accurate way to detect COVID-19 infections. However, sharing diagnostic images across medical institutions is usually not allowed due to the concern of patients' privacy. This causes the issue of insufficient datasets for training the image classification model. Federated learning is an emerging privacy-preserving machine learning paradigm that produces an unbiased global model based on the received updates of local models trained by clients without exchanging clients' local data. Nevertheless, the default setting of federated learning introduces huge communication cost of transferring model updates and can hardly ensure model performance when data heterogeneity of clients heavily exists. To improve communication efficiency and model performance, in this paper, we propose a novel dynamic fusion-based federated learning approach for medical diagnostic image analysis to detect COVID-19 infections. First, we design an architecture for dynamic fusion-based federated learning systems to analyse medical diagnostic images. Further, we present a dynamic fusion method to dynamically decide the participating clients according to their local model performance and schedule the model fusion-based on participating clients' training time. In addition, we summarise a category of medical diagnostic image datasets for COVID-19 detection, which can be used by the machine learning community for image analysis. The evaluation results show that the proposed approach is feasible and performs better than the default setting of federated learning in terms of model performance, communication efficiency and fault tolerance. △ Less

Submitted 25 October, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

arXiv:2009.08058 [pdf, other]

MultAV: Multiplicative Adversarial Videos

Authors: Shao-Yuan Lo, Vishal M. Patel

Abstract: The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which impos… ▽ More The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which imposes perturbation on video data by multiplication. MultAV has different noise distributions to the additive counterparts and thus challenges the defense methods tailored to resisting additive adversarial attacks. Moreover, it can be generalized to not only Lp-norm attacks with a new adversary constraint called ratio bound, but also different types of physically realizable attacks. Experimental results show that the model adversarially trained against additive attack is less robust to MultAV. △ Less

Submitted 10 October, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted at IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 2021

arXiv:2009.05244 [pdf, other]

Defending Against Multiple and Unforeseen Adversarial Videos

Authors: Shao-Yuan Lo, Vishal M. Patel

Abstract: Adversarial robustness of deep neural networks has been actively investigated. However, most existing defense approaches are limited to a specific type of adversarial perturbations. Specifically, they often fail to offer resistance to multiple attack types simultaneously, i.e., they lack multi-perturbation robustness. Furthermore, compared to image recognition problems, the adversarial robustness… ▽ More Adversarial robustness of deep neural networks has been actively investigated. However, most existing defense approaches are limited to a specific type of adversarial perturbations. Specifically, they often fail to offer resistance to multiple attack types simultaneously, i.e., they lack multi-perturbation robustness. Furthermore, compared to image recognition problems, the adversarial robustness of video recognition models is relatively unexplored. While several studies have proposed how to generate adversarial videos, only a handful of approaches about defense strategies have been published in the literature. In this paper, we propose one of the first defense strategies against multiple types of adversarial videos for video recognition. The proposed method, referred to as MultiBN, performs adversarial training on multiple adversarial video types using multiple independent batch normalization (BN) layers with a learning-based BN selection module. With a multiple BN structure, each BN brach is responsible for learning the distribution of a single perturbation type and thus provides more precise distribution estimations. This mechanism benefits dealing with multiple perturbation types. The BN selection module detects the attack type of an input video and sends it to the corresponding BN branch, making MultiBN fully automatic and allowing end-to-end training. Compared to present adversarial training approaches, the proposed MultiBN exhibits stronger multi-perturbation robustness against different and even unforeseen adversarial video types, ranging from Lp-bounded attacks and physically realizable attacks. This holds true on different datasets and target models. Moreover, we conduct an extensive analysis to study the properties of the multiple BN structure. △ Less

Submitted 14 December, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: Accepted in IEEE Transactions on Image Processing (TIP)

arXiv:2009.02643 [pdf, other]

Blockchain-based Federated Learning for Device Failure Detection in Industrial IoT

Authors: Weishan Zhang, Qinghua Lu, Qiuyu Yu, Zhaotong Li, Yue Liu, Sin Kit Lo, Shiping Chen, Xiwei Xu, Liming Zhu

Abstract: Device failure detection is one of most essential problems in industrial internet of things (IIoT). However, in conventional IIoT device failure detection, client devices need to upload raw data to the central server for model training, which might lead to disclosure of sensitive business data. Therefore, in this paper, to ensure client data privacy, we propose a blockchain-based federated learnin… ▽ More Device failure detection is one of most essential problems in industrial internet of things (IIoT). However, in conventional IIoT device failure detection, client devices need to upload raw data to the central server for model training, which might lead to disclosure of sensitive business data. Therefore, in this paper, to ensure client data privacy, we propose a blockchain-based federated learning approach for device failure detection in IIoT. First, we present a platform architecture of blockchain-based federated learning systems for failure detection in IIoT, which enables verifiable integrity of client data. In the architecture, each client periodically creates a Merkle tree in which each leaf node represents a client data record, and stores the tree root on a blockchain. Further, to address the data heterogeneity issue in IIoT failure detection, we propose a novel centroid distance weighted federated averaging (CDW\_FedAvg) algorithm taking into account the distance between positive class and negative class of each client dataset. In addition, to motivate clients to participate in federated learning, a smart contact based incentive mechanism is designed depending on the size and the centroid distance of client data used in local model training. A prototype of the proposed architecture is implemented with our industry partner, and evaluated in terms of feasibility, accuracy and performance. The results show that the approach is feasible, and has satisfactory accuracy and performance. △ Less

Submitted 18 October, 2020; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: accepted by IEEE Internet of Things Journal

arXiv:2007.11354 [pdf, other]

doi 10.1145/3450288

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective

Authors: Sin Kit Lo, Qinghua Lu, Chen Wang, Hye-Young Paik, Liming Zhu

Abstract: Federated learning is an emerging machine learning paradigm where clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning and explore how to develop federated learning systems, we perform a systematic literature review from a software engineering perspective, based on 231 primary studies. Our data synthesis… ▽ More Federated learning is an emerging machine learning paradigm where clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning and explore how to develop federated learning systems, we perform a systematic literature review from a software engineering perspective, based on 231 primary studies. Our data synthesis covers the lifecycle of federated learning system development that includes background understanding, requirement analysis, architecture design, implementation, and evaluation. We highlight and summarise the findings from the results, and identify future trends to encourage researchers to advance their current work. △ Less

Submitted 28 May, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: Published on ACM Computing Survey. Latest version available here: https://dl.acm.org/doi/10.1145/3450288

Journal ref: ACM.CSUR.54.95 (2021) 1-39

arXiv:2006.04047 [pdf, ps, other]

DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative Depth Prediction

Authors: Shing Yan Loo, Syamsiah Mashohor, Sai Hong Tang, Hong Zhang

Abstract: In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the se… ▽ More In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the semi-dense depth maps, we propose an adaptive filtering scheme, which is a structure-preserving weighted average smoothing filter that takes into account the pixel intensity and depth of the neighbouring pixels, yielding substantial reconstruction accuracy gain in densification. To perform densification, we introduce two incremental improvements upon the energy minimization framework proposed by DeepFusion: (1) an improved cost function, and (2) the use of single-image relative depth prediction. After densification, we update the keyframes with two-view consistent optimized semi-dense and dense depth maps to improve pose-graph optimization, providing a feedback loop to refine the keyframe poses for accurate scene reconstruction. Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin. △ Less

Submitted 9 July, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: Accepted to be published in the Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:1907.10015 [pdf]

Exploring Semantic Segmentation on the DCT Representation

Authors: Shao-Yuan Lo, Hsueh-Ming Hang

Abstract: Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred in… ▽ More Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred input type, then we tailor an existing network to the DCT inputs. The proposed method has an accuracy close to the RGB model at about the same network complexity. Moreover, we investigate the impact of selecting different DCT components on segmentation performance. With a proper selection, one can achieve the same level accuracy using only 36% of the DCT coefficients. We further show the robustness of our method under the quantization errors. To our knowledge, this paper is the first to explore semantic segmentation on the DCT representation. △ Less

Submitted 29 December, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: Accepted in ACM International Conference on Multimedia in Asia (MMAsia) 2019

arXiv:1907.09438 [pdf]

Multi-Class Lane Semantic Segmentation using Efficient Convolutional Networks

Authors: Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, Jing-Jhih Lin

Abstract: Lane detection plays an important role in a self-driving vehicle. Several studies leverage a semantic segmentation network to extract robust lane features, but few of them can distinguish different types of lanes. In this paper, we focus on the problem of multi-class lane semantic segmentation. Based on the observation that the lane is a small-size and narrow-width object in a road scene image, we… ▽ More Lane detection plays an important role in a self-driving vehicle. Several studies leverage a semantic segmentation network to extract robust lane features, but few of them can distinguish different types of lanes. In this paper, we focus on the problem of multi-class lane semantic segmentation. Based on the observation that the lane is a small-size and narrow-width object in a road scene image, we propose two techniques, Feature Size Selection (FSS) and Degressive Dilation Block (DD Block). The FSS allows a network to extract thin lane features using appropriate feature sizes. To acquire fine-grained spatial information, the DD Block is made of a series of dilated convolutions with degressive dilation rates. Experimental results show that the proposed techniques provide obvious improvement in accuracy, while they achieve the same or faster inference speed compared to the baseline system, and can run at real-time on high-resolution images. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: Accepted in IEEE International Workshop on Multimedia Signal Processing (MMSP) 2019

arXiv:1905.07542 [pdf, other]

Semi-Supervised Monocular Depth Estimation with Left-Right Consistency Using Deep Neural Network

Authors: Ali Jahani Amiri, Shing Yan Loo, Hong Zhang

Abstract: There has been tremendous research progress in estimating the depth of a scene from a monocular camera image. Existing methods for single-image depth prediction are exclusively based on deep neural networks, and their training can be unsupervised using stereo image pairs, supervised using LiDAR point clouds, or semi-supervised using both stereo and LiDAR. In general, semi-supervised training is pr… ▽ More There has been tremendous research progress in estimating the depth of a scene from a monocular camera image. Existing methods for single-image depth prediction are exclusively based on deep neural networks, and their training can be unsupervised using stereo image pairs, supervised using LiDAR point clouds, or semi-supervised using both stereo and LiDAR. In general, semi-supervised training is preferred as it does not suffer from the weaknesses of either supervised training, resulting from the difference in the cameras and the LiDARs field of view, or unsupervised training, resulting from the poor depth accuracy that can be recovered from a stereo pair. In this paper, we present our research in single image depth prediction using semi-supervised training that outperforms the state-of-the-art. We achieve this through a loss function that explicitly exploits left-right consistency in a stereo reconstruction, which has not been adopted in previous semi-supervised training. In addition, we describe the correct use of ground truth depth derived from LiDAR that can significantly reduce prediction error. The performance of our depth prediction model is evaluated on popular datasets, and the importance of each aspect of our semi-supervised training approach is demonstrated through experimental results. Our deep neural network model has been made publicly available. △ Less

Submitted 18 May, 2019; originally announced May 2019.

Comments: Submitted to IROS2019

arXiv:1902.06484 [pdf, other]

Find Subtrees of Specified Weight and Cycles of Specified Length in Linear Time

Authors: On-Hei Solomon Lo

Abstract: We apply the Euler tour technique to find subtrees of specified weight as follows. Let $k, g, N_1, N_2 \in \mathbb{N}$ such that $1 \leq k \leq N_2$, $g + h > 2$ and $2k - 4g - h + 3 \leq N_2 \leq 2k + g + h - 2$, where $h := 2N_1 - N_2$. Let $T$ be a tree of $N_1$ vertices and let $c : V(T) \rightarrow \mathbb{N}$ be vertex weights such that $c(T) := \sum_{v \in V(T)} c(v) = N_2$ and… ▽ More We apply the Euler tour technique to find subtrees of specified weight as follows. Let $k, g, N_1, N_2 \in \mathbb{N}$ such that $1 \leq k \leq N_2$, $g + h > 2$ and $2k - 4g - h + 3 \leq N_2 \leq 2k + g + h - 2$, where $h := 2N_1 - N_2$. Let $T$ be a tree of $N_1$ vertices and let $c : V(T) \rightarrow \mathbb{N}$ be vertex weights such that $c(T) := \sum_{v \in V(T)} c(v) = N_2$ and $c(v) \leq k$ for all $v \in V(T)$. We prove that a subtree $S$ of $T$ of weight $k - g + 1 \leq c(S) \leq k$ exists and can be found in linear time. We apply it to show, among others, the following: (i) Every planar hamiltonian graph $G = (V(G), E(G))$ with minimum degree $δ\geq 4$ has a cycle of length $k$ for every $k \in \{\lfloor \frac{|V(G)|}{2} \rfloor, \dots, \lceil \frac{|V(G)|}{2} \rceil + 3\}$ with $3 \leq k \leq |V(G)|$. (ii) Every $3$-connected planar hamiltonian graph $G$ with $δ\geq 4$ and $|V(G)| \geq 8$ even has a cycle of length $\frac{|V(G)|}{2} - 1$ or $\frac{|V(G)|}{2} - 2$. Each of these cycles can be found in linear time if a Hamilton cycle of the graph is given. This work was partially motivated by conjectures of Bondy and Malkevitch on cycle spectra of 4-connected planar graphs. △ Less

Submitted 12 October, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

arXiv:1810.03865 [pdf, other]

Compact Cactus Representations of all Non-Trivial Min-Cuts

Authors: On-Hei Solomon Lo, Jens M. Schmidt, Mikkel Thorup

Abstract: Recently, Kawarabayashi and Thorup presented the first deterministic edge-connectivity recognition algorithm in near-linear time. A crucial step in their algorithm uses the existence of vertex subsets of a simple graph $G$ on $n$ vertices whose contractions leave a multigraph with $\tilde{O}(n/δ)$ vertices and $\tilde{O}(n)$ edges that preserves all non-trivial min-cuts of $G$, where $δ$ is the mi… ▽ More Recently, Kawarabayashi and Thorup presented the first deterministic edge-connectivity recognition algorithm in near-linear time. A crucial step in their algorithm uses the existence of vertex subsets of a simple graph $G$ on $n$ vertices whose contractions leave a multigraph with $\tilde{O}(n/δ)$ vertices and $\tilde{O}(n)$ edges that preserves all non-trivial min-cuts of $G$, where $δ$ is the minimum degree of $G$ and $\tilde{O}$ hides logarithmic factors. We present a simple argument that improves this contraction-based sparsifier by eliminating the poly-logarithmic factors, that is, we show a contraction-based sparsification that leaves $O(n/δ)$ vertices and $O(n)$ edges, preserves all non-trivial min-cuts and can be computed in near-linear time $\tilde{O}(m)$, where $m$ is the number of edges of $G$. We also obtain that every simple graph has $O((n/δ)^2)$ non-trivial min-cuts. Our approach allows to represent all non-trivial min-cuts of a graph by a cactus representation, whose cactus graph has $O(n/δ)$ vertices. Moreover, this cactus representation can be derived directly from the standard cactus representation of all min-cuts in linear time. We apply this compact structure to show that all min-cuts can be explicitly listed in $\tilde{O}(m) + O(n^2 / δ)$ time for every simple graph, which improves the previous best time bound $O(nm)$ given by Gusfield and Naor. △ Less

Submitted 28 October, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: 12 pages, 3 figures

arXiv:1810.01011 [pdf, ps, other]

CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction

Authors: Shing Yan Loo, Ali Jahani Amiri, Syamsiah Mashohor, Sai Hong Tang, Hong Zhang

Abstract: Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of p… ▽ More Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of probabilistic mapping method. This paper improves the SVO mapping by initializing the mean and the variance of the depth at a feature location according to the depth prediction from a single-image depth prediction network. By significantly reducing the depth uncertainty of the initialized map point (i.e., small variance centred about the depth prediction), the benefits are twofold: reliable feature correspondence between views and fast convergence to the true depth in order to create new map points. We evaluate our method with two outdoor datasets: KITTI dataset and Oxford Robotcar dataset. The experimental results indicate that the improved SVO mapping results in increased robustness and camera tracking accuracy. △ Less

Submitted 1 October, 2018; originally announced October 2018.

Comments: 6 pages, 5 figures, submitted to ICRA 2019

arXiv:1809.09077 [pdf]

Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation

Authors: Shang-Wei Hung, Shao-Yuan Lo, Hsueh-Ming Hang

Abstract: Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we… ▽ More Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we propose a novel solution named LDFNet, which incorporates Luminance, Depth and Color information by a fusion-based network. It includes a sub-network to process depth maps and employs luminance images to assist the depth information in processes. LDFNet outperforms the other state-of-art systems on the Cityscapes dataset, and its inference speed is faster than most of the existing networks. The experimental results show the effectiveness of the proposed multi-modal fusion network and its potential for practical applications. △ Less

Submitted 19 May, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

Comments: Accepted in IEEE International Conference on Image Processing (ICIP) 2019

arXiv:1809.06323 [pdf]

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Authors: Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, Jing-Jhih Lin

Abstract: Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional… ▽ More Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti. △ Less

Submitted 28 December, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

Comments: Accepted in ACM International Conference on Multimedia in Asia (MMAsia) 2019 [Best Paper Award]

arXiv:1809.03994 [pdf]

Efficient Road Lane Marking Detection with Deep Learning

Authors: Ping-Rong Chen, Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, Jing-Jhih Lin

Abstract: Lane mark detection is an important element in the road scene analysis for Advanced Driver Assistant System (ADAS). Limited by the onboard computing power, it is still a challenge to reduce system complexity and maintain high accuracy at the same time. In this paper, we propose a Lane Marking Detector (LMD) using a deep convolutional neural network to extract robust lane marking features. To impro… ▽ More Lane mark detection is an important element in the road scene analysis for Advanced Driver Assistant System (ADAS). Limited by the onboard computing power, it is still a challenge to reduce system complexity and maintain high accuracy at the same time. In this paper, we propose a Lane Marking Detector (LMD) using a deep convolutional neural network to extract robust lane marking features. To improve its performance with a target of lower complexity, the dilated convolution is adopted. A shallower and thinner structure is designed to decrease the computational cost. Moreover, we also design post-processing algorithms to construct 3rd-order polynomial models to fit into the curved lanes. Our system shows promising results on the captured road scenes. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: Accepted at International Conference on Digital Signal Processing (DSP) 2018

arXiv:1808.01280 [pdf]

Geared Rotationally Identical and Invariant Convolutional Neural Network Systems

Authors: ShihChung B. Lo, Ph. D., Matthew T. Freedman, M. D., Seong K. Mun, Ph. D., Heang-Ping Chan, Ph. D

Abstract: Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting ne… ▽ More Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting networks of participated processes at the first flatten layer. Using an ordinary CNN structure as a base, requirements for constructing a GRI-CNN include the use of either symmetric input vector or kernels with an angle increment that can form a complete cycle as a "gearwheel". Four basic GRI-CNN structures were studied. Each of them can produce quantitatively identical output results when a rotation angle of the input vector is evenly divisible by the step angle of the gear. Our study showed when an input vector rotated with an angle does not match to a step angle, the GRI-CNN can also produce a highly consistent result. With a design of using an ultra-fine gear-tooth step angle (e.g., 1 degree or 0.1 degree), all four GRI-CNN systems can be constructed virtually isotropically. △ Less

Submitted 10 August, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: 14 pages, 6 figures, 8 tables

arXiv:1807.11156 [pdf]

Transformationally Identical and Invariant Convolutional Neural Networks by Combining Symmetric Operations or Input Vectors

Authors: ShihChung B. Lo, Matthew T. Freedman, Seong K. Mun

Abstract: Transformationally invariant processors constructed by transformed input vectors or operators have been suggested and applied to many applications. In this study, transformationally identical processing based on combining results of all sub-processes with corresponding transformations at one of the processing steps or at the beginning step were found to be equivalent for a given condition. This pr… ▽ More Transformationally invariant processors constructed by transformed input vectors or operators have been suggested and applied to many applications. In this study, transformationally identical processing based on combining results of all sub-processes with corresponding transformations at one of the processing steps or at the beginning step were found to be equivalent for a given condition. This property can be applied to most convolutional neural network (CNN) systems. Specifically, a transformationally identical CNN can be constructed by arranging internally symmetric operations in parallel with the same transformation family that includes a flatten layer with weights sharing among their corresponding transformation elements. Other transformationally identical CNNs can be constructed by averaging transformed input vectors of the family at the input layer followed by an ordinary CNN process or by a set of symmetric operations. Interestingly, we found that both types of transformationally identical CNN systems are mathematically equivalent by either applying an averaging operation to corresponding elements of all sub-channels before the activation function or without using a non-linear activation function. △ Less

Submitted 20 August, 2018; v1 submitted 29 July, 2018; originally announced July 2018.

Comments: 9 pages, 3 figures

Showing 1–50 of 61 results for author: Lo, S