-
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Authors:
Zhuolin Tan,
Chenqiang Gao,
Anyong Qin,
Ruixin Chen,
Tiecheng Song,
Feng Yang,
Deyu Meng
Abstract:
Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different…
▽ More
Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms. Compared to existing behavioral datasets, our dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. The increased complexity of the dataset brings new opportunities and challenges for benchmarking action detection. Innovatively, we also propose a new baseline method, a visual transformer for enhancing attention to key local details in small and dense object regions. Our method achieves excellent performance with mean Average Precision (mAP) of 67.9\% and 27.4\% on SAV and AVA, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes. The code and dataset will be released at https://github.com/Ritatanz/SAV.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
FedSPU: Personalized Federated Learning for Resource-constrained Devices with Stochastic Parameter Update
Authors:
Ziru Niu,
Hai Dong,
A. K. Qin
Abstract:
Personalized Federated Learning (PFL) is widely employed in IoT applications to handle high-volume, non-iid client data while ensuring data privacy. However, heterogeneous edge devices owned by clients may impose varying degrees of resource constraints, causing computation and communication bottlenecks for PFL. Federated Dropout has emerged as a popular strategy to address this challenge, wherein…
▽ More
Personalized Federated Learning (PFL) is widely employed in IoT applications to handle high-volume, non-iid client data while ensuring data privacy. However, heterogeneous edge devices owned by clients may impose varying degrees of resource constraints, causing computation and communication bottlenecks for PFL. Federated Dropout has emerged as a popular strategy to address this challenge, wherein only a subset of the global model, i.e. a \textit{sub-model}, is trained on a client's device, thereby reducing computation and communication overheads. Nevertheless, the dropout-based model-pruning strategy may introduce bias, particularly towards non-iid local data. When biased sub-models absorb highly divergent parameters from other clients, performance degradation becomes inevitable. In response, we propose federated learning with stochastic parameter update (FedSPU). Unlike dropout that tailors the global model to small-size local sub-models, FedSPU maintains the full model architecture on each device but randomly freezes a certain percentage of neurons in the local model during training while updating the remaining neurons. This approach ensures that a portion of the local model remains personalized, thereby enhancing the model's robustness against biased parameters from other clients. Experimental results demonstrate that FedSPU outperforms federated dropout by 7.57\% on average in terms of accuracy. Furthermore, an introduced early stopping scheme leads to a significant reduction of the training time by \(24.8\%\sim70.4\%\) while maintaining high accuracy.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
FAGH: Accelerating Federated Learning with Approximated Global Hessian
Authors:
Mrinmay Sen,
A. K. Qin,
Krishna Mohan C
Abstract:
In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the exi…
▽ More
In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the existing Newton-based FL training methods suffer from either memory inefficiency or high computational costs for local clients or the server. To address this issue, we propose an FL with approximated global Hessian (FAGH) method to accelerate FL training. FAGH leverages the first moment of the approximated global Hessian and the first moment of the global gradient to train the global model. By harnessing the approximated global Hessian curvature, FAGH accelerates the convergence of global model training, leading to the reduced number of communication rounds and thus the shortened training time. Experimental results verify FAGH's effectiveness in decreasing the number of communication rounds and the time required to achieve the pre-specified objectives of the global model performance in terms of training and test losses as well as test accuracy. Notably, FAGH outperforms several state-of-the-art FL training methods.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix
Authors:
Mrinmay Sen,
A. K. Qin,
Gayathri C,
Raghu Kishore N,
Yen-Wei Chen,
Balasubramanian Raman
Abstract:
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and…
▽ More
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and calculating the full FIM is addressed through making use of the regularized FIM and directly finding the gradient update direction via Sherman-Morrison matrix inversion. Additionally, like the popular Adam method, SOFIM uses the first moment of the gradient to address the issue of non-stationary objectives across mini-batches due to heterogeneous data. The utilization of the regularized FIM and Sherman-Morrison matrix inversion leads to the improved convergence rate with the same space and time complexities as stochastic gradient descent (SGD) with momentum. The extensive experiments on training deep learning models using several benchmark image classification datasets demonstrate that the proposed SOFIM outperforms SGD with momentum and several state-of-the-art Newton optimization methods in term of the convergence speed for achieving the pre-specified objectives of training and test losses as well as test accuracy.
△ Less
Submitted 1 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Decentralised Traffic Incident Detection via Network Lasso
Authors:
Qiyuan Zhu,
A. K. Qin,
Prabath Abeysekara,
Hussein Dia,
Hanna Grzybowska
Abstract:
Traffic incident detection plays a key role in intelligent transportation systems, which has gained great attention in transport engineering. In the past, traditional machine learning (ML) based detection methods achieved good performance under a centralised computing paradigm, where all data are transmitted to a central server for building ML models therein. Nowadays, deep neural networks based f…
▽ More
Traffic incident detection plays a key role in intelligent transportation systems, which has gained great attention in transport engineering. In the past, traditional machine learning (ML) based detection methods achieved good performance under a centralised computing paradigm, where all data are transmitted to a central server for building ML models therein. Nowadays, deep neural networks based federated learning (FL) has become a mainstream detection approach to enable the model training in a decentralised manner while warranting local data governance. Such neural networks-centred techniques, however, have overshadowed the utility of well-established ML-based detection methods. In this work, we aim to explore the potential of potent conventional ML-based detection models in modern traffic scenarios featured by distributed data. We leverage an elegant but less explored distributed optimisation framework named Network Lasso, with guaranteed global convergence for convex problem formulations, integrate the potent convex ML model with it, and compare it with centralised learning, local learning, and federated learning methods atop a well-known traffic incident detection dataset. Experimental results show that the proposed network lasso-based approach provides a promising alternative to the FL-based approach in data-decentralised traffic scenarios, with a strong convergence guarantee while rekindling the significance of conventional ML-based detection methods.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Expert-Adaptive Medical Image Segmentation
Authors:
Binyan Hu,
A. K. Qin
Abstract:
Medical image segmentation (MIS) plays an instrumental role in medical image analysis, where considerable effort has been devoted to automating the process. Currently, mainstream MIS approaches are based on deep neural networks (DNNs), which are typically trained on a dataset with annotations produced by certain medical experts. In the medical domain, the annotations generated by different experts…
▽ More
Medical image segmentation (MIS) plays an instrumental role in medical image analysis, where considerable effort has been devoted to automating the process. Currently, mainstream MIS approaches are based on deep neural networks (DNNs), which are typically trained on a dataset with annotations produced by certain medical experts. In the medical domain, the annotations generated by different experts can be inherently distinct due to complexity of medical images and variations in expertise and post-segmentation missions. Consequently, the DNN model trained on the data annotated by some experts may hardly adapt to a new expert. In this work, we evaluate a customised expert-adaptive method, characterised by multi-expert annotation, multi-task DNN-based model training, and lightweight model fine-tuning, to investigate model's adaptivity to a new expert in the situation where the amount and mobility of training images are limited. Experiments conducted on brain MRI segmentation tasks with limited training data demonstrate its effectiveness and the impact of its key parameters.
△ Less
Submitted 1 May, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Two-Stage Multi-task Self-Supervised Learning for Medical Image Segmentation
Authors:
Binyan Hu,
A. K. Qin
Abstract:
Medical image segmentation has been significantly advanced by deep learning (DL) techniques, though the data scarcity inherent in medical applications poses a great challenge to DL-based segmentation methods. Self-supervised learning offers a solution by creating auxiliary learning tasks from the available dataset and then leveraging the knowledge acquired from solving auxiliary tasks to help bett…
▽ More
Medical image segmentation has been significantly advanced by deep learning (DL) techniques, though the data scarcity inherent in medical applications poses a great challenge to DL-based segmentation methods. Self-supervised learning offers a solution by creating auxiliary learning tasks from the available dataset and then leveraging the knowledge acquired from solving auxiliary tasks to help better solve the target segmentation task. Different auxiliary tasks may have different properties and thus can help the target task to different extents. It is desired to leverage their complementary advantages to enhance the overall assistance to the target task. To achieve this, existing methods often adopt a joint training paradigm, which co-solves segmentation and auxiliary tasks by integrating their losses or intermediate gradients. However, direct coupling of losses or intermediate gradients risks undesirable interference because the knowledge acquired from solving each auxiliary task at every training step may not always benefit the target task. To address this issue, we propose a two-stage training approach. In the first stage, the target segmentation task will be independently co-solved with each auxiliary task in both joint training and pre-training modes, with the better model selected via validation performance. In the second stage, the models obtained with respect to each auxiliary task are converted into a single model using an ensemble knowledge distillation method. Our approach allows for making best use of each auxiliary task to create multiple elite segmentation models and then combine them into an even more powerful model. We employed five auxiliary tasks of different proprieties in our approach and applied it to train the U-Net model on an X-ray pneumothorax segmentation dataset. Experimental results demonstrate the superiority of our approach over several existing methods.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
HAYATE: Photometric redshift estimation by hybridising machine learning with template fitting
Authors:
Shingo Tanigawa,
Karl Glazebrook,
Colin Jacobs,
Ivo Labbe,
Alex K. Qin
Abstract:
Machine learning photo-z methods, trained directly on spectroscopic redshifts, provide a viable alternative to traditional template fitting methods but may not generalise well on new data that deviates from that in the training set. In this work, we present a Hybrid Algorithm for WI(Y)de-range photo-z estimation with Artificial neural networks and TEmplate fitting (HAYATE), a novel photo-z method…
▽ More
Machine learning photo-z methods, trained directly on spectroscopic redshifts, provide a viable alternative to traditional template fitting methods but may not generalise well on new data that deviates from that in the training set. In this work, we present a Hybrid Algorithm for WI(Y)de-range photo-z estimation with Artificial neural networks and TEmplate fitting (HAYATE), a novel photo-z method that combines template fitting and data-driven approaches and whose training loss is optimised in terms of both redshift point estimates and probability distributions. We produce artificial training data from low-redshift galaxy SEDs at z<1.3, artificially redshifted up to z=5. We test the model on data from the ZFOURGE surveys, demonstrating that HAYATE can function as a reliable emulator of EAZY for the broad redshift range beyond the region of sufficient spectroscopic completeness. The network achieves precise photo-z estimations with smaller errors ($σ_{NMAD}$) than EAZY in the initial low-z region (z<1.3), while being comparable even in the high-z extrapolated regime (1.3<z<5). Meanwhile, it provides more robust photo-z estimations than EAZY with the lower outlier rate ($η_{0.2}\lesssim 1\%$) but runs $\sim100$ times faster than the original template fitting method. We also demonstrate HAYATE offers more reliable redshift PDFs, showing a flatter distribution of Probability Integral Transform scores than EAZY. The performance is further improved using transfer learning with spec-z samples. We expect that future large surveys will benefit from our novel methodology applicable to observations over a wide redshift range.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
FLrce: Resource-Efficient Federated Learning with Early-Stopping Strategy
Authors:
Ziru Niu,
Hai Dong,
A. Kai Qin,
Tao Gu
Abstract:
Federated Learning (FL) achieves great popularity in the Internet of Things (IoT) as a powerful interface to offer intelligent services to customers while maintaining data privacy. Under the orchestration of a server, edge devices (also called clients in FL) collaboratively train a global deep-learning model without sharing any local data. Nevertheless, the unequal training contributions among cli…
▽ More
Federated Learning (FL) achieves great popularity in the Internet of Things (IoT) as a powerful interface to offer intelligent services to customers while maintaining data privacy. Under the orchestration of a server, edge devices (also called clients in FL) collaboratively train a global deep-learning model without sharing any local data. Nevertheless, the unequal training contributions among clients have made FL vulnerable, as clients with heavily biased datasets can easily compromise FL by sending malicious or heavily biased parameter updates. Furthermore, the resource shortage issue of the network also becomes a bottleneck. Due to overwhelming computation overheads generated by training deep-learning models on edge devices, and significant communication overheads for transmitting deep-learning models across the network, enormous amounts of resources are consumed in the FL process. This encompasses computation resources like energy and communication resources like bandwidth. To comprehensively address these challenges, in this paper, we present FLrce, an efficient FL framework with a relationship-based client selection and early-stopping strategy. FLrce accelerates the FL process by selecting clients with more significant effects, enabling the global model to converge to a high accuracy in fewer rounds. FLrce also leverages an early stopping mechanism that terminates FL in advance to save communication and computation resources. Experiment results show that, compared with existing efficient FL frameworks, FLrce improves the computation and communication efficiency by at least 30% and 43% respectively.
△ Less
Submitted 22 August, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration
Authors:
Yongzhe Yuan,
Yue Wu,
Maoguo Gong,
Qiguang Miao,
A. K. Qin
Abstract:
The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlapping scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its correspon…
▽ More
The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlapping scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high quality reference point cloud copy, an One-Nearest Neighborhood (1-NN) point cloud is generated by input point cloud. This facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signal for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Training Physics-Informed Neural Networks via Multi-Task Optimization for Traffic Density Prediction
Authors:
Bo Wang,
A. K. Qin,
Sajjad Shafiei,
Hussein Dia,
Adriana-Simona Mihaita,
Hanna Grzybowska
Abstract:
Physics-informed neural networks (PINNs) are a newly emerging research frontier in machine learning, which incorporate certain physical laws that govern a given data set, e.g., those described by partial differential equations (PDEs), into the training of the neural network (NN) based on such a data set. In PINNs, the NN acts as the solution approximator for the PDE while the PDE acts as the prior…
▽ More
Physics-informed neural networks (PINNs) are a newly emerging research frontier in machine learning, which incorporate certain physical laws that govern a given data set, e.g., those described by partial differential equations (PDEs), into the training of the neural network (NN) based on such a data set. In PINNs, the NN acts as the solution approximator for the PDE while the PDE acts as the prior knowledge to guide the NN training, leading to the desired generalization performance of the NN when facing the limited availability of training data. However, training PINNs is a non-trivial task largely due to the complexity of the loss composed of both NN and physical law parts. In this work, we propose a new PINN training framework based on the multi-task optimization (MTO) paradigm. Under this framework, multiple auxiliary tasks are created and solved together with the given (main) task, where the useful knowledge from solving one task is transferred in an adaptive mode to assist in solving some other tasks, aiming to uplift the performance of solving the main task. We implement the proposed framework and apply it to train the PINN for addressing the traffic density prediction problem. Experimental results demonstrate that our proposed training framework leads to significant performance improvement in comparison to the traditional way of training the PINN.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
Learning non-Markovian Decision-Making from State-only Sequences
Authors:
Aoyang Qin,
Feng Gao,
Qing Li,
Song-Chun Zhu,
Sirui Xie
Abstract:
Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-…
▽ More
Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.
△ Less
Submitted 30 October, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
TrafFormer: A Transformer Model for Predicting Long-term Traffic
Authors:
David Alexander Tedjopurnomo,
Farhana M. Choudhury,
A. K. Qin
Abstract:
Traffic prediction is a flourishing research field due to its importance in human mobility in the urban space. Despite this, existing studies only focus on short-term prediction of up to few hours in advance, with most being up to one hour only. Long-term traffic prediction can enable more comprehensive, informed, and proactive measures against traffic congestion and is therefore an important task…
▽ More
Traffic prediction is a flourishing research field due to its importance in human mobility in the urban space. Despite this, existing studies only focus on short-term prediction of up to few hours in advance, with most being up to one hour only. Long-term traffic prediction can enable more comprehensive, informed, and proactive measures against traffic congestion and is therefore an important task to explore. In this paper, we explore the task of long-term traffic prediction; where we predict traffic up to 24 hours in advance. We note the weaknesses of existing models--which are based on recurrent structures--for long-term traffic prediction and propose a modified Transformer model "TrafFormer". Experiments comparing our model with existing hybrid neural network models show the superiority of our model.
△ Less
Submitted 2 March, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Deep Edge Intelligence: Architecture, Key Features, Enabling Technologies and Challenges
Authors:
Prabath Abeysekara,
Hai Dong,
A. K. Qin
Abstract:
With the breakthroughs in Deep Learning, recent years have witnessed a massive surge in Artificial Intelligence applications and services. Meanwhile, the rapid advances in Mobile Computing and Internet of Things has also given rise to billions of mobile and smart sensing devices connected to the Internet, generating zettabytes of data at the network edge. The opportunity to combine these two domai…
▽ More
With the breakthroughs in Deep Learning, recent years have witnessed a massive surge in Artificial Intelligence applications and services. Meanwhile, the rapid advances in Mobile Computing and Internet of Things has also given rise to billions of mobile and smart sensing devices connected to the Internet, generating zettabytes of data at the network edge. The opportunity to combine these two domains of technologies to power interconnected devices with intelligence is likely to pave the way for a new wave of technology revolutions. Embracing this technology revolution, in this article, we present a novel computing vision named Deep Edge Intelligence (DEI). DEI employs Deep Learning, Artificial Intelligence, Cloud and Edge Computing, 5G/6G networks, Internet of Things, Microservices, etc. aiming to provision reliable and secure intelligence services to every person and organisation at any place with better user experience. The vision, system architecture, key layers and features of DEI are also detailed. Finally, we reveal the key enabling technologies and research challenges associated with it.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Traffic disruption modelling with mode shift in multi-modal networks
Authors:
Dong Zhao,
Adriana-Simona Mihaita,
Yuming Ou,
Sajjad Shafiei,
Hanna Grzybowska,
A. K. Qin,
Gary Tan,
Mo Li,
Hussein Dia
Abstract:
A multi-modal transport system is acknowledged to have robust failure tolerance and can effectively relieve urban congestion issues. However, estimating the impact of disruptions across multi-transport modes is a challenging problem due to a dis-aggregated modelling approach applied to only individual modes at a time. To fill this gap, this paper proposes a new integrated modelling framework for a…
▽ More
A multi-modal transport system is acknowledged to have robust failure tolerance and can effectively relieve urban congestion issues. However, estimating the impact of disruptions across multi-transport modes is a challenging problem due to a dis-aggregated modelling approach applied to only individual modes at a time. To fill this gap, this paper proposes a new integrated modelling framework for a multi-modal traffic state estimation and evaluation of the disruption impact across all modes under various traffic conditions. First, we propose an iterative trip assignment model to elucidate the association between travel demand and travel behaviour, including a multi-modal origin-to-destination estimation for private and public transport. Secondly, we provide a practical multi-modal travel demand re-adjustment that takes the mode shift of the affected travellers into consideration. The pros and cons of the mode shift strategy are showcased via several scenario-based transport simulating experiments. The results show that a well-balanced mode shift with flexible routing and early announcements of detours so that travellers can plan ahead can significantly benefit all travellers by a delay time reduction of 46%, while a stable route assignment maintains a higher average traffic flow and the inactive mode-route choice help relief density under the traffic disruptions.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources
Authors:
Ji Liu,
Daxiang Dong,
Xi Wang,
An Qin,
Xingjian Li,
Patrick Valduriez,
Dejing Dou,
Dianhai Yu
Abstract:
Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an…
▽ More
Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge distillation emerges as a promising approach to deal with the big models. Existing knowledge distillation methods cannot exploit the elastic available computing resources and correspond to low efficiency. In this paper, we propose an Elastic Deep Learning framework for knowledge Distillation, i.e., EDL-Dist. The advantages of EDL-Dist are three-fold. First, the inference and the training process is separated. Second, elastic available computing resources can be utilized to improve the efficiency. Third, fault-tolerance of the training and inference processes is supported. We take extensive experimentation to show that the throughput of EDL-Dist is up to 3.125 times faster than the baseline method (online knowledge distillation) while the accuracy is similar or higher.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Multi-task Optimization Based Co-training for Electricity Consumption Prediction
Authors:
Hui Song,
A. K. Qin,
Chenggang Yan
Abstract:
Real-world electricity consumption prediction may involve different tasks, e.g., prediction for different time steps ahead or different geo-locations. These tasks are often solved independently without utilizing some common problem-solving knowledge that could be extracted and shared among these tasks to augment the performance of solving each task. In this work, we propose a multi-task optimizati…
▽ More
Real-world electricity consumption prediction may involve different tasks, e.g., prediction for different time steps ahead or different geo-locations. These tasks are often solved independently without utilizing some common problem-solving knowledge that could be extracted and shared among these tasks to augment the performance of solving each task. In this work, we propose a multi-task optimization (MTO) based co-training (MTO-CT) framework, where the models for solving different tasks are co-trained via an MTO paradigm in which solving each task may benefit from the knowledge gained from when solving some other tasks to help its solving process. MTO-CT leverages long short-term memory (LSTM) based model as the predictor where the knowledge is represented via connection weights and biases. In MTO-CT, an inter-task knowledge transfer module is designed to transfer knowledge between different tasks, where the most helpful source tasks are selected by using the probability matching and stochastic universal selection, and evolutionary operations like mutation and crossover are performed for reusing the knowledge from selected source tasks in a target task. We use electricity consumption data from five states in Australia to design two sets of tasks at different scales: a) one-step ahead prediction for each state (five tasks) and b) 6-step, 12-step, 18-step, and 24-step ahead prediction for each state (20 tasks). The performance of MTO-CT is evaluated on solving each of these two sets of tasks in comparison to solving each task in the set independently without knowledge sharing under the same settings, which demonstrates the superiority of MTO-CT in terms of prediction accuracy.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems
Authors:
Nasrin Sultana,
Jeffrey Chan,
Tabinda Sarwar,
A. K. Qin
Abstract:
Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when applied to combinatorial problems: insufficient exploration and the requirement of many training examples of the search space to achieve reasonable performance. Comb…
▽ More
Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when applied to combinatorial problems: insufficient exploration and the requirement of many training examples of the search space to achieve reasonable performance. Combinatorial optimisation can be complex, characterised by search spaces with many optimas and large spaces to search and learn. Therefore, a new method is needed to find good solutions that are more efficient by being more sample efficient. This paper presents a new reinforcement learning approach that is based on entropy. In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return and improves the sample efficiency to achieve faster learning during training time. We systematically evaluate our approach on a range of route optimisation tasks typically used to evaluate learning-based optimisation, such as the such as the Travelling Salesman problems (TSP), Capacitated Vehicle Routing Problem (CVRP). In this paper, we show that our model can generalise to various route problems, such as the split-delivery VRP (SDVRP), and compare the performance of our method with that of current state-of-the-art approaches. The Empirical results show that the proposed method can improve on state-of-the-art methods in terms of solution quality and computation time and generalise to problems of different sizes.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks
Authors:
Binyan Hu,
Yu Sun,
A. K. Qin
Abstract:
Deep neural networks (DNNs) often rely on massive labelled data for training, which is inaccessible in many applications. Data augmentation (DA) tackles data scarcity by creating new labelled data from available ones. Different DA methods have different mechanisms and therefore using their generated labelled data for DNN training may help improving DNN's generalisation to different degrees. Combin…
▽ More
Deep neural networks (DNNs) often rely on massive labelled data for training, which is inaccessible in many applications. Data augmentation (DA) tackles data scarcity by creating new labelled data from available ones. Different DA methods have different mechanisms and therefore using their generated labelled data for DNN training may help improving DNN's generalisation to different degrees. Combining multiple DA methods, namely multi-DA, for DNN training, provides a way to boost generalisation. Among existing multi-DA based DNN training methods, those relying on knowledge distillation (KD) have received great attention. They leverage knowledge transfer to utilise the labelled data sets created by multiple DA methods instead of directly combining them for training DNNs. However, existing KD-based methods can only utilise certain types of DA methods, incapable of utilising the advantages of arbitrary DA methods. We propose a general multi-DA based DNN training framework capable to use arbitrary DA methods. To train a DNN, our framework replicates a certain portion in the latter part of the DNN into multiple copies, leading to multiple DNNs with shared blocks in their former parts and independent blocks in their latter parts. Each of these DNNs is associated with a unique DA and a newly devised loss that allows comprehensively learning from the data generated by all DA methods and the outputs from all DNNs in an online and adaptive way. The overall loss, i.e., the sum of each DNN's loss, is used for training the DNN. Eventually, one of the DNNs with the best validation performance is chosen for inference. We implement the proposed framework by using three distinct DA methods and apply it for training representative DNNs. Experiments on the popular benchmarks of image classification demonstrate the superiority of our method to several existing single-DA and multi-DA based training methods.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
Exploring the interpretability of deep neural networks used for gravitational lens finding with a sensitivity probe
Authors:
C. Jacobs,
K. Glazebrook,
A. K. Qin,
T. Collett
Abstract:
Artificial neural networks are finding increasing use in astronomy, but understanding the limitations of these models can be difficult. We utilize a statistical method, a sensitivity probe, designed to complement established methods for interpreting neural network behavior by quantifying the sensitivity of a model's performance to various properties of the inputs. We apply this method to neural ne…
▽ More
Artificial neural networks are finding increasing use in astronomy, but understanding the limitations of these models can be difficult. We utilize a statistical method, a sensitivity probe, designed to complement established methods for interpreting neural network behavior by quantifying the sensitivity of a model's performance to various properties of the inputs. We apply this method to neural networks trained to classify images of galaxy-galaxy strong lenses in the Dark Energy Survey. We find that the networks are highly sensitive to color, the simulated PSF used in training, and occlusion of light from a lensed source, but are insensitive to Einstein radius, and performance degrades smoothly with source and lens magnitudes. From this we identify weaknesses in the training sets used to constrain the networks, particularly the over-sensitivity to PSF, and constrain the selection function of the lens-finder as a function of galaxy photometric magnitudes, with accuracy decreasing significantly where the g-band magnitude of the lens source is greater than 21.5 and the r-band magnitude of the lens is less than 19.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
ADDS: Adaptive Differentiable Sampling for Robust Multi-Party Learning
Authors:
Maoguo Gong,
Yuan Gao,
Yue Wu,
A. K. Qin
Abstract:
Distributed multi-party learning provides an effective approach for training a joint model with scattered data under legal and practical constraints. However, due to the quagmire of a skewed distribution of data labels across participants and the computation bottleneck of local devices, how to build smaller customized models for clients in various scenarios while providing updates appliable to the…
▽ More
Distributed multi-party learning provides an effective approach for training a joint model with scattered data under legal and practical constraints. However, due to the quagmire of a skewed distribution of data labels across participants and the computation bottleneck of local devices, how to build smaller customized models for clients in various scenarios while providing updates appliable to the central model remains a challenge. In this paper, we propose a novel adaptive differentiable sampling framework (ADDS) for robust and communication-efficient multi-party learning. Inspired by the idea of dropout in neural networks, we introduce a network sampling strategy in the multi-party setting, which distributes different subnets of the central model to clients for updating, and the differentiable sampling rates allow each client to extract optimal local architecture from the supernet according to its private data distribution. The approach requires minimal modifications to the existing multi-party learning structure, and it is capable of integrating local updates of all subnets into the supernet, improving the robustness of the central model. The proposed framework significantly reduces local computation and communication costs while speeding up the central model convergence, as we demonstrated through experiments on real-world datasets.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Learning Enhanced Optimisation for Routing Problems
Authors:
Nasrin Sultana,
Jeffrey Chan,
Tabinda Sarwar,
Babak Abbasi,
A. K. Qin
Abstract:
Deep learning approaches have shown promising results in solving routing problems. However, there is still a substantial gap in solution quality between machine learning and operations research algorithms. Recently, another line of research has been introduced that fuses the strengths of machine learning and operational research algorithms. In particular, search perturbation operators have been us…
▽ More
Deep learning approaches have shown promising results in solving routing problems. However, there is still a substantial gap in solution quality between machine learning and operations research algorithms. Recently, another line of research has been introduced that fuses the strengths of machine learning and operational research algorithms. In particular, search perturbation operators have been used to improve the solution. Nevertheless, using the perturbation may not guarantee a quality solution. This paper presents "Learning to Guide Local Search" (L2GLS), a learning-based approach for routing problems that uses a penalty term and reinforcement learning to adaptively adjust search efforts. L2GLS combines local search (LS) operators' strengths with penalty terms to escape local optimals. Routing problems have many practical applications, often presetting larger instances that are still challenging for many existing algorithms introduced in the learning to optimise field. We show that L2GLS achieves the new state-of-the-art results on larger TSP and CVRP over other machine learning methods.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Evolutionary Ensemble Learning for Multivariate Time Series Prediction
Authors:
Hui Song,
A. K. Qin,
Flora D. Salim
Abstract:
Multivariate time series (MTS) prediction plays a key role in many fields such as finance, energy and transport, where each individual time series corresponds to the data collected from a certain data source, so-called channel. A typical pipeline of building an MTS prediction model (PM) consists of selecting a subset of channels among all available ones, extracting features from the selected chann…
▽ More
Multivariate time series (MTS) prediction plays a key role in many fields such as finance, energy and transport, where each individual time series corresponds to the data collected from a certain data source, so-called channel. A typical pipeline of building an MTS prediction model (PM) consists of selecting a subset of channels among all available ones, extracting features from the selected channels, and building a PM based on the extracted features, where each component involves certain optimization tasks, i.e., selection of channels, feature extraction (FE) methods, and PMs as well as configuration of the selected FE method and PM. Accordingly, pursuing the best prediction performance corresponds to optimizing the pipeline by solving all of its involved optimization problems. This is a non-trivial task due to the vastness of the solution space. Different from most of the existing works which target at optimizing certain components of the pipeline, we propose a novel evolutionary ensemble learning framework to optimize the entire pipeline in a holistic manner. In this framework, a specific pipeline is encoded as a candidate solution and a multi-objective evolutionary algorithm is applied under different population sizes to produce multiple Pareto optimal sets (POSs). Finally, selective ensemble learning is designed to choose the optimal subset of solutions from the POSs and combine them to yield final prediction by using greedy sequential selection and least square methods. We implement the proposed framework and evaluate our implementation on two real-world applications, i.e., electricity consumption prediction and air quality prediction. The performance comparison with state-of-the-art techniques demonstrates the superiority of the proposed approach.
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
AdvDrop: Adversarial Attack to DNNs by Dropping Information
Authors:
Ranjie Duan,
Yuefeng Chen,
Dantong Niu,
Yun Yang,
A. K. Qin,
Yuan He
Abstract:
Human can easily recognize visual objects with lost information: even losing most details with only contour reserved, e.g. cartoon. However, in terms of visual perception of Deep Neural Networks (DNNs), the ability for recognizing abstract objects (visual objects with lost information) is still a challenge. In this work, we investigate this issue from an adversarial viewpoint: will the performance…
▽ More
Human can easily recognize visual objects with lost information: even losing most details with only contour reserved, e.g. cartoon. However, in terms of visual perception of Deep Neural Networks (DNNs), the ability for recognizing abstract objects (visual objects with lost information) is still a challenge. In this work, we investigate this issue from an adversarial viewpoint: will the performance of DNNs decrease even for the images only losing a little information? Towards this end, we propose a novel adversarial attack, named \textit{AdvDrop}, which crafts adversarial examples by dropping existing information of images. Previously, most adversarial attacks add extra disturbing information on clean images explicitly. Opposite to previous works, our proposed work explores the adversarial robustness of DNN models in a novel perspective by dropping imperceptible details to craft adversarial examples. We demonstrate the effectiveness of \textit{AdvDrop} by extensive experiments, and show that this new type of adversarial examples is more difficult to be defended by current defense systems.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Multi-Party Dual Learning
Authors:
Maoguo Gong,
Yuan Gao,
Yu Xie,
A. K. Qin,
Ke Pan,
Yew-Soon Ong
Abstract:
The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning…
▽ More
The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning models. In this paper, we propose a multi-party dual learning (MPDL) framework to alleviate the problem of limited data with poor quality in an isolated party. Since the knowledge sharing processes for multiple parties always emerge in dual forms, we show that dual learning is naturally suitable to handle the challenge of missing data, and explicitly exploits the probabilistic correlation and structural relationship between dual tasks to regularize the training process. We introduce a feature-oriented differential privacy with mathematical proof, in order to avoid possible privacy leakage of raw features in the dual inference process. The approach requires minimal modifications to the existing multi-party learning structure, and each party can build flexible and powerful models separately, whose accuracy is no less than non-distributed self-learning approaches. The MPDL framework achieves significant improvement compared with state-of-the-art multi-party learning methods, as we demonstrated through simulations on real-world datasets.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Towards Explainable Multi-Party Learning: A Contrastive Knowledge Sharing Framework
Authors:
Yuan Gao,
Jiawei Li,
Maoguo Gong,
Yu Xie,
A. K. Qin
Abstract:
Multi-party learning provides solutions for training joint models with decentralized data under legal and practical constraints. However, traditional multi-party learning approaches are confronted with obstacles such as system heterogeneity, statistical heterogeneity, and incentive design. How to deal with these challenges and further improve the efficiency and performance of multi-party learning…
▽ More
Multi-party learning provides solutions for training joint models with decentralized data under legal and practical constraints. However, traditional multi-party learning approaches are confronted with obstacles such as system heterogeneity, statistical heterogeneity, and incentive design. How to deal with these challenges and further improve the efficiency and performance of multi-party learning has become an urgent problem to be solved. In this paper, we propose a novel contrastive multi-party learning framework for knowledge refinement and sharing with an accountable incentive mechanism. Since the existing naive model parameter averaging method is contradictory to the learning paradigm of neural networks, we simulate the process of human cognition and communication, and analogy multi-party learning as a many-to-one knowledge sharing problem. The approach is capable of integrating the acquired explicit knowledge of each client in a transparent manner without privacy disclosure, and it reduces the dependence on data distribution and communication environments. The proposed scheme achieves significant improvement in model performance in a variety of scenarios, as we demonstrated through experiments on several real-world datasets.
△ Less
Submitted 25 May, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
An attention-based unsupervised adversarial model for movie review spam detection
Authors:
Yuan Gao,
Maoguo Gong,
Yu Xie,
A. K. Qin
Abstract:
With the prevalence of the Internet, online reviews have become a valuable information resource for people. However, the authenticity of online reviews remains a concern, and deceptive reviews have become one of the most urgent network security problems to be solved. Review spams will mislead users into making suboptimal choices and inflict their trust in online reviews. Most existing research man…
▽ More
With the prevalence of the Internet, online reviews have become a valuable information resource for people. However, the authenticity of online reviews remains a concern, and deceptive reviews have become one of the most urgent network security problems to be solved. Review spams will mislead users into making suboptimal choices and inflict their trust in online reviews. Most existing research manually extracted features and labeled training samples, which are usually complicated and time-consuming. This paper focuses primarily on a neglected emerging domain - movie review, and develops a novel unsupervised spam detection model with an attention mechanism. By extracting the statistical features of reviews, it is revealed that users will express their sentiments on different aspects of movies in reviews. An attention mechanism is introduced in the review embedding, and the conditional generative adversarial network is exploited to learn users' review style for different genres of movies. The proposed model is evaluated on movie reviews crawled from Douban, a Chinese online community where people could express their feelings about movies. The experimental results demonstrate the superior performance of the proposed approach.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink
Authors:
Ranjie Duan,
Xiaofeng Mao,
A. K. Qin,
Yun Yang,
Yuefeng Chen,
Shaokai Ye,
Yuan He
Abstract:
Though it is well known that the performance of deep neural networks (DNNs) degrades under certain light conditions, there exists no study on the threats of light beams emitted from some physical source as adversarial attacker on DNNs in a real-world scenario. In this work, we show by simply using a laser beam that DNNs are easily fooled. To this end, we propose a novel attack method called Advers…
▽ More
Though it is well known that the performance of deep neural networks (DNNs) degrades under certain light conditions, there exists no study on the threats of light beams emitted from some physical source as adversarial attacker on DNNs in a real-world scenario. In this work, we show by simply using a laser beam that DNNs are easily fooled. To this end, we propose a novel attack method called Adversarial Laser Beam ($AdvLB$), which enables manipulation of laser beam's physical parameters to perform adversarial attack. Experiments demonstrate the effectiveness of our proposed approach in both digital- and physical-settings. We further empirically analyze the evaluation results and reveal that the proposed laser beam attack may lead to some interesting prediction errors of the state-of-the-art DNNs. We envisage that the proposed $AdvLB$ method enriches the current family of adversarial attacks and builds the foundation for future robustness studies for light.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Learning Vehicle Routing Problems using Policy Optimisation
Authors:
Nasrin Sultana,
Jeffrey Chan,
A. K. Qin,
Tabinda Sarwar
Abstract:
Deep reinforcement learning (DRL) has been used to learn effective heuristics for solving complex combinatorial optimisation problem via policy networks and have demonstrated promising performance. Existing works have focused on solving (vehicle) routing problems as they have a nice balance between non-triviality and difficulty. State-of-the-art approaches learn a policy using reinforcement learni…
▽ More
Deep reinforcement learning (DRL) has been used to learn effective heuristics for solving complex combinatorial optimisation problem via policy networks and have demonstrated promising performance. Existing works have focused on solving (vehicle) routing problems as they have a nice balance between non-triviality and difficulty. State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver. These approaches have demonstrated good performance in some cases, but given the large search space typical combinatorial/routing problem, they can converge too quickly to poor policy. To prevent this, in this paper, we propose an approach name entropy regularised reinforcement learning (ERRL) that supports exploration by providing more stochastic policies, which tends to improve optimisation. Empirically, the low variance ERRL offers RL training fast and stable. We also introduce a combination of local search operators during test time, which significantly improves solution and complement ERRL. We qualitatively demonstrate that for vehicle routing problems, a policy with higher entropy can make the optimisation landscape smooth which makes it easier to optimise. The quantitative evaluation shows that the performance of the model is comparable with the state-of-the-art variants. In our evaluation, we experimentally illustrate that the model produces state-of-the-art performance on variants of Vehicle Routing problems such as Capacitated Vehicle Routing Problem (CVRP), Multiple Routing with Fixed Fleet Problems (MRPFF) and Travelling Salesman problem.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Learning to Optimise General TSP Instances
Authors:
Nasrin Sultana,
Jeffrey Chan,
A. K. Qin,
Tabinda Sarwar
Abstract:
The Travelling Salesman Problem (TSP) is a classical combinatorial optimisation problem. Deep learning has been successfully extended to meta-learning, where previous solving efforts assist in learning how to optimise future optimisation instances. In recent years, learning to optimise approaches have shown success in solving TSP problems. However, they focus on one type of TSP problem, namely one…
▽ More
The Travelling Salesman Problem (TSP) is a classical combinatorial optimisation problem. Deep learning has been successfully extended to meta-learning, where previous solving efforts assist in learning how to optimise future optimisation instances. In recent years, learning to optimise approaches have shown success in solving TSP problems. However, they focus on one type of TSP problem, namely ones where the points are uniformly distributed in Euclidean spaces and have issues in generalising to other embedding spaces, e.g., spherical distance spaces, and to TSP instances where the points are distributed in a non-uniform manner. An aim of learning to optimise is to train once and solve across a broad spectrum of (TSP) problems. Although supervised learning approaches have shown to achieve more optimal solutions than unsupervised approaches, they do require the generation of training data and running a solver to obtain solutions to learn from, which can be time-consuming and difficult to find reasonable solutions for harder TSP instances. Hence this paper introduces a new learning-based approach to solve a variety of different and common TSP problems that are trained on easier instances which are faster to train and are easier to obtain better solutions. We name this approach the non-Euclidean TSP network (NETSP-Net). The approach is evaluated on various TSP instances using the benchmark TSPLIB dataset and popular instance generator used in the literature. We performed extensive experiments that indicate our approach generalises across many types of instances and scales to instances that are larger than what was used during training.
△ Less
Submitted 3 November, 2020; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Locality Preserving Dense Graph Convolutional Networks with Graph Context-Aware Node Representations
Authors:
Wenfeng Liu,
Maoguo Gong,
Zedong Tang,
A. K. Qin
Abstract:
Graph convolutional networks (GCNs) have been widely used for representation learning on graph data, which can capture structural patterns on a graph via specifically designed convolution and readout operations. In many graph classification applications, GCN-based approaches have outperformed traditional methods. However, most of the existing GCNs are inefficient to preserve local information of g…
▽ More
Graph convolutional networks (GCNs) have been widely used for representation learning on graph data, which can capture structural patterns on a graph via specifically designed convolution and readout operations. In many graph classification applications, GCN-based approaches have outperformed traditional methods. However, most of the existing GCNs are inefficient to preserve local information of graphs -- a limitation that is especially problematic for graph classification. In this work, we propose a locality-preserving dense GCN with graph context-aware node representations. Specifically, our proposed model incorporates a local node feature reconstruction module to preserve initial node features into node representations, which is realized via a simple but effective encoder-decoder mechanism. To capture local structural patterns in neighbourhoods representing different ranges of locality, dense connectivity is introduced to connect each convolutional layer and its corresponding readout with all previous convolutional layers. To enhance node representativeness, the output of each convolutional layer is concatenated with the output of the previous layer's readout to form a global context-aware node representation. In addition, a self-attention module is introduced to aggregate layer-wise representations to form the final representation. Experiments on benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art methods in terms of classification accuracy.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Low-Rank Matrix Recovery from Noise via an MDL Framework-based Atomic Norm
Authors:
Anyong Qin,
Lina Xian,
Yongliang Yang,
Taiping Zhang,
Yuan Yan Tang
Abstract:
The recovery of the underlying low-rank structure of clean data corrupted with sparse noise/outliers is attracting increasing interest. However, in many low-level vision problems, the exact target rank of the underlying structure and the particular locations and values of the sparse outliers are not known. Thus, the conventional methods cannot separate the low-rank and sparse components completely…
▽ More
The recovery of the underlying low-rank structure of clean data corrupted with sparse noise/outliers is attracting increasing interest. However, in many low-level vision problems, the exact target rank of the underlying structure and the particular locations and values of the sparse outliers are not known. Thus, the conventional methods cannot separate the low-rank and sparse components completely, especially in the case of gross outliers or deficient observations. Therefore, in this study, we employ the minimum description length (MDL) principle and atomic norm for low-rank matrix recovery to overcome these limitations. First, we employ the atomic norm to find all the candidate atoms of low-rank and sparse terms, and then we minimize the description length of the model in order to select the appropriate atoms of low-rank and the sparse matrices, respectively. Our experimental analyses show that the proposed approach can obtain a higher success rate than the state-of-the-art methods, even when the number of observations is limited or the corruption ratio is high. Experimental results utilizing synthetic data and real sensing applications (high dynamic range imaging, background modeling, removing noise and shadows) demonstrate the effectiveness, robustness and efficiency of the proposed method.
△ Less
Submitted 27 October, 2020; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Learning a Deep Part-based Representation by Preserving Data Distribution
Authors:
Anyong Qin,
Zhaowei Shang,
Zhuolin Tan,
Taiping Zhang,
Yuan Yan Tang
Abstract:
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems. The deep autoencoder network which constrains the weights to be non-negative, can learn a low dimensional part-based representation of data. On the other hand, the inherent structure of the each data cluster can be described by the distribution of the intraclass…
▽ More
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems. The deep autoencoder network which constrains the weights to be non-negative, can learn a low dimensional part-based representation of data. On the other hand, the inherent structure of the each data cluster can be described by the distribution of the intraclass samples. Then one hopes to learn a new low dimensional representation which can preserve the intrinsic structure embedded in the original high dimensional data space perfectly. In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding (DPNE). In DPNE, we first need to estimate the distribution of the original high dimensional data using the $k$-nearest neighbor kernel density estimation, and then we seek a part-based representation which respects the above distribution. The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI. It turns out that the manifold structure in the raw data can be well preserved in the low dimensional feature space.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization
Authors:
Boyu Zhang,
A. K. Qin,
Hong Pan,
Timos Sellis
Abstract:
Conventional DNN training paradigms typically rely on one training set and one validation set, obtained by partitioning an annotated dataset used for training, namely gross training set, in a certain way. The training set is used for training the model while the validation set is used to estimate the generalization performance of the trained model as the training proceeds to avoid over-fitting. Th…
▽ More
Conventional DNN training paradigms typically rely on one training set and one validation set, obtained by partitioning an annotated dataset used for training, namely gross training set, in a certain way. The training set is used for training the model while the validation set is used to estimate the generalization performance of the trained model as the training proceeds to avoid over-fitting. There exist two major issues in this paradigm. Firstly, the validation set may hardly guarantee an unbiased estimate of generalization performance due to potential mismatching with test data. Secondly, training a DNN corresponds to solve a complex optimization problem, which is prone to getting trapped into inferior local optima and thus leads to undesired training results. To address these issues, we propose a novel DNN training framework. It generates multiple pairs of training and validation sets from the gross training set via random splitting, trains a DNN model of a pre-specified structure on each pair while making the useful knowledge (e.g., promising network parameters) obtained from one model training process to be transferred to other model training processes via multi-task optimization, and outputs the best, among all trained models, which has the overall best performance across the validation sets from all pairs. The knowledge transfer mechanism featured in this new framework can not only enhance training effectiveness by helping the model training process to escape from local optima but also improve on generalization performance via implicit regularization imposed on one model training process from other model training processes. We implement the proposed framework, parallelize the implementation on a GPU cluster, and apply it to train several widely used DNN models. Experimental results demonstrate the superiority of the proposed framework over the conventional training paradigm.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
PGD-UNet: A Position-Guided Deformable Network for Simultaneous Segmentation of Organs and Tumors
Authors:
Ziqiang Li,
Hong Pan,
Yaping Zhu,
A. K. Qin
Abstract:
Precise segmentation of organs and tumors plays a crucial role in clinical applications. It is a challenging task due to the irregular shapes and various sizes of organs and tumors as well as the significant class imbalance between the anatomy of interest (AOI) and the background region. In addition, in most situation tumors and normal organs often overlap in medical images, but current approaches…
▽ More
Precise segmentation of organs and tumors plays a crucial role in clinical applications. It is a challenging task due to the irregular shapes and various sizes of organs and tumors as well as the significant class imbalance between the anatomy of interest (AOI) and the background region. In addition, in most situation tumors and normal organs often overlap in medical images, but current approaches fail to delineate both tumors and organs accurately. To tackle such challenges, we propose a position-guided deformable UNet, namely PGD-UNet, which exploits the spatial deformation capabilities of deformable convolution to deal with the geometric transformation of both organs and tumors. Position information is explicitly encoded into the network to enhance the capabilities of deformation. Meanwhile, we introduce a new pooling module to preserve position information lost in conventional max-pooling operation. Besides, due to unclear boundaries between different structures as well as the subjectivity of annotations, labels are not necessarily accurate for medical image segmentation tasks. It may cause the overfitting of the trained network due to label noise. To address this issue, we formulate a novel loss function to suppress the influence of potential label noise on the training process. Our method was evaluated on two challenging segmentation tasks and achieved very promising segmentation accuracy in both tasks.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition
Authors:
Ziming Liu,
Guangyu Gao,
A. K. Qin,
Jinyang Li
Abstract:
State-of-the-art video action recognition models with complex network architecture have archived significant improvements, but these models heavily depend on large-scale well-labeled datasets. To reduce such dependency, we propose a self-supervised teacher-student architecture, i.e., the Differentiated Teachers Guided self-supervised Network (DTG-Net). In DTG-Net, except for reducing labeled data…
▽ More
State-of-the-art video action recognition models with complex network architecture have archived significant improvements, but these models heavily depend on large-scale well-labeled datasets. To reduce such dependency, we propose a self-supervised teacher-student architecture, i.e., the Differentiated Teachers Guided self-supervised Network (DTG-Net). In DTG-Net, except for reducing labeled data dependency by self-supervised learning (SSL), pre-trained action related models are used as teacher guidance providing prior knowledge to alleviate the demand for a large number of unlabeled videos in SSL. Specifically, leveraging the years of effort in action-related tasks, e.g., image classification, image-based action recognition, the DTG-Net learns the self-supervised video representation under various teacher guidance, i.e., those well-trained models of action-related tasks. Meanwhile, the DTG-Net is optimized in the way of contrastive self-supervised learning. When two image sequences are randomly sampled from the same video or different videos as the positive or negative pairs, respectively, they are then sent to the teacher and student networks for feature embedding. After that, the contrastive feature consistency is defined between features embedding of each pair, i.e., consistent for positive pair and inconsistent for negative pairs. Meanwhile, to reflect various teacher tasks' different guidance, we also explore different weighted guidance on teacher tasks. Finally, the DTG-Net is evaluated in two ways: (i) the self-supervised DTG-Net to pre-train the supervised action recognition models with only unlabeled videos; (ii) the supervised DTG-Net to be jointly trained with the supervised action networks in an end-to-end way. Its performance is better than most pre-training methods but also has excellent competitiveness compared to supervised action recognition methods.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
Distributed Machine Learning for Predictive Analytics in Mobile Edge Computing Based IoT Environments
Authors:
Prabath Abeysekara,
Hai Dong,
A. K. Qin
Abstract:
Predictive analytics in Mobile Edge Computing (MEC) based Internet of Things (IoT) is becoming a high demand in many real-world applications. A prediction problem in an MEC-based IoT environment typically corresponds to a collection of tasks with each task solved in a specific MEC environment based on the data accumulated locally, which can be regarded as a Multi-task Learning (MTL) problem. Howev…
▽ More
Predictive analytics in Mobile Edge Computing (MEC) based Internet of Things (IoT) is becoming a high demand in many real-world applications. A prediction problem in an MEC-based IoT environment typically corresponds to a collection of tasks with each task solved in a specific MEC environment based on the data accumulated locally, which can be regarded as a Multi-task Learning (MTL) problem. However, the heterogeneity of the data (non-IIDness) accumulated across different MEC environments challenges the application of general MTL techniques in such a setting. Federated MTL (FMTL) has recently emerged as an attempt to address this issue. Besides FMTL, there exists another powerful but under-exploited distributed machine learning technique, called Network Lasso (NL), which is inherently related to FMTL but has its own unique features. In this paper, we made an in-depth evaluation and comparison of these two techniques on three distinct IoT datasets representing real-world application scenarios. Experimental results revealed that NL outperformed FMTL in MEC-based IoT environments in terms of both accuracy and computational efficiency.
△ Less
Submitted 29 July, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Adversarial Camouflage: Hiding Physical-World Attacks with Natural Styles
Authors:
Ranjie Duan,
Xingjun Ma,
Yisen Wang,
James Bailey,
A. K. Qin,
Yun Yang
Abstract:
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Existing works have mostly focused on either digital adversarial examples created via small and imperceptible perturbations, or physical-world adversarial examples created with large and less realistic distortions that are easily identified by human observers. In this paper, we propose a novel approach, called Adversar…
▽ More
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Existing works have mostly focused on either digital adversarial examples created via small and imperceptible perturbations, or physical-world adversarial examples created with large and less realistic distortions that are easily identified by human observers. In this paper, we propose a novel approach, called Adversarial Camouflage (\emph{AdvCam}), to craft and camouflage physical-world adversarial examples into natural styles that appear legitimate to human observers. Specifically, \emph{AdvCam} transfers large adversarial perturbations into customized styles, which are then "hidden" on-target object or off-target background. Experimental evaluation shows that, in both digital and physical-world scenarios, adversarial examples crafted by \emph{AdvCam} are well camouflaged and highly stealthy, while remaining effective in fooling state-of-the-art DNN image classifiers. Hence, \emph{AdvCam} is a flexible approach that can help craft stealthy attacks to evaluate the robustness of DNNs. \emph{AdvCam} can also be used to protect private information from being detected by deep learning systems.
△ Less
Submitted 22 June, 2020; v1 submitted 8 March, 2020;
originally announced March 2020.
-
An extended catalog of galaxy-galaxy strong gravitational lenses discovered in DES using convolutional neural networks
Authors:
C. Jacobs,
T. Collett,
K. Glazebrook,
E. Buckley-Geer,
H. T. Diehl,
H. Lin,
C. McCarthy,
A. K. Qin,
C. Odden,
M. Caso Escudero,
P. Dial,
V. J. Yung,
S. Gaitsch,
A. Pellico,
K. A. Lindgren,
T. M. C. Abbott,
J. Annis,
S. Avila,
D. Brooks,
D. L. Burke,
A. Carnero Rosell,
M. Carrasco Kind,
J. Carretero,
L. N. da Costa,
J. De Vicente
, et al. (33 additional authors not shown)
Abstract:
We search Dark Energy Survey (DES) Year 3 imaging for galaxy-galaxy strong gravitational lenses using convolutional neural networks, extending previous work with new training sets and covering a wider range of redshifts and colors. We train two neural networks using images of simulated lenses, then use them to score postage stamp images of 7.9 million sources from the Dark Energy Survey chosen to…
▽ More
We search Dark Energy Survey (DES) Year 3 imaging for galaxy-galaxy strong gravitational lenses using convolutional neural networks, extending previous work with new training sets and covering a wider range of redshifts and colors. We train two neural networks using images of simulated lenses, then use them to score postage stamp images of 7.9 million sources from the Dark Energy Survey chosen to have plausible lens colors based on simulations. We examine 1175 of the highest-scored candidates and identify 152 probable or definite lenses. Examining an additional 20,000 images with lower scores, we identify a further 247 probable or definite candidates. After including 86 candidates discovered in earlier searches using neural networks and 26 candidates discovered through visual inspection of blue-near-red objects in the DES catalog, we present a catalog of 511 lens candidates.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Location-Centered House Price Prediction: A Multi-Task Learning Approach
Authors:
Guangliang Gao,
Zhifeng Bao,
Jie Cao,
A. K. Qin,
Timos Sellis,
Zhiang Wu
Abstract:
Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, investors, and agents. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we define and capture a fine-grained location profile powered by a diverse range of location data sou…
▽ More
Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, investors, and agents. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we define and capture a fine-grained location profile powered by a diverse range of location data sources, such as transportation profile (e.g., distance to nearest train station), education profile (e.g., school zones and ranking), suburb profile based on census data, facility profile (e.g., nearby hospitals, supermarkets). Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire house data for modeling, or split the entire data and model each partition independently. However, such modeling ignores the relatedness between partitions, and for all prediction scenarios, there may not be sufficient training samples per partition for the latter approach. We address this problem by conducting a careful study of exploiting the Multi-Task Learning (MTL) model. Specifically, we map the strategies for splitting the entire house data to the ways the tasks are defined in MTL, and each partition obtained is aligned with a task. Furthermore, we select specific MTL-based methods with different regularization terms to capture and exploit the relatedness between tasks. Based on real-world house transaction data collected in Melbourne, Australia. We design extensive experimental evaluations, and the results indicate a significant superiority of MTL-based methods over state-of-the-art approaches. Meanwhile, we conduct an in-depth analysis on the impact of task definitions and method selections in MTL on the prediction performance, and demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network
Authors:
Chang Liu,
Zhaowei Shang,
Anyong Qin
Abstract:
Image denoising is a classical problem in low level computer vision. Model-based optimization methods and deep learning approaches have been the two main strategies for solving the problem. Model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming. In contrast, deep learning methods have fast testing speed but the performance of these CNNs…
▽ More
Image denoising is a classical problem in low level computer vision. Model-based optimization methods and deep learning approaches have been the two main strategies for solving the problem. Model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming. In contrast, deep learning methods have fast testing speed but the performance of these CNNs is still inferior. To address this issue, here we propose a novel deep residual learning model that combines the dilated residual convolution and multi-scale convolution groups. Due to the complex patterns and structures of inside an image, the multiscale convolution group is utilized to learn those patterns and enlarge the receptive field. Specifically, the residual connection and batch normalization are utilized to speed up the training process and maintain the denoising performance. In order to decrease the gridding artifacts, we integrate the hybrid dilated convolution design into our model. To this end, this paper aims to train a lightweight and effective denoiser based on multiscale convolution group. Experimental results have demonstrated that the enhanced denoiser can not only achieve promising denoising results, but also become a strong competitor in practical application.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Finding high-redshift strong lenses in DES using convolutional neural networks
Authors:
C. Jacobs,
T. Collett,
K. Glazebrook,
C. McCarthy,
A. K. Qin,
T. M. C. Abbott,
F. B. Abdalla,
J. Annis,
S. Avila,
K. Bechtol,
E. Bertin,
D. Brooks,
E. Buckley-Geer,
D. L. Burke,
A. Carnero Rosell,
M. Carrasco Kind,
J. Carretero,
L. N. da Costa,
C. Davis,
J. De Vicente,
S. Desai,
H. T. Diehl,
P. Doel,
T. F. Eifler,
B. Flaugher
, et al. (41 additional authors not shown)
Abstract:
We search Dark Energy Survey (DES) Year 3 imaging data for galaxy-galaxy strong gravitational lenses using convolutional neural networks. We generate 250,000 simulated lenses at redshifts > 0.8 from which we create a data set for training the neural networks with realistic seeing, sky and shot noise. Using the simulations as a guide, we build a catalogue of 1.1 million DES sources with 1.8 < g - i…
▽ More
We search Dark Energy Survey (DES) Year 3 imaging data for galaxy-galaxy strong gravitational lenses using convolutional neural networks. We generate 250,000 simulated lenses at redshifts > 0.8 from which we create a data set for training the neural networks with realistic seeing, sky and shot noise. Using the simulations as a guide, we build a catalogue of 1.1 million DES sources with 1.8 < g - i < 5, 0.6 < g -r < 3, r_mag > 19, g_mag > 20 and i_mag > 18.2. We train two ensembles of neural networks on training sets consisting of simulated lenses, simulated non-lenses, and real sources. We use the neural networks to score images of each of the sources in our catalogue with a value from 0 to 1, and select those with scores greater than a chosen threshold for visual inspection, resulting in a candidate set of 7,301 galaxies. During visual inspection we rate 84 as "probably" or "definitely" lenses. Four of these are previously known lenses or lens candidates. We inspect a further 9,428 candidates with a different score threshold, and identify four new candidates. We present 84 new strong lens candidates, selected after a few hours of visual inspection by astronomers. This catalogue contains a comparable number of high-redshift lenses to that predicted by simulations. Based on simulations we estimate our sample to contain most discoverable lenses in this imaging and at this redshift range.
△ Less
Submitted 22 January, 2019; v1 submitted 9 November, 2018;
originally announced November 2018.
-
Superconductivity in potassium-doped 2,2$'$-bipyridine
Authors:
Kai Zhang,
Ren-Shu Wang,
An-Jun Qin,
Xiao-Jia Chen
Abstract:
Organic compounds are always promising candidates of superconductors with high transition temperatures. We examine this proposal by choosing 2,2$'$-bipyridine solely composed by C, H, and N atoms. The presence of Meissner effect with a transition temperature of 7.2 K in this material upon potassium doping is demonstrated by the $dc$ magnetic susceptibility measurements. The real part of the $ac$ s…
▽ More
Organic compounds are always promising candidates of superconductors with high transition temperatures. We examine this proposal by choosing 2,2$'$-bipyridine solely composed by C, H, and N atoms. The presence of Meissner effect with a transition temperature of 7.2 K in this material upon potassium doping is demonstrated by the $dc$ magnetic susceptibility measurements. The real part of the $ac$ susceptibility exhibits the same transition temperature as that in $dc$ magnetization, and a sharp peak appeared in the imaginary part indicates the formation of the weakly linked superconducting vortex current. The occurence of superconductivity is further supported by the resistance drop at the transition together with its suppression by the applied magnetic fields. The superconducting phase is identified to be K$_3$-2,2$'$-bipyridine from the analysis of Raman scattering spectra. This work not only opens an encouraging window for finding superconductivity after optoelectronics in 2,2$'$-bipyridine-based materials but also offers an example to realize superconductivity from conducting polymers and their derivatives.
△ Less
Submitted 24 January, 2018; v1 submitted 19 January, 2018;
originally announced January 2018.
-
Evolutionary Multitasking for Single-objective Continuous Optimization: Benchmark Problems, Performance Metric, and Baseline Results
Authors:
Bingshui Da,
Yew-Soon Ong,
Liang Feng,
A. K. Qin,
Abhishek Gupta,
Zexuan Zhu,
Chuan-Kang Ting,
Ke Tang,
Xin Yao
Abstract:
In this report, we suggest nine test problems for multi-task single-objective optimization (MTSOO), each of which consists of two single-objective optimization tasks that need to be solved simultaneously. The relationship between tasks varies between different test problems, which would be helpful to have a comprehensive evaluation of the MFO algorithms. It is expected that the proposed test probl…
▽ More
In this report, we suggest nine test problems for multi-task single-objective optimization (MTSOO), each of which consists of two single-objective optimization tasks that need to be solved simultaneously. The relationship between tasks varies between different test problems, which would be helpful to have a comprehensive evaluation of the MFO algorithms. It is expected that the proposed test problems will germinate progress the field of the MTSOO research.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
Evolutionary Multitasking for Multiobjective Continuous Optimization: Benchmark Problems, Performance Metrics and Baseline Results
Authors:
Yuan Yuan,
Yew-Soon Ong,
Liang Feng,
A. K. Qin,
Abhishek Gupta,
Bingshui Da,
Qingfu Zhang,
Kay Chen Tan,
Yaochu Jin,
Hisao Ishibuchi
Abstract:
In this report, we suggest nine test problems for multi-task multi-objective optimization (MTMOO), each of which consists of two multiobjective optimization tasks that need to be solved simultaneously. The relationship between tasks varies between different test problems, which would be helpful to have a comprehensive evaluation of the MO-MFO algorithms. It is expected that the proposed test probl…
▽ More
In this report, we suggest nine test problems for multi-task multi-objective optimization (MTMOO), each of which consists of two multiobjective optimization tasks that need to be solved simultaneously. The relationship between tasks varies between different test problems, which would be helpful to have a comprehensive evaluation of the MO-MFO algorithms. It is expected that the proposed test problems will germinate progress the field of the MTMOO research.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.