Search | arXiv e-print repository

AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models

Authors: Ken Chen, Sachith Seneviratne, Wei Wang, Dongting Hu, Sanjay Saha, Md. Tarek Hasan, Sanka Rasnayaka, Tamasha Malepathirana, Mingming Gong, Saman Halgamuge

Abstract: Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression condit… ▽ More Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression conditions. However, the mechanisms used in these methods to control pose and expression often inadvertently introduce identity information from the driving video, while also causing a loss of expression-related details. This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for high-fidelity face reenactment. First, we propose an enhanced 2D facial snapshot conditioning approach by facial shape alignment to prevent the inclusion of identity information from the driving video. Then, we introduce an expression adapter conditioning mechanism to address the potential loss of expression-related information. Our approach effectively preserves pose and expression fidelity from the driving video while retaining the identity and fine details of the source image. Through experiments on the VoxCeleb dataset, we demonstrate that our method achieves state-of-the-art results in face reenactment, showcasing superior image quality, identity preservation, and expression accuracy, especially for cross-identity scenarios. Considering the ethical concerns surrounding potential misuse, we analyze the implications of our method, evaluate current state-of-the-art deepfake detectors, and identify their shortcomings to guide future research. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2403.04492 [pdf, other]

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Authors: Rashindrie Perera, Saman Halgamuge

Abstract: In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address ov… ▽ More In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address overfitting associated with fine-tuning a large number of parameters on small datasets. This strategy employs a linear transformation of pre-trained features, significantly reducing the trainable parameter count. Second, we replace the traditional nearest centroid classifier with a discriminative sample-aware loss function, enhancing the model's sensitivity to the inter- and intra-class variances within the training set for improved clustering in feature space. Empirical evaluations on the Meta-Dataset benchmark showcase that our approach not only improves accuracy up to 7.7\% and 5.3\% on previously seen and unseen datasets, respectively, but also achieves the above performance while being at least $\sim3\times$ more parameter-efficient than existing methods, establishing a new state-of-the-art in cross-domain few-shot learning. Our code is available at https://github.com/rashindrie/DIPA. △ Less

Submitted 3 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Code is available at this link: https://github.com/rashindrie/DIPA

arXiv:2403.01653 [pdf, other]

Day-ahead regional solar power forecasting with hierarchical temporal convolutional neural networks using historical power generation and weather data

Authors: Maneesha Perera, Julian De Hoog, Kasun Bandara, Damith Senanayake, Saman Halgamuge

Abstract: Regional solar power forecasting, which involves predicting the total power generation from all rooftop photovoltaic systems in a region holds significant importance for various stakeholders in the energy sector. However, the vast amount of solar power generation and weather time series from geographically dispersed locations that need to be considered in the forecasting process makes accurate reg… ▽ More Regional solar power forecasting, which involves predicting the total power generation from all rooftop photovoltaic systems in a region holds significant importance for various stakeholders in the energy sector. However, the vast amount of solar power generation and weather time series from geographically dispersed locations that need to be considered in the forecasting process makes accurate regional forecasting challenging. Therefore, previous work has limited the focus to either forecasting a single time series (i.e., aggregated time series) which is the addition of all solar generation time series in a region, disregarding the location-specific weather effects or forecasting solar generation time series of each PV site (i.e., individual time series) independently using location-specific weather data, resulting in a large number of forecasting models. In this work, we propose two deep-learning-based regional forecasting methods that can effectively leverage both types of time series (aggregated and individual) with weather data in a region. We propose two hierarchical temporal convolutional neural network architectures (HTCNN) and two strategies to adapt HTCNNs for regional solar power forecasting. At first, we explore generating a regional forecast using a single HTCNN. Next, we divide the region into multiple sub-regions based on weather information and train separate HTCNNs for each sub-region; the forecasts of each sub-region are then added to generate a regional forecast. The proposed work is evaluated using a large dataset collected over a year from 101 locations across Western Australia to provide a day ahead forecast. We compare our approaches with well-known alternative methods and show that the sub-region HTCNN requires fewer individual networks and achieves a forecast skill score of 40.2% reducing a statistically significant error by 6.5% compared to the best counterpart. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 37 pages, 16 figures, Accepted to the journal of Applied Energy

arXiv:2401.03104 [pdf, other]

When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks

Authors: Haihang Wu, Wei Wang, Tamasha Malepathirana, Damith Senanayake, Denny Oetomo, Saman Halgamuge

Abstract: Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced… ▽ More Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced by the chosen policy for growth timing. While this regularization effect may mitigate the overfitting risk of the model, it may lead to a notable accuracy drop when the model underfits. Yet, current approaches have not addressed this issue due to their lack of consideration of the regularization effect from neural growth. Motivated by these findings, we propose an under/over fitting risk-aware growth timing policy, which automatically adjusts the growth timing informed by the level of potential under/overfitting risks to address both risks. Comprehensive experiments conducted using CIFAR-10/100 and ImageNet datasets show that the proposed policy achieves accuracy improvements of up to 1.3% in models prone to underfitting while achieving similar accuracies in models suffering from overfitting compared to the existing methods. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI'24

arXiv:2312.10913 [pdf, other]

GINN-LP: A Growing Interpretable Neural Network for Discovering Multivariate Laurent Polynomial Equations

Authors: Nisal Ranasinghe, Damith Senanayake, Sachith Seneviratne, Malin Premaratne, Saman Halgamuge

Abstract: Traditional machine learning is generally treated as a black-box optimization problem and does not typically produce interpretable functions that connect inputs and outputs. However, the ability to discover such interpretable functions is desirable. In this work, we propose GINN-LP, an interpretable neural network to discover the form and coefficients of the underlying equation of a dataset, when… ▽ More Traditional machine learning is generally treated as a black-box optimization problem and does not typically produce interpretable functions that connect inputs and outputs. However, the ability to discover such interpretable functions is desirable. In this work, we propose GINN-LP, an interpretable neural network to discover the form and coefficients of the underlying equation of a dataset, when the equation is assumed to take the form of a multivariate Laurent Polynomial. This is facilitated by a new type of interpretable neural network block, named the "power-term approximator block", consisting of logarithmic and exponential activation functions. GINN-LP is end-to-end differentiable, making it possible to use backpropagation for training. We propose a neural network growth strategy that will enable finding the suitable number of terms in the Laurent polynomial that represents the data, along with sparsity regularization to promote the discovery of concise equations. To the best of our knowledge, this is the first model that can discover arbitrary multivariate Laurent polynomial terms without any prior information on the order. Our approach is first evaluated on a subset of data used in SRBench, a benchmark for symbolic regression. We first show that GINN-LP outperforms the state-of-the-art symbolic regression methods on datasets generated using 48 real-world equations in the form of multivariate Laurent polynomials. Next, we propose an ensemble method that combines our method with a high-performing symbolic regression method, enabling us to discover non-Laurent polynomial equations. We achieve state-of-the-art results in equation discovery, showing an absolute improvement of 7.1% over the best contender, by applying this ensemble method to 113 datasets within SRBench with known ground-truth equations. △ Less

Submitted 14 February, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: 14 pages, 12 figures, Accepted by AAAI24

arXiv:2308.09297 [pdf, other]

NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning

Authors: Tamasha Malepathirana, Damith Senanayake, Saman Halgamuge

Abstract: Catastrophic forgetting; the loss of old knowledge upon acquiring new knowledge, is a pitfall faced by deep neural networks in real-world applications. Many prevailing solutions to this problem rely on storing exemplars (previously encountered data), which may not be feasible in applications with memory limitations or privacy constraints. Therefore, the recent focus has been on Non-Exemplar based… ▽ More Catastrophic forgetting; the loss of old knowledge upon acquiring new knowledge, is a pitfall faced by deep neural networks in real-world applications. Many prevailing solutions to this problem rely on storing exemplars (previously encountered data), which may not be feasible in applications with memory limitations or privacy constraints. Therefore, the recent focus has been on Non-Exemplar based Class Incremental Learning (NECIL) where a model incrementally learns about new classes without using any past exemplars. However, due to the lack of old data, NECIL methods struggle to discriminate between old and new classes causing their feature representations to overlap. We propose NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization, a framework that reduces this class overlap in NECIL. We draw inspiration from Neural Gas to learn the topological relationships in the feature space, identifying the neighboring classes that are most likely to get confused with each other. This neighborhood information is utilized to enforce strong separation between the neighboring classes as well as to generate old class representative prototypes that can better aid in obtaining a discriminative decision boundary between old and new classes. Our comprehensive experiments on CIFAR-100, TinyImageNet, and ImageNet-Subset demonstrate that NAPA-VQ outperforms the State-of-the-art NECIL methods by an average improvement of 5%, 2%, and 4% in accuracy and 10%, 3%, and 9% in forgetting respectively. Our code can be found in https://github.com/TamashaM/NAPA-VQ.git. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023

arXiv:2305.06564 [pdf, other]

Undercover Deepfakes: Detecting Fake Segments in Videos

Authors: Sanjay Saha, Rashindrie Perera, Sachith Seneviratne, Tamasha Malepathirana, Sanka Rasnayaka, Deshani Geethika, Terence Sim, Saman Halgamuge

Abstract: The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such… ▽ More The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such generative techniques creates a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth. This paradigm has been under-explored by the current deepfake detection methods in the academic literature. In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels. To facilitate testing our method, we prepared a new benchmark dataset where videos have both real and fake frame sequences with very subtle transitions. We provide a benchmark on the proposed dataset with our detection method which utilizes the Vision Transformer based on Scaling and Shifting to learn spatial features, and a Timeseries Transformer to learn temporal features of the videos to help facilitate the interpretation of possible deepfakes. Extensive experiments on a variety of deepfake generation methods show excellent results by the proposed method on temporal segmentation and classical video-level predictions as well. In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes. All experiments can be reproduced at: github.com/rgb91/temporal-deepfake-segmentation. △ Less

Submitted 24 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: ICCV 2023 Workshop and Challenge on DeepFake Analysis and Detection

arXiv:2211.16699 [pdf]

doi 10.4038/jnsfsr.v50i0.11249

Interpretability and accessibility of machine learning in selected food processing, agriculture and health applications

Authors: N. Ranasinghe, A. Ramanan, S. Fernando, P. N. Hameed, D. Herath, T. Malepathirana, P. Suganthan, M. Niranjan, S. Halgamuge

Abstract: Artificial Intelligence (AI) and its data-centric branch of machine learning (ML) have greatly evolved over the last few decades. However, as AI is used increasingly in real world use cases, the importance of the interpretability of and accessibility to AI systems have become major research areas. The lack of interpretability of ML based systems is a major hindrance to widespread adoption of these… ▽ More Artificial Intelligence (AI) and its data-centric branch of machine learning (ML) have greatly evolved over the last few decades. However, as AI is used increasingly in real world use cases, the importance of the interpretability of and accessibility to AI systems have become major research areas. The lack of interpretability of ML based systems is a major hindrance to widespread adoption of these powerful algorithms. This is due to many reasons including ethical and regulatory concerns, which have resulted in poorer adoption of ML in some areas. The recent past has seen a surge in research on interpretable ML. Generally, designing a ML system requires good domain understanding combined with expert knowledge. New techniques are emerging to improve ML accessibility through automated model design. This paper provides a review of the work done to improve interpretability and accessibility of machine learning in the context of global problems while also being relevant to developing countries. We review work under multiple levels of interpretability including scientific and mathematical interpretation, statistical interpretation and partial semantic interpretation. This review includes applications in three areas, namely food processing, agriculture and health. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: published in the "Journal of the National Science Foundation of Sri Lanka, Volume 50"

Journal ref: Journal of the National Science Foundation of Sri Lanka (2022), Vol 50, 263-276

arXiv:2206.10795 [pdf, other]

doi 10.1016/j.eswa.2022.117690

Multi-Resolution, Multi-Horizon Distributed Solar PV Power Forecasting with Forecast Combinations

Authors: Maneesha Perera, Julian De Hoog, Kasun Bandara, Saman Halgamuge

Abstract: Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increasing rate. This can cause major impacts on distribution networks and energy markets. As a result, there is a significant need for improved forecasting of the power generation of these systems at different time resolutions and horizons. However, the performance of forecasting models depends on the resolu… ▽ More Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increasing rate. This can cause major impacts on distribution networks and energy markets. As a result, there is a significant need for improved forecasting of the power generation of these systems at different time resolutions and horizons. However, the performance of forecasting models depends on the resolution and horizon. Forecast combinations (ensembles), that combine the forecasts of multiple models into a single forecast may be robust in such cases. Therefore, in this paper, we provide comparisons and insights into the performance of five state-of-the-art forecast models and existing forecast combinations at multiple resolutions and horizons. We propose a forecast combination approach based on particle swarm optimization (PSO) that will enable a forecaster to produce accurate forecasts for the task at hand by weighting the forecasts produced by individual models. Furthermore, we compare the performance of the proposed combination approach with existing forecast combination approaches. A comprehensive evaluation is conducted using a real-world residential PV power data set measured at 25 houses located in three locations in the United States. The results across four different resolutions and four different horizons show that the PSO-based forecast combination approach outperforms the use of any individual forecast model and other forecast combination counterparts, with an average Mean Absolute Scaled Error reduction by 3.81% compared to the best performing individual model. Our approach enables a solar forecaster to produce accurate forecasts for their application regardless of the forecast resolution or horizon. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Journal ref: Expert Systems with Applications 205 (2022)

arXiv:2008.10384 [pdf, other]

doi 10.1109/TSTE.2019.2895387

An Incentive-compatible Energy Trading Framework for Neighborhood Area Networks with Shared Energy Storage

Authors: Chathurika P. Mediwaththe, Marnie Shaw, Saman Halgamuge, David B. Smith, Paul Scott

Abstract: Here, a novel energy trading system is proposed for demand-side management of a neighborhood area network (NAN) consisting of a shared energy storage (SES) provider, users with non-dispatchable energy generation, and an electricity retailer. In a leader-follower Stackelberg game, the SES provider first maximizes their revenue by setting a price signal and trading energy with the grid. Then, by fol… ▽ More Here, a novel energy trading system is proposed for demand-side management of a neighborhood area network (NAN) consisting of a shared energy storage (SES) provider, users with non-dispatchable energy generation, and an electricity retailer. In a leader-follower Stackelberg game, the SES provider first maximizes their revenue by setting a price signal and trading energy with the grid. Then, by following the SES provider's actions, the retailer minimizes social cost for the users, i.e., the sum of the total users' cost when they interact with the SES and the total cost for supplying grid energy to the users. A pricing strategy, which incorporates mechanism design, is proposed to make the system incentive-compatible by rewarding users who disclose true energy usage information. A unique Stackelberg equilibrium is achieved where the SES provider's revenue is maximized and the user-level social cost is minimized, which also rewards the retailer. A case study with realistic energy demand and generation data demonstrates 28\%~-~45\% peak demand reduction of the NAN, depending on the number of participating users, compared to a system without SES. Simulation results confirm that the retailer can also benefit financially, in addition to the SES provider and the users. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: Accepted in IEEE Transactions on Sustainable Energy

arXiv:1912.04896 [pdf, other]

doi 10.1109/TNNLS.2020.3023941.

Self Organizing Nebulous Growths for Robust and Incremental Data Visualization

Authors: Damith Senanayake, Wei Wang, Shalin H. Naik, Saman Halgamuge

Abstract: Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in providing visualizations for datasets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present Self-Organizing Nebulous Growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data… ▽ More Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in providing visualizations for datasets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present Self-Organizing Nebulous Growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data visualization, i.e., incremental addition of new data while preserving the structure of the existing visualization. In addition, SONG is capable of handling new data increments, no matter whether they are similar or heterogeneous to the already observed data distribution. We test SONG on a variety of real and simulated datasets. The results show that SONG is superior to Parametric t-SNE, t-SNE and UMAP in incremental data visualization. Specifically, for heterogeneous increments, SONG improves over Parametric t-SNE by 14.98 % on the Fashion MNIST dataset and 49.73% on the MNIST dataset regarding the cluster quality measured by the Adjusted Mutual Information scores. On similar or homogeneous increments, the improvements are 8.36% and 42.26% respectively. Furthermore, even when the above datasets are presented all at once, SONG performs better or comparable to UMAP, and superior to t-SNE. We also demonstrate that the algorithmic foundations of SONG render it more tolerant to noise compared to UMAP and t-SNE, thus providing greater utility for data with high variance, high mixing of clusters, or noise. △ Less

Submitted 1 October, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: in IEEE Transactions on Neural Networks and Learning Systems

arXiv:1902.01939 [pdf]

doi 10.1109/TNET.2018.2888864

The Fast Heuristic Algorithms and Post-Processing Techniques to Design Large and Low-Cost Communication Networks

Authors: Yahui Sun, Marcus Brazil, Doreen Thomas, Saman Halgamuge

Abstract: It is challenging to design large and low-cost communication networks. In this paper, we formulate this challenge as the prize-collecting Steiner Tree Problem (PCSTP). The objective is to minimize the costs of transmission routes and the disconnected monetary or informational profits. Initially, we note that the PCSTP is MAX SNP-hard. Then, we propose some post-processing techniques to improve sub… ▽ More It is challenging to design large and low-cost communication networks. In this paper, we formulate this challenge as the prize-collecting Steiner Tree Problem (PCSTP). The objective is to minimize the costs of transmission routes and the disconnected monetary or informational profits. Initially, we note that the PCSTP is MAX SNP-hard. Then, we propose some post-processing techniques to improve suboptimal solutions to PCSTP. Based on these techniques, we propose two fast heuristic algorithms: the first one is a quasilinear time heuristic algorithm that is faster and consumes less memory than other algorithms; and the second one is an improvement of a stateof-the-art polynomial time heuristic algorithm that can find high-quality solutions at a speed that is only inferior to the first one. We demonstrate the competitiveness of our heuristic algorithms by comparing them with the state-of-the-art ones on the largest existing benchmark instances (169 800 vertices and 338 551 edges). Moreover, we generate new instances that are even larger (1 000 000 vertices and 10 000 000 edges) to further demonstrate their advantages in large networks. The state-ofthe-art algorithms are too slow to find high-quality solutions for instances of this size, whereas our new heuristic algorithms can do this in around 6 to 45s on a personal computer. Ultimately, we apply our post-processing techniques to update the bestknown solution for a notoriously difficult benchmark instance to show that they can improve near-optimal solutions to PCSTP. In conclusion, we demonstrate the usefulness of our heuristic algorithms and post-processing techniques for designing large and low-cost communication networks. △ Less

Submitted 8 January, 2019; originally announced February 2019.

Journal ref: IEEE/ACM Transactions on Networking; Jan 2019

arXiv:1812.09916 [pdf, other]

Improving MMD-GAN Training with Repulsive Loss Function

Authors: Wei Wang, Yuan Sun, Saman Halgamuge

Abstract: Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning o… ▽ More Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning of fine details in data as it attempts to contract the discriminator outputs of real data. To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. Second, inspired by the hinge loss, we propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the repulsive loss function. The proposed methods are applied to the unsupervised image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets. Results show that the repulsive loss function significantly improves over the MMD loss at no additional computational cost and outperforms other representative loss functions. The proposed methods achieve an FID score of 16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral normalization. △ Less

Submitted 8 February, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

Comments: Published as a conference paper at ICLR 2019

Showing 1–13 of 13 results for author: Halgamuge, S