Search | arXiv e-print repository

FairSSD: Understanding Bias in Synthetic Speech Detectors

Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp

Abstract: Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect… ▽ More Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024 (WMF)

arXiv:2402.14205 [pdf, other]

Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer

Authors: Amit Kumar Singh Yadav, Ziyue Xiang, Kratika Bhagtani, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detect… ▽ More Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detecting synthetic speech shared on social platforms. In this paper we propose, Patched Spectrogram Synthetic Speech Detection Transformer (PS3DT), a synthetic speech detector that converts a time domain speech signal to a mel-spectrogram and processes it in patches using a transformer neural network. We evaluate the detection performance of PS3DT on ASVspoof2019 dataset. Our experiments show that PS3DT performs well on ASVspoof2019 dataset compared to other approaches using spectrogram for synthetic speech detection. We also investigate generalization performance of PS3DT on In-the-Wild dataset. PS3DT generalizes well than several existing methods on detecting synthetic speech from an out-of-distribution dataset. We also evaluate robustness of PS3DT to detect telephone quality synthetic speech and synthetic speech shared on social platforms (compressed speech). PS3DT is robust to compression and can detect telephone quality synthetic speech better than several existing methods. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted as long oral paper at ICMLA 2023

arXiv:2310.06945 [pdf]

doi 10.2352/EI.2023.35.16.AVM-111

End-to-end Evaluation of Practical Video Analytics Systems for Face Detection and Recognition

Authors: Praneet Singh, Edward J. Delp, Amy R. Reibman

Abstract: Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Typically,… ▽ More Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Typically, the modules of these systems are evaluated independently using task-specific imbalanced datasets that can misconstrue performance estimates. In this paper, we perform a thorough end-to-end evaluation of a face analytics system using a driving-specific dataset, which enables meaningful interpretations. We demonstrate how independent task evaluations, dataset imbalances, and inconsistent annotations can lead to incorrect system performance estimates. We propose strategies to create balanced evaluation subsets of our dataset and to make its annotations consistent across multiple analytics tasks and scenarios. We then evaluate the end-to-end system performance sequentially to account for task interdependencies. Our experiments show that our approach provides consistent, accurate, and interpretable estimates of the system's performance which is critical for real-world applications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted to Autonomous Vehicles and Machines 2023 Conference, IS&T Electronic Imaging (EI) Symposium

Journal ref: Electronic Imaging, 2023, pp 111-1 - 111-6

arXiv:2309.12159 [pdf, other]

Information Forensics and Security: A quarter-century-long journey

Authors: Mauro Barni, Patrizio Campisi, Edward J. Delp, Gwenael Doërr, Jessica Fridrich, Nasir Memon, Fernando Pérez-González, Anderson Rocha, Luisa Verdoliva, Min Wu

Abstract: Information Forensics and Security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information… ▽ More Information Forensics and Security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information era. The IEEE Signal Processing Society (SPS) has emerged as an important hub and leader in this area, and the article below celebrates some landmark technical contributions. In particular, we highlight the major technological advances on some selected focus areas in the field developed in the last 25 years from the research community and present future trends. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.00199 [pdf, other]

doi 10.1145/3607828.3617796

Diffusion Model with Clustering-based Conditioning for Food Image Generation

Authors: Yue Han, Jiangpeng He, Mridul Gupta, Edward J. Delp, Fengqing Zhu

Abstract: Image-based dietary assessment serves as an efficient and accurate solution for recording and analyzing nutrition intake using eating occasion images as input. Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation, which rely on large amounts of food images with annotations for training. However, such data… ▽ More Image-based dietary assessment serves as an efficient and accurate solution for recording and analyzing nutrition intake using eating occasion images as input. Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation, which rely on large amounts of food images with annotations for training. However, such data dependency poses significant barriers to real-world applications, because acquiring a substantial, diverse, and balanced set of food images can be challenging. One potential solution is to use synthetic food images for data augmentation. Although existing work has explored the use of generative adversarial networks (GAN) based structures for generation, the quality of synthetic food images still remains subpar. In addition, while diffusion-based generative models have shown promising results for general image generation tasks, the generation of food images can be challenging due to the substantial intra-class variance. In this paper, we investigate the generation of synthetic food images based on the conditional diffusion model and propose an effective clustering-based training framework, named ClusDiff, for generating high-quality and representative food images. The proposed method is evaluated on the Food-101 dataset and shows improved performance when compared with existing image generation works. We also demonstrate that the synthetic food images generated by ClusDiff can help address the severe class imbalance issue in long-tailed food classification using the VFN-LT dataset. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: Accepted for 31st ACM International Conference on Multimedia: 8th International Workshop on Multimedia Assisted Dietary Management (MADiMa 2023)

arXiv:2305.09810 [pdf, other]

Semi-Supervised Object Detection for Sorghum Panicles in UAV Imagery

Authors: Enyu Cai, Jiaqi Guo, Changye Yang, Edward J. Delp

Abstract: The sorghum panicle is an important trait related to grain yield and plant development. Detecting and counting sorghum panicles can provide significant information for plant phenotyping. Current deep-learning-based object detection methods for panicles require a large amount of training data. The data labeling is time-consuming and not feasible for real application. In this paper, we present an ap… ▽ More The sorghum panicle is an important trait related to grain yield and plant development. Detecting and counting sorghum panicles can provide significant information for plant phenotyping. Current deep-learning-based object detection methods for panicles require a large amount of training data. The data labeling is time-consuming and not feasible for real application. In this paper, we present an approach to reduce the amount of training data for sorghum panicle detection via semi-supervised learning. Results show we can achieve similar performance as supervised methods for sorghum panicle detection by only using 10\% of original training data. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2304.03323 [pdf, other]

DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Ziyue Xiang, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approach… ▽ More Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approaches. In this paper, we propose Disentangled Spectrogram Variational Auto Encoder (DSVAE) which is a two staged trained variational autoencoder that processes spectrograms of speech using disentangled representation learning to generate interpretable representations of a speech signal for detecting synthetic speech. DSVAE also creates an activation map to highlight the spectrogram regions that discriminate synthetic and bona fide human speech signals. We evaluated the representations obtained from DSVAE using the ASVspoof2019 dataset. Our experimental results show high accuracy (>98%) on detecting synthetic speech from 6 known and 10 out of 11 unknown speech synthesizers. We also visualize the representation obtained from DSVAE for 17 different speech synthesizers and verify that they are indeed interpretable and discriminate bona fide and synthetic speech from each of the synthesizers. △ Less

Submitted 28 July, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2301.09702 [pdf, other]

Illumination Variation Correction Using Image Synthesis For Unsupervised Domain Adaptive Person Re-Identification

Authors: Jiaqi Guo, Amy R. Reibman, Edward J. Delp

Abstract: Unsupervised domain adaptive (UDA) person re-identification (re-ID) aims to learn identity information from labeled images in source domains and apply it to unlabeled images in a target domain. One major issue with many unsupervised re-identification methods is that they do not perform well relative to large domain variations such as illumination, viewpoint, and occlusions. In this paper, we propo… ▽ More Unsupervised domain adaptive (UDA) person re-identification (re-ID) aims to learn identity information from labeled images in source domains and apply it to unlabeled images in a target domain. One major issue with many unsupervised re-identification methods is that they do not perform well relative to large domain variations such as illumination, viewpoint, and occlusions. In this paper, we propose a Synthesis Model Bank (SMB) to deal with illumination variation in unsupervised person re-ID. The proposed SMB consists of several convolutional neural networks (CNN) for feature extraction and Mahalanobis matrices for distance metrics. They are trained using synthetic data with different illumination conditions such that their synergistic effect makes the SMB robust against illumination variation. To better quantify the illumination intensity and improve the quality of synthetic images, we introduce a new 3D virtual-human dataset for GAN-based image synthesis. From our experiments, the proposed SMB outperforms other synthesis methods on several re-ID benchmarks. △ Less

Submitted 14 November, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 10 pages, 5 figures, 5 tables

arXiv:2301.07861 [pdf, other]

doi 10.2352/ISSN.2470-1173.2021.8.IMAWM-286

Improving Food Detection For Images From a Wearable Egocentric Camera

Authors: Yue Han, Sri Kalyan Yarlagadda, Tonmoy Ghosh, Fengqing Zhu, Edward Sazonov, Edward J. Delp

Abstract: Diet is an important aspect of our health. Good dietary habits can contribute to the prevention of many diseases and improve the overall quality of life. To better understand the relationship between diet and health, image-based dietary assessment systems have been developed to collect dietary information. We introduce the Automatic Ingestion Monitor (AIM), a device that can be attached to one's e… ▽ More Diet is an important aspect of our health. Good dietary habits can contribute to the prevention of many diseases and improve the overall quality of life. To better understand the relationship between diet and health, image-based dietary assessment systems have been developed to collect dietary information. We introduce the Automatic Ingestion Monitor (AIM), a device that can be attached to one's eye glasses. It provides an automated hands-free approach to capture eating scene images. While AIM has several advantages, images captured by the AIM are sometimes blurry. Blurry images can significantly degrade the performance of food image analysis such as food detection. In this paper, we propose an approach to pre-process images collected by the AIM imaging sensor by rejecting extremely blurry images to improve the performance of food detection. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 6 pages, 6 figures, Conference Paper for Imaging and Multimedia Analytics in a Web and Mobile World Conference, IS&T Electronic Imaging Symposium, Burlingame, CA (Virtual), January, 2021

arXiv:2210.11549 [pdf, other]

doi 10.1007/978-3-031-37742-6_24

H4VDM: H.264 Video Device Matching

Authors: Ziyue Xiang, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Methods that can determine if two given video sequences are captured by the same device (e.g., mobile telephone or digital camera) can be used in many forensics tasks. In this paper we refer to this as "video device matching". In open-set video forensics scenarios it is easier to determine if two video sequences were captured with the same device than identifying the specific device. In this paper… ▽ More Methods that can determine if two given video sequences are captured by the same device (e.g., mobile telephone or digital camera) can be used in many forensics tasks. In this paper we refer to this as "video device matching". In open-set video forensics scenarios it is easier to determine if two video sequences were captured with the same device than identifying the specific device. In this paper, we propose a technique for open-set video device matching. Given two H.264 compressed video sequences, our method can determine if they are captured by the same device, even if our method has never encountered the device in training. We denote our proposed technique as H.264 Video Device Matching (H4VDM). H4VDM uses H.264 compression information extracted from video sequences to make decisions. It is more robust against artifacts that alter camera sensor fingerprints, and it can be used to analyze relatively small fragments of the H.264 sequence. We trained and tested our method on a publicly available video forensics dataset consisting of 35 devices, where our proposed method demonstrated good performance. △ Less

Submitted 22 August, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.07546 [pdf, other]

Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

Authors: Emily R. Bartusiak, Edward J. Delp

Abstract: Speech synthesis methods can create realistic-sounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic attribution methods provide even more information about the nature of synthesized speech signals because they identify the specific speech synthesis method (i.e… ▽ More Speech synthesis methods can create realistic-sounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic attribution methods provide even more information about the nature of synthesized speech signals because they identify the specific speech synthesis method (i.e., speech synthesizer) used to create a speech signal. Due to the increasing number of realistic-sounding speech synthesizers, we propose a speech attribution method that generalizes to new synthesizers not seen during training. To do so, we investigate speech synthesizer attribution in both a closed set scenario and an open set scenario. In other words, we consider some speech synthesizers to be "known" synthesizers (i.e., part of the closed set) and others to be "unknown" synthesizers (i.e., part of the open set). We represent speech signals as spectrograms and train our proposed method, known as compact attribution transformer (CAT), on the closed set for multi-class classification. Then, we extend our analysis to the open set to attribute synthesized speech signals to both known and unknown synthesizers. We utilize a t-distributed stochastic neighbor embedding (tSNE) on the latent space of the trained CAT to differentiate between each unknown synthesizer. Additionally, we explore poly-1 loss formulations to improve attribution results. Our proposed approach successfully attributes synthesized speech signals to their respective speech synthesizers in both closed and open set scenarios. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted to the 2022 IEEE International Conference on Machine Learning and Applications

Journal ref: IEEE International Conference on Machine Learning and Applications, pp. 1-8, December 2022, Nassau, The Bahamas

arXiv:2205.03947 [pdf, other]

High-Resolution UAV Image Generation for Sorghum Panicle Detection

Authors: Enyu Cai, Zhankun Luo, Sriram Baireddy, Jiaqi Guo, Changye Yang, Edward J. Delp

Abstract: The number of panicles (or heads) of Sorghum plants is an important phenotypic trait for plant development and grain yield estimation. The use of Unmanned Aerial Vehicles (UAVs) enables the capability of collecting and analyzing Sorghum images on a large scale. Deep learning can provide methods for estimating phenotypic traits from UAV images but requires a large amount of labeled data. The lack o… ▽ More The number of panicles (or heads) of Sorghum plants is an important phenotypic trait for plant development and grain yield estimation. The use of Unmanned Aerial Vehicles (UAVs) enables the capability of collecting and analyzing Sorghum images on a large scale. Deep learning can provide methods for estimating phenotypic traits from UAV images but requires a large amount of labeled data. The lack of training data due to the labor-intensive ground truthing of UAV images causes a major bottleneck in developing methods for Sorghum panicle detection and counting. In this paper, we present an approach that uses synthetic training images from generative adversarial networks (GANs) for data augmentation to enhance the performance of Sorghum panicle detection and counting. Our method can generate synthetic high-resolution UAV RGB images with panicle labels by using image-to-image translation GANs with a limited ground truth dataset of real UAV RGB images. The results show the improvements in panicle detection and counting using our data augmentation approach. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2205.01806 [pdf, other]

Frequency Domain-Based Detection of Generated Audio

Authors: Emily R. Bartusiak, Edward J. Delp

Abstract: Attackers may manipulate audio with the intent of presenting falsified reports, changing an opinion of a public figure, and winning influence and power. The prevalence of inauthentic multimedia continues to rise, so it is imperative to develop a set of tools that determines the legitimacy of media. We present a method that analyzes audio signals to determine whether they contain real human voices… ▽ More Attackers may manipulate audio with the intent of presenting falsified reports, changing an opinion of a public figure, and winning influence and power. The prevalence of inauthentic multimedia continues to rise, so it is imperative to develop a set of tools that determines the legitimacy of media. We present a method that analyzes audio signals to determine whether they contain real human voices or fake human voices (i.e., voices generated by neural acoustic and waveform models). Instead of analyzing the audio signals directly, the proposed approach converts the audio signals into spectrogram images displaying frequency, intensity, and temporal content and evaluates them with a Convolutional Neural Network (CNN). Trained on both genuine human voice signals and synthesized voice signals, we show our approach achieves high accuracy on this classification task. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: Accepted to the 2021 Media Watermarking, Security, and Forensics Conference, IS&T Electronic Imaging Symposium (EI)

Journal ref: Proceedings of the Media Watermarking, Security, and Forensics Conference, IS&T Electronic Imaging Symposium, pp 273-1 - 273-7, January 2021, Burlingame, CA

arXiv:2205.01805 [pdf, other]

doi 10.1109/MIPR.2019.00024

Splicing Detection and Localization In Satellite Imagery Using Conditional GANs

Authors: Emily R. Bartusiak, Sri Kalyan Yarlagadda, David Güera, Paolo Bestagini, Stefano Tubaro, Fengqing M. Zhu, Edward J. Delp

Abstract: The widespread availability of image editing tools and improvements in image processing techniques allow image manipulation to be very easy. Oftentimes, easy-to-use yet sophisticated image manipulation tools yields distortions/changes imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Interne… ▽ More The widespread availability of image editing tools and improvements in image processing techniques allow image manipulation to be very easy. Oftentimes, easy-to-use yet sophisticated image manipulation tools yields distortions/changes imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Internet. Therefore, verifying image integrity poses an immense and important challenge to the digital forensic community. Satellite images specifically can be modified in a number of ways, including the insertion of objects to hide existing scenes and structures. In this paper, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forgeries within satellite images. Additionally, we identify their locations and shapes. Trained on pristine and falsified images, our method achieves high success on these detection and localization objectives. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: Accepted to the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)

Journal ref: IEEE Conference on Multimedia Information Processing and Retrieval, pp. 91-96, March 2019, San Jose, CA

arXiv:2205.01800 [pdf, other]

doi 10.1109/IEEECONF53345.2021.9723142

Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis

Authors: Emily R. Bartusiak, Edward J. Delp

Abstract: Synthesized speech is common today due to the prevalence of virtual assistants, easy-to-use tools for generating and modifying speech signals, and remote work practices. Synthesized speech can also be used for nefarious purposes, including creating a purported speech signal and attributing it to someone who did not speak the content of the signal. We need methods to detect if a speech signal is sy… ▽ More Synthesized speech is common today due to the prevalence of virtual assistants, easy-to-use tools for generating and modifying speech signals, and remote work practices. Synthesized speech can also be used for nefarious purposes, including creating a purported speech signal and attributing it to someone who did not speak the content of the signal. We need methods to detect if a speech signal is synthesized. In this paper, we analyze speech signals in the form of spectrograms with a Compact Convolutional Transformer (CCT) for synthesized speech detection. A CCT utilizes a convolutional layer that introduces inductive biases and shared weights into a network, allowing a transformer architecture to perform well with fewer data samples used for training. The CCT uses an attention mechanism to incorporate information from all parts of a signal under analysis. Trained on both genuine human voice signals and synthesized human voice signals, we demonstrate that our CCT approach successfully differentiates between genuine and synthesized speech signals. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: Accepted to the 2021 IEEE Asilomar Conference on Signals, Systems, and Computers

Journal ref: IEEE Asilomar Conference on Signals, Systems, and Computers, pp. 1426-1430, October 2021, Asilomar, CA

arXiv:2205.00952 [pdf, other]

Leaf Tar Spot Detection Using RGB Images

Authors: Sriram Baireddy, Da-Young Lee, Carlos Gongora-Canul, Christian D. Cruz, Edward J. Delp

Abstract: Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenotypin… ▽ More Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenotyping system. Deep neural networks could provide quick, automated tar spot detection with sufficient ground truth. However, manually labeling tar spots in images to serve as ground truth is also tedious and time-consuming. In this paper we first describe an approach that uses automated image analysis tools to generate ground truth images that are then used for training a Mask R-CNN. We show that a Mask R-CNN can be used effectively to detect tar spots in close-up images of leaf surfaces. We additionally show that the Mask R-CNN can also be used for in-field images of whole leaves to capture the number of tar spots and area of the leaf infected by the disease. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.12067 [pdf, other]

An Overview of Recent Work in Media Forensics: Methods and Threats

Authors: Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, Edward J. Delp

Abstract: In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions… ▽ More In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions for future research. △ Less

Submitted 12 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: This is a longer version of a paper accepted to the 2022 IEEE International Conference on Multimedia Information Processing and Retrieval entitled "An Overview of Recent Work in Multimedia Forensics"

arXiv:2203.16499 [pdf, other]

doi 10.1109/ICASSP43922.2022.9747639

Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers

Authors: Ziyue Xiang, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Audio signals are often stored and transmitted in compressed formats. Among the many available audio compression schemes, MPEG-1 Audio Layer III (MP3) is very popular and widely used. Since MP3 is lossy it leaves characteristic traces in the compressed audio which can be used forensically to expose the past history of an audio file. In this paper, we consider the scenario of audio signal manipulat… ▽ More Audio signals are often stored and transmitted in compressed formats. Among the many available audio compression schemes, MPEG-1 Audio Layer III (MP3) is very popular and widely used. Since MP3 is lossy it leaves characteristic traces in the compressed audio which can be used forensically to expose the past history of an audio file. In this paper, we consider the scenario of audio signal manipulation done by temporal splicing of compressed and uncompressed audio signals. We propose a method to find the temporal location of the splices based on transformer networks. Our method identifies which temporal portions of a audio signal have undergone single or multiple compression at the temporal frame level, which is the smallest temporal unit of MP3 compression. We tested our method on a dataset of 486,743 MP3 audio clips. Our method achieved higher performance and demonstrated robustness with respect to different MP3 data when compared with existing methods. △ Less

Submitted 28 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2112.08739 [pdf, other]

doi 10.1109/ACCESS.2022.3179116

Forensic Analysis of Synthetically Generated Western Blot Images

Authors: Sara Mandelli, Davide Cozzolino, Edoardo D. Cannas, Joao P. Cardenuto, Daniel Moreira, Paolo Bestagini, Walter J. Scheirer, Anderson Rocha, Luisa Verdoliva, Stefano Tubaro, Edward J. Delp

Abstract: The widespread diffusion of synthetically generated content is a serious threat that needs urgent countermeasures. As a matter of fact, the generation of synthetic content is not restricted to multimedia data like videos, photographs or audio sequences, but covers a significantly vast area that can include biological images as well, such as western blot and microscopic images. In this paper, we fo… ▽ More The widespread diffusion of synthetically generated content is a serious threat that needs urgent countermeasures. As a matter of fact, the generation of synthetic content is not restricted to multimedia data like videos, photographs or audio sequences, but covers a significantly vast area that can include biological images as well, such as western blot and microscopic images. In this paper, we focus on the detection of synthetically generated western blot images. These images are largely explored in the biomedical literature and it has been already shown they can be easily counterfeited with few hopes to spot manipulations by visual inspection or by using standard forensics detectors. To overcome the absence of publicly available data for this task, we create a new dataset comprising more than 14K original western blot images and 24K synthetic western blot images, generated using four different state-of-the-art generation methods. We investigate different strategies to detect synthetic western blots, exploring binary classification methods as well as one-class detectors. In both scenarios, we never exploit synthetic western blot images at training stage. The achieved results show that synthetically generated western blot images can be spot with good accuracy, even though the exploited detectors are not optimized over synthetic versions of these scientific images. We also test the robustness of the developed detectors against post-processing operations commonly performed on scientific images, showing that we can be robust to JPEG compression and that some generative models are easily recognizable, despite the application of editing might alter the artifacts they leave. △ Less

Submitted 1 June, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

arXiv:2109.03961 [pdf, other]

Improving Building Segmentation for Off-Nadir Satellite Imagery

Authors: Hanxiang Hao, Sriram Baireddy, Kevin LaTourette, Latisha Konz, Moses Chan, Mary L. Comer, Edward J. Delp

Abstract: Automatic building segmentation is an important task for satellite imagery analysis and scene understanding. Most existing segmentation methods focus on the case where the images are taken from directly overhead (i.e., low off-nadir/viewing angle). These methods often fail to provide accurate results on satellite images with larger off-nadir angles due to the higher noise level and lower spatial r… ▽ More Automatic building segmentation is an important task for satellite imagery analysis and scene understanding. Most existing segmentation methods focus on the case where the images are taken from directly overhead (i.e., low off-nadir/viewing angle). These methods often fail to provide accurate results on satellite images with larger off-nadir angles due to the higher noise level and lower spatial resolution. In this paper, we propose a method that is able to provide accurate building segmentation for satellite imagery captured from a large range of off-nadir angles. Based on Bayesian deep learning, we explicitly design our method to learn the data noise via aleatoric and epistemic uncertainty modeling. Satellite image metadata (e.g., off-nadir angle and ground sample distance) is also used in our model to further improve the result. We show that with uncertainty modeling and metadata injection, our method achieves better performance than the baseline method, especially for noisy images taken from large off-nadir angles. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: This is an extended version of our ACM SIGSPATIAL'21 conference paper

arXiv:2109.00632 [pdf, other]

Field-Based Plot Extraction Using UAV RGB Images

Authors: Changye Yang, Sriram Baireddy, Enyu Cai, Melba Crawford, Edward J. Delp

Abstract: Unmanned Aerial Vehicles (UAVs) have become popular for use in plant phenotyping of field based crops, such as maize and sorghum, due to their ability to acquire high resolution data over field trials. Field experiments, which may comprise thousands of plants, are planted according to experimental designs to evaluate varieties or management practices. For many types of phenotyping analysis, we exa… ▽ More Unmanned Aerial Vehicles (UAVs) have become popular for use in plant phenotyping of field based crops, such as maize and sorghum, due to their ability to acquire high resolution data over field trials. Field experiments, which may comprise thousands of plants, are planted according to experimental designs to evaluate varieties or management practices. For many types of phenotyping analysis, we examine smaller groups of plants known as "plots." In this paper, we propose a new plot extraction method that will segment a UAV image into plots. We will demonstrate that our method achieves higher plot extraction accuracy than existing approaches. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2106.15753 [pdf, ps, other]

RCNN-SliceNet: A Slice and Cluster Approach for Nuclei Centroid Detection in Three-Dimensional Fluorescence Microscopy Images

Authors: Liming Wu, Shuo Han, Alain Chen, Paul Salama, Kenneth W. Dunn, Edward J. Delp

Abstract: Robust and accurate nuclei centroid detection is important for the understanding of biological structures in fluorescence microscopy images. Existing automated nuclei localization methods face three main challenges: (1) Most of object detection methods work only on 2D images and are difficult to extend to 3D volumes; (2) Segmentation-based models can be used on 3D volumes but it is computational e… ▽ More Robust and accurate nuclei centroid detection is important for the understanding of biological structures in fluorescence microscopy images. Existing automated nuclei localization methods face three main challenges: (1) Most of object detection methods work only on 2D images and are difficult to extend to 3D volumes; (2) Segmentation-based models can be used on 3D volumes but it is computational expensive for large microscopy volumes and they have difficulty distinguishing different instances of objects; (3) Hand annotated ground truth is limited for 3D microscopy volumes. To address these issues, we present a scalable approach for nuclei centroid detection of 3D microscopy volumes. We describe the RCNN-SliceNet to detect 2D nuclei centroids for each slice of the volume from different directions and 3D agglomerative hierarchical clustering (AHC) is used to estimate the 3D centroids of nuclei in a volume. The model was trained with the synthetic microscopy data generated using Spatially Constrained Cycle-Consistent Adversarial Networks (SpCycleGAN) and tested on different types of real 3D microscopy data. Extensive experimental results demonstrate that our proposed method can accurately count and detect the nuclei centroids in a 3D microscopy volume. △ Less

Submitted 4 November, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2105.12926 [pdf, other]

Image-Based Plant Wilting Estimation

Authors: Changye Yang, Sriram Baireddy, Enyu Cai, Valerian Meline, Denise Caldwell, Anjali S. Iyer-Pascuzzi, Edward J. Delp

Abstract: Many plants become limp or droop through heat, loss of water, or disease. This is also known as wilting. In this paper, we examine plant wilting caused by bacterial infection. In particular, we want to design a metric for wilting based on images acquired of the plant. A quantifiable wilting metric will be useful in studying bacterial wilt and identifying resistance genes. Since there is no standar… ▽ More Many plants become limp or droop through heat, loss of water, or disease. This is also known as wilting. In this paper, we examine plant wilting caused by bacterial infection. In particular, we want to design a metric for wilting based on images acquired of the plant. A quantifiable wilting metric will be useful in studying bacterial wilt and identifying resistance genes. Since there is no standard way to estimate wilting, it is common to use ad hoc visual scores. This is very subjective and requires expert knowledge of the plants and the disease mechanism. Our solution consists of using various wilting metrics acquired from RGB images of the plants. We also designed several experiments to demonstrate that our metrics are effective at estimating wilting in plants. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.06361 [pdf, other]

doi 10.1109/CVPRW53098.2021.00115

Forensic Analysis of Video Files Using Metadata

Authors: Ziyue Xiang, János Horváth, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: The unprecedented ease and ability to manipulate video content has led to a rapid spread of manipulated media. The availability of video editing tools greatly increased in recent years, allowing one to easily generate photo-realistic alterations. Such manipulations can leave traces in the metadata embedded in video files. This metadata information can be used to determine video manipulations, bran… ▽ More The unprecedented ease and ability to manipulate video content has led to a rapid spread of manipulated media. The availability of video editing tools greatly increased in recent years, allowing one to easily generate photo-realistic alterations. Such manipulations can leave traces in the metadata embedded in video files. This metadata information can be used to determine video manipulations, brand of video recording device, the type of video editing tool, and other important evidence. In this paper, we focus on the metadata contained in the popular MP4 video wrapper/container. We describe our method for metadata extractor that uses the MP4's tree structure. Our approach for analyzing the video metadata produces a more compact representation. We will describe how we construct features from the metadata and then use dimensionality reduction and nearest neighbor classification for forensic analysis of a video file. Our approach allows one to visually inspect the distribution of metadata features and make decisions. The experimental results confirm that the performance of our approach surpasses other methods. △ Less

Submitted 22 January, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: v2: fixed a typo in Section 3.4; added page number; added IEEE copyright notice

arXiv:2010.03758 [pdf, other]

Generative Autoregressive Ensembles for Satellite Imagery Manipulation Detection

Authors: Daniel Mas Montserrat, János Horváth, S. K. Yarlagadda, Fengqing Zhu, Edward J. Delp

Abstract: Satellite imagery is becoming increasingly accessible due to the growing number of orbiting commercial satellites. Many applications make use of such images: agricultural management, meteorological prediction, damage assessment from natural disasters, or cartography are some of the examples. Unfortunately, these images can be easily tampered and modified with image manipulation tools damaging down… ▽ More Satellite imagery is becoming increasingly accessible due to the growing number of orbiting commercial satellites. Many applications make use of such images: agricultural management, meteorological prediction, damage assessment from natural disasters, or cartography are some of the examples. Unfortunately, these images can be easily tampered and modified with image manipulation tools damaging downstream applications. Because the nature of the manipulation applied to the image is typically unknown, unsupervised methods that don't require prior knowledge of the tampering techniques used are preferred. In this paper, we use ensembles of generative autoregressive models to model the distribution of the pixels of the image in order to detect potential manipulations. We evaluate the performance of the presented approach obtaining accurate localization results compared to previously presented approaches. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2005.06402 [pdf, other]

FaR-GAN for One-Shot Face Reenactment

Authors: Hanxiang Hao, Sriram Baireddy, Amy R. Reibman, Edward J. Delp

Abstract: Animating a static face image with target facial expressions and movements is important in the area of image editing and movie production. This face reenactment process is challenging due to the complex geometry and movement of human faces. Previous work usually requires a large set of images from the same person to model the appearance. In this paper, we present a one-shot face reenactment model,… ▽ More Animating a static face image with target facial expressions and movements is important in the area of image editing and movie production. This face reenactment process is challenging due to the complex geometry and movement of human faces. Previous work usually requires a large set of images from the same person to model the appearance. In this paper, we present a one-shot face reenactment model, FaR-GAN, that takes only one face image of any given source identity and a target expression as input, and then produces a face image of the same source identity but with the target expression. The proposed method makes no assumptions about the source identity, facial expression, head pose, or even image background. We evaluate our method on the VoxCeleb1 dataset and show that our method is able to generate a higher quality face image than the compared methods. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: This paper has been accepted to the AI for content creation workshop at CVPR 2020

arXiv:2004.13973 [pdf, other]

Deep Transfer Learning For Plant Center Localization

Authors: Enyu Cai, Sriram Baireddy, Changye Yang, Melba Crawford, Edward J. Delp

Abstract: Plant phenotyping focuses on the measurement of plant characteristics throughout the growing season, typically with the goal of evaluating genotypes for plant breeding. Estimating plant location is important for identifying genotypes which have low emergence, which is also related to the environment and management practices such as fertilizer applications. The goal of this paper is to investigate… ▽ More Plant phenotyping focuses on the measurement of plant characteristics throughout the growing season, typically with the goal of evaluating genotypes for plant breeding. Estimating plant location is important for identifying genotypes which have low emergence, which is also related to the environment and management practices such as fertilizer applications. The goal of this paper is to investigate methods that estimate plant locations for a field-based crop using RGB aerial images captured using Unmanned Aerial Vehicles (UAVs). Deep learning approaches provide promising capability for locating plants observed in RGB images, but they require large quantities of labeled data (ground truth) for training. Using a deep learning architecture fine-tuned on a single field or a single type of crop on fields in other geographic areas or with other crops may not have good results. The problem of generating ground truth for each new field is labor-intensive and tedious. In this paper, we propose a method for estimating plant centers by transferring an existing model to a new scenario using limited ground truth data. We describe the use of transfer learning using a model fine-tuned for a single field or a single type of plant on a varied set of similar crops and fields. We show that transfer learning provides promising results for detecting plant locations. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.12027 [pdf, other]

Deepfakes Detection with Automatic Face Weighting

Authors: Daniel Mas Montserrat, Hanxiang Hao, S. K. Yarlagadda, Sriram Baireddy, Ruiting Shao, János Horváth, Emily Bartusiak, Justin Yang, David Güera, Fengqing Zhu, Edward J. Delp

Abstract: Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this… ▽ More Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this problem, the Deepfake Detection Challenge (DFDC) provides a large dataset of videos containing realistic manipulations and an evaluation system that ensures that methods work quickly and accurately, even when faced with challenging data. In this paper, we introduce a method based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) that extracts visual and temporal features from faces present in videos to accurately detect manipulations. The method is evaluated with the DFDC dataset, providing competitive results compared to other techniques. △ Less

Submitted 4 May, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

arXiv:2004.06643 [pdf, other]

An Attention-Based System for Damage Assessment Using Satellite Imagery

Authors: Hanxiang Hao, Sriram Baireddy, Emily R. Bartusiak, Latisha Konz, Kevin LaTourette, Michael Gribbons, Moses Chan, Mary L. Comer, Edward J. Delp

Abstract: When disaster strikes, accurate situational information and a fast, effective response are critical to save lives. Widely available, high resolution satellite images enable emergency responders to estimate locations, causes, and severity of damage. Quickly and accurately analyzing the extensive amount of satellite imagery available, though, requires an automatic approach. In this paper, we present… ▽ More When disaster strikes, accurate situational information and a fast, effective response are critical to save lives. Widely available, high resolution satellite images enable emergency responders to estimate locations, causes, and severity of damage. Quickly and accurately analyzing the extensive amount of satellite imagery available, though, requires an automatic approach. In this paper, we present Siam-U-Net-Attn model - a multi-class deep learning model with an attention mechanism - to assess damage levels of buildings given a pair of satellite images depicting a scene before and after a disaster. We evaluate the proposed method on xView2, a large-scale building damage assessment dataset, and demonstrate that the proposed approach achieves accurate damage scale classification and building segmentation results simultaneously. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 10 pages, 9 figures

arXiv:2002.02079 [pdf, ps, other]

Forensic Scanner Identification Using Machine Learning

Authors: Ruiting Shao, Edward J. Delp

Abstract: Due to the increasing availability and functionality of image editing tools, many forensic techniques such as digital image authentication, source identification and tamper detection are important for forensic image analysis. In this paper, we describe a machine learning based system to address the forensic analysis of scanner devices. The proposed system uses deep-learning to automatically learn… ▽ More Due to the increasing availability and functionality of image editing tools, many forensic techniques such as digital image authentication, source identification and tamper detection are important for forensic image analysis. In this paper, we describe a machine learning based system to address the forensic analysis of scanner devices. The proposed system uses deep-learning to automatically learn the intrinsic features from various scanned images. Our experimental results show that high accuracy can be achieved for source scanner identification. The proposed system can also generate a reliability map that indicates the manipulated regions in an scanned image. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:2001.08854 [pdf, other]

Plant Stem Segmentation Using Fast Ground Truth Generation

Authors: Changye Yang, Sriram Baireddy, Yuhao Chen, Enyu Cai, Denise Caldwell, Valérian Méline, Anjali S. Iyer-Pascuzzi, Edward J. Delp

Abstract: Accurately phenotyping plant wilting is important for understanding responses to environmental stress. Analysis of the shape of plants can potentially be used to accurately quantify the degree of wilting. Plant shape analysis can be enhanced by locating the stem, which serves as a consistent reference point during wilting. In this paper, we show that deep learning methods can accurately segment to… ▽ More Accurately phenotyping plant wilting is important for understanding responses to environmental stress. Analysis of the shape of plants can potentially be used to accurately quantify the degree of wilting. Plant shape analysis can be enhanced by locating the stem, which serves as a consistent reference point during wilting. In this paper, we show that deep learning methods can accurately segment tomato plant stems. We also propose a control-point-based ground truth method that drastically reduces the resources needed to create a training dataset for a deep learning approach. Experimental results show the viability of both our proposed ground truth approach and deep learning based stem segmentation. △ Less

Submitted 23 January, 2020; originally announced January 2020.

arXiv:1911.12330 [pdf, other]

Multi-View Matching Network for 6D Pose Estimation

Authors: Daniel Mas Montserrat, Jianhang Chen, Qian Lin, Jan P. Allebach, Edward J. Delp

Abstract: Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refi… ▽ More Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refine and track the pose of the objects by matching the input image with rendered images. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1909.05992 [pdf, ps, other]

doi 10.1109/BHI.2019.8834516

Center-Extraction-Based Three Dimensional Nuclei Instance Segmentation of Fluorescence Microscopy Images

Authors: David Joon Ho, Shuo Han, Chichen Fu, Paul Salama, Kenneth W. Dunn, Edward J. Delp

Abstract: Fluorescence microscopy is an essential tool for the analysis of 3D subcellular structures in tissue. An important step in the characterization of tissue involves nuclei segmentation. In this paper, a two-stage method for segmentation of nuclei using convolutional neural networks (CNNs) is described. In particular, since creating labeled volumes manually for training purposes is not practical due… ▽ More Fluorescence microscopy is an essential tool for the analysis of 3D subcellular structures in tissue. An important step in the characterization of tissue involves nuclei segmentation. In this paper, a two-stage method for segmentation of nuclei using convolutional neural networks (CNNs) is described. In particular, since creating labeled volumes manually for training purposes is not practical due to the size and complexity of the 3D data sets, the paper describes a method for generating synthetic microscopy volumes based on a spatially constrained cycle-consistent adversarial network. The proposed method is tested on multiple real microscopy data sets and outperforms other commonly used segmentation techniques. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: Presented at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2019)

arXiv:1906.11979 [pdf, other]

A Utility-Preserving GAN for Face Obscuration

Authors: Hanxiang Hao, David Güera, Amy R. Reibman, Edward J. Delp

Abstract: From TV news to Google StreetView, face obscuration has been used for privacy protection. Due to recent advances in the field of deep learning, obscuration methods such as Gaussian blurring and pixelation are not guaranteed to conceal identity. In this paper, we propose a utility-preserving generative model, UP-GAN, that is able to provide an effective face obscuration, while preserving facial uti… ▽ More From TV news to Google StreetView, face obscuration has been used for privacy protection. Due to recent advances in the field of deep learning, obscuration methods such as Gaussian blurring and pixelation are not guaranteed to conceal identity. In this paper, we propose a utility-preserving generative model, UP-GAN, that is able to provide an effective face obscuration, while preserving facial utility. By utility-preserving we mean preserving facial features that do not reveal identity, such as age, gender, skin tone, pose, and expression. We show that the proposed method achieves the best performance in terms of obscuration and utility preservation. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: 6 pages, 5 figures, presented at the ICML 2019 Worksop on Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes

arXiv:1906.08743 [pdf, other]

We Need No Pixels: Video Manipulation Detection Using Stream Descriptors

Authors: David Güera, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by a… ▽ More Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, completely avoiding the pixel space. Using well-known datasets, our results show that this scalable approach can achieve a high manipulation detection score if the manipulators have not done a careful data sanitization of the multimedia stream descriptors. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 7 pages, 6 figures, presented at the ICML 2019 Worksop on Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes

arXiv:1905.05243 [pdf, other]

Robustness Analysis of Face Obscuration

Authors: Hanxiang Hao, David Güera, János Horváth, Amy R. Reibman, Edward J. Delp

Abstract: Face obscuration is needed by law enforcement and mass media outlets to guarantee privacy. Sharing sensitive content where obscuration or redaction techniques have failed to completely remove all identifiable traces can lead to many legal and social issues. Hence, we need to be able to systematically measure the face obscuration performance of a given technique. In this paper we propose to measure… ▽ More Face obscuration is needed by law enforcement and mass media outlets to guarantee privacy. Sharing sensitive content where obscuration or redaction techniques have failed to completely remove all identifiable traces can lead to many legal and social issues. Hence, we need to be able to systematically measure the face obscuration performance of a given technique. In this paper we propose to measure the effectiveness of eight obscuration techniques. We do so by attacking the redacted faces in three scenarios: obscured face identification, verification, and reconstruction. Threat modeling is also considered to provide a vulnerability analysis for each studied obscuration technique. Based on our evaluation, we show that the k-same based methods are the most effective. △ Less

Submitted 15 October, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

arXiv:1904.09974 [pdf, other]

doi 10.1109/ISBI.2019.8759250

Three dimensional blind image deconvolution for fluorescence microscopy using generative adversarial networks

Authors: Soonam Lee, Shuo Han, Paul Salama, Kenneth W. Dunn, Edward J. Delp

Abstract: Due to image blurring image deconvolution is often used for studying biological structures in fluorescence microscopy. Fluorescence microscopy image volumes inherently suffer from intensity inhomogeneity, blur, and are corrupted by various types of noise which exacerbate image quality at deeper tissue depth. Therefore, quantitative analysis of fluorescence microscopy in deeper tissue still remains… ▽ More Due to image blurring image deconvolution is often used for studying biological structures in fluorescence microscopy. Fluorescence microscopy image volumes inherently suffer from intensity inhomogeneity, blur, and are corrupted by various types of noise which exacerbate image quality at deeper tissue depth. Therefore, quantitative analysis of fluorescence microscopy in deeper tissue still remains a challenge. This paper presents a three dimensional blind image deconvolution method for fluorescence microscopy using 3-way spatially constrained cycle-consistent adversarial networks. The restored volumes of the proposed deconvolution method and other well-known deconvolution methods, denoising methods, and an inhomogeneity correction method are visually and numerically evaluated. Experimental results indicate that the proposed method can restore and improve the quality of blurred and noisy deep depth microscopy image visually and quantitatively. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: IEEE International Symposium on Biomedical Imaging (ISBI) 2019

arXiv:1807.07149 [pdf, other]

A Hand-Held Multimedia Translation and Interpretation System with Application to Diet Management

Authors: Albert Parra, Andrew W. Haddad, Mireille Boutin, Edward J. Delp

Abstract: We propose a network independent, hand-held system to translate and disambiguate foreign restaurant menu items in real-time. The system is based on the use of a portable multimedia device, such as a smartphones or a PDA. An accurate and fast translation is obtained using a Machine Translation engine and a context-specific corpora to which we apply two pre-processing steps, called translation stand… ▽ More We propose a network independent, hand-held system to translate and disambiguate foreign restaurant menu items in real-time. The system is based on the use of a portable multimedia device, such as a smartphones or a PDA. An accurate and fast translation is obtained using a Machine Translation engine and a context-specific corpora to which we apply two pre-processing steps, called translation standardization and $n$-gram consolidation. The phrase-table generated is orders of magnitude lighter than the ones commonly used in market applications, thus making translations computationally less expensive, and decreasing the battery usage. Translation ambiguities are mitigated using multimedia information including images of dishes and ingredients, along with ingredient lists. We implemented a prototype of our system on an iPod Touch Second Generation for English speakers traveling in Spain. Our tests indicate that our translation method yields higher accuracy than translation engines such as Google Translate, and does so almost instantaneously. The memory requirements of the application, including the database of images, are also well within the limits of the device. By combining it with a database of nutritional information, our proposed system can be used to help individuals who follow a medical diet maintain this diet while traveling. △ Less

Submitted 16 July, 2018; originally announced July 2018.

arXiv:1807.00498 [pdf, other]

Estimating Phenotypic Traits From UAV Based RGB Imagery

Authors: Javier Ribera, Fangning He, Yuhao Chen, Ayman F. Habib, Edward J. Delp

Abstract: In many agricultural applications one wants to characterize physical properties of plants and use the measurements to predict, for example biomass and environmental influence. This process is known as phenotyping. Traditional collection of phenotypic information is labor-intensive and time-consuming. Use of imagery is becoming popular for phenotyping. In this paper, we present methods to estimate… ▽ More In many agricultural applications one wants to characterize physical properties of plants and use the measurements to predict, for example biomass and environmental influence. This process is known as phenotyping. Traditional collection of phenotypic information is labor-intensive and time-consuming. Use of imagery is becoming popular for phenotyping. In this paper, we present methods to estimate traits of sorghum plants from RBG cameras on board of an unmanned aerial vehicle (UAV). The position and orientation of the imagery together with the coordinates of sparse points along the area of interest are derived through a new triangulation method. A rectified orthophoto mosaic is then generated from the imagery. The number of leaves is estimated and a model-based method to analyze the leaf morphology for leaf segmentation is proposed. We present a statistical model to find the location of each individual sorghum plant. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: 8 pages, double-column

arXiv:1806.07564 [pdf, other]

Locating Objects Without Bounding Boxes

Authors: Javier Ribera, David Güera, Yuhao Chen, Edward J. Delp

Abstract: Recent advances in convolutional neural networks (CNN) have achieved remarkable results in locating objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to la… ▽ More Recent advances in convolutional neural networks (CNN) have achieved remarkable results in locating objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to label. We propose a loss function that can be used in any fully convolutional network (FCN) to estimate object locations. This loss function is a modification of the average Hausdorff distance between two unordered sets of points. The proposed method has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people's heads, pupil centers and plant centers. We outperform state-of-the-art generic object detectors and methods fine-tuned for pupil tracking. △ Less

Submitted 3 April, 2019; v1 submitted 20 June, 2018; originally announced June 2018.

Comments: 12 pages, double-column, 8 figures, accepted at Computer Vision and Pattern Recognition (CVPR) 2019

arXiv:1805.02131 [pdf, other]

doi 10.1109/CVPRW.2017.230

A Counter-Forensic Method for CNN-Based Camera Model Identification

Authors: David Güera, Yu Wang, Luca Bondi, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: An increasing number of digital images are being shared and accessed through websites, media, and social applications. Many of these images have been modified and are not authentic. Recent advances in the use of deep convolutional neural networks (CNNs) have facilitated the task of analyzing the veracity and authenticity of largely distributed image datasets. We examine in this paper the problem o… ▽ More An increasing number of digital images are being shared and accessed through websites, media, and social applications. Many of these images have been modified and are not authentic. Recent advances in the use of deep convolutional neural networks (CNNs) have facilitated the task of analyzing the veracity and authenticity of largely distributed image datasets. We examine in this paper the problem of identifying the camera model or type that was used to take an image and that can be spoofed. Due to the linear nature of CNNs and the high-dimensionality of images, neural networks are vulnerable to attacks with adversarial examples. These examples are imperceptibly different from correctly classified images but are misclassified with high confidence by CNNs. In this paper, we describe a counter-forensic method capable of subtly altering images to change their estimated camera model when they are analyzed by any CNN-based camera model detector. Our method can use both the Fast Gradient Sign Method (FGSM) or the Jacobian-based Saliency Map Attack (JSMA) to craft these adversarial images and does not require direct access to the CNN. Our results show that even advanced deep learning architectures trained to analyze images and obtain camera model information are still vulnerable to our proposed method. △ Less

Submitted 5 May, 2018; originally announced May 2018.

Comments: Presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshop on Media Forensics

arXiv:1805.01946 [pdf, other]

Reliability Map Estimation For CNN-Based Camera Model Attribution

Authors: David Güera, Sri Kalyan Yarlagadda, Paolo Bestagini, Fengqing Zhu, Stefano Tubaro, Edward J. Delp

Abstract: Among the image forensic issues investigated in the last few years, great attention has been devoted to blind camera model attribution. This refers to the problem of detecting which camera model has been used to acquire an image by only exploiting pixel information. Solving this problem has great impact on image integrity assessment as well as on authenticity verification. Recent advancements that… ▽ More Among the image forensic issues investigated in the last few years, great attention has been devoted to blind camera model attribution. This refers to the problem of detecting which camera model has been used to acquire an image by only exploiting pixel information. Solving this problem has great impact on image integrity assessment as well as on authenticity verification. Recent advancements that use convolutional neural networks (CNNs) in the media forensic field have enabled camera model attribution methods to work well even on small image patches. These improvements are also important for determining forgery localization. Some patches of an image may not contain enough information related to the camera model (e.g., saturated patches). In this paper, we propose a CNN-based solution to estimate the camera model attribution reliability of a given image patch. We show that we can estimate a reliability-map indicating which portions of the image contain reliable camera traces. Testing using a well known dataset confirms that by using this information, it is possible to increase small patch camera model attribution accuracy by more than 8% on a single patch. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: Presented at the IEEE Winter Conference on Applications of Computer Vision (WACV18)

arXiv:1802.09670 [pdf, ps, other]

Single-View Food Portion Estimation: Learning Image-to-Energy Mappings Using Generative Adversarial Networks

Authors: Shaobo Fang, Zeman Shao, Runyu Mao, Chichen Fu, Deborah A. Kerr, Carol J. Boushey, Edward J. Delp, Fengqing Zhu

Abstract: Due to the growing concern of chronic diseases and other health problems related to diet, there is a need to develop accurate methods to estimate an individual's food and energy intake. Measuring accurate dietary intake is an open research problem. In particular, accurate food portion estimation is challenging since the process of food preparation and consumption impose large variations on food sh… ▽ More Due to the growing concern of chronic diseases and other health problems related to diet, there is a need to develop accurate methods to estimate an individual's food and energy intake. Measuring accurate dietary intake is an open research problem. In particular, accurate food portion estimation is challenging since the process of food preparation and consumption impose large variations on food shapes and appearances. In this paper, we present a food portion estimation method to estimate food energy (kilocalories) from food images using Generative Adversarial Networks (GAN). We introduce the concept of an "energy distribution" for each food image. To train the GAN, we design a food image dataset based on ground truth food labels and segmentation masks for each food image as well as energy information associated with the food image. Our goal is to learn the mapping of the food image to the food energy. We can then estimate food energy based on the energy distribution. We show that an average energy estimation error rate of 10.89% can be obtained by learning the image-to-energy mapping. △ Less

Submitted 23 May, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

Comments: 2018 IEEE International Conference on Image Processing

arXiv:1802.04881 [pdf, other]

Satellite Image Forgery Detection and Localization Using GAN and One-Class Classifier

Authors: Sri Kalyan Yarlagadda, David Güera, Paolo Bestagini, Fengqing Maggie Zhu, Stefano Tubaro, Edward J. Delp

Abstract: Current satellite imaging technology enables shooting high-resolution pictures of the ground. As any other kind of digital images, overhead pictures can also be easily forged. However, common image forensic techniques are often developed for consumer camera images, which strongly differ in their nature from satellite ones (e.g., compression schemes, post-processing, sensors, etc.). Therefore, many… ▽ More Current satellite imaging technology enables shooting high-resolution pictures of the ground. As any other kind of digital images, overhead pictures can also be easily forged. However, common image forensic techniques are often developed for consumer camera images, which strongly differ in their nature from satellite ones (e.g., compression schemes, post-processing, sensors, etc.). Therefore, many accurate state-of-the-art forensic algorithms are bound to fail if blindly applied to overhead image analysis. Development of novel forensic tools for satellite images is paramount to assess their authenticity and integrity. In this paper, we propose an algorithm for satellite image forgery detection and localization. Specifically, we consider the scenario in which pixels within a region of a satellite image are replaced to add or remove an object from the scene. Our algorithm works under the assumption that no forged images are available for training. Using a generative adversarial network (GAN), we learn a feature representation of pristine satellite images. A one-class support vector machine (SVM) is trained on these features to determine their distribution. Finally, image forgeries are detected as anomalies. The proposed algorithm is validated against different kinds of satellite images containing forgeries of different size and shape. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: Presented at the IS&T International Symposium on Electronic Imaging (EI)

arXiv:1802.03542 [pdf, other]

doi 10.2352/ISSN.2470-1173.2018.15.COIMG-199

Tubule segmentation of fluorescence microscopy images based on convolutional neural networks with inhomogeneity correction

Authors: Soonam Lee, Chichen Fu, Paul Salama, Kenneth W. Dunn, Edward J. Delp

Abstract: Fluorescence microscopy has become a widely used tool for studying various biological structures of in vivo tissue or cells. However, quantitative analysis of these biological structures remains a challenge due to their complexity which is exacerbated by distortions caused by lens aberrations and light scattering. Moreover, manual quantification of such image volumes is an intractable and error-pr… ▽ More Fluorescence microscopy has become a widely used tool for studying various biological structures of in vivo tissue or cells. However, quantitative analysis of these biological structures remains a challenge due to their complexity which is exacerbated by distortions caused by lens aberrations and light scattering. Moreover, manual quantification of such image volumes is an intractable and error-prone process, making the need for automated image analysis methods crucial. This paper describes a segmentation method for tubular structures in fluorescence microscopy images using convolutional neural networks with data augmentation and inhomogeneity correction. The segmentation results of the proposed method are visually and numerically compared with other microscopy segmentation methods. Experimental results indicate that the proposed method has better performance with correctly segmenting and identifying multiple tubular structures compared to other methods. △ Less

Submitted 10 February, 2018; originally announced February 2018.

Comments: IS&T International Symposium on Electronic Imaging 2018

arXiv:1802.02992 [pdf, ps, other]

Texture Segmentation Based Video Compression Using Convolutional Neural Networks

Authors: Chichen Fu, Di Chen, Edward J. Delp, Zoe Liu, Fengqing Zhu

Abstract: There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the All… ▽ More There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality. △ Less

Submitted 8 February, 2018; originally announced February 2018.

arXiv:1801.07198 [pdf, ps, other]

doi 10.1109/CVPRW.2018.00298

Three Dimensional Fluorescence Microscopy Image Synthesis and Segmentation

Authors: Chichen Fu, Soonam Lee, David Joon Ho, Shuo Han, Paul Salama, Kenneth W. Dunn, Edward J. Delp

Abstract: Advances in fluorescence microscopy enable acquisition of 3D image volumes with better image quality and deeper penetration into tissue. Segmentation is a required step to characterize and analyze biological structures in the images and recent 3D segmentation using deep learning has achieved promising results. One issue is that deep learning techniques require a large set of groundtruth data which… ▽ More Advances in fluorescence microscopy enable acquisition of 3D image volumes with better image quality and deeper penetration into tissue. Segmentation is a required step to characterize and analyze biological structures in the images and recent 3D segmentation using deep learning has achieved promising results. One issue is that deep learning techniques require a large set of groundtruth data which is impractical to annotate manually for large 3D microscopy volumes. This paper describes a 3D deep learning nuclei segmentation method using synthetic 3D volumes for training. A set of synthetic volumes and the corresponding groundtruth are generated using spatially constrained cycle-consistent adversarial networks. Segmentation results demonstrate that our proposed method is capable of segmenting nuclei successfully for various data sets. △ Less

Submitted 20 April, 2018; v1 submitted 22 January, 2018; originally announced January 2018.

Comments: Accepted by CVPR Workshop on Computer Vision for Microscopy Image Analysis (CVMI)

arXiv:1603.01068 [pdf, other]

doi 10.1109/LSP.2016.2641006

First Steps Toward Camera Model Identification with Convolutional Neural Networks

Authors: Luca Bondi, Luca Baroffio, David Güera, Paolo Bestagini, Edward J. Delp, Stefano Tubaro

Abstract: Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this paper, we inve… ▽ More Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this paper, we investigate a novel approach to solve camera model identification problem. Specifically, we propose a data-driven algorithm based on convolutional neural networks, which learns features characterizing each camera model directly from the acquired pictures. Results on a well-known dataset of 18 camera models show that: (i) the proposed method outperforms up-to-date state-of-the-art algorithms on classification of 64x64 color image patches; (ii) features learned by the proposed network generalize to camera models never used for training. △ Less

Submitted 26 September, 2017; v1 submitted 3 March, 2016; originally announced March 2016.

arXiv:0801.4807 [pdf, other]

Automatic Text Area Segmentation in Natural Images

Authors: Syed Ali Raza Jafri, Mireille Boutin, Edward J. Delp

Abstract: We present a hierarchical method for segmenting text areas in natural images. The method assumes that the text is written with a contrasting color on a more or less uniform background. But no assumption is made regarding the language or character set used to write the text. In particular, the text can contain simple graphics or symbols. The key feature of our approach is that we first concentrat… ▽ More We present a hierarchical method for segmenting text areas in natural images. The method assumes that the text is written with a contrasting color on a more or less uniform background. But no assumption is made regarding the language or character set used to write the text. In particular, the text can contain simple graphics or symbols. The key feature of our approach is that we first concentrate on finding the background of the text, before testing whether there is actually text on the background. Since uniform areas are easy to find in natural images, and since text backgrounds define areas which contain "holes" (where the text is written) we thus look for uniform areas containing "holes" and label them as text backgrounds candidates. Each candidate area is then further tested for the presence of text within its convex hull. We tested our method on a database of 65 images including English and Urdu text. The method correctly segmented all the text areas in 63 of these images, and in only 4 of these were areas that do not contain text also segmented. △ Less

Submitted 30 January, 2008; originally announced January 2008.

Showing 1–49 of 49 results for author: Delp, E J