Search | arXiv e-print repository

DAVIDE: Depth-Aware Video Deblurring

Authors: German F. Torres, Jussi Kalliola, Soumya Tripathy, Erman Acar, Joni-Kristian Kämäräinen

Abstract: Video deblurring aims at recovering sharp details from a sequence of blurry frames. Despite the proliferation of depth sensors in mobile phones and the potential of depth information to guide deblurring, depth-aware deblurring has received only limited attention. In this work, we introduce the 'Depth-Aware VIdeo DEblurring' (DAVIDE) dataset to study the impact of depth information in video deblurr… ▽ More Video deblurring aims at recovering sharp details from a sequence of blurry frames. Despite the proliferation of depth sensors in mobile phones and the potential of depth information to guide deblurring, depth-aware deblurring has received only limited attention. In this work, we introduce the 'Depth-Aware VIdeo DEblurring' (DAVIDE) dataset to study the impact of depth information in video deblurring. The dataset comprises synchronized blurred, sharp, and depth videos. We investigate how the depth information should be injected into the existing deep RGB video deblurring models, and propose a strong baseline for depth-aware video deblurring. Our findings reveal the significance of depth information in video deblurring and provide insights into the use cases where depth cues are beneficial. In addition, our results demonstrate that while the depth improves deblurring performance, this effect diminishes when models are provided with a longer temporal context. Project page: https://germanftv.github.io/DAVIDE.github.io/ . △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2407.13125 [pdf, other]

On Finding the Closest Zonotope to a Polytope in Hausdorff Distance

Authors: George D. Torres

Abstract: We provide a local theory for the optimization of the Hausdorff distance between a polytope and a zonotope. To do this, we compute explicit local formulae for the Hausdorff function $d(P, -) : Z_n \to \mathbb{R}$, where $P$ is a fixed polytope and $Z_n$ is the space of rank $n$ zonotopes. This local theory is then used to provide an optimization algorithm based on subgradient descent that converge… ▽ More We provide a local theory for the optimization of the Hausdorff distance between a polytope and a zonotope. To do this, we compute explicit local formulae for the Hausdorff function $d(P, -) : Z_n \to \mathbb{R}$, where $P$ is a fixed polytope and $Z_n$ is the space of rank $n$ zonotopes. This local theory is then used to provide an optimization algorithm based on subgradient descent that converges to critical points of $d(P, -)$. We also express the condition of being at a local minimum as a polyhedral feasibility condition. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 26 pages, 9 figures

arXiv:2406.15946 [pdf, other]

Optimizing LaneSegNet for Real-Time Lane Topology Prediction in Autonomous Vehicles

Authors: William Stevens, Vishal Urs, Karthik Selvaraj, Gabriel Torres, Gaurish Lakhanpal

Abstract: With the increasing prevalence of autonomous vehicles, it is essential for computer vision algorithms to accurately assess road features in real-time. This study explores the LaneSegNet architecture, a new approach to lane topology prediction which integrates topological information with lane-line data to provide a more contextual understanding of road environments. The LaneSegNet architecture inc… ▽ More With the increasing prevalence of autonomous vehicles, it is essential for computer vision algorithms to accurately assess road features in real-time. This study explores the LaneSegNet architecture, a new approach to lane topology prediction which integrates topological information with lane-line data to provide a more contextual understanding of road environments. The LaneSegNet architecture includes a feature extractor, lane encoder, lane decoder, and prediction head, leveraging components from ResNet-50, BEVFormer, and various attention mechanisms. We experimented with optimizations to the LaneSegNet architecture through feature extractor modification and transformer encoder-decoder stack modification. We found that modifying the encoder and decoder stacks offered an interesting tradeoff between training time and prediction accuracy, with certain combinations showing promising results. Our implementation, trained on a single NVIDIA Tesla A100 GPU, found that a 2:4 ratio reduced training time by 22.3% with only a 7.1% drop in mean average precision, while a 4:8 ratio increased training time by only 11.1% but improved mean average precision by a significant 23.7%. These results indicate that strategic hyperparameter tuning can yield substantial improvements depending on the resources of the user. This study provides valuable insights for optimizing LaneSegNet according to available computation power, making it more accessible for users with limited resources and increasing the capabilities for users with more powerful resources. △ Less

Submitted 30 July, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 18 pages, 16 figures

arXiv:2307.12970 [pdf, other]

Volcanic ash delimitation using Artificial Intelligence based on Pix2Pix

Authors: Christian Carrillo, Gissela Torres, Christian Mejia-Escobar

Abstract: Volcanic eruptions emit ash that can be harmful to human health and cause damage to infrastructure, economic activities and the environment. The delimitation of ash clouds allows to know their behavior and dispersion, which helps in the prevention and mitigation of this phenomenon. Traditional methods take advantage of specialized software programs to process the bands or channels that compose the… ▽ More Volcanic eruptions emit ash that can be harmful to human health and cause damage to infrastructure, economic activities and the environment. The delimitation of ash clouds allows to know their behavior and dispersion, which helps in the prevention and mitigation of this phenomenon. Traditional methods take advantage of specialized software programs to process the bands or channels that compose the satellite images. However, their use is limited to experts and demands a lot of time and significant computational resources. In recent years, Artificial Intelligence has been a milestone in the computational treatment of complex problems in different areas. In particular, Deep Learning techniques allow automatic, fast and accurate processing of digital images. The present work proposes the use of the Pix2Pix model, a type of generative adversarial network that, once trained, learns the mapping of input images to output images. The architecture of such a network consisting of a generator and a discriminator provides the versatility needed to produce black and white ash cloud images from multispectral satellite images. The evaluation of the model, based on loss and accuracy plots, a confusion matrix, and visual inspection, indicates a satisfactory solution for accurate ash cloud delineation, applicable in any area of the world and becomes a useful tool in risk management. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 18 pages, in Spanish language, 15 figures

arXiv:2307.02924 [pdf, other]

The Emotional Dilemma: Influence of a Human-like Robot on Trust and Cooperation

Authors: Dennis Becker, Diana Rueda, Felix Beese, Brenda Scarleth Gutierrez Torres, Myriem Lafdili, Kyra Ahrens, Di Fu, Erik Strahl, Tom Weber, Stefan Wermter

Abstract: Increasing anthropomorphic robot behavioral design could affect trust and cooperation positively. However, studies have shown contradicting results and suggest a task-dependent relationship between robots that display emotions and trust. Therefore, this study analyzes the effect of robots that display human-like emotions on trust, cooperation, and participants' emotions. In the between-group study… ▽ More Increasing anthropomorphic robot behavioral design could affect trust and cooperation positively. However, studies have shown contradicting results and suggest a task-dependent relationship between robots that display emotions and trust. Therefore, this study analyzes the effect of robots that display human-like emotions on trust, cooperation, and participants' emotions. In the between-group study, participants play the coin entrustment game with an emotional and a non-emotional robot. The results show that the robot that displays emotions induces more anxiety than the neutral robot. Accordingly, the participants trust the emotional robot less and are less likely to cooperate. Furthermore, the perceived intelligence of a robot increases trust, while a desire to outcompete the robot can reduce trust and cooperation. Thus, the design of robots expressing emotions should be task dependent to avoid adverse effects that reduce trust and cooperation. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted at 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

arXiv:2303.09334 [pdf, other]

Depth-Aware Image Compositing Model for Parallax Camera Motion Blur

Authors: German F. Torres, Joni-Kristian Kämäräinen

Abstract: Camera motion introduces spatially varying blur due to the depth changes in the 3D world. This work investigates scene configurations where such blur is produced under parallax camera motion. We present a simple, yet accurate, Image Compositing Blur (ICB) model for depth-dependent spatially varying blur. The (forward) model produces realistic motion blur from a single image, depth map, and camera… ▽ More Camera motion introduces spatially varying blur due to the depth changes in the 3D world. This work investigates scene configurations where such blur is produced under parallax camera motion. We present a simple, yet accurate, Image Compositing Blur (ICB) model for depth-dependent spatially varying blur. The (forward) model produces realistic motion blur from a single image, depth map, and camera trajectory. Furthermore, we utilize the ICB model, combined with a coordinate-based MLP, to learn a sharp neural representation from the blurred input. Experimental results are reported for synthetic and real examples. The results verify that the ICB forward model is computationally efficient and produces realistic blur, despite the lack of occlusion information. Additionally, our method for restoring a sharp representation proves to be a competitive approach for the deblurring task. △ Less

Submitted 30 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2208.04711 [pdf, ps, other]

The Transform-o-meter: A method to forecast the transformative impact of innovation

Authors: Hector G. T. Torres

Abstract: With the advent of Transformative Artificial Intelligence, it is now more important than ever to be able to both measure and forecast the transformative impact/potential of innovation. However, current methods fall short when faced with this task. This paper introduces the Transform-o-meter; a methodology that can be used to achieve the aforementioned goal, and be applied to any innovation, both m… ▽ More With the advent of Transformative Artificial Intelligence, it is now more important than ever to be able to both measure and forecast the transformative impact/potential of innovation. However, current methods fall short when faced with this task. This paper introduces the Transform-o-meter; a methodology that can be used to achieve the aforementioned goal, and be applied to any innovation, both material and immaterial. While this method can effectively be used for the mentioned purpose, it should be taken as a first approach; to be iterated, researched, and expanded further upon. △ Less

Submitted 15 July, 2022; originally announced August 2022.

arXiv:2105.07636 [pdf, other]

DOC3-Deep One Class Classification using Contradictions

Authors: Sauptik Dhar, Bernardo Gonzalez Torres

Abstract: This paper introduces the notion of learning from contradictions (a.k.a Universum learning) for deep one class classification problems. We formalize this notion for the widely adopted one class large-margin loss, and propose the Deep One Class Classification using Contradictions (DOC3) algorithm. We show that learning from contradictions incurs lower generalization error by comparing the Empirical… ▽ More This paper introduces the notion of learning from contradictions (a.k.a Universum learning) for deep one class classification problems. We formalize this notion for the widely adopted one class large-margin loss, and propose the Deep One Class Classification using Contradictions (DOC3) algorithm. We show that learning from contradictions incurs lower generalization error by comparing the Empirical Rademacher Complexity (ERC) of DOC3 against its traditional inductive learning counterpart. Our empirical results demonstrate the efficacy of DOC3 compared to popular baseline algorithms on several real-life data sets. △ Less

Submitted 23 May, 2022; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Deep Learning, Anomaly Detection, Visual Inspection, Learning from Contradictions, Disjoint Auxiliary, Outlier Exposure, MVTec-AD

arXiv:2103.15596 [pdf, other]

A Shape-Aware Retargeting Approach to Transfer Human Motion and Appearance in Monocular Videos

Authors: Thiago L. Gomes, Renato Martins, João Ferreira, Rafael Azevedo, Guilherme Torres, Erickson R. Nascimento

Abstract: Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision. Despite the advances from recent image-to-image translation approaches, there are several transferring contexts where most end-to-end learning-based retargeting methods still perform poorly. Transferring human appearance from one actor to another is only ensured when a stri… ▽ More Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision. Despite the advances from recent image-to-image translation approaches, there are several transferring contexts where most end-to-end learning-based retargeting methods still perform poorly. Transferring human appearance from one actor to another is only ensured when a strict setup has been complied, which is generally built considering their training regime's specificities. In this work, we propose a shape-aware approach based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches. The formulation leverages the user body shape into the retargeting while considering physical constraints of the motion in 3D and the 2D image domain. We also present a new video retargeting benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos, which can be used as a common base to improve tracking the progress in the field. The dataset and its evaluation protocols are designed to evaluate retargeting methods in more general and challenging conditions. Our method is validated in several experiments, comprising publicly available videos of actors with different shapes, motion types, and camera setups. The dataset and retargeting code are publicly available to the community at: https://www.verlab.dcc.ufmg.br/retargeting-motion. △ Less

Submitted 28 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: 19 pages, 13 figures

arXiv:1904.00904 [pdf, other]

An Atomistic Machine Learning Package for Surface Science and Catalysis

Authors: Martin Hangaard Hansen, José A. Garrido Torres, Paul C. Jennings, Ziyun Wang, Jacob R. Boes, Osman G. Mamun, Thomas Bligaard

Abstract: We present work flows and a software module for machine learning model building in surface science and heterogeneous catalysis. This includes fingerprinting atomic structures from 3D structure and/or connectivity information, it includes descriptor selection methods and benchmarks, and it includes active learning frameworks for atomic structure optimization, acceleration of screening studies and f… ▽ More We present work flows and a software module for machine learning model building in surface science and heterogeneous catalysis. This includes fingerprinting atomic structures from 3D structure and/or connectivity information, it includes descriptor selection methods and benchmarks, and it includes active learning frameworks for atomic structure optimization, acceleration of screening studies and for exploration of the structure space of nano particles, which are all atomic structure problems relevant for surface science and heterogeneous catalysis. Our overall goal is to provide a repository to ease machine learning model building for catalysis, to advance the models beyond the chemical intuition of the user and to increase autonomy for exploration of chemical space. △ Less

Submitted 1 April, 2019; originally announced April 2019.

arXiv:1212.2669 [pdf]

A Lossless Data Hiding Technique based on AES-DWT

Authors: Francisco Rubén Castillo Soria, Gustavo Fernández Torres, Ignacio Algredo-Badillo

Abstract: In this paper we propose a new data hiding technique. The new technique uses steganography and cryptography on images with a size of 256x256 pixels and an 8-bit grayscale format. There are design restrictions such as a fixed-size cover image, and reconstruction without error of the hidden image. The steganography technique uses a Haar-DWT (Discrete Wavelet Transform) with hard thresholding and LSB… ▽ More In this paper we propose a new data hiding technique. The new technique uses steganography and cryptography on images with a size of 256x256 pixels and an 8-bit grayscale format. There are design restrictions such as a fixed-size cover image, and reconstruction without error of the hidden image. The steganography technique uses a Haar-DWT (Discrete Wavelet Transform) with hard thresholding and LSB (Less Significant Bit) technique on the cover image. The algorithms used for compressing and ciphering the secret image are lossless JPG and AES, respectively. The proposed technique is used to generate a stego image which provides a double type of security that is robust against attacks. Results are reported for different thresholds levels in terms of PSNR. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Comments: 9 pages, 15 figures, 2 tables; IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 3, September 2012. ISSN (Online): 1694-0814. www.IJCSI.org

Showing 1–11 of 11 results for author: Torres, G