Zum Hauptinhalt springen

Showing 51–79 of 79 results for author: Tzimiropoulos, G

.
  1. arXiv:1910.09469  [pdf, other

    cs.CV cs.LG eess.IV

    Object landmark discovery through unsupervised adaptation

    Authors: Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: This paper proposes a method to ease the unsupervised learning of object landmark detectors. Similarly to previous methods, our approach is fully unsupervised in a sense that it does not require or make any use of annotated landmarks for the target object category. Contrary to previous works, we do however assume that a landmark detector, which has already learned a structured representation for a… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019. Code is available https://github.com/ESanchezLozano/SAIC-Unsupervised-landmark-detection-NeurIPS2019

  2. arXiv:1909.13863  [pdf, other

    cs.CV cs.LG eess.IV

    XNOR-Net++: Improved Binary Neural Networks

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper proposes an improved training algorithm for binary neural networks in which both weights and activations are binary numbers. A key but fairly overlooked feature of the current state-of-the-art method of XNOR-Net is the use of analytically calculated real-valued scaling factors for re-weighting the output of binary convolutions. We argue that analytic calculation of these factors is sub-… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to BMVC 2019

  3. arXiv:1909.04951  [pdf, other

    cs.CV

    AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

    Authors: Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, Georgios Tzimiropoulos

    Abstract: Being heavily reliant on animals, it is our ethical obligation to improve their well-being by understanding their needs. Several studies show that animal needs are often expressed through their faces. Though remarkable progress has been made towards the automatic understanding of human faces, this has regrettably not been the case with animal faces. There exists significant room and appropriate ne… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: 15 pages, 14 figures

  4. arXiv:1904.07852  [pdf, other

    cs.CV cs.AI cs.LG

    Matrix and tensor decompositions for training binary neural networks

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantiza… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

  5. arXiv:1904.06345  [pdf, other

    cs.CV cs.AI cs.LG

    Incremental multi-domain learning with network latent tensor factorization

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at leas… ▽ More

    Submitted 22 November, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: AAAI20

  6. arXiv:1904.05868  [pdf, other

    cs.CV

    Improved training of binary networks for human pose estimation and image recognition

    Authors: Adrian Bulat, Georgios Tzimiropoulos, Jean Kossaifi, Maja Pantic

    Abstract: Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin. However, under low memory and limited computational power constraints, the accuracy on the same problems drops considerable. In this paper, we propose a series of techniques that significantly improve the accuracy of binarized neural… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

  7. arXiv:1904.02698  [pdf, other

    cs.CV cs.AI cs.LG

    T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor

    Authors: Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, Maja Pantic

    Abstract: Recent findings indicate that over-parametrization, while crucial for successfully training deep neural networks, also introduces large amounts of redundancy. Tensor methods have the potential to efficiently parametrize over-complete representations by leveraging this redundancy. In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  8. arXiv:1812.05082  [pdf, other

    cs.CV cs.LG

    Features Extraction Based on an Origami Representation of 3D Landmarks

    Authors: Juan Manuel Fernandez Montenegro, Mahdi Maktab Dar Oghaz, Athanasios Gkelias, Georgios Tzimiropoulos, Vasileios Argyriou

    Abstract: Feature extraction analysis has been widely investigated during the last decades in computer vision community due to the large range of possible applications. Significant work has been done in order to improve the performance of the emotion detection methods. Classification algorithms have been refined, novel preprocessing techniques have been applied and novel representations from images and vide… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

  9. arXiv:1812.02486  [pdf, other

    cs.CV

    Learning to Infer the Depth Map of a Hand from its Color Image

    Authors: Vassilis C. Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos, Antonis Argyros

    Abstract: We propose the first approach to the problem of inferring the depth map of a human hand based on a single RGB image. We achieve this with a Convolutional Neural Network (CNN) that employs a stacked hourglass model as its main building block. Intermediate supervision is used in several outputs of the proposed architecture in a staged approach. To aid the process of training and inference, hand segm… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  10. arXiv:1811.01194  [pdf, other

    cs.CV

    Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs

    Authors: Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos

    Abstract: Visual and audiovisual speech recognition are witnessing a renaissance which is largely due to the advent of deep learning methods. In this paper, we present a deep learning architecture for lipreading and audiovisual word recognition, which combines Residual Networks equipped with spatiotemporal input layers and Bidirectional LSTMs. The lipreading architecture attains 11.92% misclassification rat… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

    Comments: Accepted to Computer Vision and Image Understanding (Elsevier)

  11. arXiv:1810.00108  [pdf, other

    cs.CV

    Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

    Authors: Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

    Abstract: Recent works in speech recognition rely either on connectionist temporal classification (CTC) or sequence-to-sequence models for character-level recognition. CTC assumes conditional independence of individual characters, whereas attention-based models can provide nonsequential alignments. Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic al… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

    Comments: Accepted to IEEE SLT 2018

  12. arXiv:1809.03770  [pdf, other

    cs.CV

    3D Human Body Reconstruction from a Single Image via Volumetric Regression

    Authors: Aaron S. Jackson, Chris Manafas, Georgios Tzimiropoulos

    Abstract: This paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to work from a variety of input types, whether it be landmarks, images or segmentation masks. Additionally, non-visible parts, either self-occluded or otherwis… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: Accepted to ECCV Workshops (PeopleCap) 2018

  13. Hierarchical binary CNNs for landmark localization with limited resources

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tas… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

    Comments: Accepted to IEEE TPAMI18: Best of ICCV 2017 SI. Previously portions of this work appeared as arXiv:1703.00862, which was the conference version

  14. arXiv:1807.11458  [pdf, other

    cs.CV

    To learn image super-resolution, use a GAN to learn how to do image degradation first

    Authors: Adrian Bulat, Jing Yang, Georgios Tzimiropoulos

    Abstract: This paper is on image and face super-resolution. The vast majority of prior work for this problem focus on how to increase the resolution of low-resolution images which are artificially generated by simple bilinear down-sampling (or in a few cases by blurring followed by down-sampling).We show that such methods fail to produce good results when applied to real-world low-resolution, low quality im… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV18

  15. arXiv:1807.08469  [pdf, other

    cs.CV

    Zero-shot keyword spotting for visual speech recognition in-the-wild

    Authors: Themos Stafylakis, Georgios Tzimiropoulos

    Abstract: Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information. This paper focuses on visual KWS for words unseen during training, a real-world, practical setting which so far has received no attention by the community. To this end, we devise an end-to-end architecture comprising (a) a state-of-the-art visual feature extract… ▽ More

    Submitted 25 July, 2018; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: Accepted at ECCV-2018

  16. arXiv:1805.03487  [pdf, other

    cs.CV

    Joint Action Unit localisation and intensity estimation through heatmap regression

    Authors: Enrique Sanchez-Lozano, Georgios Tzimiropoulos, Michel Valstar

    Abstract: This paper proposes a supervised learning approach to jointly perform facial Action Unit (AU) localisation and intensity estimation. Contrary to previous works that try to learn an unsupervised representation of the Action Unit regions, we propose to directly and jointly estimate all AU intensities through heatmap regression, along with the location in the face where they cause visible changes. Ou… ▽ More

    Submitted 20 July, 2018; v1 submitted 9 May, 2018; originally announced May 2018.

    Comments: BMVC 2018. Code and model will be available to download from https://github.com/ESanchezLozano/Action-Units-Heatmaps

  17. arXiv:1802.06424  [pdf, ps, other

    cs.CV

    End-to-end Audiovisual Speech Recognition

    Authors: Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

    Abstract: Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-to-end audiovisual model based on residual networks and Bidirectional Gated Recurrent Units (BGRUs). To the be… ▽ More

    Submitted 22 February, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

    Comments: Accepted to ICASSP 2018

  18. arXiv:1712.02765  [pdf, other

    cs.CV

    Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper addresses 2 challenging tasks: improving the quality of low resolution facial images and accurately locating the facial landmarks on such poor resolution images. To this end, we make the following 5 contributions: (a) we propose Super-FAN: the very first end-to-end system that addresses both tasks simultaneously, i.e. both improves face resolution and detects the facial landmarks. The n… ▽ More

    Submitted 27 March, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

    Comments: CVPR 2018 SPOTLIGHT

  19. arXiv:1710.11201  [pdf, other

    cs.CV

    Deep word embeddings for visual speech recognition

    Authors: Themos Stafylakis, Georgios Tzimiropoulos

    Abstract: In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while suppressing other types of variability such as speaker, pose and illumination. The system is comprised of a spatiotemporal convolutional layer, a Residual Network a… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

  20. arXiv:1703.07834  [pdf, other

    cs.CV

    Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

    Authors: Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos

    Abstract: 3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these method… ▽ More

    Submitted 8 September, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: 10 pages, ICCV 2017

  21. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very l… ▽ More

    Submitted 7 September, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: accepted to ICCV 2017

  22. arXiv:1703.04105  [pdf, other

    cs.CV

    Combining Residual Networks with LSTMs for Lipreading

    Authors: Themos Stafylakis, Georgios Tzimiropoulos

    Abstract: We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The propos… ▽ More

    Submitted 8 September, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

    Comments: Submitted to Interspeech 2017

  23. arXiv:1703.00862  [pdf, other

    cs.CV cs.LG stat.ML

    Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation… ▽ More

    Submitted 7 August, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: ICCV 2017 Oral

  24. A Functional Regression approach to Facial Landmark Tracking

    Authors: Enrique Sánchez-Lozano, Georgios Tzimiropoulos, Brais Martinez, Fernando De la Torre, Michel Valstar

    Abstract: Linear regression is a fundamental building block in many face detection and tracking algorithms, typically used to predict shape displacements from image features through a linear mapping. This paper presents a Functional Regression solution to the least squares problem, which we coin Continuous Regression, resulting in the first real-time incremental face tracker. Contrary to prior work in Funct… ▽ More

    Submitted 20 September, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

    Comments: Accepted at IEEE TPAMI. This is authors' version. 0162-8828 ©2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

  25. arXiv:1609.09642  [pdf, other

    cs.CV

    A CNN Cascade for Landmark Guided Semantic Part Segmentation

    Authors: Aaron Jackson, Michel Valstar, Georgios Tzimiropoulos

    Abstract: This paper proposes a CNN cascade for semantic part segmentation guided by pose-specific information encoded in terms of a set of landmarks (or keypoints). There is large amount of prior work on each of these tasks separately, yet, to the best of our knowledge, this is the first time in literature that the interplay between pose estimation and semantic part segmentation is investigated. To address… ▽ More

    Submitted 30 September, 2016; originally announced September 2016.

    Comments: accepted to Geometry Meets Deep Learning ECCV 2016 Workshop

  26. Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper describes our submission to the 1st 3D Face Alignment in the Wild (3DFAW) Challenge. Our method builds upon the idea of convolutional part heatmap regression [1], extending it for 3D face alignment. Our method decomposes the problem into two parts: (a) X,Y (2D) estimation and (b) Z (depth) estimation. At the first stage, our method estimates the X,Y coordinates of the facial landmarks b… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: Winner of 3D Face Alignment in the Wild (3DFAW) Challenge, ECCV 2016

  27. Human pose estimation via Convolutional Part Heatmap Regression

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The first part of our cascade outputs part detectio… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: accepted to ECCV 2016

  28. arXiv:1608.01137  [pdf, other

    cs.CV

    Cascaded Continuous Regression for Real-time Incremental Face Tracking

    Authors: Enrique Sánchez-Lozano, Brais Martinez, Georgios Tzimiropoulos, Michel Valstar

    Abstract: This paper introduces a novel real-time algorithm for facial landmark tracking. Compared to detection, tracking has both additional challenges and opportunities. Arguably the most important aspect in this domain is updating a tracker's models as tracking progresses, also known as incremental (face) tracking. While this should result in more accurate localisation, how to do this online and in real… ▽ More

    Submitted 6 August, 2016; v1 submitted 3 August, 2016; originally announced August 2016.

    Comments: ECCV 2016 accepted paper, with supplementary material included as appendices. References to Equations fixed

  29. arXiv:1005.2715  [pdf, ps, other

    cs.CV

    On the Subspace of Image Gradient Orientations

    Authors: Georgios Tzimiropoulos, Stefanos Zafeiriou

    Abstract: We introduce the notion of Principal Component Analysis (PCA) of image gradient orientations. As image data is typically noisy, but noise is substantially different from Gaussian, traditional PCA of pixel intensities very often fails to estimate reliably the low-dimensional subspace of a given data population. We show that replacing intensities with gradient orientations and the $\ell_2$ norm with… ▽ More

    Submitted 15 May, 2010; originally announced May 2010.