Zum Hauptinhalt springen

Showing 1–43 of 43 results for author: Bulat, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10433  [pdf, other

    cs.CV

    CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

    Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Despite recent successes, LVLMs or Large Vision Language Models are prone to hallucinating details like objects and their properties or relations, limiting their real-world deployment. To address this and improve their robustness, we present CLIP-DPO, a preference optimization method that leverages contrastively pre-trained Vision-Language (VL) embedding models, such as CLIP, for DPO-based optimiz… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2405.10286  [pdf, other

    cs.CV cs.AI

    FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models

    Authors: Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

    Abstract: Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024

  3. arXiv:2401.17258  [pdf, other

    cs.CV

    You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

    Authors: Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby mak… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  4. arXiv:2307.15697  [pdf, other

    cs.CV

    Aligned Unsupervised Pretraining of Object Detectors with Self-training

    Authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos

    Abstract: The unsupervised pretraining of object detectors has recently become a key component of object detector training, as it leads to improved performance and faster convergence during the supervised fine-tuning stage. Existing unsupervised pretraining methods, however, typically rely on low-level information to define proposals that are used to train the detector. Furthermore, in the absence of class… ▽ More

    Submitted 7 July, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  5. arXiv:2304.01752  [pdf, other

    cs.CV cs.CL cs.LG

    Black Box Few-Shot Adaptation for Vision-Language models

    Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaptation aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the… ▽ More

    Submitted 17 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Published at ICCV 2023

  6. arXiv:2210.04845  [pdf, other

    cs.CV cs.AI

    FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

    Authors: Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper is on Few-Shot Object Detection (FSOD), where given a few templates (examples) depicting a novel class (not seen during training), the goal is to detect all of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following desiderata: (a) it must be used as is, without requiring any fine-tuning at test time, (b) it must be able to process… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  7. arXiv:2210.02390  [pdf, other

    cs.CV cs.AI cs.LG

    Bayesian Prompt Learning for Image-Language Model Generalization

    Authors: Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generaliza… ▽ More

    Submitted 20 August, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  8. arXiv:2210.01115  [pdf, other

    cs.CV cs.AI cs.LG

    LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Soft prompt learning has recently emerged as one of the methods of choice for adapting V&L models to a downstream task using a few training examples. However, current methods significantly overfit the training data, suffering from large accuracy degradation when tested on unseen classes from the same domain. To this end, in this paper, we make the following 4 contributions: (1) To alleviate base c… ▽ More

    Submitted 2 April, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted at CVPR 2023

  9. arXiv:2209.15000  [pdf, other

    cs.CV cs.AI cs.LG

    REST: REtrieve & Self-Train for generative action recognition

    Authors: Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This work is on training a generative action/video recognition model whose output is a free-form action-specific caption describing the video (rather than an action class label). A generative approach has practical advantages like producing more fine-grained and human-readable output, and being naturally open-world. To this end, we propose to adapt a pre-trained generative Vision & Language (V&L)… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  10. arXiv:2208.11108  [pdf, other

    cs.CV cs.LG

    Efficient Attention-free Video Shift Transformers

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper tackles the problem of efficient video recognition. In this area, video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum. At the same time, there have been some attempts in the image domain which challenge the necessity of the self-attention operation within the transformer architecture, advocating the use of simpler approaches for token mixing. How… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  11. arXiv:2206.08339  [pdf, other

    cs.CV cs.LG

    iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

    Authors: Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Learning visual representations through self-supervision is an extremely challenging task as the network needs to sieve relevant patterns from spurious distractors without the active guidance provided by supervision. This is achieved through heavy data augmentation, large-scale datasets and prohibitive amounts of compute. Video self-supervised learning (SSL) suffers from added challenges: video da… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  12. arXiv:2205.06701  [pdf, other

    cs.CV

    Knowledge Distillation Meets Open-Set Semi-Supervised Learning

    Authors: Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Existing knowledge distillation methods mostly focus on distillation of teacher's prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel {\em \modelname{}} ({\bf\em \shortname{})} method dedicated for distilling representational knowledge semantica… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: Accepted by IJCV

  13. arXiv:2205.03436  [pdf, other

    cs.CV

    EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

    Authors: Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive architecture alternative to convolutional neural networks (CNNs) in computer vision. Despite increasingly stronger variants with ever-higher recognition accuracies, due to the quadratic complexity of self-attention, existing ViTs are typically demanding in computation and model size. Although several… ▽ More

    Submitted 21 July, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted in ECCV 2022

  14. arXiv:2111.02360  [pdf, other

    cs.CV

    Subpixel Heatmap Regression for Facial Landmark Localization

    Authors: Adrian Bulat, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: Deep Learning models based on heatmap regression have revolutionized the task of facial landmark localization with existing models working robustly under large poses, non-uniform illumination and shadows, occlusions and self-occlusions, low resolution and blur. However, despite their wide adoption, heatmap regression approaches suffer from discretization-induced errors related to both the heatmap… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: Accepted at BMVC 2021

  15. arXiv:2110.13859  [pdf, other

    cs.LG cs.AI cs.CV

    Defensive Tensorization

    Authors: Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Timothy Hospedales, Georgios Tzimiropoulos, Nicholas D Lane, Maja Pantic

    Abstract: We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network. The layers of a network are first expressed as factorized tensor layers. Tensor dropout is then applied in the latent subspace, therefore resulting in dense reconstructed weights, without the sparsity or perturbations typically induced by the randomization.Our appro… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: To be presented at BMVC 2021

  16. arXiv:2110.02902  [pdf, ps, other

    cs.CV

    SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

    Authors: Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Ranked third in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021

  17. arXiv:2106.05968  [pdf, other

    cs.CV cs.AI cs.LG

    Space-time Mixing Attention for Video Transformer

    Authors: Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper is on video recognition using Transformers. Very recent attempts in this area have demonstrated promising results in terms of recognition accuracy, yet they have been also shown to induce, in many cases, significant computational overheads due to the additional modelling of the temporal information. In this work, we propose a Video Transformer model the complexity of which scales linear… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Updated results on SSv2

  18. arXiv:2103.17267  [pdf, other

    cs.LG cs.AI cs.CV

    Bit-Mixer: Mixed-precision networks with runtime bit-width selection

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Mixed-precision networks allow for a variable bit-width quantization for every layer in the network. A major limitation of existing work is that the bit-width for each layer must be predefined during training time. This allows little flexibility if the characteristics of the device on which the network is deployed change during runtime. In this work, we propose Bit-Mixer, the very first method to… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  19. arXiv:2103.16554  [pdf, other

    cs.CV cs.LG

    Pre-training strategies and datasets for facial representation learning

    Authors: Adrian Bulat, Shiyang Cheng, Jing Yang, Andrew Garbett, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: What is the best way to learn a universal face representation? Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e.g. face recognition, facial landmark localization etc.) but has overlooked the overarching question of how to find a facial representation that can be readily adapted to several facial analysis tasks and datase… ▽ More

    Submitted 20 July, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted at ECCV 2022

  20. arXiv:2102.04442  [pdf, other

    cs.CV cs.LG

    Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

    Authors: Adrian Bulat, Enrique Sánchez-Lozano, Georgios Tzimiropoulos

    Abstract: An important component of unsupervised learning by instance-based discrimination is a memory bank for storing a feature representation for each training sample in the dataset. In this paper, we introduce 3 improvements to the vanilla memory bank-based formulation which brings massive accuracy gains: (a) Large mini-batch: we pull multiple augmentations for each sample within the same batch and show… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted at ICASSP 2021

  21. arXiv:2011.01864  [pdf, other

    cs.CV

    Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

    Authors: Enrique Sanchez, Adrian Bulat, Anestis Zaganidis, Georgios Tzimiropoulos

    Abstract: This paper tackles the challenging problem of estimating the intensity of Facial Action Units with few labeled images. Contrary to previous works, our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2\%$ of annotated frames, which are \textit{randomly chosen}. To this end, we propose a semi-supervised learning approach where a spatio-… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: ACCV 2020

  22. arXiv:2010.03558  [pdf, other

    cs.CV cs.AI cs.LG

    High-Capacity Expert Binary Networks

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Network binarization is a promising hardware-aware direction for creating efficient deep models. Despite its memory and computational advantages, reducing the accuracy gap between binary models and their real-valued counterparts remains an unsolved challenging research problem. To this end, we make the following 3 contributions: (a) To increase model capacity, we propose Expert Binary Convolution,… ▽ More

    Submitted 30 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021

  23. arXiv:2004.06657  [pdf, other

    cs.CV

    A Transfer Learning approach to Heatmap Regression for Action Unit intensity estimation

    Authors: Ioanna Ntinou, Enrique Sanchez, Adrian Bulat, Michel Valstar, Georgios Tzimiropoulos

    Abstract: Action Units (AUs) are geometrically-based atomic facial muscle movements known to produce appearance changes at specific facial locations. Motivated by this observation we propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity. To this end, we propose a simple yet efficient approach based on Heatmap Regression that merges both problems into a sin… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: Submitted for review to IEEE Trans. on Affective Computing

  24. arXiv:2003.11535  [pdf, other

    cs.CV

    Training Binary Neural Networks with Real-to-Binary Convolutions

    Authors: Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper shows how to train binary networks to within a few percent points ($\sim 3-5 \%$) of the full precision counterpart. We first show how to build a strong baseline, which already achieves state-of-the-art accuracy, by combining recently proposed advances and carefully adjusting the optimization procedure. Secondly, we show that by attempting to minimize the discrepancy between the output… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: ICLR 2020

  25. arXiv:2003.04289  [pdf, other

    cs.CV cs.LG

    Knowledge distillation via adaptive instance normalization

    Authors: Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper addresses the problem of model compression via knowledge distillation. To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student. Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher throug… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  26. arXiv:2003.01711  [pdf, other

    cs.CV cs.LG

    BATS: Binary ArchitecTure Search

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper proposes Binary ArchitecTure Search (BATS), a framework that drastically reduces the accuracy gap between binary neural networks and their real-valued counterparts by means of Neural Architecture Search (NAS). We show that directly applying NAS to the binary domain provides very poor results. To alleviate this, we describe, to our knowledge, for the first time, the 3 key ingredients for… ▽ More

    Submitted 23 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: accepted to ECCV 2020

  27. arXiv:2002.11098  [pdf, other

    cs.CV

    Toward fast and accurate human pose estimation via soft-gated skip connections

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: This paper is on highly accurate and highly efficient human pose estimation. Recent works based on Fully Convolutional Networks (FCNs) have demonstrated excellent results for this difficult problem. While residual connections within FCNs have proved to be quintessential for achieving high accuracy, we re-analyze this design choice in the context of improving both the accuracy and the efficiency ov… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted to FG 2020 (oral)

  28. arXiv:1911.06095  [pdf, other

    cs.CV

    Towards Pose-invariant Lip-Reading

    Authors: Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

    Abstract: Lip-reading models have been significantly improved recently thanks to powerful deep learning architectures. However, most works focused on frontal or near frontal views of the mouth. As a consequence, lip-reading performance seriously deteriorates in non-frontal mouth views. In this work, we present a framework for training pose-invariant lip-reading models on synthetic data instead of collecting… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: 6 pages, 2 figures

  29. arXiv:1909.13863  [pdf, other

    cs.CV cs.LG eess.IV

    XNOR-Net++: Improved Binary Neural Networks

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper proposes an improved training algorithm for binary neural networks in which both weights and activations are binary numbers. A key but fairly overlooked feature of the current state-of-the-art method of XNOR-Net is the use of analytically calculated real-valued scaling factors for re-weighting the output of binary convolutions. We argue that analytic calculation of these factors is sub-… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to BMVC 2019

  30. arXiv:1906.06196  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

    Authors: Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy Hospedales, Maja Pantic

    Abstract: Training deep neural networks with spatio-temporal (i.e., 3D) or multidimensional convolutions of higher-order is computationally challenging due to millions of unknown parameters across dozens of layers. To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters. Alternatively, new convolut… ▽ More

    Submitted 31 March, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: IEEE CVPR 2020

  31. arXiv:1904.07852  [pdf, other

    cs.CV cs.AI cs.LG

    Matrix and tensor decompositions for training binary neural networks

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantiza… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

  32. arXiv:1904.06345  [pdf, other

    cs.CV cs.AI cs.LG

    Incremental multi-domain learning with network latent tensor factorization

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at leas… ▽ More

    Submitted 22 November, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: AAAI20

  33. arXiv:1904.05868  [pdf, other

    cs.CV

    Improved training of binary networks for human pose estimation and image recognition

    Authors: Adrian Bulat, Georgios Tzimiropoulos, Jean Kossaifi, Maja Pantic

    Abstract: Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin. However, under low memory and limited computational power constraints, the accuracy on the same problems drops considerable. In this paper, we propose a series of techniques that significantly improve the accuracy of binarized neural… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

  34. arXiv:1904.02698  [pdf, other

    cs.CV cs.AI cs.LG

    T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor

    Authors: Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, Maja Pantic

    Abstract: Recent findings indicate that over-parametrization, while crucial for successfully training deep neural networks, also introduces large amounts of redundancy. Tensor methods have the potential to efficiently parametrize over-complete representations by leveraging this redundancy. In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  35. arXiv:1902.10758  [pdf, other

    cs.LG stat.ML

    Tensor Dropout for Robust Learning

    Authors: Arinbjörn Kolbeinsson, Jean Kossaifi, Yannis Panagakis, Adrian Bulat, Anima Anandkumar, Ioanna Tzoulaki, Paul Matthews

    Abstract: CNNs achieve remarkable performance by leveraging deep, over-parametrized architectures, trained on large datasets. However, they have limited generalization ability to data outside the training domain, and a lack of robustness to noise and adversarial attacks. By building better inductive biases, we can improve robustness and also obtain smaller networks that are more memory and computationally e… ▽ More

    Submitted 11 December, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

  36. Hierarchical binary CNNs for landmark localization with limited resources

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tas… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

    Comments: Accepted to IEEE TPAMI18: Best of ICCV 2017 SI. Previously portions of this work appeared as arXiv:1703.00862, which was the conference version

  37. arXiv:1807.11458  [pdf, other

    cs.CV

    To learn image super-resolution, use a GAN to learn how to do image degradation first

    Authors: Adrian Bulat, Jing Yang, Georgios Tzimiropoulos

    Abstract: This paper is on image and face super-resolution. The vast majority of prior work for this problem focus on how to increase the resolution of low-resolution images which are artificially generated by simple bilinear down-sampling (or in a few cases by blurring followed by down-sampling).We show that such methods fail to produce good results when applied to real-world low-resolution, low quality im… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV18

  38. arXiv:1712.02765  [pdf, other

    cs.CV

    Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper addresses 2 challenging tasks: improving the quality of low resolution facial images and accurately locating the facial landmarks on such poor resolution images. To this end, we make the following 5 contributions: (a) we propose Super-FAN: the very first end-to-end system that addresses both tasks simultaneously, i.e. both improves face resolution and detects the facial landmarks. The n… ▽ More

    Submitted 27 March, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

    Comments: CVPR 2018 SPOTLIGHT

  39. arXiv:1703.07834  [pdf, other

    cs.CV

    Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

    Authors: Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos

    Abstract: 3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these method… ▽ More

    Submitted 8 September, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: 10 pages, ICCV 2017

  40. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very l… ▽ More

    Submitted 7 September, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: accepted to ICCV 2017

  41. arXiv:1703.00862  [pdf, other

    cs.CV cs.LG stat.ML

    Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation… ▽ More

    Submitted 7 August, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: ICCV 2017 Oral

  42. Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper describes our submission to the 1st 3D Face Alignment in the Wild (3DFAW) Challenge. Our method builds upon the idea of convolutional part heatmap regression [1], extending it for 3D face alignment. Our method decomposes the problem into two parts: (a) X,Y (2D) estimation and (b) Z (depth) estimation. At the first stage, our method estimates the X,Y coordinates of the facial landmarks b… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: Winner of 3D Face Alignment in the Wild (3DFAW) Challenge, ECCV 2016

  43. Human pose estimation via Convolutional Part Heatmap Regression

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The first part of our cascade outputs part detectio… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: accepted to ECCV 2016