Zum Hauptinhalt springen

Showing 251–300 of 331 results for author: Torr, P

.
  1. arXiv:1901.08150  [pdf, other

    cs.LG cs.CV stat.ML

    Hypergraph Convolution and Hypergraph Attention

    Authors: Song Bai, Feihu Zhang, Philip H. S. Torr

    Abstract: Recently, graph neural networks have attracted great attention and achieved prominent performance in various research fields. Most of those algorithms have assumed pairwise relationships of objects of interest. However, in many real applications, the relationships between objects are in higher-order, beyond a pairwise formulation. To efficiently learn deep embeddings on the high-order graph-struct… ▽ More

    Submitted 10 October, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: Accepted by Pattern Recognition

  2. arXiv:1812.11276  [pdf, other

    cs.LG stat.ML

    Learn to Interpret Atari Agents

    Authors: Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

    Abstract: Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision-making of the agents. In contrast to previous a-posteriori methods for visualizing DeepRL policies, in this work, we propose to equip the DeepRL model with an innate visualization ability. Our prop… ▽ More

    Submitted 5 April, 2023; v1 submitted 28 December, 2018; originally announced December 2018.

    Comments: An old report. Uploaded for archival purposes only

  3. arXiv:1812.06417  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Dialogue without Vision or Dialogue

    Authors: Daniela Massiceti, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

    Abstract: We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli. To do so, we develop an embarrassingly simple method based on Canonical Correlation Analysis (CCA) that, on the standard dataset, achieves near state-of-the-art performance on mean ra… ▽ More

    Submitted 22 October, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

    Comments: 2018 NeurIPS Workshop on Critiquing and Correcting Trends in Machine Learning

  4. arXiv:1812.05050  [pdf, other

    cs.CV

    Fast Online Object Tracking and Segmentation: A Unifying Approach

    Authors: Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H. S. Torr

    Abstract: In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on… ▽ More

    Submitted 4 May, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 camera ready. Code available at https://github.com/foolwood/SiamMask

  5. arXiv:1812.04353  [pdf, other

    cs.CV cs.LG

    Proximal Mean-field for Neural Network Quantization

    Authors: Thalaiyasingam Ajanthan, Puneet K. Dokania, Richard Hartley, Philip H. S. Torr

    Abstract: Compressing large Neural Networks (NN) by quantizing the parameters, while maintaining the performance is highly desirable due to reduced memory and time complexity. In this work, we cast NN quantization as a discrete labelling problem, and by examining relaxations, we design an efficient iterative optimization procedure that involves stochastic gradient descent followed by a projection. We prove… ▽ More

    Submitted 19 August, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

    Journal ref: ICCV, 2019

  6. arXiv:1812.01397  [pdf, other

    cs.CV

    Meta Learning Deep Visual Words for Fast Video Object Segmentation

    Authors: Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H. S. Torr

    Abstract: Personal robots and driverless cars need to be able to operate in novel environments and thus quickly and efficiently learn to recognise new object classes. We address this problem by considering the task of video object segmentation. Previous accurate methods for this task finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processi… ▽ More

    Submitted 16 August, 2020; v1 submitted 4 December, 2018; originally announced December 2018.

    Journal ref: In Proceedings of International Conference on Intelligent Robots and Systems (IROS) 2020

  7. arXiv:1811.07807  [pdf, other

    cs.CV

    Deeper Interpretability of Deep Networks

    Authors: Tian Xu, Jiayu Zhan, Oliver G. B. Garrod, Philip H. S. Torr, Song-Chun Zhu, Robin A. A. Ince, Philippe G. Schyns

    Abstract: Deep Convolutional Neural Networks (CNNs) have been one of the most influential recent developments in computer vision, particularly for categorization. There is an increasing demand for explainable AI as these systems are deployed in the real world. However, understanding the information represented and processed in CNNs remains in most cases challenging. Within this paper, we explore the use of… ▽ More

    Submitted 20 November, 2018; v1 submitted 19 November, 2018; originally announced November 2018.

  8. R$^3$SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

    Authors: Oscar Rahnama, Tommaso Cavallari, Stuart Golodetz, Simon Walker, Philip H. S. Torr

    Abstract: Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4 tables

    Journal ref: 2018 International Conference on Field-Programmable Technology (FPT)

  9. Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade

    Authors: Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Victor A. Prisacariu, Luigi Di Stefano, Philip H. S. Torr

    Abstract: Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achie… ▽ More

    Submitted 2 July, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin assert joint first authorship

    MSC Class: 68T45

  10. arXiv:1810.11702  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Multi-Agent Common Knowledge Reinforcement Learning

    Authors: Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

    Abstract: Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons… ▽ More

    Submitted 11 January, 2020; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: Advances in Neural Information Processing Systems, 9924-9935

  11. arXiv:1810.02340  [pdf, ps, other

    cs.CV cs.LG

    SNIP: Single-shot Network Pruning based on Connection Sensitivity

    Authors: Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr

    Abstract: Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at in… ▽ More

    Submitted 23 February, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: ICLR 2019

  12. arXiv:1808.03575  [pdf, other

    cs.CV

    Weakly- and Semi-Supervised Panoptic Segmentation

    Authors: Qizhu Li, Anurag Arnab, Philip H. S. Torr

    Abstract: We present a weakly supervised model that jointly performs both semantic- and instance-segmentation -- a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both "t… ▽ More

    Submitted 12 January, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

    Comments: ECCV 2018. The first two authors contributed equally

  13. arXiv:1807.07706  [pdf, other

    cs.LG hep-ph physics.data-an stat.ML

    Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

    Authors: Atılım Güneş Baydin, Lukas Heinrich, Wahid Bhimji, Lei Shao, Saeid Naderiparizi, Andreas Munk, Jialin Liu, Bradley Gram-Hansen, Gilles Louppe, Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Kyle Cranmer, Frank Wood

    Abstract: We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable po… ▽ More

    Submitted 17 February, 2020; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: 20 pages, 9 figures

    MSC Class: 68T37; 68T05; 62P35 ACM Class: G.3; I.2.6; J.2

    Journal ref: In Advances in Neural Information Processing Systems 33 (NeurIPS), Vancouver, Canada, 2019

  14. arXiv:1807.04200  [pdf, other

    cs.CV

    With Friends Like These, Who Needs Adversaries?

    Authors: Saumya Jetley, Nicholas A. Lord, Philip H. S. Torr

    Abstract: The vulnerability of deep image classification networks to adversarial attack is now well known, but less well understood. Via a novel experimental analysis, we illustrate some facts about deep convolutional networks for image classification that shed new light on their behaviour and how it connects to the problem of adversaries. In short, the celebrated performance of these networks and their vul… ▽ More

    Submitted 8 January, 2019; v1 submitted 11 July, 2018; originally announced July 2018.

    Comments: Published in this form at NeurIPS 2018

  15. arXiv:1805.11199  [pdf, other

    cs.AI cs.LG

    Value Propagation Networks

    Authors: Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H. S. Torr, Nicolas Usunier

    Abstract: We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes s… ▽ More

    Submitted 25 March, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: Updated to match ICLR 2019 OpenReview's version

  16. arXiv:1805.09028  [pdf, other

    cs.CV

    Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials

    Authors: Thomas Joy, Alban Desmaison, Thalaiyasingam Ajanthan, Rudy Bunel, Mathieu Salzmann, Pushmeet Kohli, Philip H. S. Torr, M. Pawan Kumar

    Abstract: Dense conditional random fields (CRFs) have become a popular framework for modelling several problems in computer vision such as stereo correspondence and multi-class semantic segmentation. By modelling long-range interactions, dense CRFs provide a labelling that captures finer detail than their sparse counterparts. Currently, the state-of-the-art algorithm performs mean-field inference using a fi… ▽ More

    Submitted 26 October, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

  17. arXiv:1805.08136  [pdf, other

    cs.CV cs.LG stat.ML

    Meta-learning with differentiable closed-form solvers

    Authors: Luca Bertinetto, João F. Henriques, Philip H. S. Torr, Andrea Vedaldi

    Abstract: Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures. Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent. Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models ver… ▽ More

    Submitted 24 July, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Published at ICLR'19. Code and data available at http://www.robots.ox.ac.uk/~luca/r2d2.html

  18. arXiv:1804.07090  [pdf, other

    cs.LG cs.AI stat.ML

    Robustness via Deep Low-Rank Representations

    Authors: Amartya Sanyal, Varun Kanade, Philip H. S. Torr, Puneet K. Dokania

    Abstract: We investigate the effect of the dimensionality of the representations learned in Deep Neural Networks (DNNs) on their robustness to input perturbations, both adversarial and random. To achieve low dimensionality of learned representations, we propose an easy-to-use, end-to-end trainable, low-rank regularizer (LR) that can be applied to any intermediate layer representation of a DNN. This regulari… ▽ More

    Submitted 19 February, 2020; v1 submitted 19 April, 2018; originally announced April 2018.

  19. arXiv:1804.06364  [pdf, other

    cs.CV stat.ML

    DGPose: Deep Generative Models for Human Body Analysis

    Authors: Rodrigo de Bem, Arnab Ghosh, Thalaiyasingam Ajanthan, Ondrej Miksik, Adnane Boukhayma, N. Siddharth, Philip Torr

    Abstract: Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows indepe… ▽ More

    Submitted 14 February, 2020; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: IJCV 2020 special issue on 'Generating Realistic Visual Data of Human Behavior' preprint. Keywords: deep generative models, semi-supervised learning, human pose estimation, variational autoencoders, generative adversarial networks

  20. arXiv:1804.02391  [pdf, other

    cs.CV cs.AI

    Learn To Pay Attention

    Authors: Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr

    Abstract: We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorp… ▽ More

    Submitted 26 April, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

    Comments: International Conference on Learning Representations 2018

  21. arXiv:1803.09860  [pdf, other

    cs.CV

    Three Birds One Stone: A General Architecture for Salient Object Segmentation, Edge Detection and Skeleton Extraction

    Authors: Qibin Hou, Jiang-Jiang Liu, Ming-Ming Cheng, Ali Borji, Philip H. S. Torr

    Abstract: In this paper, we aim at solving pixel-wise binary problems, including salient object segmentation, skeleton extraction, and edge detection, by introducing a unified architecture. Previous works have proposed tailored methods for solving each of the three tasks independently. Here, we show that these tasks share some similarities that can be exploited for developing a unified framework. In particu… ▽ More

    Submitted 5 April, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

  22. arXiv:1803.09859  [pdf, other

    cs.CV

    WebSeg: Learning Semantic Segmentation from Web Searches

    Authors: Qibin Hou, Ming-Ming Cheng, Jiangjiang Liu, Philip H. S. Torr

    Abstract: In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods. To solve such a challenging problem, we leverage several low-level cues (such as saliency, edg… ▽ More

    Submitted 26 March, 2018; originally announced March 2018.

    Comments: Submitted to ECCV2018

  23. arXiv:1803.09502  [pdf, other

    cs.CV

    Long-term Tracking in the Wild: A Benchmark

    Authors: Jack Valmadre, Luca Bertinetto, João F. Henriques, Ran Tao, Andrea Vedaldi, Arnold Smeulders, Philip Torr, Efstratios Gavves

    Abstract: We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms. Benchmarks have enabled great strides in the field of object tracking by defining standardized evaluations on large sets of diverse videos. However, these works have focused exclusively on sequences that are just tens of seconds in length and in which the target is always visible. Consequently, most rese… ▽ More

    Submitted 10 August, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: To appear at ECCV 2018

  24. arXiv:1802.07351  [pdf, other

    cs.CV

    Devon: Deformable Volume Network for Learning Optical Flow

    Authors: Yao Lu, Jack Valmadre, Heng Wang, Juho Kannala, Mehrtash Harandi, Philip H. S. Torr

    Abstract: State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions. Despite their impressive results, it is known that there are two problems with the approach. First, the multi-resolution estimation of optical flow fails in situations where small objects move fast. Second, warping creates artifact… ▽ More

    Submitted 4 March, 2019; v1 submitted 20 February, 2018; originally announced February 2018.

  25. Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Authors: Oscar Rahnama, Duncan Frost, Ondrej Miksik, Philip H. S. Torr

    Abstract: For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an imp… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

    Comments: 8 pages, 7 figures, 2 tables

    Journal ref: IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2008-2015, July 2018

  26. arXiv:1802.03803  [pdf, other

    cs.CV

    FlipDial: A Generative Model for Two-Way Visual Dialogue

    Authors: Daniela Massiceti, N. Siddharth, Puneet K. Dokania, Philip H. S. Torr

    Abstract: We present FlipDial, a generative model for visual dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FlipDial learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which… ▽ More

    Submitted 3 April, 2018; v1 submitted 11 February, 2018; originally announced February 2018.

  27. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

    Authors: Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr

    Abstract: Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classif… ▽ More

    Submitted 14 August, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

  28. Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation

    Authors: Stuart Golodetz, Tommaso Cavallari, Nicholas A Lord, Victor A Prisacariu, David W Murray, Philip H S Torr

    Abstract: Reconstructing dense, volumetric models of real-world 3D scenes is important for many tasks, but capturing large scenes can take significant time, and the risk of transient changes to the scene goes up as the capture time increases. These are good reasons to want instead to capture several smaller sub-scenes that can be joined to make the whole scene. Achieving this has traditionally been difficul… ▽ More

    Submitted 2 July, 2019; v1 submitted 25 January, 2018; originally announced January 2018.

    Comments: Stuart Golodetz, Tommaso Cavallari and Nicholas Lord assert joint first authorship

    MSC Class: 68T45

    Journal ref: IEEE Transactions on Visualization and Computer Graphics 24(11):2895-2905, 2018

  29. arXiv:1711.09856  [pdf, other

    cs.CV

    On the Robustness of Semantic Segmentation Models to Adversarial Attacks

    Authors: Anurag Arnab, Ondrej Miksik, Philip H. S. Torr

    Abstract: Deep Neural Networks (DNNs) have demonstrated exceptional performance on most recognition tasks such as image classification and segmentation. However, they have also been shown to be vulnerable to adversarial examples. This phenomenon has recently attracted a lot of attention but it has not been extensively studied on multiple, large-scale datasets and structured prediction tasks such as semantic… ▽ More

    Submitted 8 July, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 extended version

  30. arXiv:1711.06025  [pdf, other

    cs.CV

    Learning to Compare: Relation Network for Few-Shot Learning

    Authors: Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, Timothy M. Hospedales

    Abstract: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is desig… ▽ More

    Submitted 27 March, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: To appear in CVPR2018

  31. arXiv:1711.00455  [pdf, ps, other

    cs.AI cs.LG

    A Unified View of Piecewise Linear Neural Network Verification

    Authors: Rudy Bunel, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, M. Pawan Kumar

    Abstract: The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear st… ▽ More

    Submitted 22 May, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Updated version of "Piecewise Linear Neural Network verification: A comparative study"

  32. arXiv:1709.03612  [pdf, other

    cs.CV

    Holistic, Instance-Level Human Parsing

    Authors: Qizhu Li, Anurag Arnab, Philip H. S. Torr

    Abstract: Object parsing -- the task of decomposing an object into its semantic parts -- has traditionally been formulated as a category-level segmentation problem. Consequently, when there are multiple objects in an image, current methods cannot count the number of objects in the scene, nor can they determine which part belongs to which object. We address this problem by segmenting the parts of objects at… ▽ More

    Submitted 11 September, 2017; originally announced September 2017.

    Comments: Poster at BMVC 2017

  33. arXiv:1708.00783  [pdf, other

    cs.CV

    InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure

    Authors: Victor Adrian Prisacariu, Olaf Kähler, Stuart Golodetz, Michael Sapienza, Tommaso Cavallari, Philip H S Torr, David W Murray

    Abstract: Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

    Comments: This article largely supersedes arxiv:1410.0925 (it describes version 3 of the InfiniTAM framework)

  34. arXiv:1707.07213  [pdf, other

    cs.CV

    Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

    Authors: Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

    Abstract: Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. We cast the action tube extraction as an energy maximisation p… ▽ More

    Submitted 6 August, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

    Comments: Typos corrected

  35. arXiv:1707.05821  [pdf, other

    cs.CV

    Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation

    Authors: Arslan Chaudhry, Puneet K. Dokania, Philip H. S. Torr

    Abstract: We propose an approach to discover class-specific pixels for the weakly-supervised semantic segmentation task. We show that properly combining saliency and attention maps allows us to obtain reliable cues capable of significantly boosting the performance. First, we propose a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object de… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Journal ref: 28th British Machine Vision Conference (BMVC), 2017

  36. arXiv:1706.00400  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Disentangled Representations with Semi-Supervised Deep Generative Models

    Authors: N. Siddharth, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank Wood, Philip H. S. Torr

    Abstract: Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectur… ▽ More

    Submitted 13 November, 2017; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: Accepted for publication at NIPS 2017

  37. arXiv:1704.06036  [pdf, other

    cs.CV cs.LG

    End-to-end representation learning for Correlation Filter based tracking

    Authors: Jack Valmadre, Luca Bertinetto, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr

    Abstract: The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations. It is well suited to object tracking because its formulation in the Fourier domain provides a fast solution, enabling the detector to be re-trained once per frame. Previous works that use the Correlation Filter, however, have adopted features that were either manually designe… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: To appear at CVPR 2017

  38. arXiv:1704.04394  [pdf, other

    cs.CV

    DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

    Authors: Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, Manmohan Chandraker

    Abstract: We introduce a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and m… ▽ More

    Submitted 14 April, 2017; originally announced April 2017.

    Comments: Accepted at CVPR 2017

  39. arXiv:1704.02906  [pdf, other

    cs.CV cs.AI cs.GR cs.LG stat.ML

    Multi-Agent Diverse Generative Adversarial Networks

    Authors: Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania

    Abstract: We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is de… ▽ More

    Submitted 16 July, 2018; v1 submitted 10 April, 2017; originally announced April 2017.

    Comments: This is an updated version of our CVPR'18 paper with the same title. In this version, we also introduce MAD-GAN-Sim in Appendix B

  40. arXiv:1704.02386  [pdf, other

    cs.CV

    Pixelwise Instance Segmentation with a Dynamically Instantiated Network

    Authors: Anurag Arnab, Philip H. S Torr

    Abstract: Semantic segmentation and object detection research have recently achieved rapid progress. However, the former task has no notion of different instances of the same object, and the latter operates at a coarse, bounding-box level. We propose an Instance Segmentation system that produces a segmentation map where each pixel is assigned an object class and instance identity label. Most approaches adap… ▽ More

    Submitted 7 April, 2017; originally announced April 2017.

    Comments: CVPR 2017

  41. arXiv:1704.01358  [pdf, other

    cs.CV

    Incremental Tube Construction for Human Action Detection

    Authors: Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr

    Abstract: Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and associ… ▽ More

    Submitted 23 July, 2018; v1 submitted 5 April, 2017; originally announced April 2017.

    Comments: British Machine Vision Conference (BMVC) 2018

  42. arXiv:1702.08887  [pdf, other

    cs.AI cs.LG cs.MA

    Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

    Authors: Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

    Abstract: Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that indep… ▽ More

    Submitted 21 May, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

    Comments: Camera-ready version, International Conference of Machine Learning 2017; updated to fix print-breaking image

  43. arXiv:1702.02779  [pdf, other

    cs.CV

    On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation

    Authors: Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Luigi Di Stefano, Philip H. S. Torr

    Abstract: Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known poses coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the… ▽ More

    Submitted 26 June, 2017; v1 submitted 9 February, 2017; originally announced February 2017.

    Comments: To appear in the proceedings of CVPR 2017

  44. arXiv:1701.06805  [pdf, other

    cs.CV

    A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

    Authors: Måns Larsson, Anurag Arnab, Fredrik Kahl, Shuai Zheng, Philip Torr

    Abstract: Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to mode… ▽ More

    Submitted 2 January, 2018; v1 submitted 24 January, 2017; originally announced January 2017.

    Comments: Presented at EMMCVPR 2017 conference

  45. arXiv:1612.02101  [pdf, other

    cs.CV

    Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation

    Authors: Qinbin Hou, Puneet Kumar Dokania, Daniela Massiceti, Yunchao Wei, Ming-Ming Cheng, Philip Torr

    Abstract: We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels which specify the object classes present in the image. Our method uses deep convolutional neural networks (CNNs) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-s… ▽ More

    Submitted 9 April, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

  46. arXiv:1612.01495  [pdf, other

    cs.CV

    ROAM: a Rich Object Appearance Model with Application to Rotoscoping

    Authors: Ondrej Miksik, Juan-Manuel Pérez-Rúa, Philip H. S. Torr, Patrick Pérez

    Abstract: Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the se… ▽ More

    Submitted 5 December, 2016; originally announced December 2016.

  47. arXiv:1612.01094  [pdf, other

    cs.LG

    Learning to superoptimize programs - Workshop Version

    Authors: Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H. S. Torr, Pushmeet Kohli

    Abstract: Superoptimization requires the estimation of the best program for a given computational task. In order to deal with large programs, superoptimization techniques perform a stochastic search. This involves proposing a modification of the current program, which is accepted or rejected based on the improvement achieved. The state of the art method uses uniform proposal distributions, which fails to ex… ▽ More

    Submitted 4 December, 2016; originally announced December 2016.

    Comments: Workshop version for the NIPS NAMPI Workshop. Extended version at arXiv:1611.01787

  48. arXiv:1612.00380  [pdf, other

    cs.AI cs.CV stat.ML

    Playing Doom with SLAM-Augmented Deep Reinforcement Learning

    Authors: Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, Philip H. S. Torr

    Abstract: A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that ind… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.

  49. arXiv:1611.09718  [pdf, other

    cs.CV

    Efficient Linear Programming for Dense CRFs

    Authors: Thalaiyasingam Ajanthan, Alban Desmaison, Rudy Bunel, Mathieu Salzmann, Philip H. S. Torr, M. Pawan Kumar

    Abstract: The fully connected conditional random field (CRF) with Gaussian pairwise potentials has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a linear programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimizatio… ▽ More

    Submitted 14 February, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

    Comments: 24 pages, 10 figures and 4 tables

    ACM Class: G.1.6; I.4.6

  50. arXiv:1611.08563  [pdf, other

    cs.CV

    Online Real-time Multiple Spatiotemporal Action Localisation and Prediction

    Authors: Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, Fabio Cuzzolin

    Abstract: We present a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation, classification and early prediction. Current state-of-the-art approaches work offline and are too slow to be useful in real- world settings. To overcome their limitations we introduce two major developments. Firstly, we adopt real-time SSD (Single Shot MultiBox Detector) convolutional neural netw… ▽ More

    Submitted 24 August, 2017; v1 submitted 25 November, 2016; originally announced November 2016.

    Comments: 10 pages 3 figures, ICCV 2017, Added link to new annotations of ucf101-24