Search | arXiv e-print repository

doi 10.1016/j.patcog.2020.107369

Learning Direct Optimization for Scene Understanding

Authors: Lukasz Romaszko, Christopher K. I. Williams, John Winn

Abstract: We develop a Learning Direct Optimization (LiDO) method for the refinement of a latent variable model that describes input image x. Our goal is to explain a single image x with an interpretable 3D computer graphics model having scene graph latent variables z (such as object appearance, camera position). Given a current estimate of z we can render a prediction of the image g(z), which can be compar… ▽ More We develop a Learning Direct Optimization (LiDO) method for the refinement of a latent variable model that describes input image x. Our goal is to explain a single image x with an interpretable 3D computer graphics model having scene graph latent variables z (such as object appearance, camera position). Given a current estimate of z we can render a prediction of the image g(z), which can be compared to the image x. The standard way to proceed is then to measure the error E(x, g(z)) between the two, and use an optimizer to minimize the error. However, it is unknown which error measure E would be most effective for simultaneously addressing issues such as misaligned objects, occlusions, textures, etc. In contrast, the LiDO approach trains a Prediction Network to predict an update directly to correct z, rather than minimizing the error with respect to z. Experiments show that our LiDO method converges rapidly as it does not need to perform a search on the error landscape, produces better solutions than error-based competitors, and is able to handle the mismatch between the data and the fitted scene model. We apply LiDO to a realistic synthetic dataset, and show that the method also transfers to work well with real images. △ Less

Submitted 7 May, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

Journal ref: Pattern Recognition, Volume 105, 2020, 107369

arXiv:1611.02266 [pdf, other]

Gaussian Attention Model and Its Application to Knowledge Base Embedding and Question Answering

Authors: Liwen Zhang, John Winn, Ryota Tomioka

Abstract: We propose the Gaussian attention model for content-based neural memory access. With the proposed attention model, a neural network has the additional degree of freedom to control the focus of its attention from a laser sharp attention to a broad attention. It is applicable whenever we can assume that the distance in the latent space reflects some notion of semantics. We use the proposed attention… ▽ More We propose the Gaussian attention model for content-based neural memory access. With the proposed attention model, a neural network has the additional degree of freedom to control the focus of its attention from a laser sharp attention to a broad attention. It is applicable whenever we can assume that the distance in the latent space reflects some notion of semantics. We use the proposed attention model as a scoring function for the embedding of a knowledge base into a continuous vector space and then train a model that performs question answering about the entities in the knowledge base. The proposed attention model can handle both the propagation of uncertainty when following a series of relations and also the conjunction of conditions in a natural way. On a dataset of soccer players who participated in the FIFA World Cup 2014, we demonstrate that our model can handle both path queries and conjunctive queries well. △ Less

Submitted 30 November, 2016; v1 submitted 7 November, 2016; originally announced November 2016.

Comments: 16 pages, 4 figures

arXiv:1410.7452 [pdf, other]

Consensus Message Passing for Layered Graphical Models

Authors: Varun Jampani, S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, John Winn

Abstract: Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that… ▽ More Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models. △ Less

Submitted 26 January, 2015; v1 submitted 27 October, 2014; originally announced October 2014.

Comments: Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015

arXiv:1304.7605 [pdf]

Identifying Participants in the Personal Genome Project by Name (A Re-identification Experiment)

Authors: Latanya Sweeney, Akua Abu, Julia Winn

Abstract: We linked names and contact information to publicly available profiles in the Personal Genome Project. These profiles contain medical and genomic information, including details about medications, procedures and diseases, and demographic information, such as date of birth, gender, and postal code. By linking demographics to public records such as voter lists, and mining for names hidden in attached… ▽ More We linked names and contact information to publicly available profiles in the Personal Genome Project. These profiles contain medical and genomic information, including details about medications, procedures and diseases, and demographic information, such as date of birth, gender, and postal code. By linking demographics to public records such as voter lists, and mining for names hidden in attached documents, we correctly identified 84 to 97 percent of the profiles for which we provided names. Our ability to learn their names is based on their demographics, not their DNA, thereby revisiting an old vulnerability that could be easily thwarted with minimal loss of research value. So, we propose technical remedies for people to learn about their demographics to make better decisions. △ Less

Submitted 29 April, 2013; originally announced April 2013.

Comments: 4 pages

Report number: Harvard University, Data Privacy Lab 1021-1 ACM Class: K.4.1; K.6.5; J.3; H.1.2; H.2.0; H.2.7; H.3.5

arXiv:1107.3823 [pdf, ps, other]

Weakly Supervised Learning of Foreground-Background Segmentation using Masked RBMs

Authors: Nicolas Heess, Nicolas Le Roux, John Winn

Abstract: We propose an extension of the Restricted Boltzmann Machine (RBM) that allows the joint shape and appearance of foreground objects in cluttered images to be modeled independently of the background. We present a learning scheme that learns this representation directly from cluttered images with only very weak supervision. The model generates plausible samples and performs foreground-background segm… ▽ More We propose an extension of the Restricted Boltzmann Machine (RBM) that allows the joint shape and appearance of foreground objects in cluttered images to be modeled independently of the background. We present a learning scheme that learns this representation directly from cluttered images with only very weak supervision. The model generates plausible samples and performs foreground-background segmentation. We demonstrate that representing foreground objects independently of the background can be beneficial in recognition tasks. △ Less

Submitted 19 July, 2011; originally announced July 2011.

Journal ref: International Conference on Artificial Neural Networks (2011)

Showing 1–5 of 5 results for author: Winn, J