Search | arXiv e-print repository

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

Authors: Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly

Abstract: Controllable Image Captioning (CIC) aims at generating natural language descriptions for an image, conditioned on information provided by end users, e.g., regions, entities or events of interest. However, available image-language datasets mainly contain captions that describe the entirety of an image, making them ineffective for training CIC models that can potentially attend to any subset of regi… ▽ More Controllable Image Captioning (CIC) aims at generating natural language descriptions for an image, conditioned on information provided by end users, e.g., regions, entities or events of interest. However, available image-language datasets mainly contain captions that describe the entirety of an image, making them ineffective for training CIC models that can potentially attend to any subset of regions or relationships. To tackle this challenge, we propose a novel, fully automatic method to sample additional focused and visually grounded captions using a unified structured semantic representation built on top of the existing set of captions associated with an image. We leverage Abstract Meaning Representation (AMR), a cross-lingual graph-based semantic formalism, to encode all possible spatio-semantic relations between entities, beyond the typical spatial-relations-only focus of current methods. We use this Structured Semantic Augmentation (SSA) framework to augment existing image-caption datasets with the grounded controlled captions, increasing their spatial and semantic diversity and focal coverage. We then develop a new model, CIC-BART-SSA, specifically tailored for the CIC task, that sources its control signals from SSA-diversified datasets. We empirically show that, compared to SOTA CIC models, CIC-BART-SSA generates captions that are superior in diversity and text quality, are competitive in controllability, and, importantly, minimize the gap between broad and highly focused controlled captioning performance by efficiently generalizing to the challenging highly focused scenarios. Code is available at https://github.com/SamsungLabs/CIC-BART-SSA. △ Less

Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2210.14862 [pdf, other]

Visual Semantic Parsing: From Images to Abstract Meaning Representation

Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input. Moreover, they allow us to generate meta-AMR graphs to unify information contained in multiple image descriptions under one representation. Through extensive experimentation and analysis, we demonstrate that we can re-purpose an existing text-to-AMR parser to parse images into AMRs. Our findings point to important future research directions for improved scene understanding. △ Less

Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: published in CoNLL 2022

arXiv:2206.09265 [pdf, ps, other]

SAViR-T: Spatially Attentive Visual Reasoning with Transformers

Authors: Pritish Sahu, Kalliopi Basioti, Vladimir Pavlovic

Abstract: We present a novel computational model, "SAViR-T", for the family of visual reasoning problems embodied in the Raven's Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies, highly relevant for the visual reasoning task.… ▽ More We present a novel computational model, "SAViR-T", for the family of visual reasoning problems embodied in the Raven's Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies, highly relevant for the visual reasoning task. Token-wise relationship, modeled through a transformer-based SAViR-T architecture, extract group (row or column) driven representations by leveraging the group-rule coherence and use this as the inductive bias to extract the underlying rule representations in the top two row (or column) per token in the RPM. We use this relation representations to locate the correct choice image that completes the last row or column for the RPM. Extensive experiments across both synthetic RPM benchmarks, including RAVEN, I-RAVEN, RAVEN-FAIR, and PGM, and the natural image-based "V-PROM" demonstrate that SAViR-T sets a new state-of-the-art for visual reasoning, exceeding prior models' performance by a considerable margin. △ Less

Submitted 21 June, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

arXiv:2109.13156 [pdf, ps, other]

DAReN: A Collaborative Approach Towards Reasoning And Disentangling

Authors: Pritish Sahu, Kalliopi Basioti, Vladimir Pavlovic

Abstract: Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached… ▽ More Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached in a stage-wise manner (first representation, then reasoning). In this work, we propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. Specifically, we introduce a general generative graphical model for RPMs, GM-RPM, and apply it to solve the reasoning test. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM. We perform an empirical evaluation of DAReN over several benchmark datasets. DAReN shows consistent improvement over state-of-the-art (SOTA) models on both the reasoning and the disentanglement tasks. This demonstrates the strong correlation between disentangled latent representation and the ability to solve abstract visual reasoning tasks. △ Less

Submitted 29 June, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

arXiv:2007.07923 [pdf, other]

Image De-Quantization Using Generative Models as Priors

Authors: Kalliopi Basioti, George V. Moustakides

Abstract: Image quantization is used in several applications aiming in reducing the number of available colors in an image and therefore its size. De-quantization is the task of reversing the quantization effect and recovering the original multi-chromatic level image. Existing techniques achieve de-quantization by imposing suitable constraints on the ideal image in order to make the recovery problem feasibl… ▽ More Image quantization is used in several applications aiming in reducing the number of available colors in an image and therefore its size. De-quantization is the task of reversing the quantization effect and recovering the original multi-chromatic level image. Existing techniques achieve de-quantization by imposing suitable constraints on the ideal image in order to make the recovery problem feasible since it is otherwise ill-posed. Our goal in this work is to develop a de-quantization mechanism through a rigorous mathematical analysis which is based on the classical statistical estimation theory. In this effort we incorporate generative modeling of the ideal image as a suitable prior information. The resulting technique is simple and capable of de-quantizing successfully images that have experienced severe quantization effects. Interestingly, our method can recover images even if the quantization process is not exactly known and contains unknown parameters. △ Less

Submitted 17 July, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

arXiv:2005.14036 [pdf, other]

Image Restoration from Parametric Transformations using Generative Models

Authors: Kalliopi Basioti, George V. Moustakides

Abstract: When images are statistically described by a generative model we can use this information to develop optimum techniques for various image restoration problems as inpainting, super-resolution, image coloring, generative model inversion, etc. With the help of the generative model it is possible to formulate, in a natural way, these restoration problems as Statistical estimation problems. Our approac… ▽ More When images are statistically described by a generative model we can use this information to develop optimum techniques for various image restoration problems as inpainting, super-resolution, image coloring, generative model inversion, etc. With the help of the generative model it is possible to formulate, in a natural way, these restoration problems as Statistical estimation problems. Our approach, by combining maximum a-posteriori probability with maximum likelihood estimation, is capable of restoring images that are distorted by transformations even when the latter contain unknown parameters. The resulting optimization is completely defined with no parameters requiring tuning. This must be compared with the current state of the art which requires exact knowledge of the transformations and contains regularizer terms with weights that must be properly defined. Finally, we must mention that we extend our method to accommodate mixtures of multiple images where each image is described by its own generative model and we are able of successfully separating each participating image from a single mixture. △ Less

Submitted 16 June, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

arXiv:2002.00865 [pdf, other]

Designing GANs: A Likelihood Ratio Approach

Authors: Kalliopi Basioti, George V. Moustakides

Abstract: We are interested in the design of generative networks. The training of these mathematical structures is mostly performed with the help of adversarial (min-max) optimization problems. We propose a simple methodology for constructing such problems assuring, at the same time, consistency of the corresponding solution. We give characteristic examples developed by our method, some of which can be reco… ▽ More We are interested in the design of generative networks. The training of these mathematical structures is mostly performed with the help of adversarial (min-max) optimization problems. We propose a simple methodology for constructing such problems assuring, at the same time, consistency of the corresponding solution. We give characteristic examples developed by our method, some of which can be recognized from other applications, and some are introduced here for the first time. We present a new metric, the likelihood ratio, that can be employed online to examine the convergence and stability during the training of different Generative Adversarial Networks (GANs). Finally, we compare various possibilities by applying them to well-known datasets using neural networks of different configurations and sizes. △ Less

Submitted 15 July, 2021; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: Accepted to "The Joint International Conference on Neural Networks (IJCNN 2021)"

arXiv:1911.00405 [pdf, other]

Training Neural Networks for Likelihood/Density Ratio Estimation

Authors: George V. Moustakides, Kalliopi Basioti

Abstract: Various problems in Engineering and Statistics require the computation of the likelihood ratio function of two probability densities. In classical approaches the two densities are assumed known or to belong to some known parametric family. In a data-driven version we replace this requirement with the availability of data sampled from the densities of interest. For most well known problems in Detec… ▽ More Various problems in Engineering and Statistics require the computation of the likelihood ratio function of two probability densities. In classical approaches the two densities are assumed known or to belong to some known parametric family. In a data-driven version we replace this requirement with the availability of data sampled from the densities of interest. For most well known problems in Detection and Hypothesis testing we develop solutions by providing neural network based estimates of the likelihood ratio or its transformations. This task necessitates the definition of proper optimizations which can be used for the training of the network. The main purpose of this work is to offer a simple and unified methodology for defining such optimization problems with guarantees that the solution is indeed the desired function. Our results are extended to cover estimates for likelihood ratios of conditional densities and estimates for statistics encountered in local approaches. △ Less

Submitted 5 November, 2019; v1 submitted 1 November, 2019; originally announced November 2019.

arXiv:1905.10161 [pdf, other]

Optimizing Shallow Networks for Binary Classification

Authors: Kalliopi Basioti, George V. Moustakides

Abstract: Data driven classification that relies on neural networks is based on optimization criteria that involve some form of distance between the output of the network and the desired label. Using the same mathematical analysis, for a multitude of such measures, we can show that their optimum solution matches the ideal likelihood ratio test classifier. In this work we introduce a different family of opti… ▽ More Data driven classification that relies on neural networks is based on optimization criteria that involve some form of distance between the output of the network and the desired label. Using the same mathematical analysis, for a multitude of such measures, we can show that their optimum solution matches the ideal likelihood ratio test classifier. In this work we introduce a different family of optimization problems which is not covered by the existing approaches and, therefore, opens possibilities for new training algorithms for neural network based classification. We give examples that lead to algorithms that are simple in implementation, exhibit stable convergence characteristics and are antagonistic to the most popular existing techniques. △ Less

Submitted 23 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1811.09568 [pdf, other]

Kernel-Based Training of Generative Networks

Authors: Kalliopi Basioti, George V. Moustakides, Emmanouil Z. Psarakis

Abstract: Generative adversarial networks (GANs) are designed with the help of min-max optimization problems that are solved with stochastic gradient-type algorithms which are known to be non-robust. In this work we revisit a non-adversarial method based on kernels which relies on a pure minimization problem and propose a simple stochastic gradient algorithm for the computation of its solution. Using simpli… ▽ More Generative adversarial networks (GANs) are designed with the help of min-max optimization problems that are solved with stochastic gradient-type algorithms which are known to be non-robust. In this work we revisit a non-adversarial method based on kernels which relies on a pure minimization problem and propose a simple stochastic gradient algorithm for the computation of its solution. Using simplified tools from Stochastic Approximation theory we demonstrate that batch versions of the algorithm or smoothing of the gradient do not improve convergence. These observations allow for the development of a training algorithm that enjoys reduced computational complexity and increased robustness while exhibiting similar synthesis characteristics as classical GANs. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Showing 1–10 of 10 results for author: Basioti, K