Search | arXiv e-print repository

SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection

Authors: Anay Majee, Ryan Sharp, Rishabh Iyer

Abstract: Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning (SMILe) framework which adopts combinatorial mutual information functions to enforce the creation of tighter and discriminative feature clusters in FSOD… ▽ More Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning (SMILe) framework which adopts combinatorial mutual information functions to enforce the creation of tighter and discriminative feature clusters in FSOD. Our proposed approach generalizes to several existing approaches in FSOD, agnostic of the backbone architecture demonstrating elevated performance gains. A paradigm shift from instance based objective functions to combinatorial objectives in SMILe naturally preserves the diversity within an object class resulting in reduced forgetting when subjected to few training examples. Furthermore, the application of mutual information between the already learnt (base) and newly added (novel) objects ensures sufficient separation between base and novel classes, minimizing the effect of class confusion. Experiments on popular FSOD benchmarks, PASCAL-VOC and MS-COCO show that our approach generalizes to State-of-the-Art (SoTA) approaches improving their novel class performance by up to 5.7% (3.3 mAP points) and 5.4% (2.6 mAP points) on the 10-shot setting of VOC (split 3) and 30-shot setting of COCO datasets respectively. Our experiments also demonstrate better retention of base class performance and up to 2x faster convergence over existing approaches agnostic of the underlying architecture. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024, 16 pages, 5 figures, 7 tables

arXiv:2211.17017 [pdf]

Integrating wind variability to modelling wind-ramp events using a non-binary ramp function and deep learning models

Authors: Russell Sharp, Hisham Ihshaish, J. Ignacio Deza

Abstract: The forecasting of large ramps in wind power output known as ramp events is crucial for the incorporation of large volumes of wind energy into national electricity grids. Large variations in wind power supply must be compensated by ancillary energy sources which can include the use of fossil fuels. Improved prediction of wind power will help to reduce dependency on supplemental energy sources alon… ▽ More The forecasting of large ramps in wind power output known as ramp events is crucial for the incorporation of large volumes of wind energy into national electricity grids. Large variations in wind power supply must be compensated by ancillary energy sources which can include the use of fossil fuels. Improved prediction of wind power will help to reduce dependency on supplemental energy sources along with their associated costs and emissions. In this paper, we discuss limitations of current predictive practices and explore the use of Machine Learning methods to enhance wind ramp event classification and prediction. We additionally outline a design for a novel approach to wind ramp prediction, in which high-resolution wind fields are incorporated to the modelling of wind power. △ Less

Submitted 31 August, 2022; originally announced November 2022.

Comments: International Conference for Sustainable Ecological Engineering Design for Society (SEEDS 2022)

arXiv:2208.12782 [pdf, other]

Mel Spectrogram Inversion with Stable Pitch

Authors: Bruno Di Giorgi, Mark Levy, Richard Sharp

Abstract: Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, t… ▽ More Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to the lack of horizontal phase coherence, which is often the result of using a time-domain target space with a model that is invariant to time-shifts, such as a convolutional neural network. We propose a new vocoder model that is specifically designed for music. Key to improving the pitch stability is the choice of a shift-invariant target space that consists of the magnitude spectrum and the phase gradient. We discuss the reasons that inspired us to re-formulate the vocoder task, outline a working example, and evaluate it on musical signals. Our method results in 60% and 10% improved reconstruction of sustained notes and chords with respect to existing models, using a novel harmonic error metric. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 7 pages, 5 figures, Proceedings of the 23st International Society for Music Information Retrieval Conference, ISMIR 2022

arXiv:2207.04158 [pdf, other]

A Survey of Task-Based Machine Learning Content Extraction Services for VIDINT

Authors: Joshua Brunk, Nathan Jermann, Ryan Sharp, Carl D. Hoover

Abstract: This paper provides a comparison of current video content extraction tools with a focus on comparing commercial task-based machine learning services. Video intelligence (VIDINT) data has become a critical intelligence source in the past decade. The need for AI-based analytics and automation tools to extract and structure content from video has quickly become a priority for organizations needing to… ▽ More This paper provides a comparison of current video content extraction tools with a focus on comparing commercial task-based machine learning services. Video intelligence (VIDINT) data has become a critical intelligence source in the past decade. The need for AI-based analytics and automation tools to extract and structure content from video has quickly become a priority for organizations needing to search, analyze and exploit video at scale. With rapid growth in machine learning technology, the maturity of machine transcription, machine translation, topic tagging, and object recognition tasks are improving at an exponential rate, breaking performance records in speed and accuracy as new applications evolve. Each section of this paper reviews and compares products, software resources and video analytics capabilities based on tasks relevant to extracting information from video with machine learning techniques. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2202.00475 [pdf, ps, other]

From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction

Authors: Robert Vacareanu, Marco A. Valenzuela-Escarcega, George C. G. Barbosa, Rebecca Sharp, Mihai Surdeanu

Abstract: While deep learning approaches to information extraction have had many successes, they can be difficult to augment or maintain as needs shift. Rule-based methods, on the other hand, can be more easily modified. However, crafting rules requires expertise in linguistics and the domain of interest, making it infeasible for most users. Here we attempt to combine the advantages of these two directions… ▽ More While deep learning approaches to information extraction have had many successes, they can be difficult to augment or maintain as needs shift. Rule-based methods, on the other hand, can be more easily modified. However, crafting rules requires expertise in linguistics and the domain of interest, making it infeasible for most users. Here we attempt to combine the advantages of these two directions while mitigating their drawbacks. We adapt recent advances from the adjacent field of program synthesis to information extraction, synthesizing rules from provided examples. We use a transformer-based architecture to guide an enumerative search, and show that this reduces the number of steps that need to be explored before a rule is found. Further, we show that without training the synthesis algorithm on the specific domain, our synthesized rules achieve state-of-the-art performance on the 1-shot scenario of a task that focuses on few-shot learning for relation classification, and competitive performance in the 5-shot scenario. △ Less

Submitted 16 January, 2022; originally announced February 2022.

arXiv:2001.07295 [pdf, other]

AutoMATES: Automated Model Assembly from Text, Equations, and Software

Authors: Adarsh Pyarelal, Marco A. Valenzuela-Escarcega, Rebecca Sharp, Paul D. Hein, Jon Stephens, Pratik Bhandari, HeuiChan Lim, Saumya Debray, Clayton T. Morrison

Abstract: Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scien… ▽ More Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scientific code and publications to facilitate the integration of computational models from different domains and allow for modeling large, complicated systems that span multiple domains and levels of abstraction. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: 8 pages, 6 figures, accepted to Modeling the World's Systems 2019

ACM Class: D.3.3; D.3.4; H.1.0; I.2.2; I.2.5; I.2.7; I.6.4; I.6.5

arXiv:1910.13872 [pdf, other]

The Game Performance Index for Mobile Phones

Authors: Hesham Dar, James Kwan, Yang Liu, Omiros Pantazis, Robert Sharp

Abstract: With the recent increase in the quantity of high fidelity games appearing on mobile devices and the recent trend of gaming focused mobile devices, there is a new requirement for a clear and comprehensive measure of the quality of gaming performance on the mobile device platform. This paper proposes a conceptual framework for a user-experience and user-perception based set of performance measures f… ▽ More With the recent increase in the quantity of high fidelity games appearing on mobile devices and the recent trend of gaming focused mobile devices, there is a new requirement for a clear and comprehensive measure of the quality of gaming performance on the mobile device platform. This paper proposes a conceptual framework for a user-experience and user-perception based set of performance measures for mobile devices. This paper presents a specific implementation and measurement use case which has been beneficial to Samsung Electronics when applied to our own product range, allowing us to better understand and quantify device performance. We believe that the methods outlined are potentially useful to the consumer, by providing an understandable public facing score for device performance to guide consumers with purchasing decisions. The methods may be useful to game developers and could better enable the developer to add new richer game features based on the performance of the device. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: 7 pages, 2 figures

arXiv:1909.09868 [pdf, other]

doi 10.18653/v1/D19-1340

On the Importance of Delexicalization for Fact Verification

Authors: Sandeep Suntwal, Mithun Paul, Rebecca Sharp, Mihai Surdeanu

Abstract: In this work we aim to understand and estimate the importance that a neural network assigns to various aspects of the data while learning and making predictions. Here we focus on the recognizing textual entailment (RTE) task and its application to fact verification. In this context, the contributions of this work are as follows. We investigate the attention weights a state of the art RTE method as… ▽ More In this work we aim to understand and estimate the importance that a neural network assigns to various aspects of the data while learning and making predictions. Here we focus on the recognizing textual entailment (RTE) task and its application to fact verification. In this context, the contributions of this work are as follows. We investigate the attention weights a state of the art RTE method assigns to input tokens in the RTE component of fact verification systems, and confirm that most of the weight is assigned to POS tags of nouns (e.g., NN, NNP etc.) or their phrases. To verify that these lexicalized models transfer poorly, we implement a domain transfer experiment where a RTE component is trained on the FEVER data, and tested on the Fake News Challenge (FNC) dataset. As expected, even though this method achieves high accuracy when evaluated in the same domain, the performance in the target domain is poor, marginally above chance.To mitigate this dependence on lexicalized information, we experiment with several strategies for masking out names by replacing them with their semantic category, coupled with a unique identifier to mark that the same or new entities are referenced between claim and evidence. The results show that, while the performance on the FEVER dataset remains at par with that of the model trained on lexicalized data, it improves significantly when tested in the FNC dataset. Thus our experiments demonstrate that our strategy is successful in mitigating the dependency on lexical information. △ Less

Submitted 23 April, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

Comments: published in the proceedings at EMNLP2019

arXiv:1807.01836 [pdf, other]

Sanity Check: A Strong Alignment and Information Retrieval Baseline for Question Answering

Authors: Vikas Yadav, Rebecca Sharp, Mihai Surdeanu

Abstract: While increasingly complex approaches to question answering (QA) have been proposed, the true gain of these systems, particularly with respect to their expensive training requirements, can be inflated when they are not compared to adequate baselines. Here we propose an unsupervised, simple, and fast alignment and information retrieval baseline that incorporates two novel contributions: a \textit{o… ▽ More While increasingly complex approaches to question answering (QA) have been proposed, the true gain of these systems, particularly with respect to their expensive training requirements, can be inflated when they are not compared to adequate baselines. Here we propose an unsupervised, simple, and fast alignment and information retrieval baseline that incorporates two novel contributions: a \textit{one-to-many alignment} between query and document terms and \textit{negative alignment} as a proxy for discriminative information. Our approach not only outperforms all conventional baselines as well as many supervised recurrent neural networks, but also approaches the state of the art for supervised systems on three QA datasets. With only three hyperparameters, we achieve 47\% P@1 on an 8th grade Science QA dataset, 32.9\% P@1 on a Yahoo! answers QA dataset and 64\% MAP on WikiQA. We also achieve 26.56\% and 58.36\% on ARC challenge and easy dataset respectively. In addition to including the additional ARC results in this version of the paper, for the ARC easy set only we also experimented with one additional parameter -- number of justifications retrieved. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: SIGIR 2018

arXiv:1609.08097 [pdf, other]

Creating Causal Embeddings for Question Answering with Minimal Supervision

Authors: Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond

Abstract: A common model for question answering (QA) is that a good answer is one that is closely related to the question, where relatedness is often determined using general-purpose lexical models such as word embeddings. We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined throu… ▽ More A common model for question answering (QA) is that a good answer is one that is closely related to the question, where relatedness is often determined using general-purpose lexical models such as word embeddings. We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings. With causality as a use case, we implement this insight in three steps. First, we generate causal embeddings cost-effectively by bootstrapping cause-effect pairs extracted from free text using a small set of seed patterns. Second, we train dedicated embeddings over this data, by using task-specific contexts, i.e., the context of a cause is its effect. Finally, we extend a state-of-the-art reranking approach for QA to incorporate these causal embeddings. We evaluate the causal embedding models both directly with a casual implication task, and indirectly, in a downstream causal QA task using data from Yahoo! Answers. We show that explicitly modeling causality improves performance in both tasks. In the QA task our best model achieves 37.3% P@1, significantly outperforming a strong baseline by 7.7% (relative). △ Less

Submitted 26 September, 2016; originally announced September 2016.

Comments: To appear in EMNLP 2016

Showing 1–10 of 10 results for author: Sharp, R