Search | arXiv e-print repository

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Authors: Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs

Abstract: This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specificall… ▽ More This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2108.04090 [pdf, other]

Classification and Visualization of Genotype x Phenotype Interactions in Biomass Sorghum

Authors: Abby Stylianou, Robert Pless, Nadia Shakoor, Todd Mockler

Abstract: We introduce a simple approach to understanding the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. The pipeline involves training deep convolutional neural networks (CNNs) to differentiate between images of plants with reference and alternate versions of various SNPs, and then using visualization approaches to highlight what… ▽ More We introduce a simple approach to understanding the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. The pipeline involves training deep convolutional neural networks (CNNs) to differentiate between images of plants with reference and alternate versions of various SNPs, and then using visualization approaches to highlight what the classification networks key on. We demonstrate the capacity of deep CNNs at performing this classification task, and show the utility of these visualizations on RGB imagery of biomass sorghum captured by the TERRA-REF gantry. We focus on several different genetic markers with known phenotypic expression, and discuss the possibilities of using this approach to uncover genotype x phenotype relationships. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: ICCV 2021 Workshop on Computer Vision Problems in Plant Phenotyping and Agriculture (CVPPA)

arXiv:2107.14072 [pdf, other]

What Does TERRA-REF's High Resolution, Multi Sensor Plant Sensing Public Domain Data Offer the Computer Vision Community?

Authors: David LeBauer, Max Burnette, Noah Fahlgren, Rob Kooper, Kenton McHenry, Abby Stylianou

Abstract: A core objective of the TERRA-REF project was to generate an open-access reference dataset for the evaluation of sensing technologies to study plants under field conditions. The TERRA-REF program deployed a suite of high-resolution, cutting edge technology sensors on a gantry system with the aim of scanning 1 hectare (10$^4$) at around 1 mm$^2$ spatial resolution multiple times per week. The syste… ▽ More A core objective of the TERRA-REF project was to generate an open-access reference dataset for the evaluation of sensing technologies to study plants under field conditions. The TERRA-REF program deployed a suite of high-resolution, cutting edge technology sensors on a gantry system with the aim of scanning 1 hectare (10$^4$) at around 1 mm$^2$ spatial resolution multiple times per week. The system contains co-located sensors including a stereo-pair RGB camera, a thermal imager, a laser scanner to capture 3D structure, and two hyperspectral cameras covering wavelengths of 300-2500nm. This sensor data is provided alongside over sixty types of traditional plant phenotype measurements that can be used to train new machine learning models. Associated weather and environmental measurements, information about agronomic management and experimental design, and the genomic sequences of hundreds of plant varieties have been collected and are available alongside the sensor and plant phenotype data. Over the course of four years and ten growing seasons, the TERRA-REF system generated over 1 PB of sensor data and almost 45 million files. The subset that has been released to the public domain accounts for two seasons and about half of the total data volume. This provides an unprecedented opportunity for investigations far beyond the core biological scope of the project. The focus of this paper is to provide the Computer Vision and Machine Learning communities an overview of the available data and some potential applications of this one of a kind data. △ Less

Submitted 17 August, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

Comments: 7 pages, 4 figures, ICCV 2021 Workshop on Computer Vision Problems in Plant Phenotyping and Agriculture (CVPPA)

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 1409-1415

arXiv:2106.05748 [pdf, other]

Multi-resolution Outlier Pooling for Sorghum Classification

Authors: Chao Ren, Justin Dulay, Gregory Rolwes, Duke Pauli, Nadia Shakoor, Abby Stylianou

Abstract: Automated high throughput plant phenotyping involves leveraging sensors, such as RGB, thermal and hyperspectral cameras (among others), to make large scale and rapid measurements of the physical properties of plants for the purpose of better understanding the difference between crops and facilitating rapid plant breeding programs. One of the most basic phenotyping tasks is to determine the cultiva… ▽ More Automated high throughput plant phenotyping involves leveraging sensors, such as RGB, thermal and hyperspectral cameras (among others), to make large scale and rapid measurements of the physical properties of plants for the purpose of better understanding the difference between crops and facilitating rapid plant breeding programs. One of the most basic phenotyping tasks is to determine the cultivar, or species, in a particular sensor product. This simple phenotype can be used to detect errors in planting and to learn the most differentiating features between cultivars. It is also a challenging visual recognition task, as a large number of highly related crops are grown simultaneously, leading to a classification problem with low inter-class variance. In this paper, we introduce the Sorghum-100 dataset, a large dataset of RGB imagery of sorghum captured by a state-of-the-art gantry system, a multi-resolution network architecture that learns both global and fine-grained features on the crops, and a new global pooling strategy called Dynamic Outlier Pooling which outperforms standard global pooling strategies on this task. △ Less

Submitted 22 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: CVPR 2021 Agriculture-Vision Workshop

arXiv:2106.05746 [pdf, other]

The 2021 Hotel-ID to Combat Human Trafficking Competition Dataset

Authors: Rashmi Kamath, Gregory Rolwes, Samuel Black, Abby Stylianou

Abstract: Hotel recognition is an important task for human trafficking investigations since victims are often photographed in hotel rooms. Identifying these hotels is vital to trafficking investigations since they can help track down current and future victims who might be taken to the same places. Hotel recognition is a challenging fine grained visual classification task as there can be little similarity b… ▽ More Hotel recognition is an important task for human trafficking investigations since victims are often photographed in hotel rooms. Identifying these hotels is vital to trafficking investigations since they can help track down current and future victims who might be taken to the same places. Hotel recognition is a challenging fine grained visual classification task as there can be little similarity between different rooms within the same hotel, and high similarity between rooms from different hotels (especially if they are from the same chain). Hotel recognition to combat human trafficking poses additional challenges as investigative images are often low quality, contain uncommon camera angles and are highly occluded. Here, we present the 2021 Hotel-ID dataset to help raise awareness for this problem and generate novel approaches. The dataset consists of hotel room images that have been crowd-sourced and uploaded through the TraffickCam mobile application. The quality of these images is similar to investigative images and hence models trained on these images have good chances of accurately narrowing down on the correct hotel. △ Less

Submitted 14 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: CVPR 2021 Workshop on Fine-Grained Visual Categorization (FGVC)

arXiv:2007.12749 [pdf, other]

Hard negative examples are hard, but useful

Authors: Hong Xuan, Abby Stylianou, Xiaotong Liu, Robert Pless

Abstract: Triplet loss is an extremely common approach to distance metric learning. Representations of images from the same class are optimized to be mapped closer together in an embedding space than representations of images from different classes. Much work on triplet losses focuses on selecting the most useful triplets of images to consider, with strategies that select dissimilar examples from the same c… ▽ More Triplet loss is an extremely common approach to distance metric learning. Representations of images from the same class are optimized to be mapped closer together in an embedding space than representations of images from different classes. Much work on triplet losses focuses on selecting the most useful triplets of images to consider, with strategies that select dissimilar examples from the same class or similar examples from different classes. The consensus of previous research is that optimizing with the \textit{hardest} negative examples leads to bad training behavior. That's a problem -- these hardest negatives are literally the cases where the distance metric fails to capture semantic similarity. In this paper, we characterize the space of triplets and derive why hard negatives make triplet loss training fail. We offer a simple fix to the loss function and show that, with this fix, optimizing with hard negative examples becomes feasible. This leads to more generalizable features, and image retrieval results that outperform state of the art for datasets with high intra-class variance. △ Less

Submitted 25 February, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: CV, Triplet loss, Image embedding, 14 pages, 9 figures, ECCV 2020

arXiv:1910.03455 [pdf, other]

TraffickCam: Explainable Image Matching For Sex Trafficking Investigations

Authors: Abby Stylianou, Richard Souvenir, Robert Pless

Abstract: Investigations of sex trafficking sometimes have access to photographs of victims in hotel rooms. These images directly link victims to places, which can help verify where victims have been trafficked or where traffickers might operate in the future. Current machine learning approaches give promising results in image search to find the matching hotel. This paper explores approaches to make this en… ▽ More Investigations of sex trafficking sometimes have access to photographs of victims in hotel rooms. These images directly link victims to places, which can help verify where victims have been trafficked or where traffickers might operate in the future. Current machine learning approaches give promising results in image search to find the matching hotel. This paper explores approaches to make this end-to-end system better support government and law enforcement requirements, including improved performance, visualization approaches that explain what parts of the image led to a match, and infrastructure to support exporting the results of a query. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: Presented at AAAI FSS-19: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA

arXiv:1909.07464 [pdf, other]

Visualizing How Embeddings Generalize

Authors: Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou, Robert Pless

Abstract: Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.… ▽ More Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.We introduce visualization tools to illustrate how an embedding generalizes beyond measuring accuracy on validation data, and we illustrate the behavior of a range of triplet selection strategies. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: 8 pages,4 figures, published in ICML workshop:Understanding and Improving Generalization in Deep Learning

arXiv:1904.04370 [pdf, other]

Improved Embeddings with Easy Positive Triplet Mining

Authors: Hong Xuan, Abby Stylianou, Robert Pless

Abstract: Deep metric learning seeks to define an embedding where semantically similar images are embedded to nearby locations, and semantically dissimilar images are embedded to distant locations. Substantial work has focused on loss functions and strategies to learn these embeddings by pushing images from the same class as close together in the embedding space as possible. In this paper, we propose an alt… ▽ More Deep metric learning seeks to define an embedding where semantically similar images are embedded to nearby locations, and semantically dissimilar images are embedded to distant locations. Substantial work has focused on loss functions and strategies to learn these embeddings by pushing images from the same class as close together in the embedding space as possible. In this paper, we propose an alternative, loosened embedding strategy that requires the embedding function only map each training image to the most similar examples from the same class, an approach we call "Easy Positive" mining. We provide a collection of experiments and visualizations that highlight that this Easy Positive mining leads to embeddings that are more flexible and generalize better to new unseen data. This simple mining strategy yields recall performance that exceeds state of the art approaches (including those with complicated loss functions and ensemble methods) on image retrieval datasets including CUB, Stanford Online Products, In-Shop Clothes and Hotels-50K. △ Less

Submitted 18 March, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

arXiv:1901.11397 [pdf, other]

Hotels-50K: A Global Hotel Recognition Dataset

Authors: Abby Stylianou, Hong Xuan, Maya Shende, Jonathan Brandt, Richard Souvenir, Robert Pless

Abstract: Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often th… ▽ More Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often the victim), and the similarity of objects (e.g., furniture, art, bedding) across different hotel rooms. To support efforts towards this hotel recognition task, we have curated a dataset of over 1 million annotated hotel room images from 50,000 hotels. These images include professionally captured photographs from travel websites and crowd-sourced images from a mobile application, which are more similar to the types of images analyzed in real-world investigations. We present a baseline approach based on a standard network architecture and a collection of data-augmentation approaches tuned to this problem domain. △ Less

Submitted 26 January, 2019; originally announced January 2019.

arXiv:1901.00536 [pdf, other]

Visualizing Deep Similarity Networks

Authors: Abby Stylianou, Richard Souvenir, Robert Pless

Abstract: For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fi… ▽ More For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fine-tuned learn to focus on different features. We also generalize our approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image. △ Less

Submitted 2 January, 2019; originally announced January 2019.

Showing 1–11 of 11 results for author: Stylianou, A