Search | arXiv e-print repository

arXiv:2404.19729 [pdf]

A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications

Authors: Steph Buongiorno, Corey Clark

Abstract: External knowledge graphs (KGs) can be used to augment large language models (LLMs), while simultaneously providing an explainable knowledge base of facts that can be inspected by a human. This approach may be particularly valuable in domains where explainability is critical, like human trafficking data analysis. However, creating KGs can pose challenges. KGs parsed from documents may comprise exp… ▽ More External knowledge graphs (KGs) can be used to augment large language models (LLMs), while simultaneously providing an explainable knowledge base of facts that can be inspected by a human. This approach may be particularly valuable in domains where explainability is critical, like human trafficking data analysis. However, creating KGs can pose challenges. KGs parsed from documents may comprise explicit connections (those directly stated by a document) but miss implicit connections (those obvious to a human although not directly stated). To address these challenges, this preliminary research introduces the GAME-KG framework, standing for "Gaming for Augmenting Metadata and Enhancing Knowledge Graphs." GAME-KG is a federated approach to modifying explicit as well as implicit connections in KGs by using crowdsourced feedback collected through video games. GAME-KG is shown through two demonstrations: a Unity test scenario from Dark Shadows, a video game that collects feedback on KGs parsed from US Department of Justice (DOJ) Press Releases on human trafficking, and a following experiment where OpenAI's GPT-4 is prompted to answer questions based on a modified and unmodified KG. Initial results suggest that GAME-KG can be an effective framework for enhancing KGs, while simultaneously providing an explainable set of structured facts verified by humans. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19721 [pdf]

PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games

Authors: Steph Buongiorno, Lawrence Jake Klinkert, Tanishq Chawla, Zixin Zhuang, Corey Clark

Abstract: This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level… ▽ More This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input. △ Less

Submitted 9 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2402.14879 [pdf, other]

Driving Generative Agents With Their Personality

Authors: Lawrence J. Klinkert, Stephanie Buongiorno, Corey Clark

Abstract: This research explores the potential of Large Language Models (LLMs) to utilize psychometric values, specifically personality information, within the context of video game character development. Affective Computing (AC) systems quantify a Non-Player character's (NPC) psyche, and an LLM can take advantage of the system's information by using the values for prompt generation. The research shows an L… ▽ More This research explores the potential of Large Language Models (LLMs) to utilize psychometric values, specifically personality information, within the context of video game character development. Affective Computing (AC) systems quantify a Non-Player character's (NPC) psyche, and an LLM can take advantage of the system's information by using the values for prompt generation. The research shows an LLM can consistently represent a given personality profile, thereby enhancing the human-like characteristics of game characters. Repurposing a human examination, the International Personality Item Pool (IPIP) questionnaire, to evaluate an LLM shows that the model can accurately generate content concerning the personality provided. Results show that the improvement of LLM, such as the latest GPT-4 model, can consistently utilize and interpret a personality to represent behavior. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 9 Pages, 4 figures, Draft

arXiv:2312.17172 [pdf, other]

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Authors: Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi

Abstract: We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs -- images, text, audio, action, bounding boxes, etc., into a shared semantic space and then process them with a single encoder-decoder transformer model. Since training with such diverse moda… ▽ More We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs -- images, text, audio, action, bounding boxes, etc., into a shared semantic space and then process them with a single encoder-decoder transformer model. Since training with such diverse modalities is challenging, we propose various architectural improvements to stabilize model training. We train our model from scratch on a large multimodal pre-training corpus from diverse sources with a multimodal mixture of denoisers objective. To learn an expansive set of skills, such as following multimodal instructions, we construct and finetune on an ensemble of 120 datasets with prompts and augmentations. With a single unified model, Unified-IO 2 achieves state-of-the-art performance on the GRIT benchmark and strong results in more than 35 benchmarks, including image generation and understanding, natural language understanding, video and audio understanding, and robotic manipulation. We release all our models to the research community. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 38 pages, 20 figures

arXiv:2312.09067 [pdf, other]

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Authors: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

Abstract: 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs… ▽ More 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (i.e., GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents. △ Less

Submitted 22 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Published in CVPR 2024, 21 pages, 27 figures, 2 tables

arXiv:2311.03026 [pdf, other]

Detecting Agreement in Multi-party Conversational AI

Authors: Laura Schauer, Jason Sweeney, Charlie Lyttle, Zein Said, Aron Szeles, Cale Clark, Katie McAskill, Xander Wickham, Tom Byars, Daniel Hernández Garcia, Nancie Gunson, Angus Addlesee, Oliver Lemon

Abstract: Today, conversational systems are expected to handle conversations in multi-party settings, especially within Socially Assistive Robots (SARs). However, practical usability remains difficult as there are additional challenges to overcome, such as speaker recognition, addressee recognition, and complex turn-taking. In this paper, we present our work on a multi-party conversational system, which inv… ▽ More Today, conversational systems are expected to handle conversations in multi-party settings, especially within Socially Assistive Robots (SARs). However, practical usability remains difficult as there are additional challenges to overcome, such as speaker recognition, addressee recognition, and complex turn-taking. In this paper, we present our work on a multi-party conversational system, which invites two users to play a trivia quiz game. The system detects users' agreement or disagreement on a final answer and responds accordingly. Our evaluation includes both performance and user assessment results, with a focus on detecting user agreement. Our annotated transcripts and the code for the proposed system have been released open-source on GitHub. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Proceedings of the workshop on advancing GROup UNderstanding and robots aDaptive behaviour (GROUND), 2023

arXiv:2310.16941 [pdf, other]

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

Authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown

Abstract: We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this… ▽ More We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 9 figures, To be published in Proceedings IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2023)

arXiv:2310.13140 [pdf, other]

Blind Evaluation Framework for Fully Homomorphic Encryption and Privacy-Preserving Machine Learning

Authors: Hunjae "Timothy" Lee, Corey Clark

Abstract: In the domain of Privacy-Preserving Machine Learning (PPML), Fully Homomorphic Encryption (FHE) is often used for encrypted computation to allow secure and privacy-preserving outsourcing of machine learning modeling. While FHE enables encrypted arithmetic operations, execution of programmatic logic such as control structures or conditional programming have remained a challenge. As a result, progre… ▽ More In the domain of Privacy-Preserving Machine Learning (PPML), Fully Homomorphic Encryption (FHE) is often used for encrypted computation to allow secure and privacy-preserving outsourcing of machine learning modeling. While FHE enables encrypted arithmetic operations, execution of programmatic logic such as control structures or conditional programming have remained a challenge. As a result, progress in encrypted training of PPML with FHE has been relatively stagnant compared to encrypted inference owing to the considerably higher logical complexity required in training. In addition, prior works that have demonstrated encrypted training use Interactive Rounds of Decryption and Evaluation (IRDE), where certain operations are decrypted and evaluated in plaintext using interactive rounds between the untrusted computing party (server) and the trusted private-key owner (client). In decision tree training for example, the current state-of-the-art requires d-rounds of IRDE for tree-depth of d. To address this issue in PPML and FHE, we introduce the Blind Evaluation Framework (BEF), a cryptographically secure programming framework that enables blind, but correct, execution of programming logic without IRDE. This is achieved by deconstructing programming logic into binary circuits and binary arithmetic to find alternative representations of logical statements, and adopting them to FHE for secure logical programming. To the best of our knowledge, this is the first framework to enable both training and inference of PPML models with FHE without decryption rounds. By advancing the state-of-the-art in IRDE efficiency by eliminating IRDE entirely, BEF enables adoption of FHE in use-cases where large amounts of computing services are available without the ability to have trusted clients available to perform decryption rounds. △ Less

Submitted 27 August, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: changes made from previous version include change in the name of the title and restructuring/re-organization of the original paper

arXiv:2306.04791 [pdf, other]

XInsight: Revealing Model Insights for GNNs with Flow-based Explanations

Authors: Eli Laird, Ayesh Madushanka, Elfi Kraka, Corey Clark

Abstract: Progress in graph neural networks has grown rapidly in recent years, with many new developments in drug discovery, medical diagnosis, and recommender systems. While this progress is significant, many networks are `black boxes' with little understanding of the `what' exactly the network is learning. Many high-stakes applications, such as drug discovery, require human-intelligible explanations from… ▽ More Progress in graph neural networks has grown rapidly in recent years, with many new developments in drug discovery, medical diagnosis, and recommender systems. While this progress is significant, many networks are `black boxes' with little understanding of the `what' exactly the network is learning. Many high-stakes applications, such as drug discovery, require human-intelligible explanations from the models so that users can recognize errors and discover new knowledge. Therefore, the development of explainable AI algorithms is essential for us to reap the benefits of AI. We propose an explainability algorithm for GNNs called eXplainable Insight (XInsight) that generates a distribution of model explanations using GFlowNets. Since GFlowNets generate objects with probabilities proportional to a reward, XInsight can generate a diverse set of explanations, compared to previous methods that only learn the maximum reward sample. We demonstrate XInsight by generating explanations for GNNs trained on two graph classification tasks: classifying mutagenic compounds with the MUTAG dataset and classifying acyclic graphs with a synthetic dataset that we have open-sourced. We show the utility of XInsight's explanations by analyzing the generated compounds using QSAR modeling, and we find that XInsight generates compounds that cluster by lipophilicity, a known correlate of mutagenicity. Our results show that XInsight generates a distribution of explanations that uncovers the underlying relationships demonstrated by the model. They also highlight the importance of generating a diverse set of explanations, as it enables us to discover hidden relationships in the model and provides valuable guidance for further analysis. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: eXplainable Artificial Intelligence. 1st World Conference on eXplainable Artificial Intelligence, xAI-2023, Lisbon, Portugal

arXiv:2303.16133 [pdf, other]

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Authors: Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi

Abstract: As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that migh… ▽ More As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, CocoCon, where we create contrast sets by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. To alleviate this issue, we propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets, that improves the multi-task consistency of large unified models while retaining their original accuracy on downstream tasks. △ Less

Submitted 21 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: TMLR 2024; Project Website: https://adymaharana.github.io/cococon/

arXiv:2211.09887 [pdf, other]

Spherical convolutional neural networks can improve brain microstructure estimation from diffusion MRI data

Authors: Leevi Kerkelä, Kiran Seunarine, Filip Szczepankiewicz, Chris A. Clark

Abstract: Diffusion magnetic resonance imaging is sensitive to the microstructural properties of brain tissue. However, estimating clinically and scientifically relevant microstructural properties from the measured signals remains a highly challenging inverse problem that machine learning may help solve. This study investigated if recently developed rotationally invariant spherical convolutional neural netw… ▽ More Diffusion magnetic resonance imaging is sensitive to the microstructural properties of brain tissue. However, estimating clinically and scientifically relevant microstructural properties from the measured signals remains a highly challenging inverse problem that machine learning may help solve. This study investigated if recently developed rotationally invariant spherical convolutional neural networks can improve microstructural parameter estimation. We trained a spherical convolutional neural network to predict the ground-truth parameter values from efficiently simulated noisy data and applied the trained network to imaging data acquired in a clinical setting to generate microstructural parameter maps. Our network performed better than the spherical mean technique and multi-layer perceptron, achieving higher prediction accuracy than the spherical mean technique with less rotational variance than the multi-layer perceptron. Although we focused on a constrained two-compartment model of neuronal tissue, the network and training pipeline are generalizable and can be used to estimate the parameters of any Gaussian compartment model. To highlight this, we also trained the network to predict the parameters of a three-compartment model that enables the estimation of apparent neural soma density using tensor-valued diffusion encoding. △ Less

Submitted 26 February, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.09778 [pdf, other]

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

Authors: Sophia Gu, Christopher Clark, Aniruddha Kembhavi

Abstract: Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. Ke… ▽ More Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. Key to our approach is exploiting the joint embedding space of contrastively trained vision and language encoders. In practice, there can be systematic differences between embedding spaces for different modalities in contrastive models, and we analyze how these differences affect our approach and study strategies to mitigate this concern. We produce models using only text training data on four representative tasks: image captioning, visual entailment, visual question answering and visual news captioning, and evaluate them on standard benchmarks using images. We find these models perform close to models trained on images, while surpassing prior work for captioning and visual entailment in this text-only setting by over 9 points, and outperforming all prior work on visual news by over 30 points. We also showcase a variety of stylistic image captioning models that are trained using no image data and no human-curated language data, but instead using readily-available text data from books, the web, or language models. △ Less

Submitted 18 August, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: website (https://prior.allenai.org/projects/close), code (https://github.com/allenai/close)

arXiv:2206.08916 [pdf, other]

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Authors: Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

Abstract: We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for suc… ▽ More We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for such a large variety of tasks poses unique challenges due to the heterogeneous inputs and outputs pertaining to each task, including RGB images, per-pixel maps, binary masks, bounding boxes, and language. We achieve this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. This common representation across all tasks allows us to train a single transformer-based architecture, jointly on over 90 diverse datasets in the vision and language fields. Unified-IO is the first model capable of performing all 7 tasks on the GRIT benchmark and produces strong results across 16 diverse benchmarks like NYUv2-Depth, ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with no task-specific fine-tuning. Code and demos for Unified-IO are available at: https://unified-io.allenai.org. △ Less

Submitted 4 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.01718 [pdf, other]

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Authors: Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi

Abstract: The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, lit… ▽ More The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, little world knowledge needed outside of the paired image, and limited reasoning required to arrive at the correct answer. We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image. We demonstrate the potential of this new dataset through a detailed analysis of its contents and baseline performance measurements over a variety of state-of-the-art vision-language models. Project page: http://a-okvqa.allenai.org/ △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2203.11447 [pdf, other]

Manipulating UAV Imagery for Satellite Model Training, Calibration and Testing

Authors: Jasper Brown, Cameron Clark, Sabrina Lomax, Khalid Rafique, Salah Sukkarieh

Abstract: Modern livestock farming is increasingly data driven and frequently relies on efficient remote sensing to gather data over wide areas. High resolution satellite imagery is one such data source, which is becoming more accessible for farmers as coverage increases and cost falls. Such images can be used to detect and track animals, monitor pasture changes, and understand land use. Many of the data dr… ▽ More Modern livestock farming is increasingly data driven and frequently relies on efficient remote sensing to gather data over wide areas. High resolution satellite imagery is one such data source, which is becoming more accessible for farmers as coverage increases and cost falls. Such images can be used to detect and track animals, monitor pasture changes, and understand land use. Many of the data driven models being applied to these tasks require ground truthing at resolutions higher than satellites can provide. Simultaneously, there is a lack of available aerial imagery focused on farmland changes that occur over days or weeks, such as herd movement. With this goal in mind, we present a new multi-temporal dataset of high resolution UAV imagery which is artificially degraded to match satellite data quality. An empirical blurring metric is used to calibrate the degradation process against actual satellite imagery of the area. UAV surveys were flown repeatedly over several weeks, for specific farm locations. This 5cm/pixel data is sufficiently high resolution to accurately ground truth cattle locations, and other factors such as grass cover. From 33 wide area UAV surveys, 1869 patches were extracted and artificially degraded using an accurate satellite optical model to simulate satellite data. Geographic patches from multiple time periods are aligned and presented as sets, providing a multi-temporal dataset that can be used for detecting changes on farms. The geo-referenced images and 27,853 manually annotated cattle labels are made publicly available. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 16 pages, 7 figures, 2 tables

arXiv:2202.02317 [pdf, other]

Webly Supervised Concept Expansion for General Purpose Vision Models

Authors: Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

Abstract: General Purpose Vision (GPV) systems are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective a… ▽ More General Purpose Vision (GPV) systems are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills. We use a dataset of 1M+ images spanning 10k+ visual concepts to demonstrate webly-supervised concept expansion for two existing GPVs (GPV-1 and VL-T5) on 3 benchmarks: 5 COCO-based datasets (80 primary concepts), a newly curated series of 5 datasets based on the OpenImages and VisualGenome repositories (~500 concepts), and the Web-derived dataset (10k+ concepts). We also propose a new architecture, GPV-2 that supports a variety of tasks -- from vision tasks like classification and localization to vision+language tasks like QA and captioning, to more niche ones like human-object interaction detection. GPV-2 benefits hugely from web data and outperforms GPV-1 and VL-T5 across these benchmarks. Our data, code, and web demo are available at https://prior.allenai.org/projects/gpv2. △ Less

Submitted 20 July, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: ECCV 2022

arXiv:2112.00800 [pdf, other]

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

Authors: Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali Farhadi

Abstract: Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challeng… ▽ More Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training. Elite human players outperform our models, particularly at the drawing task, leaving an important gap for future research to address. We release our dataset, code, and evaluation setup as a challenge to the community at http://www.github.com/allenai/iconary. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: In EMNLP 2021

arXiv:2110.01329 [pdf, other]

doi 10.1016/j.compag.2022.106689

Automated Aerial Animal Detection When Spatial Resolution Conditions Are Varied

Authors: Jasper Brown, Yongliang Qiao, Cameron Clark, Sabrina Lomax, Khalid Rafique, Salah Sukkarieh

Abstract: Knowing where livestock are located enables optimized management and mustering. However, Australian farms are large meaning that many of Australia's livestock are unmonitored which impacts farm profit, animal welfare and the environment. Effective animal localisation and counting by analysing satellite imagery overcomes this management hurdle however, high resolution satellite imagery is expensive… ▽ More Knowing where livestock are located enables optimized management and mustering. However, Australian farms are large meaning that many of Australia's livestock are unmonitored which impacts farm profit, animal welfare and the environment. Effective animal localisation and counting by analysing satellite imagery overcomes this management hurdle however, high resolution satellite imagery is expensive. Thus, to minimise cost the lowest spatial resolution data that enables accurate livestock detection should be selected. In our work, we determine the association between object detector performance and spatial degradation for cattle, sheep and dogs. Accurate ground truth was established using high resolution drone images which were then downsampled to various ground sample distances (GSDs). Both circular and cassegrain aperture optics were simulated to generate point spread functions (PSFs) corresponding to various optical qualities. By simulating the PSF, rather than approximating it as a Gaussian, the images were accurately degraded to match the spatial resolution and blurring structure of satellite imagery. Two existing datasets were combined and used to train and test a YoloV5 object detection network. Detector performance was found to drop steeply around a GSD of 0.5m/px and was associated with PSF matrix structure within this GSD region. Detector mAP performance fell by 52 percent when a cassegrain, rather than circular, aperture was used at a 0.5m/px GSD. Overall blurring magnitude also had a small impact when matched to GSD, as did the internal network resolution. Our results here inform the selection of remote sensing data requirements for animal detection tasks, allowing farmers and ecologists to use more accessible medium resolution imagery with confidence. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 20 pages, 9 figures, 4 tables in appendix

Journal ref: Computers and Electronics in Agriculture, Volume 193, 2022

arXiv:2102.04958 [pdf]

Hallmarks of Human-Machine Collaboration: A framework for assessment in the DARPA Communicating with Computers Program

Authors: Robyn Kozierok, John Aberdeen, Cheryl Clark, Christopher Garay, Bradley Goodman, Tonia Korves, Lynette Hirschman, Patricia L. McDermott, Matthew W. Peterson

Abstract: There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer. This framework ha… ▽ More There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer. This framework has been used to evaluate human-machine creative collaborations across story and music generation, interactive block building, and exploration of molecular mechanisms in cancer. These activities are fundamentally different from the more constrained tasks performed by most contemporary personal assistants as they are generally open-ended, with no single correct solution, and often no obvious completion criteria. We identified the Key Properties that must be exhibited by successful systems. From there we identified "Hallmarks" of success -- capabilities and features that evaluators can observe that would be indicative of progress toward achieving a Key Property. In addition to being a framework for assessment, the Key Properties and Hallmarks are intended to serve as goals in guiding research direction. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 20 pages, 21 figures

Report number: MITRE Document Number: MTR210002 ACM Class: I.2.11; I.2.7; H.5.2

arXiv:2011.03856 [pdf, other]

Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

Authors: Christopher Clark, Mark Yatskar, Luke Zettlemoyer

Abstract: Many datasets have been shown to contain incidental correlations created by idiosyncrasies in the data collection process. For example, sentence entailment datasets can have spurious word-class correlations if nearly all contradiction sentences contain the word "not", and image recognition datasets can have tell-tale object-background correlations if dogs are always indoors. In this paper, we prop… ▽ More Many datasets have been shown to contain incidental correlations created by idiosyncrasies in the data collection process. For example, sentence entailment datasets can have spurious word-class correlations if nearly all contradiction sentences contain the word "not", and image recognition datasets can have tell-tale object-background correlations if dogs are always indoors. In this paper, we propose a method that can automatically detect and ignore these kinds of dataset-specific patterns, which we call dataset biases. Our method trains a lower capacity model in an ensemble with a higher capacity model. During training, the lower capacity model learns to capture relatively shallow correlations, which we hypothesize are likely to reflect dataset bias. This frees the higher capacity model to focus on patterns that should generalize better. We ensure the models learn non-overlapping approaches by introducing a novel method to make them conditionally independent. Importantly, our approach does not require the bias to be known in advance. We evaluate performance on synthetic datasets, and four datasets built to penalize models that exploit known biases on textual entailment, visual question answering, and image recognition tasks. We show improvement in all settings, including a 10 point gain on the visual question answering dataset. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: In EMNLP Findings

arXiv:1912.09503 [pdf, other]

Multi-Robot Path Planning Via Genetic Programming

Authors: Alexandre Trudeau, Christopher M. Clark

Abstract: This paper presents a Genetic Programming (GP) approach to solving multi-robot path planning (MRPP) problems in single-lane workspaces, specifically those easily mapped to graph representations. GP's versatility enables this approach to produce programs optimizing for multiple attributes rather than a single attribute such as path length or completeness. When optimizing for the number of time step… ▽ More This paper presents a Genetic Programming (GP) approach to solving multi-robot path planning (MRPP) problems in single-lane workspaces, specifically those easily mapped to graph representations. GP's versatility enables this approach to produce programs optimizing for multiple attributes rather than a single attribute such as path length or completeness. When optimizing for the number of time steps needed to solve individual MRPP problems, the GP constructed programs outperformed complete MRPP algorithms, i.e. Push-Swap-Wait (PSW), by $54.1\%$. The GP constructed programs also consistently outperformed PSW in solving problems that did not meet PSW's completeness conditions. Furthermore, the GP constructed programs exhibited a greater capacity for scaling than PSW as the number of robots navigating within an MRPP environment increased. This research illustrates the benefits of using Genetic Programming for solving individual MRPP problems, including instances in which the number of robots exceeds the number of leaves in the tree-modeled workspace. △ Less

Submitted 19 December, 2019; originally announced December 2019.

Comments: ARMS 2019 Workshop (AAMAS)

arXiv:1911.00417 [pdf, other]

doi 10.33682/ts6e-sn53

Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

Authors: Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

Abstract: This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalize… ▽ More This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 5 pages, 3 figures. Presented at the 3rd International Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 25--26 October 2019, New York, NY, USA

arXiv:1909.03683 [pdf, other]

Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

Authors: Christopher Clark, Mark Yatskar, Luke Zettlemoyer

Abstract: State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, w… ▽ More State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift. Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Experiments on five datasets with out-of-domain test sets show significantly improved robustness in all settings, including a 12 point gain on a changing priors visual question answering dataset and a 9 point gain on an adversarial question answering test set. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: In EMNLP 2019

arXiv:1908.09183 [pdf, other]

Deriving a Quantitative Relationship Between Resolution and Human Classification Error

Authors: Josiah I. Clark, Caroline A. Clark

Abstract: For machine learning perception problems, human-level classification performance is used as an estimate of top algorithm performance. Thus, it is important to understand as precisely as possible the factors that impact human-level performance. Knowing this 1) provides a benchmark for model performance, 2) tells a project manager what type of data to obtain for human labelers in order to get accura… ▽ More For machine learning perception problems, human-level classification performance is used as an estimate of top algorithm performance. Thus, it is important to understand as precisely as possible the factors that impact human-level performance. Knowing this 1) provides a benchmark for model performance, 2) tells a project manager what type of data to obtain for human labelers in order to get accurate labels, and 3) enables ground-truth analysis--largely conducted by humans--to be carried out smoothly. In this empirical study, we explored the relationship between resolution and human classification performance using the MNIST data set down-sampled to various resolutions. The quantitative heuristic we derived could prove useful for predicting machine model performance, predicting data storage requirements, and saving valuable resources in the deployment of machine learning projects. It also has the potential to be used in a wide variety of fields such as remote sensing, medical imaging, scientific imaging, and astronomy. △ Less

Submitted 24 August, 2019; originally announced August 2019.

arXiv:1905.10044 [pdf, ps, other]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Authors: Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova

Abstract: In this paper we study yes/no questions that are naturally occurring --- meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. They often query for complex, non-factoid information, and require difficult entailment-like inference to solve. We also explore the eff… ▽ More In this paper we study yes/no questions that are naturally occurring --- meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. They often query for complex, non-factoid information, and require difficult entailment-like inference to solve. We also explore the effectiveness of a range of transfer learning baselines. We find that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Our best method trains BERT on MultiNLI and then re-trains it on our train set. It achieves 80.4% accuracy compared to 90% accuracy of human annotators (and 62% majority-baseline), leaving a significant gap for future work. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: In NAACL 2019

arXiv:1810.02320 [pdf, other]

doi 10.1080/01431161.2019.1674462

Computer vision-based framework for extracting geological lineaments from optical remote sensing data

Authors: Ehsan Farahbakhsh, Rohitash Chandra, Hugo K. H. Olierook, Richard Scalzo, Chris Clark, Steven M. Reddy, R. Dietmar Muller

Abstract: The extraction of geological lineaments from digital satellite data is a fundamental application in remote sensing. The location of geological lineaments such as faults and dykes are of interest for a range of applications, particularly because of their association with hydrothermal mineralization. Although a wide range of applications have utilized computer vision techniques, a standard workflow… ▽ More The extraction of geological lineaments from digital satellite data is a fundamental application in remote sensing. The location of geological lineaments such as faults and dykes are of interest for a range of applications, particularly because of their association with hydrothermal mineralization. Although a wide range of applications have utilized computer vision techniques, a standard workflow for application of these techniques to mineral exploration is lacking. We present a framework for extracting geological lineaments using computer vision techniques which is a combination of edge detection and line extraction algorithms for extracting geological lineaments using optical remote sensing data. It features ancillary computer vision techniques for reducing data dimensionality, removing noise and enhancing the expression of lineaments. We test the proposed framework on Landsat 8 data of a mineral-rich portion of the Gascoyne Province in Western Australia using different dimension reduction techniques and convolutional filters. To validate the results, the extracted lineaments are compared to our manual photointerpretation and geologically mapped structures by the Geological Survey of Western Australia (GSWA). The results show that the best correlation between our extracted geological lineaments and the GSWA geological lineament map is achieved by applying a minimum noise fraction transformation and a Laplacian filter. Application of a directional filter instead shows a stronger correlation with the output of our manual photointerpretation and known sites of hydrothermal mineralization. Hence, our framework using either filter can be used for mineral prospectivity mapping in other regions where faults are exposed and observable in optical remote sensing data. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 17 pages, 10 figures, 2 tables

arXiv:1802.08872 [pdf]

Deep learning for conifer/deciduous classification of airborne LiDAR 3D point clouds representing individual trees

Authors: Hamid Hamraz, Nathan B. Jacobs, Marco A. Contreras, Chase H. Clark

Abstract: The purpose of this study was to investigate the use of deep learning for coniferous/deciduous classification of individual trees from airborne LiDAR data. To enable efficient processing by a deep convolutional neural network (CNN), we designed two discrete representations using leaf-off and leaf-on LiDAR data: a digital surface model with four channels (DSMx4) and a set of four 2D views (4x2D). A… ▽ More The purpose of this study was to investigate the use of deep learning for coniferous/deciduous classification of individual trees from airborne LiDAR data. To enable efficient processing by a deep convolutional neural network (CNN), we designed two discrete representations using leaf-off and leaf-on LiDAR data: a digital surface model with four channels (DSMx4) and a set of four 2D views (4x2D). A training dataset of labeled tree crowns was generated via segmentation of tree crowns, followed by co-registration with field data. Potential mislabels due to GPS error or tree leaning were corrected using a statistical ensemble filtering procedure. Because the training data was heavily unbalanced (~8% conifers), we trained an ensemble of CNNs on random balanced sub-samples of augmented data (180 rotational variations per instance). The 4x2D representation yielded similar classification accuracies to the DSMx4 representation (~82% coniferous and ~90% deciduous) while converging faster. The data augmentation improved the classification accuracies, but more real training instances (especially coniferous) likely results in much stronger improvements. Leaf-off LiDAR data were the primary source of useful information, which is likely due to the perennial nature of coniferous foliage. LiDAR intensity values also proved to be useful, but normalization yielded no significant improvements. Lastly, the classification accuracies of overstory trees (~90%) were more balanced than those of understory trees (~90% deciduous and ~65% coniferous), which is likely due to the incomplete capture of understory tree crowns via airborne LiDAR. Automatic derivation of optimal features via deep learning provide the opportunity for remarkable improvements in prediction tasks where captured data are not friendly to human visual system - likely yielding sub-optimal human-designed features. △ Less

Submitted 24 February, 2018; originally announced February 2018.

Comments: Under review as of the date of submission

arXiv:1802.05365 [pdf, other]

Deep contextualized word representations

Authors: Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer

Abstract: We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show th… ▽ More We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals. △ Less

Submitted 22 March, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: NAACL 2018. Originally posted to openreview 27 Oct 2017. v2 updated for NAACL camera ready

arXiv:1710.10723 [pdf, other]

Simple and Effective Multi-Paragraph Reading Comprehension

Authors: Christopher Clark, Matt Gardner

Abstract: We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the… ▽ More We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system. △ Less

Submitted 7 November, 2017; v1 submitted 29 October, 2017; originally announced October 2017.

Comments: 11 pages, updated a reference

arXiv:1704.04760 [pdf]

In-Datacenter Performance Analysis of a Tensor Processing Unit

Authors: Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg , et al. (50 additional authors not shown)

Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp… ▽ More Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU. △ Less

Submitted 16 April, 2017; originally announced April 2017.

Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017

arXiv:1607.08482 [pdf]

Early and Late Time Acoustic Measures for Underwater Seismic Airgun Signals In Long-Term Acoustic Data Sets

Authors: Peter Dugan, Melania Guerra, Dimitri Ponirakis, Holger Klinck, Christopher W. Clark

Abstract: This work presents a new toolkit for describing the acoustic properties of the ocean environment before, during and after a sound event caused by an underwater seismic air-gun. The toolkit uses existing sound measures, but uniquely applies these to capture the early time period (actual pulse) and late time period (reverberation and multiple arrivals). In total, 183 features are produced for each a… ▽ More This work presents a new toolkit for describing the acoustic properties of the ocean environment before, during and after a sound event caused by an underwater seismic air-gun. The toolkit uses existing sound measures, but uniquely applies these to capture the early time period (actual pulse) and late time period (reverberation and multiple arrivals). In total, 183 features are produced for each air-gun sound. This toolkit was utilized on data retrieved from a field deployment encompassing five marine autonomous recording units during a 46-day seismic air-gun survey in Baffin Bay, Greenland. Using this toolkit, a total of 147 million data points were identified from the Greenland deployment recordings. The feasibility of extracting a large number of features was then evaluated using two separate methods: a serial computer and a high performance system. Results indicate that data extraction performance took an estimated 216 hours for the serial system, and 18 hours for the high performance computer. This paper provides an analytical description of the new toolkit along with details for using it to identify relevant data. △ Less

Submitted 5 May, 2016; originally announced July 2016.

Comments: Camera copy version of the paper for publication in IEEE explore. Paper was withdrawn by the co-authors for submission to JASA Express Letters

arXiv:1605.00983 [pdf]

Phase 3: DCL System Using Deep Learning Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals - Bioacoustic Applicaitons

Authors: Peter J. Dugan, Christopher W. Clark, Yann André LeCun, Sofie M. Van Parijs

Abstract: Goals of this research phase is to investigate advanced detection and classification pardims useful for data-mining passive large passive acoustic archives. Technical objectives are to develop and refine a High Performance Computing, Acoustic Data Accelerator (HPC-ADA) along with MATLAB based software based on time series acoustic signal Detection cLassification using Machine learning Algorithms,… ▽ More Goals of this research phase is to investigate advanced detection and classification pardims useful for data-mining passive large passive acoustic archives. Technical objectives are to develop and refine a High Performance Computing, Acoustic Data Accelerator (HPC-ADA) along with MATLAB based software based on time series acoustic signal Detection cLassification using Machine learning Algorithms, called DeLMA. Data scientists and biologists integrate to use the HPC-ADA and DeLMA technologies to explore data using newly developed techniques aimed at inspection of data extracted at large spatial and temporal scales. △ Less

Submitted 5 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

Comments: National Oceanic Partnership Program (NOPP) sponsored by ONR and NFWF

Report number: N000141210585

arXiv:1605.00982 [pdf]

Phase 4: DCL System Using Deep Learning Approaches for Land-Based or Ship-Based Real-Time Recognition and Localization of Marine Mammals - Distributed Processing and Big Data Applications

Authors: Peter J. Dugan, Christopher W. Clark, Yann André LeCun, Sofie M. Van Parijs

Abstract: While the animal bioacoustics community at large is collecting huge amounts of acoustic data at an unprecedented pace, processing these data is problematic. Currently in bioacoustics, there is no effective way to achieve high performance computing using commericial off the shelf (COTS) or government off the shelf (GOTS) tools. Although several advances have been made in the open source and commerc… ▽ More While the animal bioacoustics community at large is collecting huge amounts of acoustic data at an unprecedented pace, processing these data is problematic. Currently in bioacoustics, there is no effective way to achieve high performance computing using commericial off the shelf (COTS) or government off the shelf (GOTS) tools. Although several advances have been made in the open source and commercial software community, these offerings either support specific applications that do not integrate well with data formats in bioacoustics or they are too general. Furthermore, complex algorithms that use deep learning strategies require special considerations, such as very large libraiers of exemplars (whale sounds) readily available for algorithm training and testing. Detection-classification for passive acoustics is a data-mining strategy and our goals are aligned with best practices that appeal to the general data mining and machine learning communities where the problem of processing large data is common. Therefore, the objective of this work is to advance the state-of-the art for data-mining large passive acoustic datasets as they pertain to bioacoustics. With this basic deficiency recognized at the forefront, portions of the grant were dedicated to fostering deep-learning by way of international competitions (kaggle.com) meant to attract deep-learning solutions. The focus of this early work was targeted to make significant progress in addressing big data systems and advanced algorithms over the duration of the grant from 2012 to 2015. This early work provided simulataneous advances in systems-algorithms research while supporting various collaborations and projects. △ Less

Submitted 5 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

Comments: National Oceanic Partnership Program (NOPP) sponsored by ONR and NFWF

Report number: N000141210585

arXiv:1605.00972 [pdf]

Phase 2: DCL System Using Deep Learning Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals - Machine Learning Detection Algorithms

Authors: Peter J. Dugan, Christopher W. Clark, Yann André LeCun, Sofie M. Van Parijs

Abstract: Overarching goals for this work aim to advance the state of the art for detection, classification and localization (DCL) in the field of bioacoustics. This goal is primarily achieved by building a generic framework for detection-classification (DC) using a fast, efficient and scalable architecture, demonstrating the capabilities of this system using on a variety of low-frequency mid-frequency ceta… ▽ More Overarching goals for this work aim to advance the state of the art for detection, classification and localization (DCL) in the field of bioacoustics. This goal is primarily achieved by building a generic framework for detection-classification (DC) using a fast, efficient and scalable architecture, demonstrating the capabilities of this system using on a variety of low-frequency mid-frequency cetacean sounds. Two primary goals are to develop transferable technologies for detection and classification in, one: the area of advanced algorithms, such as deep learning and other methods; and two: advanced systems, capable of real-time and archival processing. For each key area, we will focus on producing publications from this work and providing tools and software to the community where/when possible. Currently massive amounts of acoustic data are being collected by various institutions, corporations and national defense agencies. The long-term goal is to provide technical capability to analyze the data using automatic algorithms for (DC) based on machine intelligence. The goal of the automation is to provide effective and efficient mechanisms by which to process large acoustic datasets for understanding the bioacoustic behaviors of marine mammals. This capability will provide insights into the potential ecological impacts and influences of anthropogenic ocean sounds. This work focuses on building technologies using a maturity model based on DARPA 6.1 and 6.2 processes, for basic and applied research, respectively. △ Less

Submitted 5 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

Comments: National Oceanic Partnership Program (NOPP) sponsored by ONR and NFWF: N000141210585

Report number: N000141210585

arXiv:1605.00971 [pdf]

Phase 1: DCL System Research Using Advanced Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals - HPC System Implementation

Authors: Peter J. Dugan, Christopher W. Clark, Yann André LeCun, Sofie M. Van Parijs

Abstract: We aim to investigate advancing the state of the art of detection, classification and localization (DCL) in the field of bioacoustics. The two primary goals are to develop transferable technologies for detection and classification in: (1) the area of advanced algorithms, such as deep learning and other methods; and (2) advanced systems, capable of real-time and archival and processing. This projec… ▽ More We aim to investigate advancing the state of the art of detection, classification and localization (DCL) in the field of bioacoustics. The two primary goals are to develop transferable technologies for detection and classification in: (1) the area of advanced algorithms, such as deep learning and other methods; and (2) advanced systems, capable of real-time and archival and processing. This project will focus on long-term, continuous datasets to provide automatic recognition, minimizing human time to annotate the signals. Effort will begin by focusing on several years of multi-channel acoustic data collected in the Stellwagen Bank National Marine Sanctuary (SBNMS) between 2006 and 2010. Our efforts will incorporate existing technologies in the bioacoustics signal processing community, advanced high performance computing (HPC) systems, and new approaches aimed at automatically detecting-classifying and measuring features for species-specific marine mammal sounds within passive acoustic data. △ Less

Submitted 5 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

Comments: Year 1 National Oceanic Partnership Program Report, sponsored ONR, NFWF. N000141210585

Report number: N000141210585

arXiv:1512.06141 [pdf, other]

The Role of Race, Ethnicity, and Gender in the Congressional Cosponsorship Network

Authors: Alison Craig, Skyler J. Cranmer, Bruce A. Desmarais, Christopher J. Clark, Vincent G. Moscardelli

Abstract: Previous research indicates that race, ethnicity, and gender influence legislative behavior in important ways. The bulk of this research, however, focuses on the way these characteristics shape an individual legislator's behavior, making it less clear how they account for relationships between legislators. We study the cosponsorship process in order to understand the race and gender based dynamics… ▽ More Previous research indicates that race, ethnicity, and gender influence legislative behavior in important ways. The bulk of this research, however, focuses on the way these characteristics shape an individual legislator's behavior, making it less clear how they account for relationships between legislators. We study the cosponsorship process in order to understand the race and gender based dynamics underlying the relational component of representation. Using a temporal exponential random graph model, we examine the U.S. House cosponsorship network from 1981 through 2004. We find that Black and Latino members of Congress are at a comparative disadvantage as a result of race-based assortative mixing in the cosponsorship process, yet this disadvantage is mitigated by the electoral pressures that all members face. Members representing districts with significant racial and ethnic minority populations are more likely to support their minority colleagues. We also find that women members do not appear to face a similar disadvantage as a result of their minority status. We argue that these race and gender dynamics in the cosponsorship network are the result of both the inherent tendency towards intra-group homophily in social networks and the electoral connection, which is manifested here as members supporting minority colleagues to broaden their own electoral base of support among minority constituencies. △ Less

Submitted 18 December, 2015; originally announced December 2015.

arXiv:1509.03591 [pdf]

High Performance Computer Acoustic Data Accelerator: A New System for Exploring Marine Mammal Acoustics for Big Data Applications

Authors: Peter Dugan, John Zollweg, Marian Popescu, Denise Risch, Herve Glotin, Yann LeCun, and Christopher Clark

Abstract: This paper presents a new software model designed for distributed sonic signal detection runtime using machine learning algorithms called DeLMA. A new algorithm--Acoustic Data-mining Accelerator (ADA)--is also presented. ADA is a robust yet scalable solution for efficiently processing big sound archives using distributing computing technologies. Together, DeLMA and the ADA algorithm provide a powe… ▽ More This paper presents a new software model designed for distributed sonic signal detection runtime using machine learning algorithms called DeLMA. A new algorithm--Acoustic Data-mining Accelerator (ADA)--is also presented. ADA is a robust yet scalable solution for efficiently processing big sound archives using distributing computing technologies. Together, DeLMA and the ADA algorithm provide a powerful tool currently being used by the Bioacoustics Research Program (BRP) at the Cornell Lab of Ornithology, Cornell University. This paper provides a high level technical overview of the system, and discusses various aspects of the design. Basic runtime performance and project summary are presented. The DeLMA-ADA baseline performance comparing desktop serial configuration to a 64 core distributed HPC system shows as much as a 44 times faster increase in runtime execution. Performance tests using 48 cores on the HPC shows a 9x to 12x efficiency over a 4 core desktop solution. Project summary results for 19 east coast deployments show that the DeLMA-ADA solution has processed over three million channel hours of sound to date. △ Less

Submitted 11 September, 2015; originally announced September 2015.

Comments: Seven pages, submitted at International Conference on Machine Learning 2014, Workshop uLearnBio, unsupervised learning for bioacoustic applications

MSC Class: 68-04

arXiv:1412.3409 [pdf, other]

Teaching Deep Convolutional Neural Networks to Play Go

Authors: Christopher Clark, Amos Storkey

Abstract: Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more 'humanlike' way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural n… ▽ More Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more 'humanlike' way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to 'hard code' symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned. △ Less

Submitted 27 January, 2015; v1 submitted 10 December, 2014; originally announced December 2014.

Comments: 9 pages, 8 figures, 5 tables. Corrected typos, minor adjustment to table format

arXiv:1305.3635 [pdf]

Bioacoustic Signal Classification Based on Continuous Region Processing, Grid Masking and Artificial Neural Network

Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Christopher Clark

Abstract: In this paper, we develop a novel method based on machine-learning and image processing to identify North Atlantic right whale (NARW) up-calls in the presence of high levels of ambient and interfering noise. We apply a continuous region algorithm on the spectrogram to extract the regions of interest, and then use grid masking techniques to generate a small feature set that is then used in an artif… ▽ More In this paper, we develop a novel method based on machine-learning and image processing to identify North Atlantic right whale (NARW) up-calls in the presence of high levels of ambient and interfering noise. We apply a continuous region algorithm on the spectrogram to extract the regions of interest, and then use grid masking techniques to generate a small feature set that is then used in an artificial neural network classifier to identify the NARW up-calls. It is shown that the proposed technique is effective in detecting and capturing even very faint up-calls, in the presence of ambient and interfering noises. The method is evaluated on a dataset recorded in Massachusetts Bay, United States. The dataset includes 20000 sound clips for training, and 10000 sound clips for testing. The results show that the proposed technique can achieve an error rate of less than FPR = 4.5% for a 90% true positive rate. △ Less

Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 8 figures

arXiv:1305.3633 [pdf]

Classification for Big Dataset of Bioacoustic Signals Based on Human Scoring System and Artificial Neural Network

Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Denise Risch, Hal Lewis, Christopher Clark

Abstract: In this paper, we propose a method to improve sound classification performance by combining signal features, derived from the time-frequency spectrogram, with human perception. The method presented herein exploits an artificial neural network (ANN) and learns the signal features based on the human perception knowledge. The proposed method is applied to a large acoustic dataset containing 24 months… ▽ More In this paper, we propose a method to improve sound classification performance by combining signal features, derived from the time-frequency spectrogram, with human perception. The method presented herein exploits an artificial neural network (ANN) and learns the signal features based on the human perception knowledge. The proposed method is applied to a large acoustic dataset containing 24 months of nearly continuous recordings. The results show a significant improvement in performance of the detection-classification system; yielding as much as 20% improvement in true positive rate for a given false positive rate. △ Less

Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 4 figures

arXiv:1305.3250 [pdf]

Bioacoustical Periodic Pulse Train Signal Detection and Classification using Spectrogram Intensity Binarization and Energy Projection

Authors: Marian Popescu, Peter J. Dugan, Mohammad Pourhomayoun, Denise Risch, Harold W. Lewis III, Christopher W. Clark

Abstract: The following work outlines an approach for automatic detection and recognition of periodic pulse train signals using a multi-stage process based on spectrogram edge detection, energy projection and classification. The method has been implemented to automatically detect and recognize pulse train songs of minke whales. While the long term goal of this work is to properly identify and detect minke s… ▽ More The following work outlines an approach for automatic detection and recognition of periodic pulse train signals using a multi-stage process based on spectrogram edge detection, energy projection and classification. The method has been implemented to automatically detect and recognize pulse train songs of minke whales. While the long term goal of this work is to properly identify and detect minke songs from large multi-year datasets, this effort was developed using sounds off the coast of Massachusetts, in the Stellwagen Bank National Marine Sanctuary. The detection methodology is presented and evaluated on 232 continuous hours of acoustic recordings and a qualitative analysis of machine learning classifiers and their performance is described. The trained automatic detection and classification system is applied to 120 continuous hours, comprised of various challenges such as broadband and narrowband noises, low SNR, and other pulse train signatures. This automatic system achieves a TPR of 63% for FPR of 0.6% (or 0.87 FP/h), at a Precision (PPV) of 84% and an F1 score of 71%. △ Less

Submitted 28 June, 2013; v1 submitted 14 May, 2013; originally announced May 2013.

Comments: ICML 2013 Workshop on Machine Learning for Bioacoustics, 2013, 6 pages

Showing 1–41 of 41 results for author: Clark, C