-
Optimized Deletion From an AVL Tree
Authors:
Russell A. Brown
Abstract:
An AVL tree is a binary search tree that guarantees $ O\left( \log n \right ) $ search. The guarantee is obtained at the cost of rebalancing the AVL tree, potentially after every insertion or deletion. This article proposes a deletion algorithm that reduces the rebalancing required after deletion compared to the rebalancing required after deletion by a previously reported algorithm.
An AVL tree is a binary search tree that guarantees $ O\left( \log n \right ) $ search. The guarantee is obtained at the cost of rebalancing the AVL tree, potentially after every insertion or deletion. This article proposes a deletion algorithm that reduces the rebalancing required after deletion compared to the rebalancing required after deletion by a previously reported algorithm.
△ Less
Submitted 1 July, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Testing the simplicity of strategy-proof mechanisms
Authors:
Alexander L. Brown,
Daniel G. Stephenson,
Rodrigo A. Velez
Abstract:
This paper experimentally evaluates four mechanisms intended to achieve the Uniform outcome in rationing problems (Sprumont, 1991). Our benchmark is the dominant-strategy, direct-revelation mechanism of the Uniform rule. A strategically equivalent mechanism that provides non-binding feedback during the reporting period greatly improves performance. A sequential revelation mechanism produces modest…
▽ More
This paper experimentally evaluates four mechanisms intended to achieve the Uniform outcome in rationing problems (Sprumont, 1991). Our benchmark is the dominant-strategy, direct-revelation mechanism of the Uniform rule. A strategically equivalent mechanism that provides non-binding feedback during the reporting period greatly improves performance. A sequential revelation mechanism produces modest improvements despite not possessing dominant strategies. A novel, obviously strategy-proof mechanism, devised by Arribillaga et al. (2023), does not improve performance. We characterize each alternative to the direct mechanism, finding general lessons about the advantages of real-time feedback and sequentiality of play as well as the potential shortcomings of an obviously strategy-proof mechanism.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model
Authors:
Andrew Brown,
Jiading Zhu,
Mohamed Abdelwahab,
Alec Dong,
Cindy Wang,
Jonathan Rose
Abstract:
Large Foundational Language Models are capable of performing many tasks at a high level but are difficult to deploy in many applications because of their size and proprietary ownership. Many will be motivated to distill specific capabilities of foundational models into smaller models that can be owned and controlled. In the development of a therapeutic chatbot, we wish to distill a capability know…
▽ More
Large Foundational Language Models are capable of performing many tasks at a high level but are difficult to deploy in many applications because of their size and proprietary ownership. Many will be motivated to distill specific capabilities of foundational models into smaller models that can be owned and controlled. In the development of a therapeutic chatbot, we wish to distill a capability known as reflective listening, in which a therapist produces reflections of client speech. These reflections either restate what a client has said, or connect what was said to a relevant observation, idea or guess that encourages and guides the client to continue contemplation. In this paper, we present a method for distilling the generation of reflections from a Foundational Language Model (GPT-4) into smaller models. We first show that GPT-4, using zero-shot prompting, can generate reflections at near 100% success rate, superior to all previous methods. Using reflections generated by GPT-4, we fine-tune different sizes of the GPT-2 family. The GPT-2-small model achieves 83% success on a hold-out test set and the GPT-2 XL achieves 90% success. We also show that GPT-4 can help in the labor-intensive task of evaluating the quality of the distilled models, using it as a zero-shot classifier. Using triple-human review as a guide, the classifier achieves a Cohen-Kappa of 0.66, a substantial inter-rater reliability figure.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Maximizing the Minimum Eigenvalue in Constant Dimension
Authors:
Adam Brown,
Aditi Laddha,
Mohit Singh
Abstract:
In an instance of the minimum eigenvalue problem, we are given a collection of $n$ vectors $v_1,\ldots, v_n \subset {\mathbb{R}^d}$, and the goal is to pick a subset $B\subseteq [n]$ of given vectors to maximize the minimum eigenvalue of the matrix $\sum_{i\in B} v_i v_i^{\top} $. Often, additional combinatorial constraints such as cardinality constraint $\left(|B|\leq k\right)$ or matroid constra…
▽ More
In an instance of the minimum eigenvalue problem, we are given a collection of $n$ vectors $v_1,\ldots, v_n \subset {\mathbb{R}^d}$, and the goal is to pick a subset $B\subseteq [n]$ of given vectors to maximize the minimum eigenvalue of the matrix $\sum_{i\in B} v_i v_i^{\top} $. Often, additional combinatorial constraints such as cardinality constraint $\left(|B|\leq k\right)$ or matroid constraint ($B$ is a basis of a matroid defined on $[n]$) must be satisfied by the chosen set of vectors. The minimum eigenvalue problem with matroid constraints models a wide variety of problems including the Santa Clause problem, the E-design problem, and the constructive Kadison-Singer problem.
In this paper, we give a randomized algorithm that finds a set $B\subseteq [n]$ subject to any matroid constraint whose minimum eigenvalue is at least $(1-ε)$ times the optimum, with high probability. The running time of the algorithm is $O\left( n^{O(d\log(d)/ε^2)}\right)$. In particular, our results give a polynomial time asymptotic scheme when the dimension of the vectors is constant. Our algorithm uses a convex programming relaxation of the problem after guessing a rescaling which allows us to apply pipage rounding and matrix Chernoff inequalities to round to a good solution. The key new component is a structural lemma which enables us to "guess'' the appropriate rescaling, which could be of independent interest. Our approach generalizes the approximation guarantee to monotone, homogeneous functions and as such we can maximize $\det(\sum_{i\in B} v_i v_i^\top)^{1/d}$, or minimize any norm of the eigenvalues of the matrix $\left(\sum_{i\in B} v_i v_i^\top\right)^{-1} $, with the same running time under some mild assumptions. As a byproduct, we also get a simple algorithm for an algorithmic version of Kadison-Singer problem.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Approximation Algorithms for the Weighted Nash Social Welfare via Convex and Non-Convex Programs
Authors:
Adam Brown,
Aditi Laddha,
Madhusudhan Reddy Pittu,
Mohit Singh
Abstract:
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment…
▽ More
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment $σ:\mathscr{G}\rightarrow \mathscr{A}$ that maximizes $\prod_{i\in \mathscr{A}} \left(\sum_{j\in σ^{-1}(i)} v_{ij}\right)^{w_i}$, the product of the weighted valuations of the players. When all the weights equal $\frac1n$, the problem reduces to the classical Nash Social Welfare problem, which has recently received much attention. In this work, we present a $5\cdot\exp\left(2\cdot D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})\right) = 5\cdot\exp\left(2\log{n} + 2\sum_{i=1}^n w_i \log{w_i}\right)$-approximation algorithm for the weighted Nash Social Welfare problem, where $D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})$ denotes the KL-divergence between the distribution induced by $\mathbf{w}$ and the uniform distribution on $[n]$.
We show a novel connection between the convex programming relaxations for the unweighted variant of Nash Social Welfare presented in \cite{cole2017convex, anari2017nash}, and generalize the programs to two different mathematical programs for the weighted case. The first program is convex and is necessary for computational efficiency, while the second program is a non-convex relaxation that can be rounded efficiently. The approximation factor derives from the difference in the objective values of the convex and non-convex relaxation.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Motion-Conditioned Image Animation for Video Editing
Authors:
Wilson Yan,
Andrew Brown,
Pieter Abbeel,
Rohit Girdhar,
Samaneh Azadi
Abstract:
We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object r…
▽ More
We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object replacement, background changes, style changes, and motion edits. We present a comprehensive human evaluation of the latest video editing methods along with MoCA, on our proposed benchmark. MoCA establishes a new state-of-the-art, demonstrating greater human preference win-rate, and outperforming notable recent approaches including Dreamix (63%), MasaCtrl (75%), and Tune-A-Video (72%), with especially significant improvements for motion edits.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Authors:
Rohit Girdhar,
Mannat Singh,
Andrew Brown,
Quentin Duval,
Samaneh Azadi,
Sai Saketh Rambhatla,
Akbar Shah,
Xi Yin,
Devi Parikh,
Ishan Misra
Abstract:
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolut…
▽ More
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Preference-conditioned Pixel-based AI Agent For Game Testing
Authors:
Sherif Abdelfattah,
Adrian Brown,
Pushi Zhang
Abstract:
The game industry is challenged to cope with increasing growth in demand and game complexity while maintaining acceptable quality standards for released games. Classic approaches solely depending on human efforts for quality assurance and game testing do not scale effectively in terms of time and cost. Game-testing AI agents that learn by interaction with the environment have the potential to miti…
▽ More
The game industry is challenged to cope with increasing growth in demand and game complexity while maintaining acceptable quality standards for released games. Classic approaches solely depending on human efforts for quality assurance and game testing do not scale effectively in terms of time and cost. Game-testing AI agents that learn by interaction with the environment have the potential to mitigate these challenges with good scalability properties on time and costs. However, most recent work in this direction depends on game state information for the agent's state representation, which limits generalization across different game scenarios. Moreover, game test engineers usually prefer exploring a game in a specific style, such as exploring the golden path. However, current game testing AI agents do not provide an explicit way to satisfy such a preference. This paper addresses these limitations by proposing an agent design that mainly depends on pixel-based state observations while exploring the environment conditioned on a user's preference specified by demonstration trajectories. In addition, we propose an imitation learning method that couples self-supervised and supervised learning objectives to enhance the quality of imitation behaviors. Our agent significantly outperforms state-of-the-art pixel-based game testing agents over exploration coverage and test execution quality when evaluated on a complex open-world environment resembling many aspects of real AAA games.
△ Less
Submitted 10 November, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Causal Video Summarizer for Video Exploration
Authors:
Jia-Hong Huang,
Chao-Han Huck Yang,
Pin-Yu Chen,
Andrew Brown,
Marcel Worring
Abstract:
Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization…
▽ More
Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Using Logs Data to Identify When Software Engineers Experience Flow or Focused Work
Authors:
Adam Brown,
Sarah D'Angelo,
Ben Holtz,
Ciera Jaspen,
Collin Green
Abstract:
Beyond self-report data, we lack reliable and non-intrusive methods for identifying flow. However, taking a step back and acknowledging that flow occurs during periods of focus gives us the opportunity to make progress towards measuring flow by isolating focused work. Here, we take a mixed-methods approach to design a logs-based metric that leverages machine learning and a comprehensive collection…
▽ More
Beyond self-report data, we lack reliable and non-intrusive methods for identifying flow. However, taking a step back and acknowledging that flow occurs during periods of focus gives us the opportunity to make progress towards measuring flow by isolating focused work. Here, we take a mixed-methods approach to design a logs-based metric that leverages machine learning and a comprehensive collection of logs data to identify periods of related actions (indicating focus), and validate this metric against self-reported time in focus or flow using diary data and quarterly survey data. Our results indicate that we can determine when software engineers at a large technology company experience focused work which includes instances of flow. This metric speaks to engineering work, but can be leveraged in other domains to non-disruptively measure when people experience focus. Future research can build upon this work to identify signals associated with other facets of flow.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Reproducing the results for NICER observation of PSR J0030+0451
Authors:
Chaitanya Afle,
Patrick R. Miles,
Silvina Caino-Lores,
Collin D. Capano,
Ingo Tews,
Karan Vahi,
Ewa Deelman,
Michela Taufer,
Duncan A. Brown
Abstract:
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We not…
▽ More
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We note the challenges we faced in reproducing the results and demonstrate that the analysis can be reproduced and reused in future works by changing the prior distribution for the radius and the sampler configuration. We find no significant change in the measurement of the mass and radius, demonstrating that the original result is robust to these changes. Finally, we provide a containerized working environment that facilitates third-party reproduction of the measurements of mass and radius of PSR J0030+0451 using the NICER observations.
△ Less
Submitted 31 January, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Automated Machine Learning for Deep Learning based Malware Detection
Authors:
Austin Brown,
Maanak Gupta,
Mahmoud Abdelsalam
Abstract:
Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed s…
▽ More
Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed state-of-the-art models are very complex and may not be the best fit for different datasets. A promising approach, known as Automated Machine Learning (AutoML), can reduce the domain expertise required to implement a custom DL model. AutoML reduces the amount of human trial-and-error involved in designing DL models, and in more recent implementations can find new model architectures with relatively low computational overhead.
This work provides a comprehensive analysis and insights on using AutoML for static and online malware detection. For static, our analysis is performed on two widely used malware datasets: SOREL-20M to demonstrate efficacy on large datasets; and EMBER-2018, a smaller dataset specifically curated to hinder the performance of machine learning models. In addition, we show the effects of tuning the NAS process parameters on finding a more optimal malware detection model on these static analysis datasets. Further, we also demonstrate that AutoML is performant in online malware detection scenarios using Convolutional Neural Networks (CNNs) for cloud IaaS. We compare an AutoML technique to six existing state-of-the-art CNNs using a newly generated online malware dataset with and without other applications running in the background during malware execution.In general, our experimental results show that the performance of AutoML based static and online malware detection models are on par or even better than state-of-the-art models or hand-designed models presented in literature.
△ Less
Submitted 3 November, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Authors:
Jaesung Huh,
Andrew Brown,
Jee-weon Jung,
Joon Son Chung,
Arsha Nagrani,
Daniel Garcia-Romero,
Andrew Zisserman
Abstract:
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild". The challenge consisted of: (i) the provision of publicly available speaker re…
▽ More
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild". The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and hybrid workshop held at INTERSPEECH 2022. We describe the four tracks of our challenge along with the baselines, methods, and results. We conclude with a discussion on the new domain-transfer focus of VoxSRC-22, and on the progression of the challenge from the previous three editions.
△ Less
Submitted 6 March, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
An Event-Driven Approach To Genotype Imputation On A Custom RISC-V FPGA Cluster
Authors:
Jordan Morris,
Ashur Rafiev,
Graeme Bragg,
Mark Vousden,
David Thomas,
Alex Yakovlev,
Andrew Brown
Abstract:
This paper proposes an event-driven solution to genotype imputation, a technique used to statistically infer missing genetic markers in DNA. The work implements the widely accepted Li and Stephens model, primary contributor to the computational complexity of modern x86 solutions, in an attempt to determine whether further investigation of the application is warranted in the event-driven domain. Th…
▽ More
This paper proposes an event-driven solution to genotype imputation, a technique used to statistically infer missing genetic markers in DNA. The work implements the widely accepted Li and Stephens model, primary contributor to the computational complexity of modern x86 solutions, in an attempt to determine whether further investigation of the application is warranted in the event-driven domain. The model is implemented using graph-based Hidden Markov Modeling and executed as a customized forward/backward dynamic programming algorithm. The solution uses an event-driven paradigm to map the algorithm to thousands of concurrent cores, where events are small messages that carry both control and data within the algorithm. The design of a single processing element is discussed. This is then extended across multiple FPGAs and executed on a custom RISC-V NoC FPGA cluster called POETS. Results demonstrate how the algorithm scales over increasing hardware resources and a 48 FPGA run demonstrates a 270X reduction in wall-clock processing time when compared to a single-threaded x86 solution. Optimisation of the algorithm via linear interpolation is then introduced and tested, with results demonstrating a wall-clock reduction time of approx. 5 orders of magnitude when compared to a similarly optimised x86 solution.
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
Relating Regularization and Generalization through the Intrinsic Dimension of Activations
Authors:
Bradley C. A. Brown,
Jordan Juravsky,
Anthony L. Caterini,
Gabriel Loaiza-Ganem
Abstract:
Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the…
▽ More
Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the model's representation of the data. First, we show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models and show how this strongly affects generalization performance. We also investigate how excessive regularization decreases a model's ability to extract features from data in earlier layers, leading to a negative effect on validation accuracy even while LLID continues to decrease and training accuracy remains near-perfect. Finally, we examine the LLID over the course of training of models that exhibit grokking. We observe that well after training accuracy saturates, when models ``grok'' and validation accuracy suddenly improves from random to perfect, there is a co-occurent sudden drop in LLID, thus providing more insight into the dynamics of sudden generalization.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Efficient Determinant Maximization for All Matroids
Authors:
Adam Brown,
Aditi Laddha,
Madhusudhan Pittu,
Mohit Singh
Abstract:
Determinant maximization provides an elegant generalization of problems in many areas, including convex geometry, statistics, machine learning, fair allocation of goods, and network design. In an instance of the determinant maximization problem, we are given a collection of vectors $v_1,\ldots, v_n \in \mathbb{R}^d$, and the goal is to pick a subset $S\subseteq [n]$ of given vectors to maximize th…
▽ More
Determinant maximization provides an elegant generalization of problems in many areas, including convex geometry, statistics, machine learning, fair allocation of goods, and network design. In an instance of the determinant maximization problem, we are given a collection of vectors $v_1,\ldots, v_n \in \mathbb{R}^d$, and the goal is to pick a subset $S\subseteq [n]$ of given vectors to maximize the determinant of the matrix $\sum_{i \in S} v_iv_i^\top$, where the picked set of vectors $S$ must satisfy some combinatorial constraint such as cardinality constraint ($|S| \leq k$) or matroid constraint ($S$ is a basis of a matroid defined on $[n]$).
In this work, we give a combinatorial algorithm for the determinant maximization problem under a matroid constraint that achieves $O(d^{O(d)})$-approximation for any matroid of rank $r\geq d$. This complements the recent result of~\cite{BrownLPST22} that achieves a similar bound for matroids of rank $r\leq d$, relying on a geometric interpretation of the determinant. Our result matches the best-known estimation algorithms~\cite{madan2020maximizing} for the problem, which could estimate the objective value but could not give an approximate solution with a similar guarantee. Our work follows the framework developed by~\cite{BrownLPST22} of using matroid intersection based algorithms for determinant maximization. To overcome the lack of a simple geometric interpretation of the objective when $r \geq d$, our approach combines ideas from combinatorial optimization with algebraic properties of the determinant. We also critically use the properties of a convex programming relaxation of the problem introduced by~\cite{madan2020maximizing}.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
In search of strong embedding extractors for speaker diarisation
Authors:
Jee-weon Jung,
Hee-Soo Heo,
Bong-Jin Lee,
Jaesung Huh,
Andrew Brown,
Youngki Kwon,
Shinji Watanabe,
Joon Son Chung
Abstract:
Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and…
▽ More
Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation. We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance. Second, embedding extractors have not seen utterances in which multiple speakers exist. These inputs are inevitably present in speaker diarisation because of overlapped speech and speaker changes; they degrade the performance. To mitigate the first problem, we generate speaker verification evaluation protocols that mimic the diarisation scenario better. We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input. One technique generates overlapped speech segments, and the other generates segments where two speakers utter sequentially. Extensive experimental results using three state-of-the-art speaker embedding extractors demonstrate that both proposed approaches are effective.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Discrete Microlocal Morse Theory
Authors:
Adam Brown,
Ondrej Draganov
Abstract:
We establish several results combining discrete Morse theory and microlocal sheaf theory in the setting of finite posets and simplicial complexes. Our primary tool is a computationally tractable description of the bounded derived category of sheaves on a poset with the Alexandrov topology. We prove that each bounded complex of sheaves on a finite poset admits a unique (up to isomorphism of complex…
▽ More
We establish several results combining discrete Morse theory and microlocal sheaf theory in the setting of finite posets and simplicial complexes. Our primary tool is a computationally tractable description of the bounded derived category of sheaves on a poset with the Alexandrov topology. We prove that each bounded complex of sheaves on a finite poset admits a unique (up to isomorphism of complexes) minimal injective resolution, and we provide algorithms for computing minimal injective resolution of an injective complex, as well as several useful functors between derived categories of sheaves. For the constant sheaf on a simplicial complex, we give asymptotically tight bounds on the complexity of computing the minimal injective resolution using those algorithms. Our main result is a novel definition of the discrete microsupport of a bounded complex of sheaves on a finite poset. We detail several foundational properties of the discrete microsupport, as well as a microlocal generalization of the discrete homological Morse theorem and Morse inequalities.
△ Less
Submitted 9 June, 2024; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Online Malware Classification with System-Wide System Calls in Cloud IaaS
Authors:
Phillip Brown,
Austin Brown,
Maanak Gupta,
Mahmoud Abdelsalam
Abstract:
Accurately classifying malware in an environment allows the creation of better response and remediation strategies by cyber analysts. However, classifying malware in a live environment is a difficult task due to the large number of system data sources. Collecting statistics from these separate sources and processing them together in a form that can be used by a machine learning model is difficult.…
▽ More
Accurately classifying malware in an environment allows the creation of better response and remediation strategies by cyber analysts. However, classifying malware in a live environment is a difficult task due to the large number of system data sources. Collecting statistics from these separate sources and processing them together in a form that can be used by a machine learning model is difficult. Fortunately, all of these resources are mediated by the operating system's kernel. User programs, malware included, interacts with system resources by making requests to the kernel with system calls. Collecting these system calls provide insight to the interaction with many system resources in a single location. Feeding these system calls into a performant model such as a random forest allows fast, accurate classification in certain situations. In this paper, we evaluate the feasibility of using system call sequences for online malware classification in both low-activity and heavy-use Cloud IaaS. We collect system calls as they are received by the kernel and take n-gram sequences of calls to use as features for tree-based machine learning models. We discuss the performance of the models on baseline systems with no extra running services and systems under heavy load and the performance gap between them.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing
Authors:
Alexander Brown,
Nenad Tomasev,
Jan Freyberg,
Yuan Liu,
Alan Karthikesalingam,
Jessica Schrouff
Abstract:
Machine learning (ML) holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. An important step is to characterize the (un)fairness of ML models - their tendency to perform differently across subgroups of the population - and to understand its underlying mechanisms. One potential driver of algorithmic unfairness, sho…
▽ More
Machine learning (ML) holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. An important step is to characterize the (un)fairness of ML models - their tendency to perform differently across subgroups of the population - and to understand its underlying mechanisms. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. However, diagnosing this phenomenon is difficult, especially when sensitive attributes are causally linked with disease. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems, and demonstrate its application to clinical tasks in radiology and dermatology. Finally, our approach reveals instances when shortcutting is not responsible for unfairness, highlighting the need for a holistic approach to fairness mitigation in medical AI.
△ Less
Submitted 16 June, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Inspector: Pixel-Based Automated Game Testing via Exploration, Detection, and Investigation
Authors:
Guoqing Liu,
Mengzhang Cai,
Li Zhao,
Tao Qin,
Adrian Brown,
Jimmy Bischoff,
Tie-Yan Liu
Abstract:
Deep reinforcement learning (DRL) has attracted much attention in automated game testing. Early attempts rely on game internal information for game space exploration, thus requiring deep integration with games, which is inconvenient for practical applications. In this work, we propose using only screenshots/pixels as input for automated game testing and build a general game testing agent, Inspecto…
▽ More
Deep reinforcement learning (DRL) has attracted much attention in automated game testing. Early attempts rely on game internal information for game space exploration, thus requiring deep integration with games, which is inconvenient for practical applications. In this work, we propose using only screenshots/pixels as input for automated game testing and build a general game testing agent, Inspector, that can be easily applied to different games without deep integration with games. In addition to covering all game space for testing, our agent tries to take human-like behaviors to interact with key objects in a game, since some bugs usually happen in player-object interactions. Inspector is based on purely pixel inputs and comprises three key modules: game space explorer, key object detector, and human-like object investigator. Game space explorer aims to explore the whole game space by using a curiosity-based reward function with pixel inputs. Key object detector aims to detect key objects in a game, based on a small number of labeled screenshots. Human-like object investigator aims to mimic human behaviors for investigating key objects via imitation learning. We conduct experiments on two popular video games: Shooter Game and Action RPG Game. Experiment results demonstrate the effectiveness of Inspector in exploring game space, detecting key objects, and investigating objects. Moreover, Inspector successfully discovers two potential bugs in those two games. The demo video of Inspector is available at https://github.com/Inspector-GameTesting/Inspector-GameTesting.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Determinant Maximization via Matroid Intersection Algorithms
Authors:
Adam Brown,
Aditi Laddha,
Madhusudhan Pittu,
Mohit Singh,
Prasad Tetali
Abstract:
Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics \cite{pukelsheim2006optimal}, convex geometry \cite{Khachiyan1996}, fair allocations\linebreak \cite{anari2016nash}, combinatorics \cite{AnariGV18}, spectral graph theory \cite{nikolov2019proportional}, network design, and random processes \cite{kulesza2012determinantal}. In a…
▽ More
Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics \cite{pukelsheim2006optimal}, convex geometry \cite{Khachiyan1996}, fair allocations\linebreak \cite{anari2016nash}, combinatorics \cite{AnariGV18}, spectral graph theory \cite{nikolov2019proportional}, network design, and random processes \cite{kulesza2012determinantal}. In an instance of a determinant maximization problem, we are given a collection of vectors $U=\{v_1,\ldots, v_n\} \subset \RR^d$, and a goal is to pick a subset $S\subseteq U$ of given vectors to maximize the determinant of the matrix $\sum_{i\in S} v_i v_i^\top $. Often, the set $S$ of picked vectors must satisfy additional combinatorial constraints such as cardinality constraint $\left(|S|\leq k\right)$ or matroid constraint ($S$ is a basis of a matroid defined on the vectors).
In this paper, we give a polynomial-time deterministic algorithm that returns a $r^{O(r)}$-approximation for any matroid of rank $r\leq d$. This improves previous results that give $e^{O(r^2)}$-approximation algorithms relying on $e^{O(r)}$-approximate \emph{estimation} algorithms \cite{NikolovS16,anari2017generalization,AnariGV18,madan2020maximizing} for any $r\leq d$. All previous results use convex relaxations and their relationship to stable polynomials and strongly log-concave polynomials. In contrast, our algorithm builds on combinatorial algorithms for matroid intersection, which iteratively improve any solution by finding an \emph{alternating negative cycle} in the \emph{exchange graph} defined by the matroids. While the $\det(.)$ function is not linear, we show that taking appropriate linear approximations at each iteration suffice to give the improved approximation algorithm.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
Reservoir Computing with 3D Nanowire Networks
Authors:
R. K. Daniels,
J. B. Mallinson,
Z. E. Heywood,
P. J. Bones,
M. D. Arnold,
S. A. Brown
Abstract:
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the…
▽ More
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the nanowires as 1D objects lying in a perfectly 2D plane - the effect of stacking on RC performance has not yet been established. Here we use detailed simulations to compare the performance of perfectly 2D and quasi-3D (stacked) networks of nanowires in two tasks: memory capacity and nonlinear transformation. We also show that our model of the junctions between nanowires is general enough to describe a wide range of memristive networks, and consider the impact of physically realistic electrode configurations on performance. We show that the various networks and configurations have a strikingly similar performance in RC tasks, which is surprising given their radically different topologies. Our results show that networks with an experimentally achievable number of electrodes perform close to the upper bounds achievable when using the information from every wire. However, we also show important differences, in particular that the quasi-3D networks are more resilient to changes in the input parameters, generalizing better to noisy training data. Since previous literature suggests that topology plays an important role in computing performance, these results may have important implications for future applications of nanowire networks in neuromorphic computing.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Verifying the Union of Manifolds Hypothesis for Image Data
Authors:
Bradley C. A. Brown,
Anthony L. Caterini,
Brendan Leigh Ross,
Jesse C. Cresswell,
Gabriel Loaiza-Ganem
Abstract:
Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does…
▽ More
Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in image data. Assuming that data lies on a single manifold implies intrinsic dimension is identical across the entire data space, and does not allow for subregions of this space to have a different number of factors of variation. To address this deficiency, we consider the union of manifolds hypothesis, which states that data lies on a disjoint union of manifolds of varying intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, finding that indeed, observed data lies on a disconnected set and that intrinsic dimension is not constant. We also provide insights into the implications of the union of manifolds hypothesis in deep learning, both supervised and unsupervised, showing that designing models with an inductive bias for this structure improves performance across classification and generative modelling tasks. Our code is available at https://github.com/layer6ai-labs/UoMH.
△ Less
Submitted 2 March, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
End-to-End Visual Editing with a Generatively Pre-Trained Artist
Authors:
Andrew Brown,
Cheng-Yang Fu,
Omkar Parkhi,
Tamara L. Berg,
Andrea Vedaldi
Abstract:
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we…
▽ More
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. The benefits are remarkable: implemented as a state-of-the-art auto-regressive transformer, our approach is simple, sidesteps difficulties with previous methods based on GAN-like priors, obtains significantly better edits, and is efficient. Furthermore, we show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture. We demonstrate the superiority of this approach across several datasets in extensive quantitative and qualitative experiments, including human studies, significantly outperforming prior work.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Diagnosing failures of fairness transfer across distribution shift in real-world medical settings
Authors:
Jessica Schrouff,
Natalie Harris,
Oluwasanmi Koyejo,
Ibrahim Alabdulmohsin,
Eva Schnider,
Krista Opsahl-Ong,
Alex Brown,
Subhrajit Roy,
Diana Mincu,
Christina Chen,
Awa Dieng,
Yuan Liu,
Vivek Natarajan,
Alan Karthikesalingam,
Katherine Heller,
Silvia Chiappa,
Alexander D'Amour
Abstract:
Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is enco…
▽ More
Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is encountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as a key tool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures of fairness transfer, including cases where real-world shifts are more complex than is often assumed in the literature. Based on these results, we discuss potential remedies at each step of the machine learning pipeline.
△ Less
Submitted 10 February, 2023; v1 submitted 2 February, 2022;
originally announced February 2022.
-
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
Authors:
Andrew Brown,
Jaesung Huh,
Joon Son Chung,
Arsha Nagrani,
Daniel Garcia-Romero,
Andrew Zisserman
Abstract:
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unconstrained or `in the wild' data. The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from Yo…
▽ More
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unconstrained or `in the wild' data. The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2021. This paper outlines the challenge, and describes the baselines, methods and results. We conclude with a discussion on the new multi-lingual focus of VoxSRC 2021, and on the progression of the challenge since the previous two editions.
△ Less
Submitted 16 November, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
Computing Minimal Injective Resolutions of Sheaves on Finite Posets
Authors:
Adam Brown,
Ondrej Draganov
Abstract:
In this paper we introduce two new methods for constructing injective resolutions of sheaves of finite-dimensional vector spaces on finite posets. Our main result is the existence and uniqueness of a minimal injective resolution of a given sheaf and an algorithm for its construction. For the constant sheaf on a simplicial complex, we give a topological interpretation of the multiplicities of indec…
▽ More
In this paper we introduce two new methods for constructing injective resolutions of sheaves of finite-dimensional vector spaces on finite posets. Our main result is the existence and uniqueness of a minimal injective resolution of a given sheaf and an algorithm for its construction. For the constant sheaf on a simplicial complex, we give a topological interpretation of the multiplicities of indecomposable injective sheaves in the minimal injective resolution, and give asymptotically tight bounds on the complexity of computing the minimal injective resolution with our algorithm.
△ Less
Submitted 8 December, 2021; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Designing Composites with Target Effective Young's Modulus using Reinforcement Learning
Authors:
Aldair E. Gongora,
Siddharth Mysore,
Beichen Li,
Wan Shou,
Wojciech Matusik,
Elise F. Morgan,
Keith A. Brown,
Emily Whiting
Abstract:
Advancements in additive manufacturing have enabled design and fabrication of materials and structures not previously realizable. In particular, the design space of composite materials and structures has vastly expanded, and the resulting size and complexity has challenged traditional design methodologies, such as brute force exploration and one factor at a time (OFAT) exploration, to find optimum…
▽ More
Advancements in additive manufacturing have enabled design and fabrication of materials and structures not previously realizable. In particular, the design space of composite materials and structures has vastly expanded, and the resulting size and complexity has challenged traditional design methodologies, such as brute force exploration and one factor at a time (OFAT) exploration, to find optimum or tailored designs. To address this challenge, supervised machine learning approaches have emerged to model the design space using curated training data; however, the selection of the training data is often determined by the user. In this work, we develop and utilize a Reinforcement learning (RL)-based framework for the design of composite structures which avoids the need for user-selected training data. For a 5 $\times$ 5 composite design space comprised of soft and compliant blocks of constituent material, we find that using this approach, the model can be trained using 2.78% of the total design space consists of $2^{25}$ design possibilities. Additionally, the developed RL-based framework is capable of finding designs at a success rate exceeding 90%. The success of this approach motivates future learning frameworks to utilize RL for the design of composites and other material systems.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Robust Temporal Ensembling for Learning with Noisy Labels
Authors:
Abel Brown,
Benedikt Schifferer,
Robert DiPietro
Abstract:
Successful training of deep neural networks with noisy labels is an essential capability as most real-world datasets contain some amount of mislabeled data. Left unmitigated, label noise can sharply degrade typical supervised learning approaches. In this paper, we present robust temporal ensembling (RTE), which combines robust loss with semi-supervised regularization methods to achieve noise-robus…
▽ More
Successful training of deep neural networks with noisy labels is an essential capability as most real-world datasets contain some amount of mislabeled data. Left unmitigated, label noise can sharply degrade typical supervised learning approaches. In this paper, we present robust temporal ensembling (RTE), which combines robust loss with semi-supervised regularization methods to achieve noise-robust learning. We demonstrate that RTE achieves state-of-the-art performance across the CIFAR-10, CIFAR-100, ImageNet, WebVision, and Food-101N datasets, while forgoing the recent trend of label filtering and/or fixing. Finally, we show that RTE also retains competitive corruption robustness to unforeseen input noise using CIFAR-10-C, obtaining a mean corruption error (mCE) of 13.50% even in the presence of an 80% noise ratio, versus 26.9% mCE with standard methods on clean data.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Snakes AI Competition 2020 and 2021 Report
Authors:
Joseph Alexander Brown,
Luiz Jonata Pires de Araujo,
Alexandr Grichshenko
Abstract:
The Snakes AI Competition was held by the Innopolis University and was part of the IEEE Conference on Games2020 and 2021 editions. It aimed to create a sandbox for learning and implementing artificial intelligence algorithms in agents in a ludic manner. Competitors of several countries participated in both editions of the competition, which was streamed to create asynergy between organizers and th…
▽ More
The Snakes AI Competition was held by the Innopolis University and was part of the IEEE Conference on Games2020 and 2021 editions. It aimed to create a sandbox for learning and implementing artificial intelligence algorithms in agents in a ludic manner. Competitors of several countries participated in both editions of the competition, which was streamed to create asynergy between organizers and the community. The high-quality submissions and the enthusiasm around the developed framework create an exciting scenario for future extensions.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Using Interaction Data to Predict Engagement with Interactive Media
Authors:
Jonathan Carlton,
Andy Brown,
Caroline Jay,
John Keane
Abstract:
Media is evolving from traditional linear narratives to personalised experiences, where control over information (or how it is presented) is given to individual audience members. Measuring and understanding audience engagement with this media is important in at least two ways: (1) a post-hoc understanding of how engaged audiences are with the content will help production teams learn from experienc…
▽ More
Media is evolving from traditional linear narratives to personalised experiences, where control over information (or how it is presented) is given to individual audience members. Measuring and understanding audience engagement with this media is important in at least two ways: (1) a post-hoc understanding of how engaged audiences are with the content will help production teams learn from experience and improve future productions; (2), this type of media has potential for real-time measures of engagement to be used to enhance the user experience by adapting content on-the-fly. Engagement is typically measured by asking samples of users to self-report, which is time consuming and expensive. In some domains, however, interaction data have been used to infer engagement. Fortuitously, the nature of interactive media facilitates a much richer set of interaction data than traditional media; our research aims to understand if these data can be used to infer audience engagement. In this paper, we report a study using data captured from audience interactions with an interactive TV show to model and predict engagement. We find that temporal metrics, including overall time spent on the experience and the interval between events, are predictive of engagement. The results demonstrate that interaction data can be used to infer users' engagement during and after an experience, and the proposed techniques are relevant to better understand audience preference and responses.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science Domains
Authors:
Qiaohao Liang,
Aldair E. Gongora,
Zekun Ren,
Armi Tiihonen,
Zhe Liu,
Shijing Sun,
James R. Deneault,
Daniil Bash,
Flore Mekki-Berrada,
Saif A. Khan,
Kedar Hippalgaonkar,
Benji Maruyama,
Keith A. Brown,
John Fisher III,
Tonio Buonassisi
Abstract:
In the field of machine learning (ML) for materials optimization, active learning algorithms, such as Bayesian Optimization (BO), have been leveraged for guiding autonomous and high-throughput experimentation systems. However, very few studies have evaluated the efficiency of BO as a general optimization algorithm across a broad range of experimental materials science domains. In this work, we eva…
▽ More
In the field of machine learning (ML) for materials optimization, active learning algorithms, such as Bayesian Optimization (BO), have been leveraged for guiding autonomous and high-throughput experimentation systems. However, very few studies have evaluated the efficiency of BO as a general optimization algorithm across a broad range of experimental materials science domains. In this work, we evaluate the performance of BO algorithms with a collection of surrogate model and acquisition function pairs across five diverse experimental materials systems, namely carbon nanotube polymer blends, silver nanoparticles, lead-halide perovskites, as well as additively manufactured polymer structures and shapes. By defining acceleration and enhancement metrics for general materials optimization objectives, we find that for surrogate model selection, Gaussian Process (GP) with anisotropic kernels (automatic relevance detection, ARD) and Random Forests (RF) have comparable performance and both outperform the commonly used GP without ARD. We discuss the implicit distributional assumptions of RF and GP, and the benefits of using GP with anisotropic kernels in detail. We provide practical insights for experimentalists on surrogate model selection of BO during materials optimization campaigns.
△ Less
Submitted 23 May, 2021;
originally announced June 2021.
-
Face, Body, Voice: Video Person-Clustering with Multiple Modalities
Authors:
Andrew Brown,
Vicky Kalogeiton,
Andrew Zisserman
Abstract:
The objective of this work is person-clustering in videos -- grouping characters according to their identity. Previous methods focus on the narrower task of face-clustering, and for the most part ignore other cues such as the person's voice, their overall appearance (hair, clothes, posture), and the editing structure of the videos. Similarly, most current datasets evaluate only the task of face-cl…
▽ More
The objective of this work is person-clustering in videos -- grouping characters according to their identity. Previous methods focus on the narrower task of face-clustering, and for the most part ignore other cues such as the person's voice, their overall appearance (hair, clothes, posture), and the editing structure of the videos. Similarly, most current datasets evaluate only the task of face-clustering, rather than person-clustering. This limits their applicability to downstream applications such as story understanding which require person-level, rather than only face-level, reasoning. In this paper we make contributions to address both these deficiencies: first, we introduce a Multi-Modal High-Precision Clustering algorithm for person-clustering in videos using cues from several modalities (face, body, and voice). Second, we introduce a Video Person-Clustering dataset, for evaluating multi-modal person-clustering. It contains body-tracks for each annotated character, face-tracks when visible, and voice-tracks when speaking, with their associated features. The dataset is by far the largest of its kind, and covers films and TV-shows representing a wide range of demographics. Finally, we show the effectiveness of using multiple modalities for person-clustering, explore the use of this new broad task for story understanding through character co-occurrences, and achieve a new state of the art on all available datasets for face and person-clustering.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Thinking Through and Writing About Research Ethics Beyond "Broader Impact"
Authors:
Kate Sim,
Andrew Brown,
Amelia Hassoun
Abstract:
In March 2021, we held the first instalment of the tutorial on thinking through and writing about research ethics beyond 'Broader Impact' in conjunction with the ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). The goal of this tutorial was to offer a conceptual and practical starting point for engineers and social scientists interested in thinking more expansively, holist…
▽ More
In March 2021, we held the first instalment of the tutorial on thinking through and writing about research ethics beyond 'Broader Impact' in conjunction with the ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). The goal of this tutorial was to offer a conceptual and practical starting point for engineers and social scientists interested in thinking more expansively, holistically, and critically about research ethics. This report provides an outline of the tutorial, and contains our 'lifecourse checklist'. This was presented as part of the tutorial, and provides a practical starting point for researchers when thinking about research ethics before a project's start. We provide this to the research community, with the hope that researchers use it when considering the ethics of their research.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Machine Learning-Based Optimal Mesh Generation in Computational Fluid Dynamics
Authors:
Keefe Huang,
Moritz Krügener,
Alistair Brown,
Friedrich Menhorn,
Hans-Joachim Bungartz,
Dirk Hartmann
Abstract:
Computational Fluid Dynamics (CFD) is a major sub-field of engineering. Corresponding flow simulations are typically characterized by heavy computational resource requirements. Often, very fine and complex meshes are required to resolve physical effects in an appropriate manner. Since all CFD algorithms scale at least linearly with the size of the underlying mesh discretization, finding an optimal…
▽ More
Computational Fluid Dynamics (CFD) is a major sub-field of engineering. Corresponding flow simulations are typically characterized by heavy computational resource requirements. Often, very fine and complex meshes are required to resolve physical effects in an appropriate manner. Since all CFD algorithms scale at least linearly with the size of the underlying mesh discretization, finding an optimal mesh is key for computational efficiency.
One methodology used to find optimal meshes is goal-oriented adaptive mesh refinement. However, this is typically computationally demanding and only available in a limited number of tools. Within this contribution, we adopt a machine learning approach to identify optimal mesh densities. We generate optimized meshes using classical methodologies and propose to train a convolutional network predicting optimal mesh densities given arbitrary geometries. The proposed concept is validated along 2d wind tunnel simulations with more than 60,000 simulations. Using a training set of 20,000 simulations we achieve accuracies of more than 98.7%.
Corresponding predictions of optimal meshes can be used as input for any mesh generation and CFD tool. Thus without complex computations, any CFD engineer can start his predictions from a high quality mesh.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Automated Video Labelling: Identifying Faces by Corroborative Evidence
Authors:
Andrew Brown,
Ernesto Coto,
Andrew Zisserman
Abstract:
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributio…
▽ More
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on different video domains and test settings, such as TV shows and news broadcasts. Our method works across three disparate datasets without any explicit domain adaptation, and sets new state-of-the-art results on all the public benchmarks.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Towards integrated tactile sensorimotor control in anthropomorphic soft robotic hands
Authors:
Nathan F. Lepora,
Andrew Stinchcombe,
Chris Ford,
Alfred Brown,
John Lloyd,
Manuel G. Catalano,
Matteo Bianchi,
Benjamin Ward-Cherrier
Abstract:
In this work, we report on the integrated sensorimotor control of the Pisa/IIT SoftHand, an anthropomorphic soft robot hand designed around the principle of adaptive synergies, with the BRL tactile fingertip (TacTip), a soft biomimetic optical tactile sensor based on the human sense of touch. Our focus is how a sense of touch can be used to control an anthropomorphic hand with one degree of actuat…
▽ More
In this work, we report on the integrated sensorimotor control of the Pisa/IIT SoftHand, an anthropomorphic soft robot hand designed around the principle of adaptive synergies, with the BRL tactile fingertip (TacTip), a soft biomimetic optical tactile sensor based on the human sense of touch. Our focus is how a sense of touch can be used to control an anthropomorphic hand with one degree of actuation, based on an integration that respects the hand's mechanical functionality. We consider: (i) closed-loop tactile control to establish a light contact on an unknown held object, based on the structural similarity with an undeformed tactile image; and (ii) controlling the estimated pose of an edge feature of a held object, using a convolutional neural network approach developed for controlling other sensors in the TacTip family. Overall, this gives a foundation to endow soft robotic hands with human-like touch, with implications for autonomous grasping, manipulation, human-robot interaction and prosthetics. Supplemental video: https://youtu.be/ndsxj659bkQ
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Authors:
Arsha Nagrani,
Joon Son Chung,
Jaesung Huh,
Andrew Brown,
Ernesto Coto,
Weidi Xie,
Mitchell McLaren,
Douglas A Reynolds,
Andrew Zisserman
Abstract:
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together…
▽ More
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2020. This paper outlines the challenge, and describes the baselines, methods used, and results. We conclude with a discussion of the progress over the first installment of the challenge.
△ Less
Submitted 12 December, 2020;
originally announced December 2020.
-
SophiaPop: Experiments in Human-AI Collaboration on Popular Music
Authors:
David Hanson,
Frankie Storm,
Wenwei Huang,
Vytas Krisciunas,
Tiger Darrow,
Audrey Brown,
Mengna Lei,
Matthew Aylett,
Adam Pickrell,
Sophia the Robot
Abstract:
A diverse team of engineers, artists, and algorithms, collaborated to create songs for SophiaPop, via various neural networks, robotics technologies, and artistic tools, and animated the results on Sophia the Robot, a robotic celebrity and animated character. Sophia is a platform for arts, research, and other uses. To advance the art and technology of Sophia, we combine various AI with a fictional…
▽ More
A diverse team of engineers, artists, and algorithms, collaborated to create songs for SophiaPop, via various neural networks, robotics technologies, and artistic tools, and animated the results on Sophia the Robot, a robotic celebrity and animated character. Sophia is a platform for arts, research, and other uses. To advance the art and technology of Sophia, we combine various AI with a fictional narrative of her burgeoning career as a popstar. Her actual AI-generated pop lyrics, music, and paintings, and animated conversations wherein she interacts with humans real-time in narratives that discuss her experiences. To compose the music, SophiaPop team built corpora from human and AI-generated Sophia character personality content, along with pop music song forms, to train and provide seeds for a number of AI algorithms including expert models, and custom-trained transformer neural networks, which then generated original pop-song lyrics and melodies. Our musicians including Frankie Storm, Adam Pickrell, and Tiger Darrow, then performed interpretations of the AI-generated musical content, including singing and instrumentation. The human-performed singing data then was processed by a neural-network-based Sophia voice, which was custom-trained from human performances by Cereproc. This AI then generated the unique Sophia voice singing of the songs. Then we animated Sophia to sing the songs in music videos, using a variety of animation generators and human-generated animations. Being algorithms and humans, working together, SophiaPop represents a human-AI collaboration, aspiring toward human AI symbiosis. We believe that such a creative convergence of multiple disciplines with humans and AI working together, can make AI relevant to human culture in new and exciting ways, and lead to a hopeful vision for the future of human-AI relations.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Towards a perceptual distance metric for auditory stimuli
Authors:
Sarah Oh,
Elijah FW Bowen,
Antonio Rodriguez,
Damian Sowinski,
Eva Childers,
Annemarie Brown,
Laura Ray,
Richard Granger
Abstract:
Although perceptual (dis)similarity between sensory stimuli seems akin to distance, measuring the Euclidean distance between vector representations of auditory stimuli is a poor estimator of subjective dissimilarity. In hearing, nonlinear response patterns, interactions between stimulus components, temporal effects, and top-down modulation transform the information contained in incoming frequency-…
▽ More
Although perceptual (dis)similarity between sensory stimuli seems akin to distance, measuring the Euclidean distance between vector representations of auditory stimuli is a poor estimator of subjective dissimilarity. In hearing, nonlinear response patterns, interactions between stimulus components, temporal effects, and top-down modulation transform the information contained in incoming frequency-domain stimuli in a way that seems to preserve some notion of distance, but not that of familiar Euclidean space. This work proposes that transformations applied to auditory stimuli during hearing can be modeled as a function mapping stimulus points to their representations in a perceptual space, inducing a Riemannian distance metric. A dataset was collected in a subjective listening experiment, the results of which were used to explore approaches (biologically inspired, data-driven, and combinations thereof) to approximating the perceptual map. Each of the proposed measures achieved comparable or stronger correlations with subjective ratings (r ~ 0.8) compared to state-of-the-art audio quality measures.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Playing a Part: Speaker Verification at the Movies
Authors:
Andrew Brown,
Jaesung Huh,
Arsha Nagrani,
Joon Son Chung,
Andrew Zisserman
Abstract:
The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies con…
▽ More
The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies contains utterances with varying emotion, accents and background noise, and therefore comprises an entirely different domain to the interview-style, emotionally calm utterances in current speaker recognition datasets such as VoxCeleb; (ii) We provide a number of domain adaptation evaluation sets, and benchmark the performance of state-of-the-art speaker recognition models on these evaluation pairs. We demonstrate that both speaker verification and identification performance drops steeply on this new data, showing the challenge in transferring models across domains; and finally (iii) We show that simple domain adaptation paradigms improve performance, but there is still large room for improvement.
△ Less
Submitted 11 February, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
Authors:
Duncan A. Brown,
Karan Vahi,
Michela Taufer,
Von Welch,
Ewa Deelman
Abstract:
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has…
▽ More
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has not yet been an attempt to directly reproduce the results, although several analyses have replicated the analysis, confirming the detection. We attempt to reproduce the result presented in the GW150914 discovery paper using publicly available code on the Open Science Grid. We show that we can reproduce the main result but we cannot exactly reproduce the LIGO analysis as the original data set used is not public. We discuss the challenges we encountered and make recommendations for scientists who wish to make their work reproducible.
△ Less
Submitted 2 March, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
Authors:
Andrew Brown,
Weidi Xie,
Vicky Kalogeiton,
Andrew Zisserman
Abstract:
Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end t…
▽ More
Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses. We apply Smooth-AP to standard retrieval benchmarks: Stanford Online products and VehicleID, and also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, and VGGFace2 and IJB-C for face retrieval. In all cases, we improve the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.
△ Less
Submitted 8 September, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Using Tabu Search Algorithm for Map Generation in the Terra Mystica Tabletop Game
Authors:
Alexandr Grichshenko,
Luiz Jonata Pires de Araujo,
Susanna Gimaeva,
Joseph Alexander Brown
Abstract:
Tabu Search (TS) metaheuristic improves simple local search algorithms (e.g. steepest ascend hill-climbing) by enabling the algorithm to escape local optima points. It has shown to be useful for addressing several combinatorial optimization problems. This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedur…
▽ More
Tabu Search (TS) metaheuristic improves simple local search algorithms (e.g. steepest ascend hill-climbing) by enabling the algorithm to escape local optima points. It has shown to be useful for addressing several combinatorial optimization problems. This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedural content generation, specifically the generation of maps for a popular tabletop game called Terra Mystica. The results validate the feasibility of the proposed method and how it can be used to generate maps that improve existing maps for the game.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Authors:
Max Bain,
Arsha Nagrani,
Andrew Brown,
Andrew Zisserman
Abstract:
Our objective in this work is long range understanding of the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movies Dataset (CMD) consisting of the key scenes from over 3K movies: each…
▽ More
Our objective in this work is long range understanding of the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movies Dataset (CMD) consisting of the key scenes from over 3K movies: each key scene is accompanied by a high level semantic description of the scene, character face-tracks, and metadata about the movie. The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use. It is also an order of magnitude larger than existing movie datasets in the number of movies; (ii) We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding; and finally (iii) We demonstrate how the addition of context from other video clips improves retrieval performance.
△ Less
Submitted 22 October, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Automated Inline Analysis of Myocardial Perfusion MRI with Deep Learning
Authors:
Hui Xue,
Rhodri Davies,
Louis AE Brown,
Kristopher D Knott,
Tushar Kotecha,
Marianna Fontana,
Sven Plein,
James C Moon,
Peter Kellman
Abstract:
Recent development of quantitative myocardial blood flow (MBF) mapping allows direct evaluation of absolute myocardial perfusion, by computing pixel-wise flow maps. Clinical studies suggest quantitative evaluation would be more desirable for objectivity and efficiency. Objective assessment can be further facilitated by segmenting the myocardium and automatically generating reports following the AH…
▽ More
Recent development of quantitative myocardial blood flow (MBF) mapping allows direct evaluation of absolute myocardial perfusion, by computing pixel-wise flow maps. Clinical studies suggest quantitative evaluation would be more desirable for objectivity and efficiency. Objective assessment can be further facilitated by segmenting the myocardium and automatically generating reports following the AHA model. This will free user interaction for analysis and lead to a 'one-click' solution to improve workflow. This paper proposes a deep neural network based computational workflow for inline myocardial perfusion analysis. Adenosine stress and rest perfusion scans were acquired from three hospitals. Training set included N=1,825 perfusion series from 1,034 patients. Independent test set included 200 scans from 105 patients. Data were consecutively acquired at each site. A convolution neural net (CNN) model was trained to provide segmentation for LV cavity, myocardium and right ventricular by processing incoming 2D+T perfusion Gd series. Model outputs were compared to manual ground-truth for accuracy of segmentation and flow measures derived on global and per-sector basis. The trained models were integrated onto MR scanners for effective inference. Segmentation accuracy and myocardial flow measures were compared between CNN models and manual ground-truth. The mean Dice ratio of CNN derived myocardium was 0.93 +/- 0.04. Both global flow and per-sector values showed no significant difference, compared to manual results. The AHA 16 segment model was automatically generated and reported on the MR scanner. As a result, the fully automated analysis of perfusion flow mapping was achieved. This solution was integrated on the MR scanner, enabling 'one-click' analysis and reporting of myocardial blood flow.
△ Less
Submitted 29 May, 2020; v1 submitted 1 November, 2019;
originally announced November 2019.
-
4-Connected Shift Residual Networks
Authors:
Andrew Brown,
Pascal Mettes,
Marcel Worring
Abstract:
The shift operation was recently introduced as an alternative to spatial convolutions. The operation moves subsets of activations horizontally and/or vertically. Spatial convolutions are then replaced with shift operations followed by point-wise convolutions, significantly reducing computational costs. In this work, we investigate how shifts should best be applied to high accuracy CNNs. We apply s…
▽ More
The shift operation was recently introduced as an alternative to spatial convolutions. The operation moves subsets of activations horizontally and/or vertically. Spatial convolutions are then replaced with shift operations followed by point-wise convolutions, significantly reducing computational costs. In this work, we investigate how shifts should best be applied to high accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet on ImageNet: the originally introduced 8-connected (8C) neighbourhood shift and the less well studied 4-connected (4C) neighbourhood shift. We find that when replacing ResNet's spatial convolutions with shifts, both shift neighbourhoods give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods for large images. Interestingly, when incorporating shifts to all point-wise convolutions in residual networks, 4-connected shifts outperform 8-connected shifts. Such a 4-connected shift setup gives the same accuracy as full residual networks while reducing the number of parameters and FLOPs by over 40%. We then highlight that without spatial convolutions, ResNet's downsampling/upsampling bottleneck channel structure is no longer needed. We show a new, 4C shift-based residual network, much shorter than the original ResNet yet with a higher accuracy for the same computational cost. This network is the highest accuracy shift-based network yet shown, demonstrating the potential of shifting in deep neural networks.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.