-
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Authors:
Ben Harwood,
Amir Dezfouli,
Iadine Chades,
Conrad Sanderson
Abstract:
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically…
▽ More
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.
△ Less
Submitted 5 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators
Authors:
Tom Blau,
Iadine Chades,
Amir Dezfouli,
Daniel Steinberg,
Edwin V. Bonilla
Abstract:
Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distributi…
▽ More
Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.
△ Less
Submitted 4 February, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Bayesian Optimisation for Mixed-Variable Inputs using Value Proposals
Authors:
Yan Zuo,
Amir Dezfouli,
Iadine Chades,
David Alexander,
Benjamin Ward Muir
Abstract:
Many real-world optimisation problems are defined over both categorical and continuous variables, yet efficient optimisation methods such asBayesian Optimisation (BO) are not designed tohandle such mixed-variable search spaces. Recent approaches to this problem cast the selection of the categorical variables as a bandit problem, operating independently alongside a BO component which optimises the…
▽ More
Many real-world optimisation problems are defined over both categorical and continuous variables, yet efficient optimisation methods such asBayesian Optimisation (BO) are not designed tohandle such mixed-variable search spaces. Recent approaches to this problem cast the selection of the categorical variables as a bandit problem, operating independently alongside a BO component which optimises the continuous variables. In this paper, we adopt a holistic view and aim to consolidate optimisation of the categorical and continuous sub-spaces under a single acquisition metric. We derive candidates from the ExpectedImprovement criterion, which we call value proposals, and use these proposals to make selections on both the categorical and continuous components of the input. We show that this unified approach significantly outperforms existing mixed-variable optimisation approaches across several mixed-variable black-box optimisation tasks.
△ Less
Submitted 16 February, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Optimizing Sequential Experimental Design with Deep Reinforcement Learning
Authors:
Tom Blau,
Edwin V. Bonilla,
Iadine Chades,
Amir Dezfouli
Abstract:
Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches practical, by training a parameterized policy that proposes designs efficiently at deployment time. However, these methods may not sufficiently explore the design…
▽ More
Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches practical, by training a parameterized policy that proposes designs efficiently at deployment time. However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). We solve the equivalent MDP with modern deep reinforcement learning techniques. Our experiments show that our approach is also computationally efficient at deployment time and exhibits state-of-the-art performance on both continuous and discrete design spaces, even when the probabilistic model is a black box.
△ Less
Submitted 17 June, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
From climate change to pandemics: decision science can help scientists have impact
Authors:
Christopher M. Baker,
Patricia T. Campbell,
Iadine Chades,
Angela J. Dean,
Susan M. Hester,
Matthew H. Holden,
James M. McCaw,
Jodie McVernon,
Robert Moss,
Freya M. Shearer,
Hugh P. Possingham
Abstract:
Scientific knowledge and advances are a cornerstone of modern society. They improve our understanding of the world we live in and help us navigate global challenges including emerging infectious diseases, climate change and the biodiversity crisis. For any scientist, whether they work primarily in fundamental knowledge generation or in the applied sciences, it is important to understand how scienc…
▽ More
Scientific knowledge and advances are a cornerstone of modern society. They improve our understanding of the world we live in and help us navigate global challenges including emerging infectious diseases, climate change and the biodiversity crisis. For any scientist, whether they work primarily in fundamental knowledge generation or in the applied sciences, it is important to understand how science fits into a decision-making framework. Decision science is a field that aims to pinpoint evidence-based management strategies. It provides a framework for scientists to directly impact decisions or to understand how their work will fit into a decision process. Decision science is more than undertaking targeted and relevant scientific research or providing tools to assist policy makers; it is an approach to problem formulation, bringing together mathematical modelling, stakeholder values and logistical constraints to support decision making. In this paper we describe decision science, its use in different contexts, and highlight current gaps in methodology and application. The COVID-19 pandemic has thrust mathematical models into the public spotlight, but it is one of innumerable examples in which modelling informs decision making. Other examples include models of storm systems (eg. cyclones, hurricanes) and climate change. Although the decision timescale in these examples differs enormously (from hours to decades), the underlying decision science approach is common across all problems. Bridging communication gaps between different groups is one of the greatest challenges for scientists. However, by better understanding and engaging with the decision-making processes, scientists will have greater impact and make stronger contributions to important societal problems.
△ Less
Submitted 21 October, 2021; v1 submitted 26 July, 2020;
originally announced July 2020.