Search | arXiv e-print repository

CMA-ES for Discrete and Mixed-Variable Optimization on Sets of Points

Authors: Kento Uchida, Ryoki Hamano, Masahiro Nomura, Shota Saito, Shinichi Shirakawa

Abstract: Discrete and mixed-variable optimization problems have appeared in several real-world applications. Most of the research on mixed-variable optimization considers a mixture of integer and continuous variables, and several integer handlings have been developed to inherit the optimization performance of the continuous optimization methods to mixed-integer optimization. In some applications, acceptabl… ▽ More Discrete and mixed-variable optimization problems have appeared in several real-world applications. Most of the research on mixed-variable optimization considers a mixture of integer and continuous variables, and several integer handlings have been developed to inherit the optimization performance of the continuous optimization methods to mixed-integer optimization. In some applications, acceptable solutions are given by selecting possible points in the disjoint subspaces. This paper focuses on the optimization on sets of points and proposes an optimization method by extending the covariance matrix adaptation evolution strategy (CMA-ES), termed the CMA-ES on sets of points (CMA-ES-SoP). The CMA-ES-SoP incorporates margin correction that maintains the generation probability of neighboring points to prevent premature convergence to a specific non-optimal point, which is an effective integer-handling technique for CMA-ES. In addition, because margin correction with a fixed margin value tends to increase the marginal probabilities for a portion of neighboring points more than necessary, the CMA-ES-SoP updates the target margin value adaptively to make the average of the marginal probabilities close to a predefined target probability. Numerical simulations demonstrated that the CMA-ES-SoP successfully optimized the optimization problems on sets of points, whereas the naive CMA-ES failed to optimize them due to premature convergence. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: This paper has been accepted for presentation at PPSN2024

arXiv:2408.11202 [pdf, other]

Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

Authors: Tatsuhiro Shimizu, Koichi Tanaka, Ren Kishimoto, Haruka Kiyohara, Masahiro Nomura, Yuta Saito

Abstract: We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB r… ▽ More We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB remains unexplored in the relevant literature. Typical OPE/L methods such as regression and importance sampling can be applied to the CCB problem, however, they face significant challenges due to high bias or variance, exacerbated by the exponential growth in the number of available subsets. To address these challenges, we introduce a concept of factored action space, which allows us to decompose each subset into binary indicators. This formulation allows us to distinguish between the ''main effect'' derived from the main actions, and the ''residual effect'', originating from the supplemental actions, facilitating more effective OPE. Specifically, our estimator, called OPCB, leverages an importance sampling-based approach to unbiasedly estimate the main effect, while employing regression-based approach to deal with the residual effect with low variance. OPCB achieves substantial variance reduction compared to conventional importance sampling methods and bias reduction relative to regression methods under certain conditions, as illustrated in our theoretical analysis. Experiments demonstrate OPCB's superior performance over typical methods in both OPE and OPL. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: accepted at RecSys2024

arXiv:2406.16506 [pdf, other]

Natural Gradient Interpretation of Rank-One Update in CMA-ES

Authors: Ryoki Hamano, Shinichi Shirakawa, Masahiro Nomura

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is a stochastic search algorithm using a multivariate normal distribution for continuous black-box optimization. In addition to strong empirical results, part of the CMA-ES can be described by a stochastic natural gradient method and can be derived from information geometric optimization (IGO) framework. However, there are some component… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is a stochastic search algorithm using a multivariate normal distribution for continuous black-box optimization. In addition to strong empirical results, part of the CMA-ES can be described by a stochastic natural gradient method and can be derived from information geometric optimization (IGO) framework. However, there are some components of the CMA-ES, such as the rank-one update, for which the theoretical understanding is limited. While the rank-one update makes the covariance matrix to increase the likelihood of generating a solution in the direction of the evolution path, this idea has been difficult to formulate and interpret as a natural gradient method unlike the rank-$μ$ update. In this work, we provide a new interpretation of the rank-one update in the CMA-ES from the perspective of the natural gradient with prior distribution. First, we propose maximum a posteriori IGO (MAP-IGO), which is the IGO framework extended to incorporate a prior distribution. Then, we derive the rank-one update from the MAP-IGO by setting the prior distribution based on the idea that the promising mean vector should exist in the direction of the evolution path. Moreover, the newly derived rank-one update is extensible, where an additional term appears in the update for the mean vector. We empirically investigate the properties of the additional term using various benchmark functions. △ Less

Submitted 9 August, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for presentation at PPSN2024. Updated experimental results and related sentences in Sections 6 and 7

arXiv:2405.10534 [pdf, other]

CMA-ES for Safe Optimization

Authors: Kento Uchida, Ryoki Hamano, Masahiro Nomura, Shota Saito, Shinichi Shirakawa

Abstract: In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe soluti… ▽ More In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe solutions. A few studies have proposed the optimization methods for safe optimization based on Bayesian optimization and the evolutionary algorithm. However, Bayesian optimization-based methods often struggle to achieve superior solutions, and the evolutionary algorithm-based method fails to effectively reduce unsafe evaluations. This study focuses on CMA-ES as an efficient evolutionary algorithm and proposes an optimization method termed safe CMA-ES. The safe CMA-ES is designed to achieve both safety and efficiency in safe optimization. The safe CMA-ES estimates the Lipschitz constants of safety functions transformed with the distribution parameters using the maximum norm of the gradient in Gaussian process regression. Subsequently, the safe CMA-ES projects the samples to the nearest point in the safe region constructed with the estimated Lipschitz constants. The numerical simulation using the benchmark functions shows that the safe CMA-ES successfully performs optimization, suppressing the unsafe evaluations, while the existing methods struggle to significantly reduce the unsafe evaluations. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: This paper has been accepted as a full paper at GECCO2024

arXiv:2405.09962 [pdf, other]

doi 10.1145/3638529.3654198

CatCMA : Stochastic Optimization for Mixed-Category Problems

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Kento Uchida, Shinichi Shirakawa

Abstract: Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-categor… ▽ More Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms. △ Less

Submitted 7 August, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: This paper has been accepted for presentation at GECCO2024. We have modified the equations in Table 1

arXiv:2404.15084 [pdf, other]

Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It

Authors: Yuta Saito, Masahiro Nomura

Abstract: There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in developing estimators aimed at accurately estimating the effectiveness of counterfactual policies based on biased logged data. However, there are many cases where those estimators are used not only to evaluate the value of d… ▽ More There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in developing estimators aimed at accurately estimating the effectiveness of counterfactual policies based on biased logged data. However, there are many cases where those estimators are used not only to evaluate the value of decision making policies but also to search for the best hyperparameters from a large candidate space. This work explores the latter hyperparameter optimization (HPO) task for off-policy learning. We empirically show that naively applying an unbiased estimator of the generalization performance as a surrogate objective in HPO can cause an unexpected failure, merely pursuing hyperparameters whose generalization performance is greatly overestimated. We then propose simple and computationally efficient corrections to the typical HPO procedure to deal with the aforementioned issues simultaneously. Empirical investigations demonstrate the effectiveness of our proposed HPO algorithm in situations where the typical procedure fails severely. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: IJCAI'24

arXiv:2402.02171 [pdf, other]

doi 10.1145/3589334.3645343

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

Authors: Haruka Kiyohara, Masahiro Nomura, Yuta Saito

Abstract: We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, however, the typical Inverse Propensity Scoring (IPS) estimator suffers from substantial variance due to large action spaces, making effective OPE a si… ▽ More We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, however, the typical Inverse Propensity Scoring (IPS) estimator suffers from substantial variance due to large action spaces, making effective OPE a significant challenge. The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assuming linearity in the reward function, but this can result in significant bias as this assumption is hard-to-verify from observed data and is often substantially violated. To address the limitations of previous estimators, we develop a novel estimator for OPE of slate bandits, called Latent IPS (LIPS), which defines importance weights in a low-dimensional slate abstraction space where we optimize slate abstractions to minimize the bias and variance of LIPS in a data-driven way. By doing so, LIPS can substantially reduce the variance of IPS without imposing restrictive assumptions on the reward function structure like linearity. Through empirical evaluation, we demonstrate that LIPS substantially outperforms existing estimators, particularly in scenarios with non-linear rewards and large slate spaces. △ Less

Submitted 17 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: WWW2024

arXiv:2402.01373 [pdf, other]

cmaes : A Simple yet Practical Python Library for CMA-ES

Authors: Masahiro Nomura, Masashi Shibata

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) has been highly effective in black-box continuous optimization, as demonstrated by its success in both benchmark problems and various real-world applications. To address the need for an accessible yet potent tool in this domain, we developed cmaes, a simple and practical Python library for CMA-ES. cmaes is characterized by its simplicity… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) has been highly effective in black-box continuous optimization, as demonstrated by its success in both benchmark problems and various real-world applications. To address the need for an accessible yet potent tool in this domain, we developed cmaes, a simple and practical Python library for CMA-ES. cmaes is characterized by its simplicity, offering intuitive use and high code readability. This makes it suitable for quickly using CMA-ES, as well as for educational purposes and seamless integration into other libraries. Despite its simplistic design, cmaes maintains enhanced functionality. It incorporates recent advancements in CMA-ES, such as learning rate adaptation for challenging scenarios, transfer learning, and mixed-integer optimization capabilities. These advanced features are accessible through a user-friendly API, ensuring that cmaes can be easily adopted in practical applications. We regard cmaes as the first choice for a Python CMA-ES library among practitioners. The software is available under the MIT license at https://github.com/CyberAgentAILab/cmaes. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.15876 [pdf, other]

CMA-ES with Learning Rate Adaptation

Authors: Masahiro Nomura, Youhei Akimoto, Isao Ono

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or n… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or noisy problems. This study comprehensively explores the impact of learning rate on the CMA-ES performance and demonstrates the necessity of a small learning rate by considering ordinary differential equations. Thereafter, it discusses the setting of an ideal learning rate. Based on these discussions, we develop a novel learning rate adaptation mechanism for the CMA-ES that maintains a constant signal-to-noise ratio. Additionally, we investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate and with population size adaptation. The results show that the CMA-ES with the proposed learning rate adaptation works well for multimodal and/or noisy problems without extremely expensive learning rate tuning. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: Under review for ACM TELO

arXiv:2305.00849 [pdf, other]

(1+1)-CMA-ES with Margin for Discrete and Mixed-Integer Problems

Authors: Yohei Watanabe, Kento Uchida, Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in mixed-integer optimization problems, the CMA-ES with margin has been proposed. The CMA-ES with margin prevents the premature convergence of discrete variables by the margin correction, in which the distribution parameters are modified to leave the generation probability for changing the discrete variable. The margin correction has been applied to ($μ/μ_\mathrm{w}$,$λ$)-CMA-ES, while this paper introduces the margin correction into (1+1)-CMA-ES, an elitist version of CMA-ES. The (1+1)-CMA-ES is often advantageous for unimodal functions and can be computationally less expensive. To tackle the performance deterioration on mixed-integer optimization, we use the discretized elitist solution as the mean of the sampling distribution and modify the margin correction not to move the elitist solution. The numerical simulation using benchmark functions on mixed-integer, integer, and binary domains shows that (1+1)-CMA-ES with margin outperforms the CMA-ES with margin and is better than or comparable with several specialized methods to a particular search domain. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.03473 [pdf, other]

doi 10.1145/3583131.3590358

CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?

Authors: Masahiro Nomura, Youhei Akimoto, Isao Ono

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving black-box continuous optimization problems. One practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact, especially for difficult tasks such as solving multimodal or noisy problems.… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving black-box continuous optimization problems. One practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact, especially for difficult tasks such as solving multimodal or noisy problems. In this study, we investigate whether the CMA-ES with default population size can solve multimodal and noisy problems. To perform this investigation, we develop a novel learning rate adaptation mechanism for the CMA-ES, such that the learning rate is adapted so as to maintain a constant signal-to-noise ratio. We investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate. The results demonstrate that, when the proposed learning rate adaptation is used, the CMA-ES with default population size works well on multimodal and/or noisy problems, without the need for extremely expensive learning rate tuning. △ Less

Submitted 14 September, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: Nominated for the best paper of GECCO'23 ENUM Track. We have corrected the error of Eq.(7)

arXiv:2302.01513 [pdf, other]

Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes

Authors: Shion Takeno, Masahiro Nomura, Masayuki Karasuyama

Abstract: We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. The most widely used approach for preferential BO is Gaussi… ▽ More We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. The most widely used approach for preferential BO is Gaussian approximation, which ignores the skewness of the true posterior. Alternatively, Markov chain Monte Carlo (MCMC) based preferential BO is also proposed. In this work, we first verify the accuracy of Gaussian approximation, from which we reveal the critical problem that the predictive probability of duels can be inaccurate. This observation motivates us to improve the MCMC-based estimation for skew GP, for which we show the practical efficiency of Gibbs sampling and derive the low variance MC estimator. However, the computational time of MCMC can still be a bottleneck in practice. Towards building a more practical preferential BO, we develop a new method that achieves both high computational efficiency and low sample complexity, and then demonstrate its effectiveness through extensive numerical experiments. △ Less

Submitted 11 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: 25 pages, 9 figures, Accepted to ICML2023

arXiv:2212.09260 [pdf, other]

Marginal Probability-Based Integer Handling for CMA-ES Tackling Single-and Multi-Objective Mixed-Integer Black-Box Optimization

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean a… ▽ More This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean and (co)variance, are updated based on the evaluation value of candidate solutions in the CMA-ES. If the CMA-ES is applied to the MI-BBO with straightforward discretization, however, the variance corresponding to the integer variables becomes much smaller than the granularity of the discretization before reaching the optimal solution, which leads to the stagnation of the optimization. In particular, when binary variables are included in the problem, this stagnation more likely occurs because the granularity of the discretization becomes wider, and the existing integer handling for the CMA-ES does not address this stagnation. To overcome these limitations, we propose a simple integer handling for the CMA-ES based on lower-bounding the marginal probabilities associated with the generation of integer variables in the MGD. The numerical experiments on the MI-BBO benchmark problems demonstrate the efficiency and robustness of the proposed method. Furthermore, in order to demonstrate the generality of the idea of the proposed method, in addition to the single-objective optimization case, we incorporate it into multi-objective CMA-ES and verify its performance on bi-objective mixed-integer benchmark problems. △ Less

Submitted 11 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Camera-ready version for ACM Transactions on Evolutionary Learning and Optimization (TELO). This paper is an extended version of the work presented in arXiv:2205.13482

arXiv:2205.13482 [pdf, other]

doi 10.1145/3512290.3528827

CMA-ES with Margin: Lower-Bounding Marginal Probability for Mixed-Integer Black-Box Optimization

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean a… ▽ More This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean and (co)variance, are updated based on the evaluation value of candidate solutions in the CMA-ES. If the CMA-ES is applied to the MI-BBO with straightforward discretization, however, the variance corresponding to the integer variables becomes much smaller than the granularity of the discretization before reaching the optimal solution, which leads to the stagnation of the optimization. In particular, when binary variables are included in the problem, this stagnation more likely occurs because the granularity of the discretization becomes wider, and the existing modification to the CMA-ES does not address this stagnation. To overcome these limitations, we propose a simple modification of the CMA-ES based on lower-bounding the marginal probabilities associated with the generation of integer variables in the MGD. The numerical experiments on the MI-BBO benchmark problems demonstrate the efficiency and robustness of the proposed method. △ Less

Submitted 12 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Nominated for the best paper of GECCO'22 ENUM Track. We have corrected the error of Algorithm 1 in the Appendix. In addition, an extended version is published at arXiv:2212.09260 that describes support for the multi-objective MI-BBO

arXiv:2201.11422 [pdf, other]

Fast Moving Natural Evolution Strategy for High-Dimensional Problems

Authors: Masahiro Nomura, Isao Ono

Abstract: In this work, we propose a new variant of natural evolution strategies (NES) for high-dimensional black-box optimization problems. The proposed method, CR-FM-NES, extends a recently proposed state-of-the-art NES, Fast Moving Natural Evolution Strategy (FM-NES), in order to be applicable in high-dimensional problems. CR-FM-NES builds on an idea using a restricted representation of a covariance matr… ▽ More In this work, we propose a new variant of natural evolution strategies (NES) for high-dimensional black-box optimization problems. The proposed method, CR-FM-NES, extends a recently proposed state-of-the-art NES, Fast Moving Natural Evolution Strategy (FM-NES), in order to be applicable in high-dimensional problems. CR-FM-NES builds on an idea using a restricted representation of a covariance matrix instead of using a full covariance matrix, while inheriting an efficiency of FM-NES. The restricted representation of the covariance matrix enables CR-FM-NES to update parameters of a multivariate normal distribution in linear time and space complexity, which can be applied to high-dimensional problems. Our experimental results reveal that CR-FM-NES does not lose the efficiency of FM-NES, and on the contrary, CR-FM-NES has achieved significant speedup compared to FM-NES on some benchmark problems. Furthermore, our numerical experiments using 200, 600, and 1000-dimensional benchmark problems demonstrate that CR-FM-NES is effective over scalable baseline methods, VD-CMA and Sep-CMA. △ Less

Submitted 8 May, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: Accepted for CEC 2022

arXiv:2201.04469 [pdf, other]

Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap

Authors: Masahiro Kato, Kaito Ariu, Masaaki Imaizumi, Masahiro Nomura, Chao Qin

Abstract: We consider fixed-budget best-arm identification in two-armed Gaussian bandit problems. One of the longstanding open questions is the existence of an optimal strategy under which the probability of misidentification matches a lower bound. We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small. First,… ▽ More We consider fixed-budget best-arm identification in two-armed Gaussian bandit problems. One of the longstanding open questions is the existence of an optimal strategy under which the probability of misidentification matches a lower bound. We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small. First, we review a lower bound derived by Kaufmann et al. (2016). Then, we propose the "Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW)" strategy, which consists of the sampling rule using the Neyman allocation with an estimated standard deviation and the recommendation rule using an AIPW estimator. Our proposed strategy is optimal because the upper bound matches the lower bound when the budget goes to infinity and the gap goes to zero. △ Less

Submitted 28 December, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

arXiv:2112.10680 [pdf, other]

Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies

Authors: Masahiro Nomura, Isao Ono

Abstract: Natural Evolution Strategies (NES) is a promising framework for black-box continuous optimization problems. NES optimizes the parameters of a probability distribution based on the estimated natural gradient, and one of the key parameters affecting the performance is the learning rate. We argue that from the viewpoint of the natural gradient method, the learning rate should be determined according… ▽ More Natural Evolution Strategies (NES) is a promising framework for black-box continuous optimization problems. NES optimizes the parameters of a probability distribution based on the estimated natural gradient, and one of the key parameters affecting the performance is the learning rate. We argue that from the viewpoint of the natural gradient method, the learning rate should be determined according to the estimation accuracy of the natural gradient. To do so, we propose a new learning rate adaptation mechanism for NES. The proposed mechanism makes it possible to set a high learning rate for problems that are relatively easy to optimize, which results in speeding up the search. On the other hand, in problems that are difficult to optimize (e.g., multimodal functions), the proposed mechanism makes it possible to set a conservative learning rate when the estimation accuracy of the natural gradient seems to be low, which results in the robust and stable search. The experimental evaluations on unimodal and multimodal functions demonstrate that the proposed mechanism works properly depending on a search situation and is effective over the existing method, i.e., using the fixed learning rate. △ Less

Submitted 4 February, 2022; v1 submitted 22 November, 2021; originally announced December 2021.

Comments: accepted at EvoApplications2022

arXiv:2108.09455 [pdf, other]

Natural Evolution Strategy for Unconstrained and Implicitly Constrained Problems with Ridge Structure

Authors: Masahiro Nomura, Isao Ono

Abstract: In this paper, we propose a new natural evolution strategy for unconstrained black-box function optimization (BBFO) problems and implicitly constrained BBFO problems. BBFO problems are known to be difficult because explicit representations of objective functions are not available. Implicit constraints make the problems more difficult because whether or not a solution is feasible is revealed when t… ▽ More In this paper, we propose a new natural evolution strategy for unconstrained black-box function optimization (BBFO) problems and implicitly constrained BBFO problems. BBFO problems are known to be difficult because explicit representations of objective functions are not available. Implicit constraints make the problems more difficult because whether or not a solution is feasible is revealed when the solution is evaluated with the objective function. DX-NES-IC is one of the promising methods for implicitly constrained BBFO problems. DX-NES-IC has shown better performance than conventional methods on implicitly constrained benchmark problems. However, DX-NES-IC has a problem in that the moving speed of the probability distribution is slow on ridge structure. To address the problem, we propose the Fast Moving Natural Evolution Strategy (FM-NES) that accelerates the movement of the probability distribution on ridge structure by introducing the rank-one update into DX-NES-IC. The rank-one update is utilized in CMA-ES. Since naively introducing the rank-one update makes the search performance deteriorate on implicitly constrained problems, we propose a condition of performing the rank-one update. We also propose to reset the shape of the probability distribution when an infeasible solution is sampled at the first time. In numerical experiments using unconstrained and implicitly constrained benchmark problems, FM-NES showed better performance than DX-NES-IC on problems with ridge structure and almost the same performance as DX-NES-IC on the others. Furthermore, FM-NES outperformed xNES, CMA-ES, xNES with the resampling technique, and CMA-ES with the resampling technique. △ Less

Submitted 16 October, 2021; v1 submitted 21 August, 2021; originally announced August 2021.

Comments: accepted at IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2021)

arXiv:2012.06932 [pdf, other]

Warm Starting CMA-ES for Hyperparameter Optimization

Authors: Masahiro Nomura, Shuhei Watanabe, Youhei Akimoto, Yoshihiko Ozaki, Masaki Onishi

Abstract: Hyperparameter optimization (HPO), formulated as black-box optimization (BBO), is recognized as essential for automation and high performance of machine learning approaches. The CMA-ES is a promising BBO approach with a high degree of parallelism, and has been applied to HPO tasks, often under parallel implementation, and shown superior performance to other approaches including Bayesian optimizati… ▽ More Hyperparameter optimization (HPO), formulated as black-box optimization (BBO), is recognized as essential for automation and high performance of machine learning approaches. The CMA-ES is a promising BBO approach with a high degree of parallelism, and has been applied to HPO tasks, often under parallel implementation, and shown superior performance to other approaches including Bayesian optimization (BO). However, if the budget of hyperparameter evaluations is severely limited, which is often the case for end users who do not deserve parallel computing, the CMA-ES exhausts the budget without improving the performance due to its long adaptation phase, resulting in being outperformed by BO approaches. To address this issue, we propose to transfer prior knowledge on similar HPO tasks through the initialization of the CMA-ES, leading to significantly shortening the adaptation time. The knowledge transfer is designed based on the novel definition of task similarity, with which the correlation of the performance of the proposed approach is confirmed on synthetic problems. The proposed warm starting CMA-ES, called WS-CMA-ES, is applied to different HPO tasks where some prior knowledge is available, showing its superior performance over the original CMA-ES as well as BO approaches with or without using the prior knowledge. △ Less

Submitted 12 December, 2020; originally announced December 2020.

Comments: accepted at AAAI2021

arXiv:2006.13600 [pdf, other]

Simple and Scalable Parallelized Bayesian Optimization

Authors: Masahiro Nomura

Abstract: In recent years, leveraging parallel and distributed computational resources has become essential to solve problems of high computational cost. Bayesian optimization (BO) has shown attractive results in those expensive-to-evaluate problems such as hyperparameter optimization of machine learning algorithms. While many parallel BO methods have been developed to search efficiently utilizing these com… ▽ More In recent years, leveraging parallel and distributed computational resources has become essential to solve problems of high computational cost. Bayesian optimization (BO) has shown attractive results in those expensive-to-evaluate problems such as hyperparameter optimization of machine learning algorithms. While many parallel BO methods have been developed to search efficiently utilizing these computational resources, these methods assumed synchronous settings or were not scalable. In this paper, we propose a simple and scalable BO method for asynchronous parallel settings. Experiments are carried out with a benchmark function and hyperparameter optimization of multi-layer perceptrons, which demonstrate the promising performance of the proposed method. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: accepted to the NewInML forum (co-located with NeurIPS 2019)

arXiv:2006.10600 [pdf, other]

doi 10.1145/3459637.3482336

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Authors: Masahiro Nomura, Yuta Saito

Abstract: A typical assumption in supervised machine learning is that the train (source) and test (target) datasets follow completely the same distribution. This assumption is, however, often violated in uncertain real-world applications, which motivates the study of learning under covariate shift. In this setting, the naive use of adaptive hyperparameter optimization methods such as Bayesian optimization d… ▽ More A typical assumption in supervised machine learning is that the train (source) and test (target) datasets follow completely the same distribution. This assumption is, however, often violated in uncertain real-world applications, which motivates the study of learning under covariate shift. In this setting, the naive use of adaptive hyperparameter optimization methods such as Bayesian optimization does not work as desired since it does not address the distributional shift among different datasets. In this work, we consider a novel hyperparameter optimization problem under the multi-source covariate shift whose goal is to find the optimal hyperparameters for a target task of interest using only unlabeled data in a target task and labeled data in multiple source tasks. To conduct efficient hyperparameter optimization for the target task, it is essential to estimate the target objective using only the available information. To this end, we construct the variance reduced estimator that unbiasedly approximates the target objective with a desirable variance property. Building on the proposed estimator, we provide a general and tractable hyperparameter optimization procedure, which works preferably in our setting with a no-regret guarantee. The experiments demonstrate that the proposed framework broadens the applications of automated hyperparameter optimization. △ Less

Submitted 16 August, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: equal contribution

arXiv:1911.07790 [pdf, ps, other]

A Simple Heuristic for Bayesian Optimization with A Low Budget

Authors: Masahiro Nomura, Kenshi Abe

Abstract: The aim of black-box optimization is to optimize an objective function within the constraints of a given evaluation budget. In this problem, it is generally assumed that the computational cost for evaluating a point is large; thus, it is important to search efficiently with as low budget as possible. Bayesian optimization is an efficient method for black-box optimization and provides exploration-e… ▽ More The aim of black-box optimization is to optimize an objective function within the constraints of a given evaluation budget. In this problem, it is generally assumed that the computational cost for evaluating a point is large; thus, it is important to search efficiently with as low budget as possible. Bayesian optimization is an efficient method for black-box optimization and provides exploration-exploitation trade-off by constructing a surrogate model that considers uncertainty of the objective function. However, because Bayesian optimization should construct the surrogate model for the entire search space, it does not exhibit good performance when points are not sampled sufficiently. In this study, we develop a heuristic method refining the search space for Bayesian optimization when the available evaluation budget is low. The proposed method refines a promising region by dividing the original region so that Bayesian optimization can be executed with the promising region as the initial search space. We confirm that Bayesian optimization with the proposed method outperforms Bayesian optimization alone and shows equal or better performance to two search-space division algorithms through experiments on the benchmark functions and the hyperparameter optimization of machine learning algorithms. △ Less

Submitted 30 November, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

arXiv:1910.07295 [pdf, other]

Towards Resolving Propensity Contradiction in Offline Recommender Learning

Authors: Yuta Saito, Masahiro Nomura

Abstract: We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-co… ▽ More We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data. △ Less

Submitted 20 April, 2022; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: IJCAI2022

Showing 1–23 of 23 results for author: Nomura, M