Optimization and Control
See recent articles
Showing new listings for Friday, 27 September 2024
- [1] arXiv:2409.17189 [pdf, html, other]
-
Title: Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed NetworksSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.
- [2] arXiv:2409.17320 [pdf, html, other]
-
Title: Accelerating Multi-Block Constrained Optimization Through Learning to OptimizeComments: 15 pages, 2 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Learning to Optimize (L2O) approaches, including algorithm unrolling, plug-and-play methods, and hyperparameter learning, have garnered significant attention and have been successfully applied to the Alternating Direction Method of Multipliers (ADMM) and its variants. However, the natural extension of L2O to multi-block ADMM-type methods remains largely unexplored. Such an extension is critical, as multi-block methods leverage the separable structure of optimization problems, offering substantial reductions in per-iteration complexity. Given that classical multi-block ADMM does not guarantee convergence, the Majorized Proximal Augmented Lagrangian Method (MPALM), which shares a similar form with multi-block ADMM and ensures convergence, is more suitable in this setting. Despite its theoretical advantages, MPALM's performance is highly sensitive to the choice of penalty parameters. To address this limitation, we propose a novel L2O approach that adaptively selects this hyperparameter using supervised learning. We demonstrate the versatility and effectiveness of our method by applying it to the Lasso problem and the optimal transport problem. Our numerical results show that the proposed framework outperforms popular alternatives. Given its applicability to generic linearly constrained composite optimization problems, this work opens the door to a wide range of potential real-world applications.
- [3] arXiv:2409.17413 [pdf, html, other]
-
Title: Setpoint Tracking and Disturbance Attenuation for Gas Pipeline Flow Subject to Uncertainties using BacksteppingSubjects: Optimization and Control (math.OC)
In this paper, we consider the problem of regulating the outlet pressure of gas flowing through a pipeline subject to uncertain and variable outlet flow. Gas flow through a pipe is modeled using the coupled isothermal Euler equations, with the Darcy-Weisbach friction model used to account for the loss of gas flow momentum. The outlet flow variation is generated by a periodic linear dynamic system, which we use as a model of load fluctuations caused by varying consumer demands. We first linearize the nonlinear equations around the equilibrium point and obtain a 2-by-2 coupled hyperbolic partial differential equation (PDE) system expressed in canonical form. Using an observer-based PDE backstepping controller, we demonstrate that the inlet pressure can be manipulated to regulate the outlet pressure to a setpoint, thus compensating for fluctuations in the outlet flow. Furthermore, we extend the observer-based controller to the case when the outlet flow variation is uncertain within a bounded set. In this case, the controller is also capable of regulating the outlet pressure to a neighborhood of the setpoint by manipulating the inlet pressure, even in the presence of uncertain fluctuations in the outlet flow. We provide numerical simulations to demonstrate the performance of the controller.
- [4] arXiv:2409.17450 [pdf, html, other]
-
Title: On Strong Quasiconvexity of Functions in Infinite DimensionsSubjects: Optimization and Control (math.OC)
In this paper, we explore the concept of $\sigma$-quasiconvexity for functions defined on normed vector spaces. This notion encompasses two important and well-established concepts: quasiconvexity and strong quasiconvexity. We start by analyzing certain operations on functions that preserve $\sigma$-quasiconvexity. Next, we present new results concerning the strong quasiconvexity of norm and Minkowski functions in infinite dimensions. Furthermore, we extend a recent result by F. Lara [16] on the supercoercive properties of strongly quasiconvex functions, with applications to the existence and uniqueness of minima, from finite dimensions to infinite dimensions. Finally, we address counterexamples related to strong quasiconvexity.
- [5] arXiv:2409.17493 [pdf, html, other]
-
Title: Tikhonov regularized mixed-order primal-dual dynamical system for convex optimization problems with linear equality constraintsComments: 26 pages, 10 figuresSubjects: Optimization and Control (math.OC)
In Hilbert spaces, we consider a Tikhonov regularized mixed-order primal-dual dynamical system for a convex optimization problem with linear equality constraints. The dynamical system with general time-dependent parameters: viscous damping and temporal scaling can derive certain existing systems when special parameters are selected. When these parameters satisfy appropriate conditions and the Tikhonov regularization parameter \epsilon(t) approaches zero at an appropriate rate, we analyze the asymptotic convergence properties of the proposed system by constructing suitable Lyapunov functions. And we obtain that the objective function error enjoys O(1/(t^2\beta(t))) convergence rate. Under suitable conditions, it can be better than O(1/(t^2)). In addition, we utilize the Lyapunov analysis method to obtain the strong convergence of the trajectory generated by the Tikhonov regularized dynamical system. In particular, when Tikhonov regularization parameter \epsilon(t) vanishes to 0 at some suitable rate, the convergence rate of the primal-dual gap can be o(1/(\beta(t))). The effectiveness of our theoretical results has been demonstrated through numerical experiments.
- [6] arXiv:2409.17944 [pdf, other]
-
Title: Filtering-Linearization: A First-Order Method for Nonconvex Trajectory Optimization with Filter-Based Warm-StartingSubjects: Optimization and Control (math.OC)
Nonconvex trajectory optimization is at the core of designing trajectories for complex autonomous systems. A challenge for nonconvex trajectory optimization methods, such as sequential convex programming, is to find an effective warm-starting point to approximate the nonconvex optimization with a sequence of convex ones. We introduce a first-order method with filter-based warm-starting for nonconvex trajectory optimization. The idea is to first generate sampled trajectories using constraint-aware particle filtering, which solves the problem as an estimation problem. We then identify different locally optimal trajectories through agglomerative hierarchical clustering. Finally, we choose the best locally optimal trajectory to warm-start the prox-linear method, a first-order method with guaranteed convergence. We demonstrate the proposed method on a multi-agent trajectory optimization problem with linear dynamics and nonconvex collision avoidance. Compared with sequential quadratic programming and interior-point method, the proposed method reduces the objective function value by up to approximately 96\% within the same amount of time for a two-agent problem, and 98\% for a six-agent problem.
- [7] arXiv:2409.17962 [pdf, html, other]
-
Title: Distribution-free expectation operators for robust pricing and stocking with heavy-tailed demandSubjects: Optimization and Control (math.OC); Probability (math.PR)
We obtain distribution-free bounds for various fundamental quantities used in probability theory by solving optimization problems that search for extreme distributions among all distributions with the same mean and dispersion. These sharpest possible bounds depend only on the mean and dispersion of the driving random variable. We solve the optimization problems by a novel yet elementary technique that reduces the set of all candidate solutions to two-point distributions. We consider a general dispersion measure, with variance, mean absolute deviation and power moments as special cases. We apply the bounds to robust newsvendor stocking and monopoly pricing, generalizing foundational mean-variance works. This shows how pricing and order decisions respond to increased demand uncertainty, including scenarios where dispersion information allows for heavy-tailed demand distributions.
- [8] arXiv:2409.17998 [pdf, html, other]
-
Title: A Decision-Making Method in Polyhedral Convex Set OptimizationComments: 19 pagesSubjects: Optimization and Control (math.OC)
Optimization problems with set-valued objective functions arise in contexts such as multi-stage optimization with vector-valued objectives. The aim is to identify an optimizer -- a feasible point with an optimal objective value -- based on an ordering relation on a family of sets. When faced with multiple optimizers, a decision maker must choose one. Visualizing the values associated with these optimizers could provide a solid basis for decision-making. However, these values are sets, making it challenging to visualize many of them. Therefore, we propose a method where an optimizer is selected by designing the respective outcome set through a trial-and-error process. In a polyhedral convex setting, we discuss an implementation and prove that an optimizer can be found using this method after a finite number of design steps. We motivate the problem setting and illustrate the process using an example: a two-stage bi-objective network flow problem.
New submissions (showing 8 of 8 entries)
- [9] arXiv:2409.17446 (cross-list from cs.DC) [pdf, other]
-
Title: Efficient Federated Learning against Heterogeneous and Non-stationary Client UnavailabilityComments: Accepted to NeurIPS 2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Optimization and Control (math.OC)
Addressing intermittent client availability is critical for the real-world deployment of federated learning algorithms. Most prior work either overlooks the potential non-stationarity in the dynamics of client unavailability or requires substantial memory/computation overhead. We study federated learning in the presence of heterogeneous and non-stationary client availability, which may occur when the deployment environments are uncertain or the clients are mobile. The impacts of the heterogeneity and non-stationarity in client unavailability can be significant, as we illustrate using FedAvg, the most widely adopted federated learning algorithm. We propose FedAPM, which includes novel algorithmic structures that (i) compensate for missed computations due to unavailability with only $O(1)$ additional memory and computation with respect to standard FedAvg, and (ii) evenly diffuse local updates within the federated learning system through implicit gossiping, despite being agnostic to non-stationary dynamics. We show that FedAPM converges to a stationary point of even non-convex objectives while achieving the desired linear speedup property. We corroborate our analysis with numerical experiments over diversified client unavailability dynamics on real-world data sets.
- [10] arXiv:2409.17488 (cross-list from q-bio.PE) [pdf, html, other]
-
Title: Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategiesComments: 12 pages, 4 figuresSubjects: Populations and Evolution (q-bio.PE); Systems and Control (eess.SY); Optimization and Control (math.OC); Biological Physics (physics.bio-ph); Molecular Networks (q-bio.MN)
Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both the dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing a control cost function based on the Kullback-Leibler divergence. This approach naturally accounts for population-specific factors and simplifies the complex nonlinear Hamilton-Jacobi-Bellman equation into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.
- [11] arXiv:2409.17499 (cross-list from cs.LG) [pdf, html, other]
-
Title: Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGDComments: To appear in NeurIPS 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Distributed learning is essential to train machine learning algorithms across heterogeneous agents while maintaining data privacy. We conduct an asymptotic analysis of Unified Distributed SGD (UD-SGD), exploring a variety of communication patterns, including decentralized SGD and local SGD within Federated Learning (FL), as well as the increasing communication interval in the FL setting. In this study, we assess how different sampling strategies, such as i.i.d. sampling, shuffling, and Markovian sampling, affect the convergence speed of UD-SGD by considering the impact of agent dynamics on the limiting covariance matrix as described in the Central Limit Theorem (CLT). Our findings not only support existing theories on linear speedup and asymptotic network independence, but also theoretically and empirically show how efficient sampling strategies employed by individual agents contribute to overall convergence in UD-SGD. Simulations reveal that a few agents using highly efficient sampling can achieve or surpass the performance of the majority employing moderately improved strategies, providing new insights beyond traditional analyses focusing on the worst-performing agent.
- [12] arXiv:2409.17500 (cross-list from cs.AI) [pdf, html, other]
-
Title: GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient DescentSubjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)
Ensuring that the outputs of neural networks satisfy specific constraints is crucial for applying neural networks to real-life decision-making problems. In this paper, we consider making a batch of neural network outputs satisfy bounded and general linear constraints. We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem. We show that such a problem can be equivalently transformed into an unconstrained convex optimization problem with Lipschitz continuous gradient according to the duality theorem. Then, based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem. To the best of our knowledge, this is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free. Despite the fact that we can explicitly perform backpropagation based on automatic differentiation mechanism, we also provide an alternative approach in GLinSAT to calculate the derivatives based on implicit differentiation of the optimality condition. Experimental results on constrained traveling salesman problems, partial graph matching with outliers, predictive portfolio allocation and power system unit commitment demonstrate the advantages of GLinSAT over existing satisfiability layers.
- [13] arXiv:2409.17903 (cross-list from math.AP) [pdf, html, other]
-
Title: Analysis of a Radiotherapy Model for Brain TumorsComments: 37 pages, 3 figuresSubjects: Analysis of PDEs (math.AP); Optimization and Control (math.OC); Medical Physics (physics.med-ph)
In this work, we focus on the analytical and numerical study of a mathematical model for brain tumors with radiotherapy influence. Under certain assumptions on the given data in the model, we prove existence and uniqueness of a weak nonnegative (biological relevant) solution. Then, assuming only more regular initial data, we obtain the extra regularity of this solution. Besides, we analyze the optimal control of the advection coefficient responding for the radiotherapy effect on the tumor cell population. Finally, we provide numerical illustration to all obtained analytical results.
- [14] arXiv:2409.18000 (cross-list from cs.LG) [pdf, html, other]
-
Title: Safe Time-Varying Optimization based on Gaussian Processes with Spatio-Temporal KernelComments: Accepted to NeurIPS 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Ensuring safety is a key aspect in sequential decision making problems, such as robotics or process control. The complexity of the underlying systems often makes finding the optimal decision challenging, especially when the safety-critical system is time-varying. Overcoming the problem of optimizing an unknown time-varying reward subject to unknown time-varying safety constraints, we propose TVSafeOpt, a new algorithm built on Bayesian optimization with a spatio-temporal kernel. The algorithm is capable of safely tracking a time-varying safe region without the need for explicit change detection. Optimality guarantees are also provided for the algorithm when the optimization problem becomes stationary. We show that TVSafeOpt compares favorably against SafeOpt on synthetic data, both regarding safety and optimality. Evaluation on a realistic case study with gas compressors confirms that TVSafeOpt ensures safety when solving time-varying optimization problems with unknown reward and safety functions.
- [15] arXiv:2409.18010 (cross-list from eess.SY) [pdf, other]
-
Title: End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic dataSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
- [16] arXiv:2409.18077 (cross-list from q-bio.PE) [pdf, html, other]
-
Title: A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networksSubjects: Populations and Evolution (q-bio.PE); Optimization and Control (math.OC)
Finding the most parsimonious tree inside a phylogenetic network with respect to a given character is an NP-hard combinatorial optimization problem that for many network topologies is essentially inapproximable. In contrast, if the network is a rooted tree, then Fitch's well-known algorithm calculates an optimal parsimony score for that character in polynomial time. Drawing inspiration from this we here introduce a new extension of Fitch's algorithm which runs in polynomial time and ensures an approximation factor of 2 on binary, tree-child phylogenetic networks, a popular topologically-restricted subclass of phylogenetic networks in the literature. Specifically, we show that Fitch's algorithm can be seen as a primal-dual algorithm, how it can be extended to binary, tree-child networks and that the approximation guarantee of this extension is tight. These results for a classic problem in phylogenetics strengthens the link between polyhedral methods and phylogenetics and can aid in the study of other related optimization problems on phylogenetic networks.
Cross submissions (showing 8 of 8 entries)
- [17] arXiv:1711.04650 (replaced) [pdf, html, other]
-
Title: Bundle methods with quadratic cuts for deterministic and stochastic strongly convex optimization problemsComments: arXiv admin note: text overlap with arXiv:1707.00812Subjects: Optimization and Control (math.OC)
We introduce two new methods for deterministic convex optimization problems: QCC (Quadratic Cuts for Convex optimization) and QB (Quadratic Bundle method). We prove the complexity of these methods for composite optimization problems which are the sum of a convex function $\tilde h$ and of a strongly convex function $\tilde f$ with parameter $\mu$. These methods use as building blocks quadratic approximations of the strongly convex function $\tilde f$ where the quadratic terms are of form $\frac{\mu}{2}\|\cdot-x_i\|^2$ for trial points $x_i$ computed along iterations (when $\mu=0$ the building blocks are linear approximations). We extend the idea of using quadratic approximations to pieces of the objective for some multistage stochastic optimization problems which have strongly convex recourse functions that we approximate as a maximum of quadratic cuts. We call DASC (Dynamic Approximation for Strongly Convex optimzation) the corresponding optimization method. When the cuts are linear, the method boils down to the popular Stochastic Dual Dynamic Programming (SDDP) method. We provide conditions ensuring strong convexity of the recourse functions and prove the convergence of DASC. Numerical experiments illustrate the performance and correctness of DASC, with DASC being much quicker than SDDP for large values of the constants of strong convexity.
- [18] arXiv:2309.03408 (replaced) [pdf, html, other]
-
Title: Subtransversality and Strong CHIP of Closed Sets in Asplund SpacesJournal-ref: Set-valued and Variational Analysis 2024Subjects: Optimization and Control (math.OC)
In this paper, we mainly study subtransversality and two types of strong CHIP (given via Fréchet and limiting normal cones) for a collection of finitely many closed sets. We first prove characterizations of Asplund spaces in terms of subtransversality and intersection formulae of Fréchet normal cones. Several necessary conditions for subtransversality of closed sets are obtained via Fréchet/limiting normal cones in Asplund spaces. Then, we consider subtransversality for some special closed sets in convex-composite optimization. In this frame we prove an equivalence result on subtransversality, strong Fréchet CHIP and property (G) so as to extend a duality characterization of subtransversality of finitely many closed convex sets via strong CHIP and property (G) to the possibly non-convex case. As applications, we use these results on subtransversality and strong CHIP to study error bounds of inequality systems and give several dual criteria for error bounds via Fréchet normal cones and subdifferentials.
- [19] arXiv:2310.06602 (replaced) [pdf, html, other]
-
Title: A solution method for arbitrary polyhedral convex set optimization problemsComments: reference [11] added, minor changes (typos)Subjects: Optimization and Control (math.OC)
We provide a solution method for the polyhedral convex set optimization problem, that is, the problem to minimize a set-valued mapping with polyhedral convex graph with respect to a set ordering relation which is generated by a polyhedral convex cone . The method is proven to be correct and finite without any further assumption to the problem.
- [20] arXiv:2310.17265 (replaced) [pdf, html, other]
-
Title: Forward Primal-Dual Half-Forward Algorithm for Splitting Four OperatorsSubjects: Optimization and Control (math.OC)
In this article, we propose a splitting algorithm to find zeros of the sum of four maximally monotone operators in real Hilbert spaces. In particular, we consider a Lipschitzian operator, a cocoercive operator, and a linear composite term. In the case when the Lipschitzian operator is absent, our method reduces to the Condat-Vũ algorithm. On the other hand, when the linear composite term is absent, the algorithm reduces to the Forward-Backward-Half-Forward algorithm (FBHF). Additionally, in each case, the set of step-sizes that guarantee the weak convergence of those methods are recovered. Therefore, our algorithm can be seen as a generalization of Condat-Vũ and FBHF. Moreover, we propose extensions and applications of our method in multivariate monotone inclusions and saddle point problems. Finally, we present a numerical experiment in image deblurring problems.
- [21] arXiv:2404.00635 (replaced) [pdf, html, other]
-
Title: Popov Mirror-Prox for solving Variational InequalitiesSubjects: Optimization and Control (math.OC)
We consider the mirror-prox algorithm for solving monotone Variational Inequality (VI) problems. As the mirror-prox algorithm is not practically implementable, except in special instances of VIs (such as affine VIs), we consider its implementation with Popov method updates. We provide convergence rate analysis of our proposed method for a monotone VI with a Lipschitz continuous mapping. We establish a convergence rate of $O(1/t)$, in terms of the number $t$ of iterations, for the dual gap function. Simulations on a two player matrix game corroborate our findings.
- [22] arXiv:2404.17541 (replaced) [pdf, html, other]
-
Title: Applications of Lifted Nonlinear Cuts to Convex Relaxations of the AC Power Flow EquationsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We demonstrate that valid inequalities, or lifted nonlinear cuts (LNC), can be projected to tighten the Second Order Cone (SOC), Convex DistFlow (CDF), and Network Flow (NF) relaxations of the AC Optimal Power Flow (AC-OPF) problem. We conduct experiments on 36 cases from the PGLib-OPF library for two objective functions, (1) power generation maximization and (2) generation cost minimization. Significant optimality gap improvements are shown for the maximization problem, where the LNC strengthen the SOC and CDF relaxations in 100% of the test cases, with average and maximum differences in the optimality gaps of 23.1% and 93.5% respectively. The NF relaxation is strengthened in 79.2% of test cases, with average and maximum differences in the optimality gaps of 3.45% and 21.2% respectively. We also study the trade-off between relaxation quality and solve time, demonstrating that the strengthened CDF relaxation outperforms the strengthened SOC formulation in terms of runtime and number of iterations needed, while the strengthened NF formulation is the most scalable with the lowest relaxation quality provided by these LNC.
- [23] arXiv:2407.06726 (replaced) [pdf, html, other]
-
Title: Optimal control of a non-smooth elliptic PDE with non-linear term acting on the controlComments: 22 pages, just minor modifications, added the reference to arXiv:2409.15039, related to the preprints arXiv:2406.15146 (version 3) and arXiv:2409.15039Subjects: Optimization and Control (math.OC)
This paper continues the investigations from [7] and is concerned with the derivation of first-order conditions for a control constrained optimization problem governed by a non-smooth elliptic PDE. The control enters the state equation not only linearly but also as the argument of a regularization of the Heaviside function. The non-linearity which acts on the state is locally Lipschitz-continuous and not necessarily differentiable, i.e., non-smooth. This excludes the application of standard adjoint calculus. We derive conditions under which a strong stationary optimality system can be established, i.e., a system that is equivalent to the purely primal optimality condition saying that the directional derivative of the reduced objective in feasible directions is nonnegative. For this, two assumptions are made on the unknown optimizer. These are fulfilled if the non-smoothness is locally convex around its non-differentiable points and if an estimate involving only the given data is true. Some of the presented findings are employed in the recent contribution [8], where limit optimality systems for non-smooth shape optimization problems [7] are established.
- [24] arXiv:2407.10065 (replaced) [pdf, other]
-
Title: An Efficient High-Dimensional Gradient Estimator for Stochastic Differential EquationsSubjects: Optimization and Control (math.OC)
Overparameterized stochastic differential equation (SDE) models have achieved remarkable success in various complex environments, such as PDE-constrained optimization, stochastic control and reinforcement learning, financial engineering, and neural SDEs. These models often feature system evolution coefficients that are parameterized by a high-dimensional vector $\theta \in \mathbb{R}^n$, aiming to optimize expectations of the SDE, such as a value function, through stochastic gradient ascent. Consequently, designing efficient gradient estimators for which the computational complexity scales well with $n$ is of significant interest. This paper introduces a novel unbiased stochastic gradient estimator--the generator gradient estimator--for which the computation time remains stable in $n$. In addition to establishing the validity of our methodology for general SDEs with jumps, we also perform numerical experiments that test our estimator in linear-quadratic control problems parameterized by high-dimensional neural networks. The results show a significant improvement in efficiency compared to the widely used pathwise differentiation method: Our estimator achieves near-constant computation times, increasingly outperforms its counterpart as $n$ increases, and does so without compromising estimation variance. These empirical findings highlight the potential of our proposed methodology for optimizing SDEs in contemporary applications.
- [25] arXiv:2409.09375 (replaced) [pdf, html, other]
-
Title: Initial Error Affection and Error Correction in Linear Quadratic Mean Field Games under Erroneous Initial InformationSubjects: Optimization and Control (math.OC)
In this paper, the initial error affection and error correction in linear quadratic mean field games (MPLQMFGs) under erroneous initial distribution information are investigated. First, a LQMFG model is developed where agents are coupled by dynamics and cost functions. Next, by studying the evolutionary of LQMFGs under erroneous initial distributions information, the affection of initial error on the game and agents' strategies are given. Furthermore, under deterministic situation, we provide a sufficient condition for agents to correct initial error and give their optimal strategies when agents are allowed to change their strategies at a intermediate time. Besides, the situation where agents are allowed to predict MF and adjust their strategies in real-time is considered. Finally, simulations are performed to verify above conclusions.
- [26] arXiv:2409.15734 (replaced) [pdf, html, other]
-
Title: Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random ModelsComments: 41 pages, 3 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)
In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods.
- [27] arXiv:2106.12060 (replaced) [pdf, html, other]
-
Title: Faster Randomized Methods for Orthogonality Constrained ProblemsSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Recent literature has advocated the use of randomized methods for accelerating the solution of various matrix problems arising throughout data science and computational science. One popular strategy for leveraging randomization is to use it as a way to reduce problem size. However, methods based on this strategy lack sufficient accuracy for some applications. Randomized preconditioning is another approach for leveraging randomization, which provides higher accuracy. The main challenge in using randomized preconditioning is the need for an underlying iterative method, thus randomized preconditioning so far have been applied almost exclusively to solving regression problems and linear systems. In this article, we show how to expand the application of randomized preconditioning to another important set of problems prevalent across data science: optimization problems with (generalized) orthogonality constraints. We demonstrate our approach, which is based on the framework of Riemannian optimization and Riemannian preconditioning, on the problem of computing the dominant canonical correlations and on the Fisher linear discriminant analysis problem. For both problems, we evaluate the effect of preconditioning on the computational costs and asymptotic convergence, and demonstrate empirically the utility of our approach.
- [28] arXiv:2304.09340 (replaced) [pdf, html, other]
-
Title: Propagation of chaos for mean field Schr\"odinger problemsSubjects: Probability (math.PR); Optimization and Control (math.OC)
In this work, we study the mean field Schrödinger problem from a purely probabilistic point of view by exploiting its connection to stochastic control theory for McKean-Vlasov diffusions. Our main result shows that the mean field Schrödinger problem arises as the limit of ``standard'' Schrödinger problems over interacting particles. Due to the stochastic maximum principle and a suitable penalization procedure, the result follows as a consequence of novel (quantitative) propagation of chaos results for forward-backwards particle systems. The approach described in the paper seems flexible enough to address other questions in the theory. For instance, our stochastic control technique further allows us to solve the mean field Schrödinger problem and characterize its solution, the mean field Schrödinger bridge, by a forward-backward planning equation.
- [29] arXiv:2310.17392 (replaced) [pdf, html, other]
-
Title: The Power of Simple Menus in Robust Selling MechanismsSubjects: Theoretical Economics (econ.TH); Optimization and Control (math.OC)
We study a robust selling problem where a seller attempts to sell one item to a buyer but is uncertain about the buyer's valuation distribution. Existing literature shows that robust screening provides a stronger theoretical guarantee than robust deterministic pricing, but at the expense of implementation complexity, as it requires a menu of infinite options. Our research aims to find simple mechanisms to hedge against market ambiguity effectively. We develop a general framework for robust selling mechanisms with a finite menu (or randomization across finite prices). We propose a tractable reformulation that addresses various ambiguity sets of the buyer's valuation distribution, including support, mean, and quantile ambiguity sets. We derive optimal selling mechanisms and corresponding performance ratios for different menu sizes, showing that even a modest menu size can deliver benefits similar to those achieved by the optimal robust mechanism with infinite options, establishing a favorable trade-off between theoretical performance and implementation simplicity. Remarkably, a menu size of merely two can significantly enhance the performance ratio compared to deterministic pricing.
- [30] arXiv:2312.05250 (replaced) [pdf, html, other]
-
Title: TaskMet: Task-Driven Metric Learning for Model LearningComments: NeurIPS 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available at this https URL
- [31] arXiv:2403.15244 (replaced) [pdf, html, other]
-
Title: A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform SmoothnessComments: Paper accepted by CDC 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.
- [32] arXiv:2408.10147 (replaced) [pdf, html, other]
-
Title: In-Context Learning with Representations: Contextual Generalization of Trained TransformersComments: Accepted by NeurIPS 2024Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via learning the template function for each task in-context, where all template functions lie in a linear space with $m$ basis functions. We analyze the training dynamics of one-layer multi-head transformers to in-contextly predict unlabeled inputs given partially labeled prompts, where the labels contain Gaussian noise and the number of examples in each prompt are not sufficient to determine the template. Under mild assumptions, we show that the training loss for a one-layer multi-head transformer converges linearly to a global minimum. Moreover, the transformer effectively learns to perform ridge regression over the basis functions. To our knowledge, this study is the first provable demonstration that transformers can learn contextual (i.e., template) information to generalize to both unseen examples and tasks when prompts contain only a small number of query-answer pairs.
- [33] arXiv:2408.11527 (replaced) [pdf, html, other]
-
Title: The Vizier Gaussian Process Bandit AlgorithmXingyou Song, Qiuyi Zhang, Chansoo Lee, Emily Fertig, Tzu-Kuo Huang, Lior Belenki, Greg Kochanski, Setareh Ariafar, Srinivas Vasudevan, Sagi Perel, Daniel GolovinComments: Google DeepMind Technical Report. Code can be found in this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Google Vizier has performed millions of optimizations and accelerated numerous research and production systems at Google, demonstrating the success of Bayesian optimization as a large-scale service. Over multiple years, its algorithm has been improved considerably, through the collective experiences of numerous research efforts and user feedback. In this technical report, we discuss the implementation details and design choices of the current default algorithm provided by Open Source Vizier. Our experiments on standardized benchmarks reveal its robustness and versatility against well-established industry baselines on multiple practical modes.
- [34] arXiv:2408.11974 (replaced) [pdf, html, other]
-
Title: Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax OptimizationComments: A preliminary version [arXiv:1906.00331] of this paper, with a subset of the results that are presented here, was presented at ICML 2020; 44 Pages, 10 FiguresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the objective function $f(\textbf{x}, \textbf{y})$ is nonconvex in $\textbf{x}$ and concave in $\textbf{y}$, and the constraint set $Y \subseteq \mathbb{R}^n$ is convex and bounded. In the convex-concave setting, the single-timescale gradient descent ascent (GDA) algorithm is widely used in applications and has been shown to have strong convergence guarantees. In more general settings, however, it can fail to converge. Our contribution is to design TTGDA algorithms that are effective beyond the convex-concave setting, efficiently finding a stationary point of the function $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$. We also establish theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To the best of our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in other real-world application problems.
- [35] arXiv:2408.16899 (replaced) [pdf, html, other]
-
Title: Network-aware Recommender System via Online Feedback OptimizationSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Personalized content on social platforms can exacerbate negative phenomena such as polarization, partly due to the feedback interactions between recommendations and the users. In this paper, we present a control-theoretic recommender system that explicitly accounts for this feedback loop to mitigate polarization. Our approach extends online feedback optimization - a control paradigm for steady-state optimization of dynamical systems - to develop a recommender system that trades off users engagement and polarization reduction, while relying solely on online click data. We establish theoretical guarantees for optimality and stability of the proposed design and validate its effectiveness via numerical experiments with a user population governed by Friedkin-Johnsen dynamics. Our results show these "network-aware" recommendations can significantly reduce polarization while maintaining high levels of user engagement.