Search | arXiv e-print repository

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning

Authors: Maximilian Nägele, Jan Olle, Thomas Fösel, Remmy Zen, Florian Marquardt

Abstract: Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision process. However, a large class of problems does not fit straightforwardly into this framework: Non-cumulative Markov decision processes (NCMDPs), where instead o… ▽ More Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision process. However, a large class of problems does not fit straightforwardly into this framework: Non-cumulative Markov decision processes (NCMDPs), where instead of the expected sum of rewards, the expected value of an arbitrary function of the rewards is maximized. Example functions include the maximum of the rewards or their mean divided by their standard deviation. In this work, we introduce a general mapping of NCMDPs to standard MDPs. This allows all techniques developed to find optimal policies for MDPs, such as reinforcement learning or dynamic programming, to be directly applied to the larger class of NCMDPs. Focusing on reinforcement learning, we show applications in a diverse set of tasks, including classical control, portfolio optimization in finance, and discrete optimization problems. Given our approach, we can improve both final performance and training time compared to relying on standard MDPs. △ Less

Submitted 22 May, 2024; originally announced May 2024.

ACM Class: I.2.8; I.2.6

arXiv:2405.06244 [pdf, other]

A $(\frac32+\frac1{\mathrm{e}})$-Approximation Algorithm for Ordered TSP

Authors: Susanne Armbruster, Matthias Mnich, Martin Nägele

Abstract: We present a new $(\frac32+\frac1{\mathrm{e}})$-approximation algorithm for the Ordered Traveling Salesperson Problem (Ordered TSP). Ordered TSP is a variant of the classical metric Traveling Salesperson Problem (TSP) where a specified subset of vertices needs to appear on the output Hamiltonian cycle in a given order, and the task is to compute a cheapest such cycle. Our approximation guarantee o… ▽ More We present a new $(\frac32+\frac1{\mathrm{e}})$-approximation algorithm for the Ordered Traveling Salesperson Problem (Ordered TSP). Ordered TSP is a variant of the classical metric Traveling Salesperson Problem (TSP) where a specified subset of vertices needs to appear on the output Hamiltonian cycle in a given order, and the task is to compute a cheapest such cycle. Our approximation guarantee of approximately $1.868$ holds with respect to the value of a natural new linear programming (LP) relaxation for Ordered TSP. Our result significantly improves upon the previously best known guarantee of $\frac52$ for this problem and thereby considerably reduces the gap between approximability of Ordered TSP and metric TSP. Our algorithm is based on a decomposition of the LP solution into weighted trees that serve as building blocks in our tour construction. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2311.18588 [pdf, other]

Optimizing ZX-Diagrams with Deep Reinforcement Learning

Authors: Maximilian Nägele, Florian Marquardt

Abstract: ZX-diagrams are a powerful graphical language for the description of quantum processes with applications in fundamental quantum mechanics, quantum circuit optimization, tensor network simulation, and many more. The utility of ZX-diagrams relies on a set of local transformation rules that can be applied to them without changing the underlying quantum process they describe. These rules can be exploi… ▽ More ZX-diagrams are a powerful graphical language for the description of quantum processes with applications in fundamental quantum mechanics, quantum circuit optimization, tensor network simulation, and many more. The utility of ZX-diagrams relies on a set of local transformation rules that can be applied to them without changing the underlying quantum process they describe. These rules can be exploited to optimize the structure of ZX-diagrams for a range of applications. However, finding an optimal sequence of transformation rules is generally an open problem. In this work, we bring together ZX-diagrams with reinforcement learning, a machine learning technique designed to discover an optimal sequence of actions in a decision-making problem and show that a trained reinforcement learning agent can significantly outperform other optimization techniques like a greedy strategy or simulated annealing. The use of graph neural networks to encode the policy of the agent enables generalization to diagrams much bigger than seen during the training phase. △ Less

Submitted 26 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 12 pages, 7 figures - Revision on 26.04.2024: Fixed bug in training algorithm to give quantitatively better results (qualitative results unchanged)

arXiv:2308.06254 [pdf, other]

A Better-Than-1.6-Approximation for Prize-Collecting TSP

Authors: Jannis Blauth, Nathan Klein, Martin Nägele

Abstract: Prize-Collecting TSP is a variant of the traveling salesperson problem where one may drop vertices from the tour at the cost of vertex-dependent penalties. The quality of a solution is then measured by adding the length of the tour and the sum of all penalties of vertices that are not visited. We present a polynomial-time approximation algorithm with an approximation guarantee slightly below… ▽ More Prize-Collecting TSP is a variant of the traveling salesperson problem where one may drop vertices from the tour at the cost of vertex-dependent penalties. The quality of a solution is then measured by adding the length of the tour and the sum of all penalties of vertices that are not visited. We present a polynomial-time approximation algorithm with an approximation guarantee slightly below $1.6$, where the guarantee is with respect to the natural linear programming relaxation of the problem. This improves upon the previous best-known approximation ratio of $1.774$. Our approach is based on a known decomposition for solutions of this linear relaxation into rooted trees. Our algorithm takes a tree from this decomposition and then performs a pruning step before doing parity correction on the remainder. Using a simple analysis, we bound the approximation guarantee of the proposed algorithm by $(1+\sqrt{5})/2 \approx 1.618$, the golden ratio. With some additional technical care we further improve it to $1.599$. △ Less

Submitted 14 February, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2302.07029 [pdf, other]

Advances on Strictly $Δ$-Modular IPs

Authors: Martin Nägele, Christian Nöbel, Richard Santiago, Rico Zenklusen

Abstract: There has been significant work recently on integer programs (IPs) $\min\{c^\top x \colon Ax\leq b,\,x\in \mathbb{Z}^n\}$ with a constraint marix $A$ with bounded subdeterminants. This is motivated by a well-known conjecture claiming that, for any constant $Δ\in \mathbb{Z}_{>0}$, $Δ$-modular IPs are efficiently solvable, which are IPs where the constraint matrix $A\in \mathbb{Z}^{m\times n}$ has f… ▽ More There has been significant work recently on integer programs (IPs) $\min\{c^\top x \colon Ax\leq b,\,x\in \mathbb{Z}^n\}$ with a constraint marix $A$ with bounded subdeterminants. This is motivated by a well-known conjecture claiming that, for any constant $Δ\in \mathbb{Z}_{>0}$, $Δ$-modular IPs are efficiently solvable, which are IPs where the constraint matrix $A\in \mathbb{Z}^{m\times n}$ has full column rank and all $n\times n$ minors of $A$ are within $\{-Δ, \dots, Δ\}$. Previous progress on this question, in particular for $Δ=2$, relies on algorithms that solve an important special case, namely strictly $Δ$-modular IPs, which further restrict the $n\times n$ minors of $A$ to be within $\{-Δ, 0, Δ\}$. Even for $Δ=2$, such problems include well-known combinatorial optimization problems like the minimum odd/even cut problem. The conjecture remains open even for strictly $Δ$-modular IPs. Prior advances were restricted to prime $Δ$, which allows for employing strong number-theoretic results. In this work, we make first progress beyond the prime case by presenting techniques not relying on such strong number-theoretic prime results. In particular, our approach implies that there is a randomized algorithm to check feasibility of strictly $Δ$-modular IPs in strongly polynomial time if $Δ\leq4$. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2301.09340 [pdf, other]

A New Dynamic Programming Approach for Spanning Trees with Chain Constraints and Beyond

Authors: Martin Nägele, Rico Zenklusen

Abstract: Short spanning trees subject to additional constraints are important building blocks in various approximation algorithms. Especially in the context of the Traveling Salesman Problem (TSP), new techniques for finding spanning trees with well-defined properties have been crucial in recent progress. We consider the problem of finding a spanning tree subject to constraints on the edges in cuts forming… ▽ More Short spanning trees subject to additional constraints are important building blocks in various approximation algorithms. Especially in the context of the Traveling Salesman Problem (TSP), new techniques for finding spanning trees with well-defined properties have been crucial in recent progress. We consider the problem of finding a spanning tree subject to constraints on the edges in cuts forming a laminar family of small width. Our main contribution is a new dynamic programming approach where the value of a table entry does not only depend on the values of previous table entries, as it is usually the case, but also on a specific representative solution saved together with each table entry. This allows for handling a broad range of constraint types. In combination with other techniques -- including negatively correlated rounding and a polyhedral approach that, in the problems we consider, allows for avoiding potential losses in the objective through the randomized rounding -- we obtain several new results. We first present a quasi-polynomial time algorithm for the Minimum Chain-Constrained Spanning Tree Problem with an essentially optimal guarantee. More precisely, each chain constraint is violated by a factor of at most $1+\varepsilon$, and the cost is no larger than that of an optimal solution not violating any chain constraint. The best previous procedure is a bicriteria approximation violating each chain constraint by up to a constant factor and losing another factor in the objective. Moreover, our approach can naturally handle lower bounds on the chain constraints, and it can be extended to constraints on cuts forming a laminar family of constant width. Furthermore, we show how our approach can also handle parity constraints (or, more precisely, a proxy thereof) as used in the context of (Path) TSP and one of its generalizations, and discuss implications in this context. △ Less

Submitted 12 September, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: A short version of this work appeared in the proceedings of the 30th annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2019)

arXiv:2212.03776 [pdf, other]

An improved approximation guarantee for Prize-Collecting TSP

Authors: Jannis Blauth, Martin Nägele

Abstract: We present a new approximation algorithm for the (metric) prize-collecting traveling salesperson problem (PCTSP). In PCTSP, opposed to the classical traveling salesperson problem (TSP), one may not include a vertex of the input graph in the returned tour at the cost of a given vertex-dependent penalty, and the objective is to balance the length of the tour and the incurred penalties for omitted ve… ▽ More We present a new approximation algorithm for the (metric) prize-collecting traveling salesperson problem (PCTSP). In PCTSP, opposed to the classical traveling salesperson problem (TSP), one may not include a vertex of the input graph in the returned tour at the cost of a given vertex-dependent penalty, and the objective is to balance the length of the tour and the incurred penalties for omitted vertices by minimizing the sum of the two. We present an algorithm that achieves an approximation guarantee of $1.774$ with respect to the natural linear programming relaxation of the problem. This significantly reduces the gap between the approximability of classical TSP and PCTSP, beating the previously best known approximation factor of $1.915$. As a key ingredient of our improvement, we present a refined decomposition technique for solutions of the LP relaxation, and show how to leverage components of that decomposition as building blocks for our tours. △ Less

Submitted 12 April, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2109.03148 [pdf, other]

Congruency-Constrained TU Problems Beyond the Bimodular Case

Authors: Martin Nägele, Richard Santiago, Rico Zenklusen

Abstract: A long-standing open question in Integer Programming is whether integer programs with constraint matrices with bounded subdeterminants are efficiently solvable. An important special case thereof are congruency-constrained integer programs $\min\{c^\top x\colon\ Tx\leq b,\ γ^\top x\equiv r\pmod{m},\ x\in\mathbb{Z}^n\}$ with a totally unimodular constraint matrix $T$. Such problems have been shown t… ▽ More A long-standing open question in Integer Programming is whether integer programs with constraint matrices with bounded subdeterminants are efficiently solvable. An important special case thereof are congruency-constrained integer programs $\min\{c^\top x\colon\ Tx\leq b,\ γ^\top x\equiv r\pmod{m},\ x\in\mathbb{Z}^n\}$ with a totally unimodular constraint matrix $T$. Such problems have been shown to be polynomial-time solvable for $m=2$, which led to an efficient algorithm for integer programs with bimodular constraint matrices, i.e., full-rank matrices whose $n\times n$ subdeterminants are bounded by two in absolute value. Whereas these advances heavily relied on existing results on well-known combinatorial problems with parity constraints, new approaches are needed beyond the bimodular case, i.e., for $m>2$. We make first progress in this direction through several new techniques. In particular, we show how to efficiently decide feasibility of congruency-constrained integer programs with a totally unimodular constraint matrix for $m=3$. Furthermore, for general $m$, our techniques also allow for identifying flat directions of infeasible problems, and deducing bounds on the proximity between solutions of the problem and its relaxation. △ Less

Submitted 25 April, 2023; v1 submitted 7 September, 2021; originally announced September 2021.

arXiv:1707.06212 [pdf, ps, other]

Submodular Minimization Under Congruency Constraints

Authors: Martin Nägele, Benny Sudakov, Rico Zenklusen

Abstract: Submodular function minimization (SFM) is a fundamental and efficiently solvable problem class in combinatorial optimization with a multitude of applications in various fields. Surprisingly, there is only very little known about constraint types under which SFM remains efficiently solvable. The arguably most relevant non-trivial constraint class for which polynomial SFM algorithms are known are pa… ▽ More Submodular function minimization (SFM) is a fundamental and efficiently solvable problem class in combinatorial optimization with a multitude of applications in various fields. Surprisingly, there is only very little known about constraint types under which SFM remains efficiently solvable. The arguably most relevant non-trivial constraint class for which polynomial SFM algorithms are known are parity constraints, i.e., optimizing only over sets of odd (or even) cardinality. Parity constraints capture classical combinatorial optimization problems like the odd-cut problem, and they are a key tool in a recent technique to efficiently solve integer programs with a constraint matrix whose subdeterminants are bounded by two in absolute value. We show that efficient SFM is possible even for a significantly larger class than parity constraints, by introducing a new approach that combines techniques from Combinatorial Optimization, Combinatorics, and Number Theory. In particular, we can show that efficient SFM is possible over all sets (of any given lattice) of cardinality r mod m, as long as m is a constant prime power. This covers generalizations of the odd-cut problem with open complexity status, and with relevance in the context of integer programming with higher subdeterminants. To obtain our results, we establish a connection between the correctness of a natural algorithm, and the inexistence of set systems with specific combinatorial properties. We introduce a general technique to disprove the existence of such set systems, which allows for obtaining extensions of our results beyond the above-mentioned setting. These extensions settle two open questions raised by Geelen and Kapadia [Combinatorica, 2017] in the context of computing the girth and cogirth of certain types of binary matroids. △ Less

Submitted 23 November, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

Showing 1–9 of 9 results for author: Nägele, M