-
Control of Large Swarms via Random Finite Set Theory
Authors:
Bryce Doerr,
Richard Linares
Abstract:
Controlling large swarms of robotic agents has many challenges including, but not limited to, computational complexity due to the number of agents, uncertainty in the functionality of each agent in the swarm, and uncertainty in the swarm's configuration. This work generalizes the swarm state using Random Finite Set (RFS) theory and solves the control problem using model predictive control which na…
▽ More
Controlling large swarms of robotic agents has many challenges including, but not limited to, computational complexity due to the number of agents, uncertainty in the functionality of each agent in the swarm, and uncertainty in the swarm's configuration. This work generalizes the swarm state using Random Finite Set (RFS) theory and solves the control problem using model predictive control which naturally handles the challenges. This work uses information divergence to define the distance between swarm RFS and a desired distribution. A stochastic optimal control problem is formulated using a modified L2 distance. Simulation results show that swarm densities converge to a target destination, and the RFS control formulation can vary in the number of target destinations.
△ Less
Submitted 11 April, 2018; v1 submitted 22 January, 2018;
originally announced January 2018.
-
Probabilistic Tools for the Analysis of Randomized Optimization Heuristics
Authors:
Benjamin Doerr
Abstract:
This chapter collects several probabilistic tools that proved to be useful in the analysis of randomized search heuristics. This includes classic material like Markov, Chebyshev and Chernoff inequalities, but also lesser known topics like stochastic domination and coupling or Chernoff bounds for geometrically distributed random variables and for negatively correlated random variables. Most of the…
▽ More
This chapter collects several probabilistic tools that proved to be useful in the analysis of randomized search heuristics. This includes classic material like Markov, Chebyshev and Chernoff inequalities, but also lesser known topics like stochastic domination and coupling or Chernoff bounds for geometrically distributed random variables and for negatively correlated random variables. Most of the results presented here have appeared previously, some, however, only in recent conference publications. While the focus is on collecting tools for the analysis of randomized search heuristics, many of these may be useful as well in the analysis of classic randomized algorithms or discrete random structures.
△ Less
Submitted 21 September, 2021; v1 submitted 20 January, 2018;
originally announced January 2018.
-
Better Runtime Guarantees Via Stochastic Domination
Authors:
Benjamin Doerr
Abstract:
Apart from few exceptions, the mathematical runtime analysis of evolutionary algorithms is mostly concerned with expected runtimes. In this work, we argue that stochastic domination is a notion that should be used more frequently in this area. Stochastic domination allows to formulate much more informative performance guarantees, it allows to decouple the algorithm analysis into the true algorithm…
▽ More
Apart from few exceptions, the mathematical runtime analysis of evolutionary algorithms is mostly concerned with expected runtimes. In this work, we argue that stochastic domination is a notion that should be used more frequently in this area. Stochastic domination allows to formulate much more informative performance guarantees, it allows to decouple the algorithm analysis into the true algorithmic part of detecting a domination statement and the probability-theoretical part of deriving the desired probabilistic guarantees from this statement, and it helps finding simpler and more natural proofs. As particular results, we prove a fitness level theorem which shows that the runtime is dominated by a sum of independent geometric random variables, we prove the first tail bounds for several classic runtime problems, and we give a short and natural proof for Witt's result that the runtime of any $(μ,p)$ mutation-based algorithm on any function with unique optimum is subdominated by the runtime of a variant of the \oea on the \onemax function. As side-products, we determine the fastest unbiased (1+1) algorithm for the \leadingones benchmark problem, both in the general case and when restricted to static mutation operators, and we prove a Chernoff-type tail bound for sums of independent coupon collector distributions.
△ Less
Submitted 23 August, 2018; v1 submitted 13 January, 2018;
originally announced January 2018.
-
An Elementary Analysis of the Probability That a Binomial Random Variable Exceeds Its Expectation
Authors:
Benjamin Doerr
Abstract:
We give an elementary proof of the fact that a binomial random variable $X$ with parameters $n$ and $0.29/n \le p < 1$ with probability at least $1/4$ strictly exceeds its expectation. We also show that for $1/n \le p < 1 - 1/n$, $X$ exceeds its expectation by more than one with probability at least $0.0370$. Both probabilities approach $1/2$ when $np$ and $n(1-p)$ tend to infinity.
We give an elementary proof of the fact that a binomial random variable $X$ with parameters $n$ and $0.29/n \le p < 1$ with probability at least $1/4$ strictly exceeds its expectation. We also show that for $1/n \le p < 1 - 1/n$, $X$ exceeds its expectation by more than one with probability at least $0.0370$. Both probabilities approach $1/2$ when $np$ and $n(1-p)$ tend to infinity.
△ Less
Submitted 4 January, 2018; v1 submitted 1 December, 2017;
originally announced December 2017.
-
Probabilistic Lower Bounds for the Discrepancy of Latin Hypercube Samples
Authors:
Benjamin Doerr,
Carola Doerr,
Michael Gnewuch
Abstract:
We provide probabilistic lower bounds for the star discrepancy of Latin hypercube samples. These bounds are sharp in the sense that they match the recent probabilistic upper bounds for the star discrepancy of Latin hypercube samples proved in [M.~Gnewuch, N.~Hebbinghaus. "Discrepancy bounds for a class of negatively dependent random points including Latin hypercube samples". Preprint 2016.]. Toget…
▽ More
We provide probabilistic lower bounds for the star discrepancy of Latin hypercube samples. These bounds are sharp in the sense that they match the recent probabilistic upper bounds for the star discrepancy of Latin hypercube samples proved in [M.~Gnewuch, N.~Hebbinghaus. "Discrepancy bounds for a class of negatively dependent random points including Latin hypercube samples". Preprint 2016.]. Together, this result and our work implies that the discrepancy of Latin hypercube samples differs at most by constant factors from the discrepancy of uniformly sampled point sets.
△ Less
Submitted 26 July, 2017;
originally announced July 2017.
-
Runtime Analysis of the $(1+(λ,λ))$ Genetic Algorithm on Random Satisfiable 3-CNF Formulas
Authors:
Maxim Buzdalov,
Benjamin Doerr
Abstract:
The $(1+(λ,λ))$ genetic algorithm, first proposed at GECCO 2013, showed a surprisingly good performance on so me optimization problems. The theoretical analysis so far was restricted to the OneMax test function, where this GA profited from the perfect fitness-distance correlation. In this work, we conduct a rigorous runtime analysis of this GA on random 3-SAT instances in the planted solution mode…
▽ More
The $(1+(λ,λ))$ genetic algorithm, first proposed at GECCO 2013, showed a surprisingly good performance on so me optimization problems. The theoretical analysis so far was restricted to the OneMax test function, where this GA profited from the perfect fitness-distance correlation. In this work, we conduct a rigorous runtime analysis of this GA on random 3-SAT instances in the planted solution model having at least logarithmic average degree, which are known to have a weaker fitness distance correlation.
We prove that this GA with fixed not too large population size again obtains runtimes better than $Θ(n \log n)$, which is a lower bound for most evolutionary algorithms on pseudo-Boolean problems with unique optimum. However, the self-adjusting version of the GA risks reaching population sizes at which the intermediate selection of the GA, due to the weaker fitness-distance correlation, is not able to distinguish a profitable offspring from others. We show that this problem can be overcome by equipping the self-adjusting GA with an upper limit for the population size. Apart from sparse instances, this limit can be chosen in a way that the asymptotic performance does not worsen compared to the idealistic OneMax case. Overall, this work shows that the $(1+(λ,λ))$ GA can provably have a good performance on combinatorial search and optimization problems also in the presence of a weaker fitness-distance correlation.
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
The (1+$λ$) Evolutionary Algorithm with Self-Adjusting Mutation Rate
Authors:
Benjamin Doerr,
Christian Gießen,
Carsten Witt,
Jing Yang
Abstract:
We propose a new way to self-adjust the mutation rate in population-based evolutionary algorithms in discrete search spaces. Roughly speaking, it consists of creating half the offspring with a mutation rate that is twice the current mutation rate and the other half with half the current rate. The mutation rate is then updated to the rate used in that subpopulation which contains the best offspring…
▽ More
We propose a new way to self-adjust the mutation rate in population-based evolutionary algorithms in discrete search spaces. Roughly speaking, it consists of creating half the offspring with a mutation rate that is twice the current mutation rate and the other half with half the current rate. The mutation rate is then updated to the rate used in that subpopulation which contains the best offspring.
We analyze how the $(1+λ)$ evolutionary algorithm with this self-adjusting mutation rate optimizes the OneMax test function. We prove that this dynamic version of the $(1+λ)$ EA finds the optimum in an expected optimization time (number of fitness evaluations) of $O(nλ/\logλ+n\log n)$. This time is asymptotically smaller than the optimization time of the classic $(1+λ)$ EA. Previous work shows that this performance is best-possible among all $λ$-parallel mutation-based unbiased black-box algorithms.
This result shows that the new way of adjusting the mutation rate can find optimal dynamic parameter values on the fly. Since our adjustment mechanism is simpler than the ones previously used for adjusting the mutation rate and does not have parameters itself, we are optimistic that it will find other applications.
△ Less
Submitted 25 May, 2018; v1 submitted 7 April, 2017;
originally announced April 2017.
-
Fast Genetic Algorithms
Authors:
Benjamin Doerr,
Huu Phuoc Le,
Régis Makhmara,
Ta Duy Nguyen
Abstract:
For genetic algorithms using a bit-string representation of length~$n$, the general recommendation is to take $1/n$ as mutation rate. In this work, we discuss whether this is really justified for multimodal functions. Taking jump functions and the $(1+1)$ evolutionary algorithm as the simplest example, we observe that larger mutation rates give significantly better runtimes. For the $\jump_{m,n}$…
▽ More
For genetic algorithms using a bit-string representation of length~$n$, the general recommendation is to take $1/n$ as mutation rate. In this work, we discuss whether this is really justified for multimodal functions. Taking jump functions and the $(1+1)$ evolutionary algorithm as the simplest example, we observe that larger mutation rates give significantly better runtimes. For the $\jump_{m,n}$ function, any mutation rate between $2/n$ and $m/n$ leads to a speed-up at least exponential in $m$ compared to the standard choice.
The asymptotically best runtime, obtained from using the mutation rate $m/n$ and leading to a speed-up super-exponential in $m$, is very sensitive to small changes of the mutation rate. Any deviation by a small $(1 \pm \eps)$ factor leads to a slow-down exponential in $m$. Consequently, any fixed mutation rate gives strongly sub-optimal results for most jump functions.
Building on this observation, we propose to use a random mutation rate $α/n$, where $α$ is chosen from a power-law distribution. We prove that the $(1+1)$ EA with this heavy-tailed mutation rate optimizes any $\jump_{m,n}$ function in a time that is only a small polynomial (in~$m$) factor above the one stemming from the optimal rate for this $m$.
Our heavy-tailed mutation operator yields similar speed-ups (over the best known performance guarantees) for the vertex cover problem in bipartite graphs and the matching problem in general graphs.
Following the example of fast simulated annealing, fast evolution strategies, and fast evolutionary programming, we propose to call genetic algorithms using a heavy-tailed mutation operator \emph{fast genetic algorithms}.
△ Less
Submitted 15 March, 2017; v1 submitted 9 March, 2017;
originally announced March 2017.
-
The Right Mutation Strength for Multi-Valued Decision Variables
Authors:
Benjamin Doerr,
Carola Doerr,
Timo Kötzing
Abstract:
The most common representation in evolutionary computation are bit strings. This is ideal to model binary decision variables, but less useful for variables taking more values. With very little theoretical work existing on how to use evolutionary algorithms for such optimization problems, we study the run time of simple evolutionary algorithms on some OneMax-like functions defined over…
▽ More
The most common representation in evolutionary computation are bit strings. This is ideal to model binary decision variables, but less useful for variables taking more values. With very little theoretical work existing on how to use evolutionary algorithms for such optimization problems, we study the run time of simple evolutionary algorithms on some OneMax-like functions defined over $Ω= \{0, 1, \dots, r-1\}^n$. More precisely, we regard a variety of problem classes requesting the component-wise minimization of the distance to an unknown target vector $z \in Ω$. For such problems we see a crucial difference in how we extend the standard-bit mutation operator to these multi-valued domains. While it is natural to select each position of the solution vector to be changed independently with probability $1/n$, there are various ways to then change such a position. If we change each selected position to a random value different from the original one, we obtain an expected run time of $Θ(nr \log n)$. If we change each selected position by either $+1$ or $-1$ (random choice), the optimization time reduces to $Θ(nr + n\log n)$. If we use a random mutation strength $i \in \{0,1,\ldots,r-1\}^n$ with probability inversely proportional to $i$ and change the selected position by either $+i$ or $-i$ (random choice), then the optimization time becomes $Θ(n \log(r)(\log(n)+\log(r)))$, bringing down the dependence on $r$ from linear to polylogarithmic. One of our results depends on a new variant of the lower bounding multiplicative drift theorem.
△ Less
Submitted 12 April, 2016;
originally announced April 2016.
-
Optimal Parameter Settings for the $(1+(λ, λ))$ Genetic Algorithm
Authors:
Benjamin Doerr
Abstract:
The $(1+(λ,λ))$ genetic algorithm is one of the few algorithms for which a super-constant speed-up through the use of crossover could be proven. So far, this algorithm has been used with parameters based also on intuitive considerations. In this work, we rigorously regard the whole parameter space and show that the asymptotic time complexity proven by Doerr and Doerr (GECCO 2015) for the intuitive…
▽ More
The $(1+(λ,λ))$ genetic algorithm is one of the few algorithms for which a super-constant speed-up through the use of crossover could be proven. So far, this algorithm has been used with parameters based also on intuitive considerations. In this work, we rigorously regard the whole parameter space and show that the asymptotic time complexity proven by Doerr and Doerr (GECCO 2015) for the intuitive choice is best possible among all settings for population size, mutation probability, and crossover bias.
△ Less
Submitted 28 July, 2016; v1 submitted 4 April, 2016;
originally announced April 2016.
-
Improved Protocols and Hardness Results for the Two-Player Cryptogenography Problem
Authors:
Benjamin Doerr,
Marvin Künnemann
Abstract:
The cryptogenography problem, introduced by Brody, Jakobsen, Scheder, and Winkler (ITCS 2014), is to collaboratively leak a piece of information known to only one member of a group (i)~without revealing who was the origin of this information and (ii)~without any private communication, neither during the process nor before. Despite several deep structural results, even the smallest case of leaking…
▽ More
The cryptogenography problem, introduced by Brody, Jakobsen, Scheder, and Winkler (ITCS 2014), is to collaboratively leak a piece of information known to only one member of a group (i)~without revealing who was the origin of this information and (ii)~without any private communication, neither during the process nor before. Despite several deep structural results, even the smallest case of leaking one bit of information present at one of two players is not well understood. Brody et al.\ gave a 2-round protocol enabling the two players to succeed with probability $1/3$ and showed the hardness result that no protocol can give a success probability of more than~$3/8$.
In this work, we show that neither bound is tight. Our new hardness result, obtained by a different application of the concavity method used also in the previous work, states that a success probability better than 0.3672 is not possible. Using both theoretical and numerical approaches, we improve the lower bound to $0.3384$, that is, give a protocol leading to this success probability. To ease the design of new protocols, we prove an equivalent formulation of the cryptogenography problem as solitaire vector splitting game. Via an automated game tree search, we find good strategies for this game. We then translate the splits that occurred in this strategy into inequalities relating position values and use an LP solver to find an optimal solution for these inequalities. This gives slightly better game values, but more importantly, it gives a more compact representation of the protocol and a way to easily verify the claimed quality of the protocol.
These improved bounds, as well as the large sizes and depths of the improved protocols we find, suggests that finding good protocols for the cryptogenography problem as well as understanding their structure are harder than what the simple problem formulation suggests.
△ Less
Submitted 19 March, 2016;
originally announced March 2016.
-
A Tight Runtime Analysis of the $(1+(λ, λ))$ Genetic Algorithm on OneMax
Authors:
Benjamin Doerr,
Carola Doerr
Abstract:
Understanding how crossover works is still one of the big challenges in evolutionary computation research, and making our understanding precise and proven by mathematical means might be an even bigger one. As one of few examples where crossover provably is useful, the $(1+(λ, λ))$ Genetic Algorithm (GA) was proposed recently in [Doerr, Doerr, Ebel: TCS 2015]. Using the fitness level method, the ex…
▽ More
Understanding how crossover works is still one of the big challenges in evolutionary computation research, and making our understanding precise and proven by mathematical means might be an even bigger one. As one of few examples where crossover provably is useful, the $(1+(λ, λ))$ Genetic Algorithm (GA) was proposed recently in [Doerr, Doerr, Ebel: TCS 2015]. Using the fitness level method, the expected optimization time on general OneMax functions was analyzed and a $O(\max\{n\log(n)/λ, λn\})$ bound was proven for any offspring population size $λ\in [1..n]$.
We improve this work in several ways, leading to sharper bounds and a better understanding of how the use of crossover speeds up the runtime in this algorithm. We first improve the upper bound on the runtime to $O(\max\{n\log(n)/λ, nλ\log\log(λ)/\log(λ)\})$. This improvement is made possible from observing that in the parallel generation of $λ$ offspring via crossover (but not mutation), the best of these often is better than the expected value, and hence several fitness levels can be gained in one iteration.
We then present the first lower bound for this problem. It matches our upper bound for all values of $λ$. This allows to determine the asymptotically optimal value for the population size. It is $λ= Θ(\sqrt{\log(n)\log\log(n)/\log\log\log(n)})$, which gives an optimization time of $Θ(n \sqrt{\log(n)\log\log\log(n)/\log\log(n)})$. Hence the improved runtime analysis gives a better runtime guarantee along with a better suggestion for the parameter $λ$.
We finally give a tail bound for the upper tail of the runtime distribution, which shows that the actual runtime exceeds our runtime guarantee by a factor of $(1+δ)$ with probability $O((n/λ^2)^{-δ})$ only.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
Solving Problems with Unknown Solution Length at (Almost) No Extra Cost
Authors:
Benjamin Doerr,
Carola Doerr,
Timo Kötzing
Abstract:
Most research in the theory of evolutionary computation assumes that the problem at hand has a fixed problem size. This assumption does not always apply to real-world optimization challenges, where the length of an optimal solution may be unknown a priori.
Following up on previous work of Cathabard, Lehre, and Yao [FOGA 2011] we analyze variants of the (1+1) evolutionary algorithm for problems w…
▽ More
Most research in the theory of evolutionary computation assumes that the problem at hand has a fixed problem size. This assumption does not always apply to real-world optimization challenges, where the length of an optimal solution may be unknown a priori.
Following up on previous work of Cathabard, Lehre, and Yao [FOGA 2011] we analyze variants of the (1+1) evolutionary algorithm for problems with unknown solution length. For their setting, in which the solution length is sampled from a geometric distribution, we provide mutation rates that yield an expected optimization time that is of the same order as that of the (1+1) EA knowing the solution length.
We then show that almost the same run times can be achieved even if \emph{no} a priori information on the solution length is available.
Finally, we provide mutation rates suitable for settings in which neither the solution length nor the positions of the relevant bits are known. Again we obtain almost optimal run times for the \textsc{OneMax} and \textsc{LeadingOnes} test functions, thus solving an open problem from Cathabard et al.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
Optimising Spatial and Tonal Data for PDE-based Inpainting
Authors:
Laurent Hoeltgen,
Markus Mainberger,
Sebastian Hoffmann,
Joachim Weickert,
Ching Hoo Tang,
Simon Setzer,
Daniel Johannsen,
Frank Neumann,
Benjamin Doerr
Abstract:
Some recent methods for lossy signal and image compression store only a few selected pixels and fill in the missing structures by inpainting with a partial differential equation (PDE). Suitable operators include the Laplacian, the biharmonic operator, and edge-enhancing anisotropic diffusion (EED). The quality of such approaches depends substantially on the selection of the data that is kept. Opti…
▽ More
Some recent methods for lossy signal and image compression store only a few selected pixels and fill in the missing structures by inpainting with a partial differential equation (PDE). Suitable operators include the Laplacian, the biharmonic operator, and edge-enhancing anisotropic diffusion (EED). The quality of such approaches depends substantially on the selection of the data that is kept. Optimising this data in the domain and codomain gives rise to challenging mathematical problems that shall be addressed in our work.
In the 1D case, we prove results that provide insights into the difficulty of this problem, and we give evidence that a splitting into spatial and tonal (i.e. function value) optimisation does hardly deteriorate the results. In the 2D setting, we present generic algorithms that achieve a high reconstruction quality even if the specified data is very sparse. To optimise the spatial data, we use a probabilistic sparsification, followed by a nonlocal pixel exchange that avoids getting trapped in bad local optima. After this spatial optimisation we perform a tonal optimisation that modifies the function values in order to reduce the global reconstruction error. For homogeneous diffusion inpainting, this comes down to a least squares problem for which we prove that it has a unique solution. We demonstrate that it can be found efficiently with a gradient descent approach that is accelerated with fast explicit diffusion (FED) cycles. Our framework allows to specify the desired density of the inpainting mask a priori. Moreover, is more generic than other data optimisation approaches for the sparse inpainting problem, since it can also be extended to nonlinear inpainting operators such as EED. This is exploited to achieve reconstructions with state-of-the-art quality.
We also give an extensive literature survey on PDE-based image compression methods.
△ Less
Submitted 15 June, 2015;
originally announced June 2015.
-
Optimal Parameter Choices Through Self-Adjustment: Applying the 1/5-th Rule in Discrete Settings
Authors:
Benjamin Doerr,
Carola Doerr
Abstract:
While evolutionary algorithms are known to be very successful for a broad range of applications, the algorithm designer is often left with many algorithmic choices, for example, the size of the population, the mutation rates, and the crossover rates of the algorithm. These parameters are known to have a crucial influence on the optimization time, and thus need to be chosen carefully, a task that o…
▽ More
While evolutionary algorithms are known to be very successful for a broad range of applications, the algorithm designer is often left with many algorithmic choices, for example, the size of the population, the mutation rates, and the crossover rates of the algorithm. These parameters are known to have a crucial influence on the optimization time, and thus need to be chosen carefully, a task that often requires substantial efforts. Moreover, the optimal parameters can change during the optimization process. It is therefore of great interest to design mechanisms that dynamically choose best-possible parameters. An example for such an update mechanism is the one-fifth success rule for step-size adaption in evolutionary strategies. While in continuous domains this principle is well understood also from a mathematical point of view, no comparable theory is available for problems in discrete domains.
In this work we show that the one-fifth success rule can be effective also in discrete settings. We regard the $(1+(λ,λ))$~GA proposed in [Doerr/Doerr/Ebel: From black-box complexity to designing new genetic algorithms, TCS 2015]. We prove that if its population size is chosen according to the one-fifth success rule then the expected optimization time on \textsc{OneMax} is linear. This is better than what \emph{any} static population size $λ$ can achieve and is asymptotically optimal also among all adaptive parameter choices.
△ Less
Submitted 13 April, 2015;
originally announced April 2015.
-
Statistical modality tagging from rule-based annotations and crowdsourcing
Authors:
Vinodkumar Prabhakaran,
Michael Bloodgood,
Mona Diab,
Bonnie Dorr,
Lori Levin,
Christine D. Piatko,
Owen Rambow,
Benjamin Van Durme
Abstract:
We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatic…
▽ More
We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.
△ Less
Submitted 3 March, 2015;
originally announced March 2015.
-
Use of Modality and Negation in Semantically-Informed Syntactic MT
Authors:
Kathryn Baker,
Michael Bloodgood,
Bonnie J. Dorr,
Chris Callison-Burch,
Nathaniel W. Filardo,
Christine Piatko,
Lori Levin,
Scott Miller
Abstract:
This paper describes the resource- and system-building efforts of an eight-week Johns Hopkins University Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, the creation of a (publicly available) MN lexicon, and two automated MN tagge…
▽ More
This paper describes the resource- and system-building efforts of an eight-week Johns Hopkins University Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, the creation of a (publicly available) MN lexicon, and two automated MN taggers that we built using the annotation scheme and lexicon. Our annotation scheme isolates three components of modality and negation: a trigger (a word that conveys modality or negation), a target (an action associated with modality or negation) and a holder (an experiencer of modality). We describe how our MN lexicon was semi-automatically produced and we demonstrate that a structure-based MN tagger results in precision around 86% (depending on genre) for tagging of a standard LDC data set.
We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations. Syntactic tags enriched with semantic annotations are assigned to parse trees in the target-language training texts through a process of tree grafting. While the focus of our work is modality and negation, the tree grafting procedure is general and supports other types of semantic information. We exploit this capability by including named entities, produced by a pre-existing tagger, in addition to the MN elements produced by the taggers described in this paper. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English test set. This finding supports the hypothesis that both syntactic and semantic information can improve translation quality.
△ Less
Submitted 5 February, 2015;
originally announced February 2015.
-
A Modality Lexicon and its use in Automatic Tagging
Authors:
Kathryn Baker,
Michael Bloodgood,
Bonnie J. Dorr,
Nathaniel W. Filardo,
Lori Levin,
Christine Piatko
Abstract:
This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotat…
▽ More
This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger---a structure-based tagger---results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Authors:
Kathryn Baker,
Michael Bloodgood,
Chris Callison-Burch,
Bonnie J. Dorr,
Nathaniel W. Filardo,
Lori Levin,
Scott Miller,
Christine Piatko
Abstract:
We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reporte…
▽ More
We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality---and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.
△ Less
Submitted 24 September, 2014;
originally announced September 2014.
-
Unbiased Black-Box Complexities of Jump Functions
Authors:
Benjamin Doerr,
Carola Doerr,
Timo Kötzing
Abstract:
We analyze the unbiased black-box complexity of jump functions with small, medium, and large sizes of the fitness plateau surrounding the optimal solution.
Among other results, we show that when the jump size is $(1/2 - \varepsilon)n$, that is, only a small constant fraction of the fitness values is visible, then the unbiased black-box complexities for arities $3$ and higher are of the same orde…
▽ More
We analyze the unbiased black-box complexity of jump functions with small, medium, and large sizes of the fitness plateau surrounding the optimal solution.
Among other results, we show that when the jump size is $(1/2 - \varepsilon)n$, that is, only a small constant fraction of the fitness values is visible, then the unbiased black-box complexities for arities $3$ and higher are of the same order as those for the simple \textsc{OneMax} function. Even for the extreme jump function, in which all but the two fitness values $n/2$ and $n$ are blanked out, polynomial-time mutation-based (i.e., unary unbiased) black-box optimization algorithms exist. This is quite surprising given that for the extreme jump function almost the whole search space (all but a $Θ(n^{-1/2})$ fraction) is a plateau of constant fitness.
To prove these results, we introduce new tools for the analysis of unbiased black-box complexities, for example, selecting the new parent individual not by comparing the fitnesses of the competing search points, but also by taking into account the (empirical) expected fitnesses of their offspring.
△ Less
Submitted 16 October, 2014; v1 submitted 30 March, 2014;
originally announced March 2014.
-
Generating Extractive Summaries of Scientific Paradigms
Authors:
Vahed Qazvinian,
Dragomir R. Radev,
Saif M. Mohammad,
Bonnie Dorr,
David Zajic,
Michael Whidby,
Taesun Moon
Abstract:
Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extra…
▽ More
Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.
△ Less
Submitted 3 February, 2014;
originally announced February 2014.
-
Collecting Coupons with Random Initial Stake
Authors:
Benjamin Doerr,
Carola Doerr
Abstract:
Motivated by a problem in the theory of randomized search heuristics, we give a very precise analysis for the coupon collector problem where the collector starts with a random set of coupons (chosen uniformly from all sets).
We show that the expected number of rounds until we have a coupon of each type is $nH_{n/2} - 1/2 \pm o(1)$, where $H_{n/2}$ denotes the $(n/2)$th harmonic number when $n$ i…
▽ More
Motivated by a problem in the theory of randomized search heuristics, we give a very precise analysis for the coupon collector problem where the collector starts with a random set of coupons (chosen uniformly from all sets).
We show that the expected number of rounds until we have a coupon of each type is $nH_{n/2} - 1/2 \pm o(1)$, where $H_{n/2}$ denotes the $(n/2)$th harmonic number when $n$ is even, and $H_{n/2}:= (1/2) H_{\lfloor n/2 \rfloor} + (1/2) H_{\lceil n/2 \rceil}$ when $n$ is odd. Consequently, the coupon collector with random initial stake is by half a round faster than the one starting with exactly $n/2$ coupons (apart from additive $o(1)$ terms).
This result implies that classic simple heuristic called \emph{randomized local search} needs an expected number of $nH_{n/2} - 1/2 \pm o(1)$ iterations to find the optimum of any monotonic function defined on bit-strings of length $n$.
△ Less
Submitted 29 August, 2013;
originally announced August 2013.
-
Computing Lexical Contrast
Authors:
Saif M. Mohammad,
Bonnie J. Dorr,
Graeme Hirst,
Peter D. Turney
Abstract:
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such as antipodals, complementaries, and gradable. However, existing lexicons often do not classify opp…
▽ More
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such as antipodals, complementaries, and gradable. However, existing lexicons often do not classify opposites into the different kinds. They also do not explicitly list word pairs that are not opposites but yet have some degree of contrast in meaning, such as {\rm warm} and {\rm cold} or {\rm tropical} and {\rm freezing}. We propose an automatic method to identify contrasting word pairs that is based on the hypothesis that if a pair of words, $A$ and $B$, are contrasting, then there is a pair of opposites, $C$ and $D$, such that $A$ and $C$ are strongly related and $B$ and $D$ are strongly related. (For example, there exists the pair of opposites {\rm hot} and {\rm cold} such that {\rm tropical} is related to {\rm hot,} and {\rm freezing} is related to {\rm cold}.) We will call this the contrast hypothesis. We begin with a large crowdsourcing experiment to determine the amount of human agreement on the concept of oppositeness and its different kinds. In the process, we flesh out key features of different kinds of opposites. We then present an automatic and empirical measure of lexical contrast that relies on the contrast hypothesis, corpus statistics, and the structure of a {\it Roget}-like thesaurus. We show that the proposed measure of lexical contrast obtains high precision and large coverage, outperforming existing methods.
△ Less
Submitted 28 August, 2013;
originally announced August 2013.
-
Improved Approximation Algorithms for the Min-Max Selecting Items Problem
Authors:
Benjamin Doerr
Abstract:
We give a simple deterministic $O(\log K / \log\log K)$ approximation algorithm for the Min-Max Selecting Items problem, where $K$ is the number of scenarios. While our main goal is simplicity, this result also improves over the previous best approximation ratio of $O(\log K)$ due to Kasperski, Kurpisz, and Zieliński (Information Processing Letters (2013)). Despite using the method of pessimistic…
▽ More
We give a simple deterministic $O(\log K / \log\log K)$ approximation algorithm for the Min-Max Selecting Items problem, where $K$ is the number of scenarios. While our main goal is simplicity, this result also improves over the previous best approximation ratio of $O(\log K)$ due to Kasperski, Kurpisz, and Zieliński (Information Processing Letters (2013)). Despite using the method of pessimistic estimators, the algorithm has a polynomial runtime also in the RAM model of computation. We also show that the LP formulation for this problem by Kasperski and Zieliński (Annals of Operations Research (2009)), which is the basis for the previous work and ours, has an integrality gap of at least $Ω(\log K / \log\log K)$.
△ Less
Submitted 27 April, 2013;
originally announced April 2013.
-
Winkler's Hat Guessing Game: Better Results for Imbalanced Hat Distributions
Authors:
Benjamin Doerr
Abstract:
In this note, we give an explicit polynomial-time executable strategy for Peter Winkler's hat guessing game that gives superior results if the distribution of hats is imbalanced. While Winkler's strategy guarantees in any case that $\lfloor n/2 \rfloor$ of the $n$ player guess their hat color correct, our strategy ensures that the players produce $\max\{r,b\} - 1.2 n^{2/3} -2$ correct guesses for…
▽ More
In this note, we give an explicit polynomial-time executable strategy for Peter Winkler's hat guessing game that gives superior results if the distribution of hats is imbalanced. While Winkler's strategy guarantees in any case that $\lfloor n/2 \rfloor$ of the $n$ player guess their hat color correct, our strategy ensures that the players produce $\max\{r,b\} - 1.2 n^{2/3} -2$ correct guesses for any distribution of $r$ red and $b = n - r$ blue hats. We also show that any strategy ensuring $\max\{r,b\} - f(n)$ correct guesses necessarily has $f(n) = Ω(\sqrt n)$.
△ Less
Submitted 28 March, 2013;
originally announced March 2013.
-
Online Checkpointing with Improved Worst-Case Guarantees
Authors:
Karl Bringmann,
Benjamin Doerr,
Adrian Neumann,
Jakub Sliacan
Abstract:
In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart. The only operation allowed is to replace an old checkpoint by the current state. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t <=…
▽ More
In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart. The only operation allowed is to replace an old checkpoint by the current state. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t <= T the number of computation steps that need to be redone to get to t from a checkpoint before t is as small as possible. In particular, we want that the closest checkpoint earlier than t is not further away from t than q_k times the ideal distance T / (k+1), where q_k is a small constant.
Improving over earlier work showing 1 + 1/k <= q_k <= 2, we show that q_k can be chosen asymptotically less than 2. We present algorithms with asymptotic discrepancy q_k <= 1.59 + o(1) valid for all k and q_k <= ln(4) + o(1) <= 1.39 + o(1) valid for k being a power of two. Experiments indicate the uniform bound p_k <= 1.7 for all k. For small k, we show how to use a linear programming approach to compute good checkpointing algorithms. This gives discrepancies of less than 1.55 for all k < 60.
We prove the first lower bound that is asymptotically more than one, namely q_k >= 1.30 - o(1). We also show that optimal algorithms (yielding the infimum discrepancy) exist for all k.
△ Less
Submitted 30 April, 2013; v1 submitted 18 February, 2013;
originally announced February 2013.
-
Evolutionary Algorithms and Dynamic Programming
Authors:
Benjamin Doerr,
Anton Eremeev,
Frank Neumann,
Madeleine Theile,
Christian Thyssen
Abstract:
Recently, it has been proven that evolutionary algorithms produce good results for a wide range of combinatorial optimization problems. Some of the considered problems are tackled by evolutionary algorithms that use a representation which enables them to construct solutions in a dynamic programming fashion. We take a general approach and relate the construction of such algorithms to the developmen…
▽ More
Recently, it has been proven that evolutionary algorithms produce good results for a wide range of combinatorial optimization problems. Some of the considered problems are tackled by evolutionary algorithms that use a representation which enables them to construct solutions in a dynamic programming fashion. We take a general approach and relate the construction of such algorithms to the development of algorithms using dynamic programming techniques. Thereby, we give general guidelines on how to develop evolutionary algorithms that have the additional ability of carrying out dynamic programming steps. Finally, we show that for a wide class of the so-called DP-benevolent problems (which are known to admit FPTAS) there exists a fully polynomial-time randomized approximation scheme based on an evolutionary algorithm.
△ Less
Submitted 17 January, 2013;
originally announced January 2013.
-
Black-Box Complexity: Breaking the $O(n \log n)$ Barrier of LeadingOnes
Authors:
Benjamin Doerr,
Carola Winzen
Abstract:
We show that the unrestricted black-box complexity of the $n$-dimensional XOR- and permutation-invariant LeadingOnes function class is $O(n \log (n) / \log \log n)$. This shows that the recent natural looking $O(n\log n)$ bound is not tight.
The black-box optimization algorithm leading to this bound can be implemented in a way that only 3-ary unbiased variation operators are used. Hence our boun…
▽ More
We show that the unrestricted black-box complexity of the $n$-dimensional XOR- and permutation-invariant LeadingOnes function class is $O(n \log (n) / \log \log n)$. This shows that the recent natural looking $O(n\log n)$ bound is not tight.
The black-box optimization algorithm leading to this bound can be implemented in a way that only 3-ary unbiased variation operators are used. Hence our bound is also valid for the unbiased black-box complexity recently introduced by Lehre and Witt (GECCO 2010). The bound also remains valid if we impose the additional restriction that the black-box algorithm does not have access to the objective values but only to their relative order (ranking-based black-box complexity).
△ Less
Submitted 24 October, 2012;
originally announced October 2012.
-
A Lower Bound for the Discrepancy of a Random Point Set
Authors:
Benjamin Doerr
Abstract:
We show that there is a constant $K > 0$ such that for all $N, s \in \N$, $s \le N$, the point set consisting of $N$ points chosen uniformly at random in the $s$-dimensional unit cube $[0,1]^s$ with probability at least $1-\exp(-Θ(s))$ admits an axis parallel rectangle $[0,x] \subseteq [0,1]^s$ containing $K \sqrt{sN}$ points more than expected. Consequently, the expected star discrepancy of a ran…
▽ More
We show that there is a constant $K > 0$ such that for all $N, s \in \N$, $s \le N$, the point set consisting of $N$ points chosen uniformly at random in the $s$-dimensional unit cube $[0,1]^s$ with probability at least $1-\exp(-Θ(s))$ admits an axis parallel rectangle $[0,x] \subseteq [0,1]^s$ containing $K \sqrt{sN}$ points more than expected. Consequently, the expected star discrepancy of a random point set is of order $\sqrt{s/N}$.
△ Less
Submitted 5 October, 2013; v1 submitted 1 October, 2012;
originally announced October 2012.
-
The Price of Anarchy for Selfish Ring Routing is Two
Authors:
Xujin Chen,
Benjamin Doerr,
Xiaodong Hu,
Weidong Ma,
Rob van Stee,
Carola Winzen
Abstract:
We analyze the network congestion game with atomic players, asymmetric strategies, and the maximum latency among all players as social cost. This important social cost function is much less understood than the average latency. We show that the price of anarchy is at most two, when the network is a ring and the link latencies are linear. Our bound is tight. This is the first sharp bound for the max…
▽ More
We analyze the network congestion game with atomic players, asymmetric strategies, and the maximum latency among all players as social cost. This important social cost function is much less understood than the average latency. We show that the price of anarchy is at most two, when the network is a ring and the link latencies are linear. Our bound is tight. This is the first sharp bound for the maximum latency objective.
△ Less
Submitted 30 September, 2012;
originally announced October 2012.
-
Simple and Optimal Randomized Fault-Tolerant Rumor Spreading
Authors:
Benjamin Doerr,
Carola Doerr,
Shay Moran,
Shlomo Moran
Abstract:
We revisit the classic problem of spreading a piece of information in a group of $n$ fully connected processors. By suitably adding a small dose of randomness to the protocol of Gasienic and Pelc (1996), we derive for the first time protocols that (i) use a linear number of messages, (ii) are correct even when an arbitrary number of adversarially chosen processors does not participate in the proce…
▽ More
We revisit the classic problem of spreading a piece of information in a group of $n$ fully connected processors. By suitably adding a small dose of randomness to the protocol of Gasienic and Pelc (1996), we derive for the first time protocols that (i) use a linear number of messages, (ii) are correct even when an arbitrary number of adversarially chosen processors does not participate in the process, and (iii) with high probability have the asymptotically optimal runtime of $O(\log n)$ when at least an arbitrarily small constant fraction of the processors are working. In addition, our protocols do not require that the system is synchronized nor that all processors are simultaneously woken up at time zero, they are fully based on push-operations, and they do not need an a priori estimate on the number of failed nodes.
Our protocols thus overcome the typical disadvantages of the two known approaches, algorithms based on random gossip (typically needing a large number of messages due to their unorganized nature) and algorithms based on fair workload splitting (which are either not {time-efficient} or require intricate preprocessing steps plus synchronization).
△ Less
Submitted 5 January, 2015; v1 submitted 27 September, 2012;
originally announced September 2012.
-
Playing Mastermind with Many Colors
Authors:
Benjamin Doerr,
Carola Doerr,
Reto Spöhel,
Henning Thomas
Abstract:
We analyze the general version of the classic guessing game Mastermind with $n$ positions and $k$ colors. Since the case $k \le n^{1-\varepsilon}$, $\varepsilon>0$ a constant, is well understood, we concentrate on larger numbers of colors. For the most prominent case $k = n$, our results imply that Codebreaker can find the secret code with $O(n \log \log n)$ guesses. This bound is valid also when…
▽ More
We analyze the general version of the classic guessing game Mastermind with $n$ positions and $k$ colors. Since the case $k \le n^{1-\varepsilon}$, $\varepsilon>0$ a constant, is well understood, we concentrate on larger numbers of colors. For the most prominent case $k = n$, our results imply that Codebreaker can find the secret code with $O(n \log \log n)$ guesses. This bound is valid also when only black answer-pegs are used. It improves the $O(n \log n)$ bound first proven by Chvátal (Combinatorica 3 (1983), 325--329). We also show that if both black and white answer-pegs are used, then the $O(n \log\log n)$ bound holds for up to $n^2 \log\log n$ colors. These bounds are almost tight as the known lower bound of $Ω(n)$ shows. Unlike for $k \le n^{1-\varepsilon}$, simply guessing at random until the secret code is determined is not sufficient. In fact, we show that an optimal non-adaptive strategy (deterministic or randomized) needs $Θ(n \log n)$ guesses.
△ Less
Submitted 17 January, 2013; v1 submitted 3 July, 2012;
originally announced July 2012.
-
More Effective Crossover Operators for the All-Pairs Shortest Path Problem
Authors:
Benjamin Doerr,
Daniel Johannsen,
Timo Kötzing,
Frank Neumann,
Madeleine Theile
Abstract:
The all-pairs shortest path problem is the first non-artificial problem for which it was shown that adding crossover can significantly speed up a mutation-only evolutionary algorithm. Recently, the analysis of this algorithm was refined and it was shown to have an expected optimization time (w.r.t. the number of fitness evaluations) of $Θ(n^{3.25}(\log n)^{0.25})$.
In contrast to this simple alg…
▽ More
The all-pairs shortest path problem is the first non-artificial problem for which it was shown that adding crossover can significantly speed up a mutation-only evolutionary algorithm. Recently, the analysis of this algorithm was refined and it was shown to have an expected optimization time (w.r.t. the number of fitness evaluations) of $Θ(n^{3.25}(\log n)^{0.25})$.
In contrast to this simple algorithm, evolutionary algorithms used in practice usually employ refined recombination strategies in order to avoid the creation of infeasible offspring. We study extensions of the basic algorithm by two such concepts which are central in recombination, namely \emph{repair mechanisms} and \emph{parent selection}. We show that repairing infeasible offspring leads to an improved expected optimization time of $\mathord{O}(n^{3.2}(\log n)^{0.2})$. As a second part of our study we prove that choosing parents that guarantee feasible offspring results in an even better optimization time of $\mathord{O}(n^{3}\log n)$.
Both results show that already simple adjustments of the recombination operator can asymptotically improve the runtime of evolutionary algorithms.
△ Less
Submitted 2 July, 2012;
originally announced July 2012.
-
Reducing the Arity in Unbiased Black-Box Complexity
Authors:
Benjamin Doerr,
Carola Winzen
Abstract:
We show that for all $1<k \leq \log n$ the $k$-ary unbiased black-box complexity of the $n$-dimensional $\onemax$ function class is $O(n/k)$. This indicates that the power of higher arity operators is much stronger than what the previous $O(n/\log k)$ bound by Doerr et al. (Faster black-box algorithms through higher arity operators, Proc. of FOGA 2011, pp. 163--172, ACM, 2011) suggests.
The key…
▽ More
We show that for all $1<k \leq \log n$ the $k$-ary unbiased black-box complexity of the $n$-dimensional $\onemax$ function class is $O(n/k)$. This indicates that the power of higher arity operators is much stronger than what the previous $O(n/\log k)$ bound by Doerr et al. (Faster black-box algorithms through higher arity operators, Proc. of FOGA 2011, pp. 163--172, ACM, 2011) suggests.
The key to this result is an encoding strategy, which might be of independent interest. We show that, using $k$-ary unbiased variation operators only, we may simulate an unrestricted memory of size $O(2^k)$ bits.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Playing Mastermind With Constant-Size Memory
Authors:
Benjamin Doerr,
Carola Winzen
Abstract:
We analyze the classic board game of Mastermind with $n$ holes and a constant number of colors. A result of Chvátal (Combinatorica 3 (1983), 325-329) states that the codebreaker can find the secret code with $Θ(n / \log n)$ questions. We show that this bound remains valid if the codebreaker may only store a constant number of guesses and answers. In addition to an intrinsic interest in this questi…
▽ More
We analyze the classic board game of Mastermind with $n$ holes and a constant number of colors. A result of Chvátal (Combinatorica 3 (1983), 325-329) states that the codebreaker can find the secret code with $Θ(n / \log n)$ questions. We show that this bound remains valid if the codebreaker may only store a constant number of guesses and answers. In addition to an intrinsic interest in this question, our result also disproves a conjecture of Droste, Jansen, and Wegener (Theory of Computing Systems 39 (2006), 525-544) on the memory-restricted black-box complexity of the OneMax function class.
△ Less
Submitted 17 October, 2011;
originally announced October 2011.
-
Black-Box Complexities of Combinatorial Problems
Authors:
Benjamin Doerr,
Timo Kötzing,
Johannes Lengler,
Carola Winzen
Abstract:
Black-box complexity is a complexity theoretic measure for how difficult a problem is to be optimized by a general purpose optimization algorithm. It is thus one of the few means trying to understand which problems are tractable for genetic algorithms and other randomized search heuristics.
Most previous work on black-box complexity is on artificial test functions. In this paper, we move a step…
▽ More
Black-box complexity is a complexity theoretic measure for how difficult a problem is to be optimized by a general purpose optimization algorithm. It is thus one of the few means trying to understand which problems are tractable for genetic algorithms and other randomized search heuristics.
Most previous work on black-box complexity is on artificial test functions. In this paper, we move a step forward and give a detailed analysis for the two combinatorial problems minimum spanning tree and single-source shortest paths. Besides giving interesting bounds for their black-box complexities, our work reveals that the choice of how to model the optimization problem is non-trivial here. This in particular comes true where the search space does not consist of bit strings and where a reasonable definition of unbiasedness has to be agreed on.
△ Less
Submitted 1 August, 2011;
originally announced August 2011.
-
Adaptive Drift Analysis
Authors:
Benjamin Doerr,
Leslie Ann Goldberg
Abstract:
We show that, for any c>0, the (1+1) evolutionary algorithm using an arbitrary mutation rate p_n = c/n finds the optimum of a linear objective function over bit strings of length n in expected time Theta(n log n). Previously, this was only known for c at most 1. Since previous work also shows that universal drift functions cannot exist for c larger than a certain constant, we instead define drift…
▽ More
We show that, for any c>0, the (1+1) evolutionary algorithm using an arbitrary mutation rate p_n = c/n finds the optimum of a linear objective function over bit strings of length n in expected time Theta(n log n). Previously, this was only known for c at most 1. Since previous work also shows that universal drift functions cannot exist for c larger than a certain constant, we instead define drift functions which depend crucially on the relevant objective functions (and also on c itself). Using these carefully-constructed drift functions, we prove that the expected optimisation time is Theta(n log n). By giving an alternative proof of the multiplicative drift theorem, we also show that our optimisation-time bound holds with high probability.
△ Less
Submitted 27 September, 2011; v1 submitted 1 August, 2011;
originally announced August 2011.
-
Ranking-Based Black-Box Complexity
Authors:
Benjamin Doerr,
Carola Winzen
Abstract:
Randomized search heuristics such as evolutionary algorithms, simulated annealing, and ant colony optimization are a broadly used class of general-purpose algorithms. Analyzing them via classical methods of theoretical computer science is a growing field. While several strong runtime analysis results have appeared in the last 20 years, a powerful complexity theory for such algorithms is yet to be…
▽ More
Randomized search heuristics such as evolutionary algorithms, simulated annealing, and ant colony optimization are a broadly used class of general-purpose algorithms. Analyzing them via classical methods of theoretical computer science is a growing field. While several strong runtime analysis results have appeared in the last 20 years, a powerful complexity theory for such algorithms is yet to be developed. We enrich the existing notions of black-box complexity by the additional restriction that not the actual objective values, but only the relative quality of the previously evaluated solutions may be taken into account by the black-box algorithm. Many randomized search heuristics belong to this class of algorithms.
We show that the new ranking-based model gives more realistic complexity estimates for some problems. For example, the class of all binary-value functions has a black-box complexity of $O(\log n)$ in the previous black-box models, but has a ranking-based complexity of $Θ(n)$.
For the class of all OneMax functions, we present a ranking-based black-box algorithm that has a runtime of $Θ(n / \log n)$, which shows that the OneMax problem does not become harder with the additional ranking-basedness restriction.
△ Less
Submitted 3 September, 2012; v1 submitted 6 February, 2011;
originally announced February 2011.
-
Multiplicative Drift Analysis
Authors:
Benjamin Doerr,
Daniel Johannsen,
Carola Winzen
Abstract:
In this work, we introduce multiplicative drift analysis as a suitable way to analyze the runtime of randomized search heuristics such as evolutionary algorithms.
We give a multiplicative version of the classical drift theorem. This allows easier analyses in those settings where the optimization progress is roughly proportional to the current distance to the optimum.
To display the strength of…
▽ More
In this work, we introduce multiplicative drift analysis as a suitable way to analyze the runtime of randomized search heuristics such as evolutionary algorithms.
We give a multiplicative version of the classical drift theorem. This allows easier analyses in those settings where the optimization progress is roughly proportional to the current distance to the optimum.
To display the strength of this tool, we regard the classical problem how the (1+1) Evolutionary Algorithm optimizes an arbitrary linear pseudo-Boolean function. Here, we first give a relatively simple proof for the fact that any linear function is optimized in expected time $O(n \log n)$, where $n$ is the length of the bit string. Afterwards, we show that in fact any such function is optimized in expected time at most ${(1+o(1)) 1.39 \euler n\ln (n)}$, again using multiplicative drift analysis. We also prove a corresponding lower bound of ${(1-o(1))e n\ln(n)}$ which actually holds for all functions with a unique global optimum.
We further demonstrate how our drift theorem immediately gives natural proofs (with better constants) for the best known runtime bounds for the (1+1) Evolutionary Algorithm on combinatorial problems like finding minimum spanning trees, shortest paths, or Euler tours.
△ Less
Submitted 4 January, 2011;
originally announced January 2011.
-
Quasirandom Rumor Spreading: An Experimental Analysis
Authors:
Benjamin Doerr,
Tobias Friedrich,
Marvin Künnemann,
Thomas Sauerwald
Abstract:
We empirically analyze two versions of the well-known "randomized rumor spreading" protocol to disseminate a piece of information in networks. In the classical model, in each round each informed node informs a random neighbor. In the recently proposed quasirandom variant, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on inf…
▽ More
We empirically analyze two versions of the well-known "randomized rumor spreading" protocol to disseminate a piece of information in networks. In the classical model, in each round each informed node informs a random neighbor. In the recently proposed quasirandom variant, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list. While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model. In this work, we compare the two models experimentally. This not only shows that the quasirandom model generally is faster, but also that the runtime is more concentrated around the mean. This is surprising given that much fewer random bits are used in the quasirandom process. These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that typically the particular structure of the lists has little influence on the efficiency.
△ Less
Submitted 24 December, 2010;
originally announced December 2010.
-
Quasirandom Rumor Spreading
Authors:
Benjamin Doerr,
Tobias Friedrich,
Thomas Sauerwald
Abstract:
We propose and analyze a quasirandom analogue of the classical push model for disseminating information in networks ("randomized rumor spreading"). In the classical model, in each round each informed vertex chooses a neighbor at random and informs it, if it was not informed before. It is known that this simple protocol succeeds in spreading a rumor from one vertex to all others within O(log n) rou…
▽ More
We propose and analyze a quasirandom analogue of the classical push model for disseminating information in networks ("randomized rumor spreading"). In the classical model, in each round each informed vertex chooses a neighbor at random and informs it, if it was not informed before. It is known that this simple protocol succeeds in spreading a rumor from one vertex to all others within O(log n) rounds on complete graphs, hypercubes, random regular graphs, Erdos-Renyi random graph and Ramanujan graphs with probability 1-o(1). In the quasirandom model, we assume that each vertex has a (cyclic) list of its neighbors. Once informed, it starts at a random position on the list, but from then on informs its neighbors in the order of the list. Surprisingly, irrespective of the orders of the lists, the above-mentioned bounds still hold. In some cases, even better bounds than for the classical model can be shown.
△ Less
Submitted 7 August, 2013; v1 submitted 24 December, 2010;
originally announced December 2010.
-
Faster Black-Box Algorithms Through Higher Arity Operators
Authors:
Benjamin Doerr,
Daniel Johannsen,
Timo Kötzing,
Per Kristian Lehre,
Markus Wagner,
Carola Winzen
Abstract:
We extend the work of Lehre and Witt (GECCO 2010) on the unbiased black-box model by considering higher arity variation operators. In particular, we show that already for binary operators the black-box complexity of \leadingones drops from $Θ(n^2)$ for unary operators to $O(n \log n)$. For \onemax, the $Ω(n \log n)$ unary black-box complexity drops to O(n) in the binary case. For $k$-ary operators…
▽ More
We extend the work of Lehre and Witt (GECCO 2010) on the unbiased black-box model by considering higher arity variation operators. In particular, we show that already for binary operators the black-box complexity of \leadingones drops from $Θ(n^2)$ for unary operators to $O(n \log n)$. For \onemax, the $Ω(n \log n)$ unary black-box complexity drops to O(n) in the binary case. For $k$-ary operators, $k \leq n$, the \onemax-complexity further decreases to $O(n/\log k)$.
△ Less
Submitted 4 December, 2010;
originally announced December 2010.
-
Non-Existence of Linear Universal Drift Functions
Authors:
Benjamin Doerr,
Daniel Johannsen,
Carola Winzen
Abstract:
Drift analysis has become a powerful tool to prove bounds on the runtime of randomized search heuristics. It allows, for example, fairly simple proofs for the classical problem how the (1+1) Evolutionary Algorithm (EA) optimizes an arbitrary pseudo-Boolean linear function. The key idea of drift analysis is to measure the progress via another pseudo-Boolean function (called drift function) and use…
▽ More
Drift analysis has become a powerful tool to prove bounds on the runtime of randomized search heuristics. It allows, for example, fairly simple proofs for the classical problem how the (1+1) Evolutionary Algorithm (EA) optimizes an arbitrary pseudo-Boolean linear function. The key idea of drift analysis is to measure the progress via another pseudo-Boolean function (called drift function) and use deeper results from probability theory to derive from this a good bound for the runtime of the EA. Surprisingly, all these results manage to use the same drift function for all linear objective functions.
In this work, we show that such universal drift functions only exist if the mutation probability is close to the standard value of $1/n$.
△ Less
Submitted 15 November, 2010;
originally announced November 2010.
-
Asymptotically Optimal Randomized Rumor Spreading
Authors:
Benjamin Doerr,
Mahmoud Fouz
Abstract:
We propose a new protocol solving the fundamental problem of disseminating a piece of information to all members of a group of n players. It builds upon the classical randomized rumor spreading protocol and several extensions. The main achievements are the following:
Our protocol spreads the rumor to all other nodes in the asymptotically optimal time of (1 + o(1)) \log_2 n. The whole process can…
▽ More
We propose a new protocol solving the fundamental problem of disseminating a piece of information to all members of a group of n players. It builds upon the classical randomized rumor spreading protocol and several extensions. The main achievements are the following:
Our protocol spreads the rumor to all other nodes in the asymptotically optimal time of (1 + o(1)) \log_2 n. The whole process can be implemented in a way such that only O(n f(n)) calls are made, where f(n)= ω(1) can be arbitrary.
In contrast to other protocols suggested in the literature, our algorithm only uses push operations, i.e., only informed nodes take active actions in the network. To the best of our knowledge, this is the first randomized push algorithm that achieves an asymptotically optimal running time.
△ Less
Submitted 17 November, 2010; v1 submitted 8 November, 2010;
originally announced November 2010.
-
Optimizing Monotone Functions Can Be Difficult
Authors:
Benjamin Doerr,
Thomas Jansen,
Dirk Sudholt,
Carola Winzen,
Christine Zarges
Abstract:
Extending previous analyses on function classes like linear functions, we analyze how the simple (1+1) evolutionary algorithm optimizes pseudo-Boolean functions that are strictly monotone. Contrary to what one would expect, not all of these functions are easy to optimize. The choice of the constant $c$ in the mutation probability $p(n) = c/n$ can make a decisive difference.
We show that if…
▽ More
Extending previous analyses on function classes like linear functions, we analyze how the simple (1+1) evolutionary algorithm optimizes pseudo-Boolean functions that are strictly monotone. Contrary to what one would expect, not all of these functions are easy to optimize. The choice of the constant $c$ in the mutation probability $p(n) = c/n$ can make a decisive difference.
We show that if $c < 1$, then the (1+1) evolutionary algorithm finds the optimum of every such function in $Θ(n \log n)$ iterations. For $c=1$, we can still prove an upper bound of $O(n^{3/2})$. However, for $c > 33$, we present a strictly monotone function such that the (1+1) evolutionary algorithm with overwhelming probability does not find the optimum within $2^{Ω(n)}$ iterations. This is the first time that we observe that a constant factor change of the mutation probability changes the run-time by more than constant factors.
△ Less
Submitted 14 October, 2010; v1 submitted 7 October, 2010;
originally announced October 2010.
-
Quasi-Random Rumor Spreading: Reducing Randomness Can Be Costly
Authors:
Benjamin Doerr,
Mahmoud Fouz
Abstract:
We give a time-randomness tradeoff for the quasi-random rumor spreading protocol proposed by Doerr, Friedrich and Sauerwald [SODA 2008] on complete graphs. In this protocol, the goal is to spread a piece of information originating from one vertex throughout the network. Each vertex is assumed to have a (cyclic) list of its neighbors. Once a vertex is informed by one of its neighbors, it chooses a…
▽ More
We give a time-randomness tradeoff for the quasi-random rumor spreading protocol proposed by Doerr, Friedrich and Sauerwald [SODA 2008] on complete graphs. In this protocol, the goal is to spread a piece of information originating from one vertex throughout the network. Each vertex is assumed to have a (cyclic) list of its neighbors. Once a vertex is informed by one of its neighbors, it chooses a position in its list uniformly at random and then informs its neighbors starting from that position and proceeding in order of the list. Angelopoulos, Doerr, Huber and Panagiotou [Electron.~J.~Combin.~2009] showed that after $(1+o(1))(\log_2 n + \ln n)$ rounds, the rumor will have been broadcasted to all nodes with probability $1 - o(1)$.
We study the broadcast time when the amount of randomness available at each node is reduced in natural way. In particular, we prove that if each node can only make its initial random selection from every $\ell$-th node on its list, then there exists lists such that $(1-\varepsilon) (\log_2 n + \ln n - \log_2 \ell - \ln \ell)+\ell-1$ steps are needed to inform every vertex with probability at least $1-O\bigl(\exp\bigl(-\frac{n^\varepsilon}{2\ln n}\bigr)\bigr)$. This shows that a further reduction of the amount of randomness used in a simple quasi-random protocol comes at a loss of efficiency.
△ Less
Submitted 3 August, 2010;
originally announced August 2010.
-
Randomized Rounding for Routing and Covering Problems: Experiments and Improvements
Authors:
Benjamin Doerr,
Marvin Künnemann,
Magnus Wahlström
Abstract:
Following previous theoretical work by Srinivasan (FOCS 2001) and the first author (STACS 2006) and a first experimental evaluation on random instances (ALENEX 2009), we investigate how the recently developed different approaches to generate randomized roundings satisfying disjoint cardinality constraints behave when used in two classical algorithmic problems, namely low-congestion routing in netw…
▽ More
Following previous theoretical work by Srinivasan (FOCS 2001) and the first author (STACS 2006) and a first experimental evaluation on random instances (ALENEX 2009), we investigate how the recently developed different approaches to generate randomized roundings satisfying disjoint cardinality constraints behave when used in two classical algorithmic problems, namely low-congestion routing in networks and max-coverage problems in hypergraphs.
We generally find that all randomized rounding algorithms work well, much better than what is guaranteed by existing theoretical work. The derandomized versions produce again significantly better rounding errors, with running times still negligible compared to the one for solving the corresponding LP. It thus seems worth preferring them over the randomized variants.
The data created in these experiments lets us propose and investigate the following new ideas. For the low-congestion routing problems, we suggest to solve a second LP, which yields the same congestion, but aims at producing a solution that is easier to round. Experiments show that this reduces the rounding errors considerably, both in combination with randomized and derandomized rounding.
For the max-coverage instances, we generally observe that the greedy heuristics also performs very good. We develop a strengthened method of derandomized rounding, and a simple greedy/rounding hybrid approach using greedy and LP-based rounding elements, and observe that both these improvements yield again better solutions than both earlier approaches on their own.
For unit disk max-domination, we also develop a PTAS. Contrary to all other algorithms investigated, it performs not much better in experiments than in theory; thus, unless extremely good solutions are to be obtained with huge computational resources, greedy, LP-based rounding or hybrid approaches are preferable.
△ Less
Submitted 2 July, 2010;
originally announced July 2010.
-
Deterministic Random Walks on Regular Trees
Authors:
Joshua Cooper,
Benjamin Doerr,
Tobias Friedrich,
Joel Spencer
Abstract:
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order.
Cooper and Spencer (Comb. Probab. Comput. (2006)) show a remarkable similarity of both models. If an (almost) arbitrary population of chips is placed on the vertices of a grid $\Z^d$ and does a simultaneous walk in the…
▽ More
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order.
Cooper and Spencer (Comb. Probab. Comput. (2006)) show a remarkable similarity of both models. If an (almost) arbitrary population of chips is placed on the vertices of a grid $\Z^d$ and does a simultaneous walk in the Propp model, then at all times and on each vertex, the number of chips on this vertex deviates from the expected number the random walk would have gotten there by at most a constant. This constant is independent of the starting configuration and the order in which each vertex serves its neighbors.
This result raises the question if all graphs do have this property. With quite some effort, we are now able to answer this question negatively. For the graph being an infinite $k$-ary tree ($k \ge 3$), we show that for any deviation $D$ there is an initial configuration of chips such that after running the Propp model for a certain time there is a vertex with at least $D$ more chips than expected in the random walk model. However, to achieve a deviation of $D$ it is necessary that at least $\exp(Ω(D^2))$ vertices contribute by being occupied by a number of chips not divisible by $k$ at a certain time.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.
-
Strong Robustness of Randomized Rumor Spreading Protocols
Authors:
Benjamin Doerr,
Anna Huber,
Ariel Levavi
Abstract:
Randomized rumor spreading is a classical protocol to disseminate information across a network. At SODA 2008, a quasirandom version of this protocol was proposed and competitive bounds for its run-time were proven. This prompts the question: to what extent does the quasirandom protocol inherit the second principal advantage of randomized rumor spreading, namely robustness against transmission fail…
▽ More
Randomized rumor spreading is a classical protocol to disseminate information across a network. At SODA 2008, a quasirandom version of this protocol was proposed and competitive bounds for its run-time were proven. This prompts the question: to what extent does the quasirandom protocol inherit the second principal advantage of randomized rumor spreading, namely robustness against transmission failures?
In this paper, we present a result precise up to $(1 \pm o(1))$ factors. We limit ourselves to the network in which every two vertices are connected by a direct link. Run-times accurate to their leading constants are unknown for all other non-trivial networks.
We show that if each transmission reaches its destination with a probability of $p \in (0,1]$, after $(1+\e)(\frac{1}{\log_2(1+p)}\log_2n+\frac{1}{p}\ln n)$ rounds the quasirandom protocol has informed all $n$ nodes in the network with probability at least $1-n^{-p\e/40}$. Note that this is faster than the intuitively natural $1/p$ factor increase over the run-time of approximately $\log_2 n + \ln n $ for the non-corrupted case.
We also provide a corresponding lower bound for the classical model. This demonstrates that the quasirandom model is at least as robust as the fully random model despite the greatly reduced degree of independent randomness.
△ Less
Submitted 3 October, 2012; v1 submitted 18 January, 2010;
originally announced January 2010.
-
Deterministic Random Walks on the Two-Dimensional Grid
Authors:
Benjamin Doerr,
Tobias Friedrich
Abstract:
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order. We analyze the difference between Propp machine and random walk on the infinite two-dimensional grid. It is known that, apart from a technicality, independent of the starting configuration, at each time, the number of c…
▽ More
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order. We analyze the difference between Propp machine and random walk on the infinite two-dimensional grid. It is known that, apart from a technicality, independent of the starting configuration, at each time, the number of chips on each vertex in the Propp model deviates from the expected number of chips in the random walk model by at most a constant. We show that this constant is approximately 7.8, if all vertices serve their neighbors in clockwise or counterclockwise order and 7.3 otherwise. This result in particular shows that the order in which the neighbors are served makes a difference. Our analysis also reveals a number of further unexpected properties of the two-dimensional Propp machine.
△ Less
Submitted 15 March, 2007;
originally announced March 2007.