Analyzing Query Optimizer Performance in the Presence and Absence of Cardinality Estimates

A Datta, B Tsan, Y Izenov, F Rusu - arXiv preprint arXiv:2311.17293, 2023 - arxiv.org
arXiv preprint arXiv:2311.17293, 2023arxiv.org
Most query optimizers rely on cardinality estimates to determine optimal execution plans.
While traditional databases such as PostgreSQL, Oracle, and Db2 utilize many types of
synopses--including histograms, samples, and sketches--recent main-memory databases
like DuckDB and Heavy. AI often operate with minimal or no estimates, yet their performance
does not necessarily suffer. To the best of our knowledge, no analytical comparison has
been conducted between optimizers with and without cardinality estimates to understand …
Most query optimizers rely on cardinality estimates to determine optimal execution plans. While traditional databases such as PostgreSQL, Oracle, and Db2 utilize many types of synopses -- including histograms, samples, and sketches -- recent main-memory databases like DuckDB and Heavy.AI often operate with minimal or no estimates, yet their performance does not necessarily suffer. To the best of our knowledge, no analytical comparison has been conducted between optimizers with and without cardinality estimates to understand their performance characteristics in different settings, such as indexed, non-indexed, and multi-threaded. In this paper, we present a comparative analysis between optimizers that use cardinality estimates and those that do not. We use the Join Order Benchmark (JOB) for our evaluation and true cardinalities as the baseline. Our investigation reveals that cardinality estimates have marginal impact in non-indexed settings. Meanwhile, when indexes are available, inaccurate estimates may lead to sub-optimal physical operators -- even with an optimal join order. Furthermore, the impact of cardinality estimates is less significant in highly-parallel main-memory databases.
arxiv.org