Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Riedewald, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11797  [pdf, other

    cs.DB

    Finding Linear Explanations for a Given Ranking

    Authors: Zixuan Chen, Panagiotis Manolios, Mirek Riedewald

    Abstract: Given a relation and a ranking of its tuples, but no information about the ranking function, we propose RankExplain to solve 2 types of problems: SAT asks if any linear scoring function can exactly reproduce the given ranking. OPT identifies the linear scoring function that minimizes position-based error, i.e., the total of the ranking-position differences over all tuples in the top-k. Our solutio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. Efficient Computation of Quantiles over Joins

    Authors: Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

    Abstract: We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  3. arXiv:2209.13589  [pdf, other

    cs.DB

    SANTOS: Relationship-based Semantic Table Union Search

    Authors: Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new n… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 15 pages, 10 figures, to appear at SIGMOD 2023

  4. arXiv:2208.01613  [pdf, other

    cs.DB cs.HC

    Principles of Query Visualization

    Authors: Wolfgang Gatterbauer, Cody Dunne, H. V. Jagadish, Mirek Riedewald

    Abstract: Query Visualization (QV) is the problem of transforming a given query into a graphical representation that helps humans understand its meaning. This task is notably different from designing a Visual Query Language (VQL) that helps a user compose a query. This article discusses the principles of relational query visualization and its potential for simplifying user interactions with relational data.

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 20 pages, 12 figures, preprint for IEEE Data Engineering Bulletin

  5. arXiv:2205.05649  [pdf, other

    cs.DB cs.DS cs.LO

    Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We study ranked enumeration for Conjunctive Queries (CQs) where the answers are ordered by a given ranking function (e.g., an ORDER BY clause in SQL). We develop "any-k" algorithms, which, without knowing the number k of desired answers, push down the ranking into joins by carefully ordering the computation of intermediate tuples and avoiding materialization of join answers until they are needed.… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

  6. arXiv:2203.07284  [pdf, other

    cs.DB cs.LO cs.PL

    Relational Diagrams: a pattern-preserving diagrammatic representation of non-disjunctive Relational Queries

    Authors: Wolfgang Gatterbauer, Cody Dunne, Mirek Riedewald

    Abstract: Analyzing relational languages by their logical expressiveness is well understood. Something not well understood or even formalized is the vague concept of relational query patterns. What are query patterns? And how can we reason about query patterns across different relational languages, irrespective of their syntax and their procedural or declarative nature? In this paper, we formalize the conce… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: 23 pages, 29 figures

  7. arXiv:2103.09940  [pdf, other

    cs.DB

    DomainNet: Homograph Detection for Data Lake Disambiguation

    Authors: Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Modern data lakes are deeply heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: how can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph? While word and entity disambiguation have been well studied in computational linguistics, data management and data science, we sh… ▽ More

    Submitted 22 March, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

    Comments: Full version of paper appearing in EDBT 2021

  8. Beyond Equi-joins: Ranking, Enumeration and Factorization

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with $n$ denoting the number of tuples in the database, we guarantee for acyclic full joi… ▽ More

    Submitted 30 August, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: 21 pages

    Journal ref: PVLDB, 14(11):2599-2612, 2021

  9. arXiv:2012.11965  [pdf, other

    cs.DB cs.DS

    Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

    Authors: Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

    Abstract: We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is… ▽ More

    Submitted 28 November, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: 44 pages

  10. Optimal Join Algorithms Meet Top-k

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large int… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: To be published in Proceedings ofthe 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD'20), June 14-19, 2020, Portland, OR, USA, 7 pages

  11. arXiv:2004.11375  [pdf

    cs.DB cs.HC cs.LO

    QueryVis: Logic-based diagrams help users understand complicated SQL queries faster

    Authors: Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, H. V. Jagadish, Mirek Riedewald

    Abstract: Understanding the meaning of existing SQL queries is critical for code maintenance and reuse. Yet SQL can be hard to read, even for expert users or the original creator of a query. We conjecture that it is possible to capture the logical intent of queries in \emph{automatically-generated visual diagrams} that can help users understand the meaning of queries faster and more accurately than SQL text… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: Full version of paper appearing in SIGMOD 2020

  12. Near-Optimal Distributed Band-Joins through Recursive Partitioning

    Authors: Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between maximum load per worker and input duplication for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  13. arXiv:1911.05582  [pdf, other

    cs.DB cs.DS

    Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

    Authors: Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang

    Abstract: We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For ful… ▽ More

    Submitted 11 September, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: 50 pages, 19 figures

  14. arXiv:1802.06060  [pdf, other

    cs.SI cs.DB cs.DS

    Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

    Authors: Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala

    Abstract: Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a rank… ▽ More

    Submitted 10 April, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: To appear in WWW 2018

  15. arXiv:1404.5665  [pdf, other

    cs.LO

    ILP Modulo Data

    Authors: Panagiotis Manolios, Vasilis Papavasileiou, Mirek Riedewald

    Abstract: The vast quantity of data generated and captured every day has led to a pressing need for tools and processes to organize, analyze and interrelate this data. Automated reasoning and optimization tools with inherent support for data could enable advancements in a variety of contexts, from data-backed decision making to data-intensive scientific research. To this end, we introduce a decidable logic… ▽ More

    Submitted 15 September, 2014; v1 submitted 22 April, 2014; originally announced April 2014.

    Comments: FMCAD 2014 final version plus proofs