Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Gruenheid, A

Searching in archive cs. Search in all archives.
.
  1. Towards Building Autonomous Data Services on Azure

    Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

    Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

  2. LST-Bench: Benchmarking Log-Structured Tables in the Cloud

    Authors: Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

    Abstract: Data processing engines increasingly leverage distributed file systems for scalable, cost-effective storage. While the Apache Parquet columnar format has become a popular choice for data storage and retrieval, the immutability of Parquet files renders it impractical to meet the demands of frequent updates in contemporary analytical workloads. Log-Structured Tables (LSTs), such as Delta Lake, Apach… ▽ More

    Submitted 19 January, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Journal ref: Proceedings of the ACM on Management of Data (2024) Volume 2 Issue 1

  3. arXiv:2011.05549  [pdf, other

    cs.DB

    Comprehensive and Efficient Workload Compression

    Authors: Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey Naughton, Stratis Viglas

    Abstract: This work studies the problem of constructing a representative workload from a given input analytical query workload where the former serves as an approximation with guarantees of the latter. We discuss our work in the context of workload analysis and monitoring. As an example, evolving system usage patterns in a database system can cause load imbalance and performance regressions which can be con… ▽ More

    Submitted 3 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

  4. Investigating Rumor News Using Agreement-Aware Search

    Authors: Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han

    Abstract: Recent years have witnessed a widespread increase of rumor news generated by humans and machines. Therefore, tools for investigating rumor news have become an urgent necessity. One useful function of such tools is to see ways a specific topic or event is represented by presenting different points of view from multiple sources. In this paper, we propose Maester, a novel agreement-aware search fra… ▽ More

    Submitted 16 September, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

  5. arXiv:1610.07732  [pdf, other

    cs.DB

    Online Event Integration with StoryPivot

    Authors: Anja Gruenheid, Donald Kossmann, Divesh Srivastava

    Abstract: Modern data integration systems need to process large amounts of data from a variety of data sources and with real-time integration constraints. They are not only employed in enterprises for managing internal data but are also used for a variety of web services that use techniques such as entity resolution or data cleaning in live systems. In this work, we discuss a new generation of data integrat… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

  6. arXiv:1512.00537  [pdf, other

    cs.DB

    Fault-Tolerant Entity Resolution with the Crowd

    Authors: Anja Gruenheid, Besmira Nushi, Tim Kraska, Wolfgang Gatterbauer, Donald Kossmann

    Abstract: In recent years, crowdsourcing is increasingly applied as a means to enhance data quality. Although the crowd generates insightful information especially for complex problems such as entity resolution (ER), the output quality of crowd workers is often noisy. That is, workers may unintentionally generate false or contradicting data even for simple tasks. The challenge that we address in this paper… ▽ More

    Submitted 1 December, 2015; originally announced December 2015.

  7. arXiv:1508.01951  [pdf, other

    cs.LG cs.DB

    Crowd Access Path Optimization: Diversity Matters

    Authors: Besmira Nushi, Adish Singla, Anja Gruenheid, Erfan Zamanian, Andreas Krause, Donald Kossmann

    Abstract: Quality assurance is one the most important challenges in crowdsourcing. Assigning tasks to several workers to increase quality through redundant answers can be expensive if asking homogeneous sources. This limitation has been overlooked by current crowdsourcing platforms resulting therefore in costly solutions. In order to achieve desirable cost-quality tradeoffs it is essential to apply efficien… ▽ More

    Submitted 11 August, 2015; v1 submitted 8 August, 2015; originally announced August 2015.

    Comments: 10 pages, 3rd AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2015)

    ACM Class: H.1.2; I.2.6; H.2.5