Zum Hauptinhalt springen

Showing 1–33 of 33 results for author: Gehrke, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.12712  [pdf, other

    cs.CL cs.AI

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Authors: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

    Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an earl… ▽ More

    Submitted 13 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  2. Meeting Effectiveness and Inclusiveness in Remote Collaboration

    Authors: Ross Cutler, Yasaman Hosseinkashi, Jamie Pool, Senja Filipi, Robert Aichner, Yuan Tu, Johannes Gehrke

    Abstract: A primary goal of remote collaboration tools is to provide effective and inclusive meetings for all participants. To study meeting effectiveness and meeting inclusiveness, we first conducted a large-scale email survey (N=4,425; after filtering N=3,290) at a large technology company (pre-COVID-19); using this data we derived a multivariate model of meeting effectiveness and show how it correlates w… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  3. arXiv:2011.12715  [pdf, other

    cs.AI cs.LG cs.NI cs.SE

    Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

    Authors: Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler , et al. (1 additional authors not shown)

    Abstract: Large software systems tune hundreds of 'constants' to optimize their runtime performance. These values are commonly derived through intuition, lab tests, or A/B tests. A 'one-size-fits-all' approach is often sub-optimal as the best value depends on runtime context. In this paper, we provide an experimental approach to replace constants with learned contextual functions for Skype - a widely used r… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: Workshop on ML for Systems at NeurIPS 2020, Accepted

    Journal ref: ML for Systems, NeurIPS 2020

  4. arXiv:2007.06835  [pdf, other

    cs.LG cs.AI cs.PL cs.SE stat.ML

    Programming by Rewards

    Authors: Nagarajan Natarajan, Ajaykrishna Karthikeyan, Prateek Jain, Ivan Radicek, Sriram Rajamani, Sumit Gulwani, Johannes Gehrke

    Abstract: We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward f… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  5. arXiv:2006.12793  [pdf, other

    cs.CY cs.AI cs.SI

    Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications

    Authors: Jamie Pool, Ebrahim Beyrami, Vishak Gopal, Ashkan Aazami, Jayant Gupchup, Jeff Rowland, Binlong Li, Pritesh Kanani, Ross Cutler, Johannes Gehrke

    Abstract: Web-scale applications can ship code on a daily to weekly cadence. These applications rely on online metrics to monitor the health of new releases. Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners. Regressions in metrics can surface due to a variety of reasons: genuine product regressions, changes in user popul… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  6. arXiv:2005.13981  [pdf

    eess.AS cs.LG cs.SD

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

    Authors: Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. While the performanc… ▽ More

    Submitted 18 October, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2001.08662

  7. arXiv:2004.10898  [pdf, other

    cs.DB cs.DS cs.LG

    Qd-tree: Learning Data Layouts for Big Data Analytics

    Authors: Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, Rajeev Acharya

    Abstract: Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: ACM SIGMOD 2020

  8. arXiv:2003.04150  [pdf, other

    cs.DC

    Lightweight Inter-transaction Caching with Precise Clocks and Dynamic Self-invalidation

    Authors: Pulkit A. Misra, Srihari Radhakrishnan, Jeffrey S. Chase, Johannes Gehrke, Alvin R. Lebeck

    Abstract: Distributed, transactional storage systems scale by sharding data across servers. However, workload-induced hotspots result in contention, leading to higher abort rates and performance degradation. We present KAIROS, a transactional key-value storage system that leverages client-side inter-transaction caching and sharded transaction validation to balance the dynamic load and alleviate workload-i… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  9. arXiv:2001.08662  [pdf

    cs.SD cs.LG eess.AS

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. Many publications report r… ▽ More

    Submitted 19 April, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Details about Deep Noise Suppression Challenge

  10. arXiv:1912.02222  [pdf, other

    cs.NI cs.LG

    Reinforcement learning for bandwidth estimation and congestion control in real-time communications

    Authors: Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke

    Abstract: Bandwidth estimation and congestion control for real-time communications (i.e., audio and video conferencing) remains a difficult problem, despite many years of research. Achieving high quality of experience (QoE) for end users requires continual updates due to changing network architectures and technologies. In this paper, we apply reinforcement learning for the first time to the problem of real-… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Workshop on ML for Systems at NeurIPS 2019

  11. arXiv:1912.00580  [pdf, other

    cs.DB cs.OS

    Multi-version Indexing in Flash-based Key-Value Stores

    Authors: Pulkit A. Misra, Jeffrey S. Chase, Johannes Gehrke, Alvin R. Lebeck

    Abstract: Maintaining multiple versions of data is popular in key-value stores since it increases concurrency and improves performance. However, designing a multi-version key-value store entails several challenges, such as additional capacity for storing extra versions and an indexing mechanism for mapping versions of a key to their values. We present SkimpyFTL, a FTL-integrated multi-version key-value stor… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: 7 pages, 6 figures

  12. arXiv:1909.08050  [pdf

    cs.SD cs.LG eess.AS

    A scalable noisy speech dataset and online subjective test framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

    Abstract: Background noise is a major source of quality impairments in Voice over Internet Protocol (VoIP) and Public Switched Telephone Network (PSTN) calls. Recent work shows the efficacy of deep learning for noise suppression, but the datasets have been relatively small compared to those used in other domains (e.g., ImageNet) and the associated evaluations have been more focused. In order to better facil… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: InterSpeech 2019

  13. arXiv:1907.01742  [pdf

    cs.SD cs.LG eess.AS

    Supervised Classifiers for Audio Impairments with Noisy Labels

    Authors: Chandan K A Reddy, Ross Cutler, Johannes Gehrke

    Abstract: Voice-over-Internet-Protocol (VoIP) calls are prone to various speech impairments due to environmental and network conditions resulting in bad user experience. A reliable audio impairment classifier helps to identify the cause for bad audio quality. The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset. However, the labels… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: To appear in INTERSPEECH 2019

  14. arXiv:1905.08898  [pdf, other

    cs.DB cs.DS cs.LG

    ALEX: An Updatable Adaptive Learned Index

    Authors: Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska

    Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory… ▽ More

    Submitted 20 May, 2020; v1 submitted 21 May, 2019; originally announced May 2019.

    Report number: MSR-TR-2020-12

  15. arXiv:1905.06425  [pdf, other

    cs.DB

    An Empirical Analysis of Deep Learning for Cardinality Estimation

    Authors: Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi

    Abstract: We implement and evaluate deep learning for cardinality estimation by studying the accuracy, space and time trade-offs across several architectures. We find that simple deep learning models can learn cardinality estimations across a variety of datasets (reducing the error by 72% - 98% on average compared to PostgreSQL). In addition, we empirically evaluate the impact of injecting cardinality estim… ▽ More

    Submitted 11 September, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

  16. arXiv:1903.06908  [pdf, other

    eess.AS cs.SD

    Non-intrusive speech quality assessment using neural networks

    Authors: Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

    Abstract: Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approac… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Comments: Accepted at ICASSP 2019

  17. arXiv:1803.08604  [pdf, other

    cs.DB cs.AI cs.LG

    Learning State Representations for Query Optimization with Deep Reinforcement Learning

    Authors: Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi

    Abstract: Deep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in th… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  18. arXiv:1803.04562  [pdf, other

    cs.DB

    Bias in OLAP Queries: Detection, Explanation, and Removal

    Authors: Babak Salimi, Johannes Gehrke, Dan Suciu

    Abstract: On line analytical processing (OLAP) is an essential element of decision-support systems. OLAP tools provide insights and understanding needed for improved decision making. However, the answers to OLAP queries can be biased and lead to perplexing and incorrect insights. In this paper, we propose HypDB, a system to detect, explain, and to resolve bias in decision-support queries. We give a simple d… ▽ More

    Submitted 24 July, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: This paper is an extended version of a paper presented at SIGMOD 2018

  19. arXiv:1802.09180  [pdf, other

    cs.DB cs.DC

    Cuttlefish: A Lightweight Primitive for Adaptive Query Processing

    Authors: Tomer Kaftan, Magdalena Balazinska, Alvin Cheung, Johannes Gehrke

    Abstract: Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefish,… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

  20. arXiv:1508.05347  [pdf, ps, other

    cs.GT cs.DB

    Pricing Queries Approximately Optimally

    Authors: Vasilis Syrgkanis, Johannes Gehrke

    Abstract: Data as a commodity has always been purchased and sold. Recently, web services that are data marketplaces have emerged that match data buyers with data sellers. So far there are no guidelines how to price queries against a database. We consider the recently proposed query-based pricing framework of Koutris et al and ask the question of computing optimal input prices in this framework by formulatin… ▽ More

    Submitted 25 August, 2015; v1 submitted 21 August, 2015; originally announced August 2015.

  21. arXiv:1412.7641  [pdf, ps, other

    cs.CR

    Balancing Isolation and Sharing of Data for Third-Party Extensible App Ecosystems

    Authors: Florian Schröder, Raphael M. Reischuk, Johannes Gehrke

    Abstract: In the landscape of application ecosystems, today's cloud users wish to personalize not only their browsers with various extensions or their smartphones with various applications, but also the various extensions and applications themselves. The resulting personalization significantly raises the attractiveness for typical Web 2.0 users, but gives rise to various security risks and privacy concerns,… ▽ More

    Submitted 10 April, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

  22. arXiv:1407.4729  [pdf, other

    stat.ME cs.LG stat.ML

    Sparse Partially Linear Additive Models

    Authors: Yin Lou, Jacob Bien, Rich Caruana, Johannes Gehrke

    Abstract: The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing each to have either a linear or nonlinear effect on the response. However, the choice of which features to treat as linear or nonlinear is typically assumed known. Thus, to make a GPLAM a viable approach in situations i… ▽ More

    Submitted 27 March, 2018; v1 submitted 17 July, 2014; originally announced July 2014.

    Comments: Corrected typos

  23. arXiv:1403.2307  [pdf, other

    cs.DB

    The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis

    Authors: Sudip Roy, Lucja Kot, Gabriel Bender, Bailu Ding, Hossein Hojjat, Christoph Koch, Nate Foster, Johannes Gehrke

    Abstract: Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties - something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while mini… ▽ More

    Submitted 19 January, 2015; v1 submitted 10 March, 2014; originally announced March 2014.

  24. arXiv:1311.2276  [pdf, ps, other

    cs.LG

    A Quantitative Evaluation Framework for Missing Value Imputation Algorithms

    Authors: Vinod Nair, Rahul Kidambi, Sundararajan Sellamanickam, S. Sathiya Keerthi, Johannes Gehrke, Vijay Narayanan

    Abstract: We consider the problem of quantitatively evaluating missing value imputation algorithms. Given a dataset with missing values and a choice of several imputation algorithms to fill them in, there is currently no principled way to rank the algorithms using a quantitative metric. We develop a framework based on treating imputation evaluation as a problem of comparing two distributions and show how it… ▽ More

    Submitted 10 November, 2013; originally announced November 2013.

    Comments: 9 pages

  25. arXiv:1208.0080  [pdf, other

    cs.DB

    The Complexity of Social Coordination

    Authors: Konstantinos Mamouras, Sigal Oren, Lior Seeman, Lucja Kot, Johannes Gehrke

    Abstract: Coordination is a challenging everyday task; just think of the last time you organized a party or a meeting involving several people. As a growing part of our social and professional life goes online, an opportunity for an improved coordination process arises. Recently, Gupta et al. proposed entangled queries as a declarative abstraction for data-driven coordination, where the difficulty of the co… ▽ More

    Submitted 31 July, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1172-1183 (2012)

  26. arXiv:1109.5111  [pdf, other

    cs.DC

    Nerio: Leader Election and Edict Ordering

    Authors: Robbert van Renesse, Fred B. Schneider, Johannes Gehrke

    Abstract: Coordination in a distributed system is facilitated if there is a unique process, the leader, to manage the other processes. The leader creates edicts and sends them to other processes for execution or forwarding to other processes. The leader may fail, and when this occurs a leader election protocol selects a replacement. This paper describes Nerio, a class of such leader election protocols.

    Submitted 26 September, 2011; v1 submitted 23 September, 2011; originally announced September 2011.

  27. arXiv:1005.3773  [pdf, other

    cs.DB cs.DC

    Behavioral Simulations in MapReduce

    Authors: Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White

    Abstract: In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand important real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and automatically scale in parallel environments. In this paper we present BRACE (Big Red Agent… ▽ More

    Submitted 20 May, 2010; originally announced May 2010.

  28. arXiv:0909.5530  [pdf, ps, other

    cs.DB

    Differential Privacy via Wavelet Transforms

    Authors: Xiaokui Xiao, Guozhang Wang, Johannes Gehrke

    Abstract: Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, {\em $ε$-differential privacy} provides one of the strongest privacy guarantees. Existing data publishing methods that achieve $ε$-differential privacy, however, offer little data utility. In particular, if the output dataset is used to answer count queries, the noise in… ▽ More

    Submitted 30 September, 2009; originally announced September 2009.

  29. arXiv:0909.1770  [pdf

    cs.DB cs.MA

    From Declarative Languages to Declarative Processing in Computer Games

    Authors: Benjamin Sowell, Alan Demers, Johannes Gehrke, Nitin Gupta, Haoyuan Li, Walker White

    Abstract: Recent work has shown that we can dramatically improve the performance of computer games and simulations through declarative processing: Character AI can be written in an imperative scripting language which is then compiled to relational algebra and executed by a special games engine with features similar to a main memory database system. In this paper we lay out a challenging research agenda bu… ▽ More

    Submitted 9 September, 2009; originally announced September 2009.

    Comments: CIDR 2009

  30. arXiv:0904.0682  [pdf, ps, other

    cs.DB cs.IR

    Privacy in Search Logs

    Authors: Michaela Goetz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, Johannes Gehrke

    Abstract: Search engine companies collect the "database of intentions", the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how… ▽ More

    Submitted 11 May, 2011; v1 submitted 4 April, 2009; originally announced April 2009.

  31. Toward Expressive and Scalable Sponsored Search Auctions

    Authors: David J. Martin, Johannes Gehrke, Joseph Y. Halpern

    Abstract: Internet search results are a growing and highly profitable advertising platform. Search providers auction advertising slots to advertisers on their search result pages. Due to the high volume of searches and the users' low tolerance for search result latency, it is imperative to resolve these auctions fast. Current approaches restrict the expressiveness of bids in order to achieve fast winner d… ▽ More

    Submitted 31 August, 2008; originally announced September 2008.

    Comments: 10 pages, 13 figures, ICDE 2008

    ACM Class: K.4.4

    Journal ref: David J. Martin, Johannes Gehrke, and Joseph Y. Halpern. Toward Expressive and Scalable Sponsored Search Auctions. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 237--246. April 2008

  32. arXiv:0705.2787  [pdf, ps, other

    cs.DB

    Worst-Case Background Knowledge for Privacy-Preserving Data Publishing

    Authors: David J. Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, Joseph Y. Halpern

    Abstract: Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background knowledge. We propose a language that can… ▽ More

    Submitted 18 May, 2007; originally announced May 2007.

    Comments: 10 pages

  33. arXiv:cs/0702012  [pdf

    cs.DB cs.DL cs.IR

    Plagiarism Detection in arXiv

    Authors: Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg

    Abstract: We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false… ▽ More

    Submitted 1 February, 2007; originally announced February 2007.

    Comments: Sixth International Conference on Data Mining (ICDM'06), Dec 2006