Zum Hauptinhalt springen

Showing 1–47 of 47 results for author: Venkataraman, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11919  [pdf, other

    cs.DC

    PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

    Authors: Rutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman

    Abstract: Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and growing use of ML, including in some scientific applications, optimizing these clusters for ML workloads is particularly important. However, recent work has demonstra… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2406.17918  [pdf, other

    cs.LG cs.DC cs.SI

    GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

    Authors: Dong Liu, Roger Waleffe, Meng Jiang, Shivaram Venkataraman

    Abstract: In our recent research, we have developed a framework called GraphSnapShot, which has been proven an useful tool for graph learning acceleration. GraphSnapShot is a framework for fast cache, storage, retrieval and computation for graph learning. It can quickly store and update the local topology of graph structure and allows us to track patterns in the structure of graph networks, just like take s… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2403.08058  [pdf, other

    cs.LG cs.CL

    CHAI: Clustered Head Attention for Efficient LLM Inference

    Authors: Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

    Abstract: Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and comput… ▽ More

    Submitted 27 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2403.05308  [pdf, ps, other

    cs.HC cs.RO eess.SP

    Sparse Wearable Sonomyography Sensor-based Proprioceptive Proportional Control Across Multiple Gestures

    Authors: Anne Tryphosa Kamatham, Kavita Sharma, Srikumar Venkataraman, Biswarup Mukherjee

    Abstract: Sonomyography (SMG) is a non-invasive technique that uses ultrasound imaging to detect the dynamic activity of muscles. Wearable SMG systems have recently gained popularity due to their potential as human-computer interfaces for their superior performance compared to conventional methods. This paper demonstrates real-time positional proportional control of multiple gestures using a multiplexed 8-c… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2402.01528  [pdf, other

    cs.LG cs.CL

    Decoding Speculative Decoding

    Authors: Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

    Abstract: Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. When performing inference, speculative decoding uses a smaller draft model to generate speculative tokens and then uses the target LLM to verify those draft tokens. The speedup provided by speculative decoding heavily depends on the choice of the draft model. In this… ▽ More

    Submitted 11 August, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  6. arXiv:2401.09243  [pdf, other

    cs.RO cs.AI cs.LG

    DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

    Authors: Sabariswaran Mani, Sreyas Venkataraman, Abhranil Chandra, Adyan Rizvi, Yash Sirvi, Soumojit Bhattacharya, Aritra Hazra

    Abstract: Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also be… ▽ More

    Submitted 23 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: NeurIPS 2023 Train Offline Test Online Workshop and Competition (Best Paper Oral Presentation / Winning Competition Submission)

  7. arXiv:2312.12621  [pdf, other

    cs.DC

    Blox: A Modular Toolkit for Deep Learning Schedulers

    Authors: Saurabh Agarwal, Amar Phanishayee, Shivaram Venkataraman

    Abstract: Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters and several new cluster schedulers have been proposed in recent years to support these workloads. With rapidly evolving DL workloads, it is challenging to quickly prototype and compare scheduling policies across workloads. Further, as prior systems target different aspects of scheduling (resource allocation, p… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To be presented at Eurosys'24

  8. arXiv:2310.19991  [pdf, other

    cs.LG cs.AR

    PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

    Authors: Minghao Yan, Hongyi Wang, Shivaram Venkataraman

    Abstract: As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU freque… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  9. arXiv:2306.14086  [pdf, other

    cs.DC

    Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning

    Authors: Qiyang Ding, Pengfei Zheng, Shreyas Kudari, Shivaram Venkataraman, Zhao Zhang

    Abstract: Accommodating long-running deep learning (DL) training and inference jobs is challenging on GPU clusters that use traditional batch schedulers, such as Slurm. Given fixed wall clock time limits, DL researchers usually need to run a sequence of batch jobs and experience long interruptions on overloaded machines. Such interruptions significantly lower the research productivity and QoS for services t… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  10. arXiv:2305.01516  [pdf, other

    cs.DB

    F2: Designing a Key-Value Store for Large Skewed Workloads

    Authors: Konstantinos Kanellis, Badrish Chandramouli, Shivaram Venkataraman

    Abstract: Today's key-value stores are either disk-optimized, focusing on large data and saturating device IOPS, or memory-optimized, focusing on high throughput with linear thread scaling assuming plenty of main memory. However, many practical workloads demand high performance for read and write working sets that are much larger than main memory, over a total data size that is even larger. They require jud… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  11. arXiv:2302.14601  [pdf, other

    cs.SE eess.SY

    SAFR-AV: Safety Analysis of Autonomous Vehicles using Real World Data -- An end-to-end solution for real world data driven scenario-based testing for pre-certification of AV stacks

    Authors: Sagar Pathrudkar, Saadhana Venkataraman, Deepika Kanade, Aswin Ajayan, Palash Gupta, Shehzaman Khatib, Vijaya Sarathi Indla, Saikat Mukherjee

    Abstract: One of the major impediments in deployment of Autonomous Driving Systems (ADS) is their safety and reliability. The primary reason for the complexity of testing ADS is that it operates in an open world characterized by its non-deterministic, high-dimensional and non-stationary nature where the actions of other actors in the environment are uncontrollable from the ADS's perspective. This leads to a… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  12. arXiv:2301.02654  [pdf, other

    cs.LG

    Does compressing activations help model parallel training?

    Authors: Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

    Abstract: Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: 16 pages, 5 figures

  13. arXiv:2210.11095  [pdf

    cs.CV

    Iterative collaborative routing among equivariant capsules for transformation-robust capsule networks

    Authors: Sai Raam Venkataraman, S. Balasubramanian, R. Raghunatha Sarma

    Abstract: Transformation-robustness is an important feature for machine learning models that perform image classification. Many methods aim to bestow this property to models by the use of data augmentation strategies, while more formal guarantees are obtained via the use of equivariant models. We recognise that compositional, or part-whole structure is also an important aspect of images that has to be consi… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  14. arXiv:2210.11092  [pdf

    cs.CV

    Robustcaps: a transformation-robust capsule network for image classification

    Authors: Sai Raam Venkataraman, S. Balasubramanian, R. Raghunatha Sarma

    Abstract: Geometric transformations of the training data as well as the test data present challenges to the use of deep neural networks to vision-based learning tasks. In order to address this issue, we present a deep neural network model that exhibits the desirable property of transformation-robustness. Our model, termed RobustCaps, uses group-equivariant convolutions in an improved capsule network model.… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  15. arXiv:2210.00093  [pdf, other

    cs.DC

    Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning

    Authors: Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, Aditya Akella

    Abstract: Dynamic adaptation has become an essential technique in accelerating distributed machine learning (ML) training. Recent studies have shown that dynamically adjusting model structure (e.g., lottery ticket hypothesis) or hyperparameters (e.g., batch size) can significantly accelerate training without sacrificing accuracy. However, existing ML cluster schedulers are not designed to handle dynamic ada… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted at the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23)

  16. arXiv:2208.11035  [pdf, other

    cs.DC

    Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems

    Authors: Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, Shivaram Venkataraman

    Abstract: Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-… ▽ More

    Submitted 8 November, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: 14 pages, 18 figures, to appear at The 34th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '22)

  17. arXiv:2207.09442  [pdf, other

    cs.RO cs.CV cs.LG math.OC

    Theseus: A Library for Differentiable Nonlinear Optimization

    Authors: Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky T. Q. Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, Jing Dong, Brandon Amos, Mustafa Mukadam

    Abstract: We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnost… ▽ More

    Submitted 18 January, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Advances in Neural Information Processing Systems (NeurIPS), 2022

  18. arXiv:2203.05128  [pdf, other

    cs.DB

    LlamaTune: Sample-Efficient DBMS Configuration Tuning

    Authors: Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, Shivaram Venkataraman

    Abstract: Tuning a database system to achieve optimal performance on a given workload is a long-standing problem in the database community. A number of recent works have leveraged ML-based approaches to guide the sampling of large parameter spaces (hundreds of tuning knobs) in search for high performance configurations. Looking at Microsoft production services operating millions of databases, sample efficie… ▽ More

    Submitted 23 August, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Proceedings of the VLDB Endowment 15 (VLDB'22)

  19. arXiv:2202.12429  [pdf, other

    cs.DC cs.LG

    BagPipe: Accelerating Deep Recommendation Model Training

    Authors: Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman

    Abstract: Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75\% of the iteration time is spent on e… ▽ More

    Submitted 1 November, 2023; v1 submitted 24 February, 2022; originally announced February 2022.

  20. arXiv:2202.02365  [pdf, other

    cs.LG cs.DB

    MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

    Authors: Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman

    Abstract: We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out-of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the… ▽ More

    Submitted 11 October, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  21. arXiv:2111.10672  [pdf, other

    cs.DC cs.LG

    Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

    Authors: Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

    Abstract: Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically on… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: Accepted at DistributedML-2021

  22. arXiv:2109.00984  [pdf, other

    cs.LG cs.CR

    CrypTen: Secure Multi-Party Computation Meets Machine Learning

    Authors: Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, Laurens van der Maaten

    Abstract: Secure multi-party computation (MPC) allows parties to perform computations on data while keeping that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one party's private model using another party's private data, etc. Although a range of studies imp… ▽ More

    Submitted 15 September, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

  23. arXiv:2107.10254  [pdf, other

    cs.LG cs.AI math.OC

    Neural Fixed-Point Acceleration for Convex Optimization

    Authors: Shobha Venkataraman, Brandon Amos

    Abstract: Fixed-point iterations are at the heart of numerical computing and are often a computational bottleneck in real-time applications that typically need a fast solution of moderate accuracy. We present neural fixed-point acceleration which combines ideas from meta-learning and classical acceleration methods to automatically learn to accelerate fixed-point problems that are drawn from a distribution.… ▽ More

    Submitted 23 July, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: AutoML@ICML2021

  24. KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

    Authors: J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

    Abstract: Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communicat… ▽ More

    Submitted 20 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21)

  25. arXiv:2103.00543  [pdf, other

    cs.DC cs.LG

    On the Utility of Gradient Compression in Distributed Training Systems

    Authors: Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

    Abstract: A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this work, we evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD acr… ▽ More

    Submitted 29 June, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

  26. arXiv:2102.01386  [pdf, other

    cs.LG

    AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

    Authors: Yuhan Liu, Saurabh Agarwal, Shivaram Venkataraman

    Abstract: With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours even when using modern accelerators like GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all la… ▽ More

    Submitted 3 April, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  27. arXiv:2101.08358  [pdf, other

    cs.LG cs.DB cs.DC

    Marius: Learning Massive Graph Embeddings on a Single Machine

    Authors: Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman

    Abstract: We propose a new framework for computing the embeddings of large-scale graphs on a single machine. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data moveme… ▽ More

    Submitted 25 May, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: Accepted into OSDI '21

  28. arXiv:2101.07344  [pdf, other

    cs.LG cs.DC cs.PF

    Accelerating Deep Learning Inference via Learned Caches

    Authors: Arjun Balasubramanian, Adarsh Kumar, Yuhan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

    Abstract: Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent t… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  29. arXiv:2010.16248  [pdf, other

    cs.LG

    Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

    Authors: Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

    Abstract: Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, or low-rank updates. The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model acc… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  30. arXiv:2010.03035  [pdf, other

    cs.DC

    Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

    Authors: Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju

    Abstract: Resource provisioning in multi-tenant stream processing systems faces the dual challenges of keeping resource utilization high (without over-provisioning), and ensuring performance isolation. In our common production use cases, where streaming workloads have to meet latency targets and avoid breaching service-level agreements, existing solutions are incapable of handling the wide variability of us… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  31. arXiv:2002.02645  [pdf, other

    cs.LG stat.ML

    Accelerating Deep Learning Inference via Freezing

    Authors: Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

    Abstract: Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for pe… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: 11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019

  32. arXiv:1911.09849  [pdf, other

    cs.DC

    Archipelago: A Scalable Low-Latency Serverless Platform

    Authors: Arjun Singhvi, Kevin Houck, Arjun Balasubramanian, Mohammed Danish Shaikh, Shivaram Venkataraman, Aditya Akella

    Abstract: The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedule… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: 14 pages

  33. arXiv:1910.04940  [pdf, other

    cs.DC cs.LG

    Blink: Fast and Generic Collectives for Distributed ML

    Authors: Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

    Abstract: Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale. Existing parameter synchronization protocols cannot effectively leverage available network resources in the face of ever increasing hardware heterogeneity. To address this, we propose Blink, a collective communication library that dynamically generates optimal communication primitives by pack… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

  34. arXiv:1907.01484  [pdf, other

    cs.DC

    Themis: Fair and Efficient GPU Cluster Scheduling

    Authors: Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla

    Abstract: Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h… ▽ More

    Submitted 29 October, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

  35. arXiv:1905.00863  [pdf, other

    cs.DC cs.IT cs.LG

    Parity Models: A General Framework for Coding-Based Resilience in ML Inference

    Authors: Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

    Abstract: Machine learning models are becoming the primary workhorses for many applications. Production services deploy models through prediction serving systems that take in queries and return predictions by performing inference on machine learning models. In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and fail… ▽ More

    Submitted 16 September, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: This paper is superseded by the ACM SOSP 2019 paper "Parity Models: Erasure-Coded Resilience for Prediction Serving Systems"

  36. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  37. arXiv:1901.05758  [pdf, other

    cs.DC

    Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

    Authors: Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang

    Abstract: With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products. These models are typically trained on shared, multi-tenant GPU clusters. Similar to existing cluster computing workloads, scheduling frameworks aim to provide features like high efficiency, resource isolation, fair sharing across users, etc.… ▽ More

    Submitted 8 August, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

  38. arXiv:1810.09679  [pdf, other

    cs.DC

    numpywren: serverless linear algebra

    Authors: Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, Jonathan Ragan-Kelley

    Abstract: Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of st… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  39. arXiv:1806.01259  [pdf, other

    cs.LG cs.IT stat.ML

    Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

    Authors: Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

    Abstract: Machine learning algorithms are typically run on large scale, distributed compute infrastructure that routinely face a number of unavailabilities such as failures and temporary slowdowns. Adding redundant computations using coding-theoretic tools called "codes" is an emerging technique to alleviate the adverse effects of such unavailabilities. A code consists of an encoding function that proactive… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

  40. arXiv:1702.05865  [pdf, other

    cs.DC cs.AI cs.LG

    Hemingway: Modeling Distributed Optimization Algorithms

    Authors: Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez

    Abstract: Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. In this paper we make the case for an ML-optimizer that can select the appropriate algorithm and cluster si… ▽ More

    Submitted 20 February, 2017; originally announced February 2017.

    Comments: Presented at ML Systems Workshop at NIPS, Dec 2016

  41. arXiv:1702.04024  [pdf, other

    cs.DC

    Occupy the Cloud: Distributed Computing for the 99%

    Authors: Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, Benjamin Recht

    Abstract: Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that state… ▽ More

    Submitted 7 June, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

  42. arXiv:1610.09451  [pdf, other

    cs.LG cs.DC

    KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

    Authors: Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht

    Abstract: Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements. We present KeystoneML, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API. This approach… ▽ More

    Submitted 29 October, 2016; originally announced October 2016.

  43. arXiv:1602.05310  [pdf, other

    cs.LG math.OC stat.ML

    Large Scale Kernel Learning using Block Coordinate Descent

    Authors: Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht

    Abstract: We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points. Armed with this capability, we conduct a thorough comparison between the full kernel, the Nyström method, and random features on three large classification tasks from various domains. Our results suggest that the Nyström method generally achieves be… ▽ More

    Submitted 17 February, 2016; originally announced February 2016.

  44. arXiv:1509.02256  [pdf, other

    cs.DC

    Matrix Computations and Optimization in Apache Spark

    Authors: Reza Bosagh Zadeh, Xiangrui Meng, Aaron Staple, Burak Yavuz, Li Pu, Shivaram Venkataraman, Evan Sparks, Alexander Ulanov, Matei Zaharia

    Abstract: We describe matrix computations available in the cluster programming framework, Apache Spark. Out of the box, Spark provides abstractions and implementations for distributed matrices and optimization routines using these matrices. When translating single-node algorithms to run on a distributed cluster, we observe that often a simple idea is enough: separating matrix operations from vector operatio… ▽ More

    Submitted 12 July, 2016; v1 submitted 8 September, 2015; originally announced September 2015.

  45. arXiv:1505.06807  [pdf, other

    cs.LG cs.DC cs.MS stat.ML

    MLlib: Machine Learning in Apache Spark

    Authors: Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar

    Abstract: Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shippe… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

  46. arXiv:1204.6082  [pdf, other

    cs.DB cs.DC

    Probabilistically Bounded Staleness for Practical Partial Quorums

    Authors: Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica

    Abstract: Data store replication results in a fundamental trade-off between operation latency and data consistency. In this paper, we examine this trade-off in the context of quorum-replicated data stores. Under partial, or non-strict quorum replication, a data store waits for responses from a subset of replicas before answering a query, without guaranteeing that read and write replica sets intersect. As de… ▽ More

    Submitted 26 April, 2012; originally announced April 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 8, pp. 776-787 (2012)

  47. Efficient Solution Algorithms for Factored MDPs

    Authors: C. Guestrin, D. Koller, R. Parr, S. Venkataraman

    Abstract: This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs… ▽ More

    Submitted 9 June, 2011; originally announced June 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 19, pages 399-468, 2003