Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Yu, C H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03067  [pdf, other

    cs.SE

    Automated Deep Learning Optimization via DSL-Based Source Code Transformation

    Authors: Ruixin Wang, Minghai Lu, Cody Hao Yu, Yi-Hsiang Lai, Tianyi Zhang

    Abstract: As deep learning models become increasingly bigger and more complex, it is critical to improve model training and inference efficiency. Though a variety of highly optimized libraries and packages (known as DL kernels) have been developed, it is tedious and time-consuming to figure out which kernel to use, where to use, and how to use them correctly. To address this challenge, we propose an Automat… ▽ More

    Submitted 21 August, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures, Accepted to ISSTA 2024

    ACM Class: D.2.11; I.2.0

    Journal ref: In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

  2. arXiv:2312.07104  [pdf, other

    cs.AI cs.PL

    SGLang: Efficient Execution of Structured Language Model Programs

    Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

    Abstract: Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend langua… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  3. arXiv:2309.06180  [pdf, other

    cs.LG cs.DC

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    Authors: Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

    Abstract: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: SOSP 2023

  4. arXiv:2303.04759  [pdf, other

    cs.LG cs.DC cs.PL

    RAF: Holistic Compilation for Deep Learning Model Training

    Authors: Cody Hao Yu, Haozheng Fan, Guangtai Huang, Zhen Jia, Yizhi Liu, Jie Wang, Zach Zheng, Yuan Zhou, Haichen Shen, Junru Shao, Mu Li, Yida Wang

    Abstract: As deep learning is pervasive in modern applications, many deep learning frameworks are presented for deep learning practitioners to develop and train DNN models rapidly. Meanwhile, as training large deep learning models becomes a trend in recent years, the training throughput and memory footprint are getting crucial. Accordingly, optimizing training workloads with compiler optimizations is inevit… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  5. arXiv:2302.08005  [pdf, other

    cs.LG

    Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

    Authors: Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang

    Abstract: Recent years have seen an increase in the development of large deep learning (DL) models, which makes training efficiency crucial. Common practice is struggling with the trade-off between usability and performance. On one hand, DL frameworks such as PyTorch use dynamic graphs to facilitate model developers at a price of sub-optimal model training performance. On the other hand, practitioners propo… ▽ More

    Submitted 22 December, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted to ASPLOS'24

  6. arXiv:2210.09603  [pdf, other

    cs.LG cs.AI cs.PL

    Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

    Authors: Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    Abstract: As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop efficient tensor programs for deep learning operators due to the high complexity of modern accelerators and the rapidly growing number of operators. Deep learning… ▽ More

    Submitted 15 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 15 pages, 22 figures, 1 table

    Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, January 2023, Pages 370-384

  7. arXiv:2207.04296  [pdf, other

    cs.LG cs.AI cs.PL

    TensorIR: An Abstraction for Automatic Tensorized Program Optimization

    Authors: Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

    Abstract: Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration primitives, along with the emerging machine learning models, bring tremendous engineering challenges. In this paper, we present TensorIR, a compiler abstraction for opti… ▽ More

    Submitted 27 October, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: Accepted to ASPLOS 2023

  8. arXiv:2207.03105  [pdf

    q-bio.TO cs.CV eess.IV physics.med-ph

    Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Mapping with Relaxation Constraint

    Authors: Chaoxing Huang, Yurui Qian, Simon Chun Ho Yu, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

    Abstract: $T_{1ρ}$ mapping is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}… ▽ More

    Submitted 25 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Provisionally accepted by Physics in Medicine and Biology

  9. arXiv:2205.13603  [pdf, other

    cs.LG

    Tensor Program Optimization with Probabilistic Programs

    Authors: Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen

    Abstract: Automatic optimization for tensor programs becomes increasingly important as we deploy deep learning in various environments, and efficient optimization relies on a rich search space and effective search. Most existing efforts adopt a search space which lacks the ability to efficiently enable domain experts to grow the search space. This paper introduces MetaSchedule, a domain-specific probabilist… ▽ More

    Submitted 9 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS 2022

  10. arXiv:2105.03215  [pdf, other

    cs.LG cs.PF cs.PL

    Bring Your Own Codegen to Deep Learning Compiler

    Authors: Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang

    Abstract: Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage with high performance, each accelerator vendor has to develop a full compiler stack to ingest, optimize, and execute the DNNs. This poses significant challenges in… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  11. arXiv:2009.14381  [pdf, other

    cs.AR cs.PL

    AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

    Authors: Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, Jason Cong

    Abstract: Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many l… ▽ More

    Submitted 31 August, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 25 pages

  12. arXiv:2006.06762  [pdf, other

    cs.LG cs.NE cs.PF cs.PL stat.ML

    Ansor: Generating High-Performance Tensor Programs for Deep Learning

    Authors: Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica

    Abstract: High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require… ▽ More

    Submitted 15 October, 2023; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: OSDI 2020

  13. AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Authors: Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang

    Abstract: CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose progra… ▽ More

    Submitted 30 July, 2018; originally announced September 2018.

  14. arXiv:1807.01340  [pdf, other

    cs.AR cs.DC cs.PF

    Best-Effort FPGA Programming: A Few Steps Can Go a Long Way

    Authors: Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou

    Abstract: FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. Nonetheless, such advantages come at the cost of sacrificing programmability. FPGA vendors and researchers attempt to improve the programmability through high-level synthesis (HLS) technologies that can directly generate hardware circuits… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.