Zum Hauptinhalt springen

Showing 1–35 of 35 results for author: Ceze, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21118  [pdf, other

    cs.AI cs.LG

    Palu: Compressing KV-Cache with Low-Rank Projection

    Authors: Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu

    Abstract: KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matri… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2406.06542  [pdf, other

    cs.AR cs.LG

    vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

    Authors: Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang

    Abstract: IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile devices, which makes it challenging to map DNNs onto MCUs. Previous work separates memory management and kernel implementation for MCU and relies on coarse-graine… ▽ More

    Submitted 1 May, 2024; originally announced June 2024.

  3. arXiv:2310.19102  [pdf, other

    cs.LG

    Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

    Authors: Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

    Abstract: The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption a… ▽ More

    Submitted 16 April, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  4. arXiv:2310.18547  [pdf, other

    cs.DC cs.LG

    Punica: Multi-Tenant LoRA Serving

    Authors: Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

    Abstract: Low-rank adaptation (LoRA) has become an important and popular method to adapt pre-trained models to specific domains. We present Punica, a system to serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA kernel design that allows batching of GPU operations for different LoRA models. This allows a GPU to hold only a single copy of the underlying pre-trained model when servi… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  5. arXiv:2207.04606  [pdf, other

    cs.LG cs.AI cs.PL

    SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

    Authors: Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

    Abstract: Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging bec… ▽ More

    Submitted 21 February, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

    Comments: To appear at ASPLOS 2023 (19 pages, 23 figures), source code available at https://github.com/uwsampl/sparsetir, artifact available at https://github.com/uwsampl/sparsetir-artifact

  6. arXiv:2110.14819  [pdf, other

    cs.CV cs.LG

    Characterizing and Taming Resolution in Convolutional Neural Networks

    Authors: Eddie Yan, Liang Luo, Luis Ceze

    Abstract: Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference. These costs are exacerbated when scaling out models to large inference serving systems and make image resolution an attractive target for optimization. However, the choice of resolution inherently introduces additional tightly coupled choices, such as image… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

  7. arXiv:2105.14088  [pdf, other

    cs.DC cs.AI cs.NI

    Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering

    Authors: Liang Luo, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze

    Abstract: ML workloads are becoming increasingly popular in the cloud. Good cloud training performance is contingent on efficient parameter exchange among VMs. We find that Collectives, the widely used distributed communication algorithms, cannot perform optimally out of the box due to the hierarchical topology of datacenter networks and multi-tenancy nature of the cloudenvironment.In this paper, we present… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  8. Pure Tensor Program Rewriting via Access Patterns (Representation Pearl)

    Authors: Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, Zachary Tatlock

    Abstract: Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and mapping to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be \textit{pure but high-level}, making low-level rewrites to hardware targets inexpressible, or \textit{low-level but impure}… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: To be published at MAPS 2021

  9. arXiv:2104.10716  [pdf, other

    cs.LG cs.DC

    Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

    Authors: Chien-Yu Lin, Liang Luo, Luis Ceze

    Abstract: Graph neural networks (GNNs), an emerging deep learning model class, can extract meaningful representations from highly expressive graph-structured data and are therefore gaining popularity for wider ranges of applications. However, current GNNs suffer from the poor performance of their sparse-dense matrix multiplication (SpMM) operator, even when using powerful GPUs. Our analysis shows that 95% o… ▽ More

    Submitted 23 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

  10. arXiv:2103.16604  [pdf, other

    cs.DB

    VSS: A Storage System for Video Analytics [Technical Report]

    Authors: Brandon Haynes, Maureen Daum, Dong He, Amrita Mazumdar, Magdalena Balazinska, Alvin Cheung, Luis Ceze

    Abstract: We present a new video storage system (VSS) designed to decouple high-level video operations from the low-level details required to store and efficiently retrieve video data. VSS is designed to be the storage subsystem of a video data management system (VDBMS) and is responsible for: (1) transparently and automatically arranging the data on disk in an efficient, granular format; (2) caching freque… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

  11. arXiv:2103.14949  [pdf, other

    cs.CV

    Automated Backend-Aware Post-Training Quantization

    Authors: Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze

    Abstract: Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerators may demand different implementations for quantized networks. This diversity calls for specialized post-training quantization pipelines to built for each hardware target, an engineerin… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

  12. arXiv:2011.14243  [pdf, other

    cs.DC

    Srifty: Swift and Thrifty Distributed Training on the Cloud

    Authors: Liang Luo, Peter West, Arvind Krishnamurthy, Luis Ceze

    Abstract: Finding the best VM configuration is key to achieve lower cost and higher throughput, two primary concerns in cloud-based distributed neural network (NN) training today. Optimal VM selection that meets user constraints requires efficiently navigating a large search space while controlling for the performance variance associated with sharing cloud instances and networks. In this work, we characteri… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 November, 2020; originally announced November 2020.

  13. arXiv:2003.00290  [pdf, other

    cs.DC cs.PL

    Enumerating Hardware-Software Splits with Program Rewriting

    Authors: Gus Smith, Zachary Tatlock, Luis Ceze

    Abstract: A core problem in hardware-software codesign is in the sheer size of the design space. Without a set ISA to constrain the hardware-software interface, the design space explodes. This work presents a strategy for managing the massive hardware-software design space within the domain of machine learning inference workloads and accelerators. We first propose EngineIR, a new language for representing m… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted in the Second Young Architect Workshop, in conjunction with ASPLOS 2020

  14. arXiv:1902.05971  [pdf, other

    cs.ET

    Synthesizing Number Generators for Stochastic Computing using Mixed Integer Programming

    Authors: Vincent T. Lee, Samuel Archibald Elliot, Armin Alaghi, Luis Ceze

    Abstract: Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values. Practical SC implementations require deterministic or pseudo-random number sequences which are optimally correlated to generate bitstreams and achieve accurate results. Unfortunately, the size of the search space makes manually designing optim… ▽ More

    Submitted 26 February, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

    Comments: 6 pages, 5 figures, 3 tables

  15. arXiv:1902.01372  [pdf, other

    cs.MM cs.DB

    Vignette: Perceptual Compression for Video Storage and Processing Systems

    Authors: Amrita Mazumdar, Brandon Haynes, Magdalena Balazinska, Luis Ceze, Alvin Cheung, Mark Oskin

    Abstract: Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, reduces compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data managemen… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  16. arXiv:1810.11066  [pdf, other

    cs.LG stat.ML

    Automating Generation of Low Precision Deep Learning Operators

    Authors: Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze

    Abstract: State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interes… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Comments: 10 pages, 11 figures

  17. arXiv:1810.04756  [pdf, other

    cs.ET

    Stochastic Synthesis for Stochastic Computing

    Authors: Vincent T. Lee, Armin Alaghi, Luis Ceze, Mark Oskin

    Abstract: Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation. Unlike BE computation, SC encodes values as probabilistic bitstreams which makes designing new circuits unintuitive. Existing techniques for synthesizing SC circuits are limited to specific classes of functions such as polynomial evaluation o… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

    Comments: 7 pages, 4 figures, 3 tables

  18. arXiv:1810.02895  [pdf, ps, other

    cs.CR

    Computer Security Risks of Distant Relative Matching in Consumer Genetic Databases

    Authors: Peter M. Ney, Luis Ceze, Tadayoshi Kohno

    Abstract: Consumer genetic testing has become immensely popular in recent years and has lead to the creation of large scale genetic databases containing millions of dense autosomal genotype profiles. One of the most used features offered by genetic databases is the ability to find distant relatives using a technique called relative matching (or DNA matching). Recently, novel uses of relative matching were d… ▽ More

    Submitted 5 October, 2018; originally announced October 2018.

  19. arXiv:1807.04188  [pdf, other

    cs.LG cs.DC stat.ML

    A Hardware-Software Blueprint for Flexible Deep Learning Specialization

    Authors: Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

    Abstract: Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture templat… ▽ More

    Submitted 22 April, 2019; v1 submitted 11 July, 2018; originally announced July 2018.

    Comments: 6 pages plus references, 8 figures

  20. arXiv:1805.08166  [pdf, other

    cs.LG stat.ML

    Learning to Optimize Tensor Programs

    Authors: Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

    Abstract: We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-suppor… ▽ More

    Submitted 8 January, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2018

  21. arXiv:1805.07891  [pdf, other

    cs.DC cs.LG cs.NE

    Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

    Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

    Abstract: Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter serve… ▽ More

    Submitted 17 January, 2020; v1 submitted 21 May, 2018; originally announced May 2018.

  22. arXiv:1803.04862  [pdf, other

    eess.SP cs.AR

    Correlation Manipulating Circuits for Stochastic Computing

    Authors: Vincent T. Lee, Armin Alaghi, Luis Ceze

    Abstract: Stochastic computing (SC) is an emerging computing technique that promises high density, low power, and error tolerant solutions. In SC, values are encoded as unary bitstreams and SC arithmetic circuits operate on one or more bitstreams. In many cases, the input bitstreams must be correlated or uncorrelated for SC arithmetic to produce accurate results. As a result, a key challenge for designing S… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Comments: 6 pages, 5 figures, 4 tables, Design, Automation and Test in Europe Conference and Exhibition (2018)

  23. arXiv:1802.04799  [pdf, other

    cs.LG cs.AI cs.PL

    TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

    Authors: Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

    Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that… ▽ More

    Submitted 5 October, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Significantly improved version, add automated optimization

  24. arXiv:1801.09805  [pdf, other

    cs.DC

    Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training

    Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

    Abstract: Most work in the deep learning systems community has focused on faster inference, but arriving at a trained model requires lengthy experiments. Accelerating training lets developers iterate faster and come up with better models. DNN training is often seen as a compute-bound problem, best done in a single large compute node with many GPUs. As DNNs get bigger, training requires going distributed. Di… ▽ More

    Submitted 17 January, 2020; v1 submitted 29 January, 2018; originally announced January 2018.

    Journal ref: SysML 2018

  25. arXiv:1706.08597  [pdf

    cs.CY

    Democratizing Design for Future Computing Platforms

    Authors: Luis Ceze, Mark D. Hill, Karthikeyan Sankaralingam, Thomas F. Wenisch

    Abstract: Information and communications technology can continue to change our world. These advances will partially depend upon designs that synergistically combine software with specialized hardware. Today open-source software incubates rapid software-only innovation. The government can unleash software-hardware innovation with programs to develop open hardware components, tools, and design flows that simp… ▽ More

    Submitted 26 June, 2017; originally announced June 2017.

    Comments: A Computing Community Consortium (CCC) white paper, 4 pages

  26. arXiv:1706.04332  [pdf, other

    cs.NE

    MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators

    Authors: Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze, Visvesh Sathe

    Abstract: As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly. However,while accelerators with high performance and efficiency on convolutional deep neural networks (Conv-DNNs) have been developed, less progress has been made with regards to fully-connected DNNs (FC-DNNs). In this paper, we propose… ▽ More

    Submitted 23 March, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: 6 pages, 12 figures, 3 tables. Published at Design, Automation and Test in Europe Conference and Exhibition (DATE) 2018

  27. arXiv:1706.03864  [pdf, other

    cs.AR

    Exploring Computation-Communication Tradeoffs in Camera Systems

    Authors: Amrita Mazumdar, Thierry Moreau, Sung Kim, Meghan Cowan, Armin Alaghi, Luis Ceze, Mark Oskin, Visvesh Sathe

    Abstract: Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous systems, has given rise to a wide range of image processing acceleration techniques at the camera node and in the cloud. In this paper, we characterize two novel camera systems that use acceleration techniques to push the extremes of energy and p… ▽ More

    Submitted 16 October, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

    Journal ref: 2017 IEEE International Symposium on Workload Characterization (IISWC)

  28. arXiv:1706.02344  [pdf

    cs.AR

    Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing

    Authors: Vincent T. Lee, Armin Alaghi, John P. Hayes, Visvesh Sathe, Luis Ceze

    Abstract: Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data mo… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

    Comments: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 2017

  29. arXiv:1704.05112  [pdf, ps, other

    cs.DC

    Making data center computations fast, but not so furious

    Authors: Daniel Porto, João Loff, Rui Duarte, Luis Ceze, Rodrigo Rodrigues

    Abstract: We propose an aggressive computational sprinting variant for data center environments. While most of previous work on computational sprinting focuses on maximizing the sprinting process while ensuring non-faulty conditions, we take advantage of the existing replication in data centers to push the system beyond its safety limits. In this paper we outline this vision, we survey existing techniques f… ▽ More

    Submitted 17 April, 2017; originally announced April 2017.

    Comments: The 7th Workshop on Multi-core and Rack Scale Systems - MARS'17

  30. arXiv:1612.03182  [pdf

    cs.AR cs.CY

    Arch2030: A Vision of Computer Architecture Research over the Next 15 Years

    Authors: Luis Ceze, Mark D. Hill, Thomas F. Wenisch

    Abstract: Application trends, device technologies and the architecture of systems drive progress in information technologies. However, the former engines of such progress - Moore's Law and Dennard Scaling - are rapidly reaching the point of diminishing returns. The time has come for the computing community to boldly confront a new challenge: how to secure a foundational future for information technology's c… ▽ More

    Submitted 9 December, 2016; originally announced December 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 7 pages

  31. arXiv:1609.06756  [pdf

    cs.CY

    21st Century Computer Architecture

    Authors: Mark D. Hill, Sarita Adve, Luis Ceze, Mary Jane Irwin, David Kaeli, Margaret Martonosi, Josep Torrellas, Thomas F. Wenisch, David Wood, Katherine Yelick

    Abstract: Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to devel… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 16 pages

  32. arXiv:1608.03175  [pdf, other

    cs.DC

    Similarity Search on Automata Processors

    Authors: Vincent T. Lee, Justin Kotalik, Carlo C. Del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin

    Abstract: Similarity search is a critical primitive for a wide variety of applications including natural language processing, content-based search, machine learning, computer vision, databases, robotics, and recommendation systems. At its core, similarity search is implemented using the k-nearest neighbors (kNN) algorithm, where computation consists of highly parallel distance calculations and a global top-… ▽ More

    Submitted 7 June, 2017; v1 submitted 9 August, 2016; originally announced August 2016.

    Comments: 12 pages, 11 figures, accepted to International Parallel and Distribution Processing Symposium (IPDPS) 2017

  33. arXiv:1606.03742  [pdf, other

    cs.DC cs.AR

    Application-Driven Near-Data Processing for Similarity Search

    Authors: Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin

    Abstract: Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance… ▽ More

    Submitted 10 July, 2017; v1 submitted 12 June, 2016; originally announced June 2016.

    Comments: 15 pages, 8 figures, 7 tables

  34. arXiv:1510.03955  [pdf, other

    cs.NI

    SAP: an Architecture for Selectively Approximate Wireless Communication

    Authors: Benjamin Ransford, Luis Ceze

    Abstract: Integrity checking is ubiquitous in data networks, but not all network traffic needs integrity protection. Many applications can tolerate slightly damaged data while still working acceptably, trading accuracy versus efficiency to save time and energy. Such applications should be able to receive damaged data if they so desire. In today's network stacks, lower-layer integrity checks discard damaged… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

  35. arXiv:1103.6114  [pdf, ps, other

    cs.DC

    The Impact of Memory Models on Software Reliability in Multiprocessors

    Authors: Alexander Jaffe, Thomas Moscibroda, Laura Effinger-Dean, Luis Ceze, Karin Strauss

    Abstract: The memory consistency model is a fundamental system property characterizing a multiprocessor. The relative merits of strict versus relaxed memory models have been widely debated in terms of their impact on performance, hardware complexity and programmability. This paper adds a new dimension to this discussion: the impact of memory models on software reliability. By allowing some instructions to r… ▽ More

    Submitted 6 April, 2011; v1 submitted 31 March, 2011; originally announced March 2011.

    Comments: 15 pages, 2 figures, conference