-
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
Authors:
Isuru Ranawaka,
Md Taufique Hussain,
Charles Block,
Gerasimos Gerogiannis,
Josep Torrellas,
Ariful Azad
Abstract:
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal per…
▽ More
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TS-SpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Authors:
Jovan Stojkovic,
Chaojie Zhang,
Íñigo Goiri,
Josep Torrellas,
Esha Choukse
Abstract:
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to consume large amount of energy an…
▽ More
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to consume large amount of energy and, consequently, result in excessive carbon emissions. Fortunately, we find that there is a great opportunity to exploit the heterogeneity in inference compute properties and fluctuations in inference workloads, to significantly improve energy-efficiency. However, such a diverse and dynamic environment creates a large search-space where different system configurations (e.g., number of instances, model parallelism, and GPU frequency) translate into different energy-performance trade-offs. To address these challenges, we propose DynamoLLM, the first energy-management framework for LLM inference environments. DynamoLLM automatically and dynamically reconfigures the inference cluster to optimize for energy and cost of LLM serving under the service's performance SLOs. We show that at a service-level, DynamoLLM conserves 53% energy and 38% operational carbon emissions, and reduces 61% cost to the customer, while meeting the latency SLOs.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Last-Level Cache Side-Channel Attacks Are Feasible in the Modern Public Cloud (Extended Version)
Authors:
Zirui Neil Zhao,
Adam Morrison,
Christopher W. Fletcher,
Josep Torrellas
Abstract:
Last-level cache side-channel attacks have been mostly demonstrated in highly-controlled, quiescent local environments. Hence, it is unclear whether such attacks are feasible in a production cloud environment. In the cloud, side channels are flooded with noise from activities of other tenants and, in Function-as-a-Service (FaaS) workloads, the attacker has a very limited time window to mount the a…
▽ More
Last-level cache side-channel attacks have been mostly demonstrated in highly-controlled, quiescent local environments. Hence, it is unclear whether such attacks are feasible in a production cloud environment. In the cloud, side channels are flooded with noise from activities of other tenants and, in Function-as-a-Service (FaaS) workloads, the attacker has a very limited time window to mount the attack. In this paper, we show that such attacks are feasible in practice, although they require new techniques. We present an end-to-end, cross-tenant attack on a vulnerable ECDSA implementation in the public FaaS Google Cloud Run environment. We introduce several new techniques to improve every step of the attack. First, to speed-up the generation of eviction sets, we introduce L2-driven candidate address filtering and a Binary Search-based algorithm for address pruning. Second, to monitor victim memory accesses with high time resolution, we introduce Parallel Probing. Finally, we leverage power spectral density from signal processing to easily identify the victim's target cache set in the frequency domain. Overall, using these mechanisms, we extract a median value of 81% of the secret ECDSA nonce bits from a victim container in 19 seconds on average.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference
Authors:
Jovan Stojkovic,
Esha Choukse,
Chaojie Zhang,
Inigo Goiri,
Josep Torrellas
Abstract:
With the ubiquitous use of modern large language models (LLMs) across industries, the inference serving for these models is ever expanding. Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models. Energy availability has come to the forefront as the biggest challenge for data center expansion to serve these models.…
▽ More
With the ubiquitous use of modern large language models (LLMs) across industries, the inference serving for these models is ever expanding. Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models. Energy availability has come to the forefront as the biggest challenge for data center expansion to serve these models. In this paper, we present the trade-offs brought up by making energy efficiency the primary goal of LLM serving under performance SLOs. We show that depending on the inputs, the model, and the service-level agreements, there are several knobs available to the LLM inference provider to use for being energy efficient. We characterize the impact of these knobs on the latency, throughput, as well as the energy. By exploring these trade-offs, we offer valuable insights into optimizing energy usage without compromising on performance, thereby paving the way for sustainable and cost-effective LLM deployment in data center environments.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
SENSEi: Input-Sensitive Compilation for Accelerating GNNs
Authors:
Damitha Lenadora,
Vimarsh Sathia,
Gerasimos Gerogiannis,
Serif Yesil,
Josep Torrellas,
Charith Mendis
Abstract:
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense ma…
▽ More
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations and selects the best among them based on input attributes. SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates and (2) an online runtime system that explores the remaining candidates and uses light-weight cost models to select the best re-association based on the input graph and the embedding sizes on a given hardware platform. On a wide range of configurations, SENSEi achieves speedups of up to $2.012\times$ and $1.85\times$ on graph convolutional networks and up to $6.294\times$ and $16.274\times$ on graph attention networks, on GPUs and CPUs respectively. We also show that its technique generalizes to GNN variants, including those that require sampling. Furthermore, we show that SENSEi's techniques are agnostic to the underlying GNN system, and can be used to yield synergistic improvements across a diverse set of implementations.
△ Less
Submitted 8 March, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Defensive ML: Defending Architectural Side-channels with Adversarial Obfuscation
Authors:
Hyoungwook Nam,
Raghavendra Pradyumna Pothukuchi,
Bo Li,
Nam Sung Kim,
Josep Torrellas
Abstract:
Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels. We call this approach Defensive ML, and the generator to obf…
▽ More
Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels. We call this approach Defensive ML, and the generator to obfuscate signals, defender. Defensive ML is a workflow to design, implement, train, and deploy defenders for different environments. First, we design a defender architecture given the physical characteristics and hardware constraints of the side-channel. Next, we use our DefenderGAN structure to train the defender. Finally, we apply defensive ML to thwart two side-channel attacks: one based on memory contention and the other on application power. The former uses a hardware defender with ns-level response time that attains a high level of security with half the performance impact of a traditional scheme; the latter uses a software defender with ms-level response time that provides better security than a traditional scheme with only 70% of its power overhead.
△ Less
Submitted 14 October, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
UniHeap: Managing Persistent Objects Across Managed Runtimes for Non-Volatile Memory
Authors:
Daixuan Li,
Benjamin Reidys,
Jinghan Sun,
Thomas Shull,
Josep Torrellas,
Jian Huang
Abstract:
Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective approach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different managed languages. Similar to the well-known filesystem primitives that allow diver…
▽ More
Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective approach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different managed languages. Similar to the well-known filesystem primitives that allow diverse programs to access same files via the block I/O interface, managed NVM deserves the same system-wide property for persistent objects across managed runtimes with low overhead.
In this paper, we present UniHeap, a new NVM framework for managing persistent objects. It proposes a unified persistent object model that supports various managed languages, and manages NVM within a shared heap that enables cross-language persistent object sharing. UniHeap reduces the object persistence overhead by managing the shared heap in a log-structured manner and coalescing object updates during the garbage collection. We implement UniHeap as a generic framework and extend it to different managed runtimes that include HotSpot JVM, cPython, and JavaScript engine SpiderMonkey. We evaluate UniHeap with a variety of applications, such as key-value store and transactional database. Our evaluation shows that UniHeap significantly outperforms state-of-the-art object sharing approaches, while introducing negligible overhead to the managed runtimes.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
A Method for Hiding the Increased Non-Volatile Cache Read Latency
Authors:
Apostolos Kokolis,
Namrata Mantri,
Shrikanth Ganapathy,
Josep Torrellas,
John Kalamatianos
Abstract:
The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However,…
▽ More
The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their area density benefit. Although researchers have proposed methods to tolerate NVM's increased write latency, little emphasis has been placed on reducing the critical NVM read latency.
To address this problem, this paper proposes Cloak. Cloak exploits data reuse in the LLC at the page level, to hide NVM read latency. Specifically, on certain L1 TLB misses to a page, Cloak transfers LLC-resident data belonging to the page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, Cloak outperforms an SRAM LLC by 23.8% and an NVM-only LLC by 8.9% -- in both cases, with negligible additional area. Further, Cloak's ED^2 is 39.9% and 17.5% lower, respectively, than these designs.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Speculative Interference Attacks: Breaking Invisible Speculation Schemes
Authors:
Mohammad Behnia,
Prateek Sahu,
Riccardo Paccagnella,
Jiyong Yu,
Zirui Zhao,
Xiang Zou,
Thomas Unterluggauer,
Josep Torrellas,
Carlos Rozas,
Adam Morrison,
Frank Mckeen,
Fangfei Liu,
Ron Gabor,
Christopher W. Fletcher,
Abhishek Basak,
Alaa Alameldeen
Abstract:
Recent security vulnerabilities that target speculative execution (e.g., Spectre) present a significant challenge for processor design. The highly publicized vulnerability uses speculative execution to learn victim secrets by changing cache state. As a result, recent computer architecture research has focused on invisible speculation mechanisms that attempt to block changes in cache state due to s…
▽ More
Recent security vulnerabilities that target speculative execution (e.g., Spectre) present a significant challenge for processor design. The highly publicized vulnerability uses speculative execution to learn victim secrets by changing cache state. As a result, recent computer architecture research has focused on invisible speculation mechanisms that attempt to block changes in cache state due to speculative execution. Prior work has shown significant success in preventing Spectre and other vulnerabilities at modest performance costs. In this paper, we introduce speculative interference attacks, which show that prior invisible speculation mechanisms do not fully block these speculation-based attacks. We make two key observations. First, misspeculated younger instructions can change the timing of older, bound-to-retire instructions, including memory operations. Second, changing the timing of a memory operation can change the order of that memory operation relative to other memory operations, resulting in persistent changes to the cache state. Using these observations, we demonstrate (among other attack variants) that secret information accessed by mis-speculated instructions can change the order of bound-to-retire loads. Load timing changes can therefore leave secret-dependent changes in the cache, even in the presence of invisible speculation mechanisms. We show that this problem is not easy to fix: Speculative interference converts timing changes to persistent cache-state changes, and timing is typically ignored by many cache-based defenses. We develop a framework to understand the attack and demonstrate concrete proof-of-concept attacks against invisible speculation mechanisms. We provide security definitions sufficient to block speculative interference attacks; describe a simple defense mechanism with a high performance cost; and discuss how future research can improve its performance.
△ Less
Submitted 23 April, 2021; v1 submitted 23 July, 2020;
originally announced July 2020.
-
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors
Authors:
Zhangxiaowen Gong,
Houxiang Ji,
Christopher Fletcher,
Christopher Hughes,
Josep Torrellas
Abstract:
Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. We propose a scheme to leverage dynamic sparsity during training. In particular, we exploit zeros introduced by the ReLU activation function to both featu…
▽ More
Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. We propose a scheme to leverage dynamic sparsity during training. In particular, we exploit zeros introduced by the ReLU activation function to both feature maps and their gradients. This is challenging because the sparsity degree is moderate and the locations of zeros change over time. We also rely purely on software. We identify zeros in a dense data representation without transforming the data and performs conventional vectorized computation. Variations of the scheme are applicable to all major components of training: forward propagation, backward propagation by inputs, and backward propagation by weights. Our method significantly outperforms a highly-optimized dense direct convolution on several popular deep neural networks. At realistic sparsity, we speed up the training of the non-initial convolutional layers in VGG16, ResNet-34, ResNet-50, and Fixup ResNet-50 by 2.19x, 1.37x, 1.31x, and 1.51x respectively on an Intel Skylake-X CPU.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Maya: Falsifying Power Sidechannels with Dynamic Control
Authors:
Raghavendra Pradyumna Pothukuchi,
Sweta Yamini Pothukuchi,
Petros Voulgaris,
Alexander Schwing,
Josep Torrellas
Abstract:
The security of computers is at risk because of information leaking through physical outputs such as power, temperature, or electromagnetic (EM) emissions. Attackers can use advanced signal measurement and analysis to recover sensitive data from these sidechannels. To address this problem, this paper presents Maya, a simple and effective solution against power side-channels. The idea is to re-shap…
▽ More
The security of computers is at risk because of information leaking through physical outputs such as power, temperature, or electromagnetic (EM) emissions. Attackers can use advanced signal measurement and analysis to recover sensitive data from these sidechannels. To address this problem, this paper presents Maya, a simple and effective solution against power side-channels. The idea is to re-shape the power dissipated by an application in an application-transparent manner using control theory techniques - preventing attackers from learning any information. With control theory, a controller can reliably keep power close to a desired target value even when runtime conditions change unpredictably. Then, by changing these targets intelligently, power can be made to appear in any desired form, appearing to carry activity information which, in reality, is unrelated to the application. Maya can be implemented in privileged software or in simple hardware. In this paper, we implement Maya on two multiprocessor machines using Operating System (OS) threads, and show its effectiveness and ease of deployment.
△ Less
Submitted 18 August, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Opportunistic Beamforming in Wireless Network-on-Chip
Authors:
S. Abadal,
A. Marruedo,
A. Franques,
H. Taghvaee,
A. Cabellos-Aparicio,
J. Zhou,
J. Torrellas,
E. Alarcón
Abstract:
Wireless Network-on-Chip (WNoC) has emerged as a promising alternative to conventional interconnect fabrics at the chip scale. Since WNoCs may imply the close integration of antennas, one of the salient challenges in this scenario is the management of coupling and interferences. This paper, instead of combating coupling, aims to take advantage of close integration to create arrays within a WNoC. T…
▽ More
Wireless Network-on-Chip (WNoC) has emerged as a promising alternative to conventional interconnect fabrics at the chip scale. Since WNoCs may imply the close integration of antennas, one of the salient challenges in this scenario is the management of coupling and interferences. This paper, instead of combating coupling, aims to take advantage of close integration to create arrays within a WNoC. The proposed solution is opportunistic as it attempts to exploit the existing infrastructure to build a simple reconfigurable beamforming scheme. Full-wave simulations show that, despite the effects of lossy silicon and nearby antennas, within-package arrays achieve moderate gains and beamwidths below 90\textsuperscript{o}, a figure which is already relevant in the multiprocessor context.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Engineer the Channel and Adapt to it: Enabling Wireless Intra-Chip Communication
Authors:
Xavier Timoneda,
Sergi Abadal,
Antonio Franques,
Dionysios Manessis,
Jin Zhou,
Josep Torrellas,
Eduard Alarcón,
Albert Cabellos-Aparicio
Abstract:
Ubiquitous multicore processors nowadays rely on an integrated packet-switched network for cores to exchange and share data. The performance of these intra-chip networks is a key determinant of the processor speed and, at high core counts, becomes an important bottleneck due to scalability issues. To address this, several works propose the use of mm-wave wireless interconnects for intra-chip commu…
▽ More
Ubiquitous multicore processors nowadays rely on an integrated packet-switched network for cores to exchange and share data. The performance of these intra-chip networks is a key determinant of the processor speed and, at high core counts, becomes an important bottleneck due to scalability issues. To address this, several works propose the use of mm-wave wireless interconnects for intra-chip communication and demonstrate that, thanks to their low-latency broadcast and system-level flexibility, this new paradigm could break the scalability barriers of current multicore architectures. However, these same works assume 10+ Gb/s speeds and efficiencies close to 1 pJ/bit without a proper understanding on the wireless intra-chip channel. This paper first demonstrates that such assumptions do not hold in the context of commercial chips by evaluating losses and dispersion in them. Then, we leverage the system's monolithic nature to engineer the channel, this is, to optimize its frequency response by carefully choosing the chip package dimensions. Finally, we exploit the static nature of the channel to adapt to it, pushing efficiency-speed limits with simple tweaks at the physical layer. Our methods reduce the path loss and delay spread of a simulated commercial chip by 47 dB and 7.3x, respectively, enabling intra-chip wireless communications over 10 Gb/s and only 3.1 dB away from the dispersion-free case.
△ Less
Submitted 12 February, 2020; v1 submitted 23 December, 2018;
originally announced January 2019.
-
Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures
Authors:
Mengjia Yan,
Christopher Fletcher,
Josep Torrellas
Abstract:
Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN's architecture (i.e., its hyper-parameters) broadly determines the DNN's accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN's arc…
▽ More
Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN's architecture (i.e., its hyper-parameters) broadly determines the DNN's accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN's architecture facilitates other, existing DNN attacks.
This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN's architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel.
This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Millimeter-Wave Propagation within a Computer Chip Package
Authors:
X. Timoneda,
S. Abadal,
A. Cabellos-Aparicio,
D. Manessis,
J. Zhou,
A. Franques,
J. Torrellas,
E. Alarcón
Abstract:
Wireless Network-on-Chip (WNoC) appears as a promising alternative to conventional interconnect fabrics for chip-scale communications. The WNoC paradigm has been extensively analyzed from the physical, network and architecture perspectives assuming mmWave band operation. However, there has not been a comprehensive study at this band for realistic chip packages and, thus, the characteristics of suc…
▽ More
Wireless Network-on-Chip (WNoC) appears as a promising alternative to conventional interconnect fabrics for chip-scale communications. The WNoC paradigm has been extensively analyzed from the physical, network and architecture perspectives assuming mmWave band operation. However, there has not been a comprehensive study at this band for realistic chip packages and, thus, the characteristics of such wireless channel remain not fully understood. This work addresses this issue by accurately modeling a flip-chip package and investigating the wave propagation inside it. Through parametric studies, a locally optimal configuration for 60 GHz WNoC is obtained, showing that chip-wide attenuation below 32.6 dB could be achieved with standard processes. Finally, the applicability of the methodology is discussed for higher bands and other integrated environments such as a Software-Defined Metamaterial (SDM).
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Medium Access Control in Wireless Network-on-Chip: A Context Analysis
Authors:
Sergi Abadal,
Albert Mestres,
Josep Torrellas,
Eduard Alarcón,
Albert Cabellos-Aparicio
Abstract:
Wireless on-chip communication is a promising candidate to address the performance and efficiency issues that arise when scaling current Network-on-Chip (NoC) techniques to manycore processors. A Wireless Network-on-Chip (WNoC) can serve global and broadcast traffic with ultra-low latency even in thousand-core chips, thus acting as a natural complement of conventional and throughput-oriented wirel…
▽ More
Wireless on-chip communication is a promising candidate to address the performance and efficiency issues that arise when scaling current Network-on-Chip (NoC) techniques to manycore processors. A Wireless Network-on-Chip (WNoC) can serve global and broadcast traffic with ultra-low latency even in thousand-core chips, thus acting as a natural complement of conventional and throughput-oriented wireline NoCs. However, the development of Medium Access Control (MAC) strategies needed to efficiently share the wireless medium among the increasing number of cores remains as a considerable challenge given the singularities of the environment and the novelty of the research area. In this position paper, we present a context analysis describing the physical constraints, performance objectives, and traffic characteristics of the on-chip communication paradigm. We summarize the main differences with respect to traditional wireless scenarios, to then discuss their implications on the design of MAC protocols for manycore WNoCs, with the ultimate goal of kickstarting this arguably unexplored research area.
△ Less
Submitted 16 June, 2018;
originally announced June 2018.
-
21st Century Computer Architecture
Authors:
Mark D. Hill,
Sarita Adve,
Luis Ceze,
Mary Jane Irwin,
David Kaeli,
Margaret Martonosi,
Josep Torrellas,
Thomas F. Wenisch,
David Wood,
Katherine Yelick
Abstract:
Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to devel…
▽ More
Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to develop by enabling higher-level programming abstractions (e.g., scripting languages and reusable components). Improvements in computer system cost-effectiveness enabled value creation that could never have been imagined by the field's founders (e.g., distributed web search sufficiently inexpensive so as to be covered by advertising links).
The wide benefits of computer performance growth are clear. Recently, Danowitz et al. apportioned computer performance growth roughly equally between technology and architecture, with architecture credited with ~80x improvement since 1985. As semiconductor technology approaches its "end-of-the-road" (see below), computer architecture will need to play an increasing role in enabling future ICT innovation. But instead of asking, "How can I make my chip run faster?," architects must now ask, "How can I enable the 21st century infrastructure, from sensors to clouds, adding value from performance to privacy, but without the benefit of near-perfect technology scaling?". The challenges are many, but with appropriate investment, opportunities abound. Underlying these opportunities is a common theme that future architecture innovations will require the engagement of and investments from innovators in other ICT layers.
△ Less
Submitted 21 September, 2016;
originally announced September 2016.