Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Cristal, A

Searching in archive cs. Search in all archives.
.
  1. Adaptable Register File Organization for Vector Processors

    Authors: Cristóbal Ramírez Lazo, Enrico Reggiani, Carlos Rojas Morales, Roger Figueras Bagué, Luis Alfonso Villa Vargas, Marco Antonio Ramírez Salinas, Mateo Valero Cortés, Osman Sabri Unsal, Adrián Cristal

    Abstract: Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one han… ▽ More

    Submitted 29 May, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

  2. A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures

    Authors: Cristóbal Ramírez Lazo, César Alejandro Hernández, Oscar Palomar, Osman Sabri Unsal, Marco Antonio Ramírez, Adrían Cristal

    Abstract: Vector architectures lack tools for research. Consider the gem5 simulator, which is possibly the leading platform for computer-system architecture research. Unfortunately, gem5 does not have an available distribution that includes a flexible and customizable vector architecture model. In consequence, researchers have to develop their own simulation platform to test their ideas, which consume much… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: ACM Transactions on Architecture and Code Optimization, Volume 17, Issue 4, December 2020, Article No.38

  3. arXiv:2005.04737  [pdf, other

    eess.SP cs.AR

    Power and Accuracy of Multi-Layer Perceptrons (MLPs) under Reduced-voltage FPGA BRAMs Operation

    Authors: Behzad Salami, Osman Unsal, Adrian Cristal

    Abstract: In this paper, we exploit the aggressive supply voltage underscaling technique in Block RAMs (BRAMs) of Field Programmable Gate Arrays (FPGAs) to improve the energy efficiency of Multi-Layer Perceptrons (MLPs). Additionally, we evaluate and improve the resilience of this accelerator. Through experiments on several representative FPGA fabrics, we observe that until a minimum safe voltage level, i.e… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

  4. arXiv:2001.00053  [pdf, other

    cs.LG cs.NE

    On the Resilience of Deep Learning for Reduced-voltage FPGAs

    Authors: Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal

    Abstract: Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for p… ▽ More

    Submitted 26 December, 2019; originally announced January 2020.

  5. arXiv:1912.01563  [pdf, other

    cs.DC

    LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing

    Authors: B. Salami, K. Parasyris, A. Cristal, O. Unsal, X. Martorell, P. Carpenter, R. De La Cruz, L. Bautista, D. Jimenez, C. Alvarez, S. Nabavi, S. Madonar, M. Pericas, P. Trancoso, M. Abduljabbar, J. Chen, P. N. Soomro, M Manivannan, M. Berge, S. Krupop, F. Klawonn, Al Mekhlafi, S. May, T. Becker, G. Gaydadjiev , et al. (20 additional authors not shown)

    Abstract: The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: 6 pages, 9 figures

  6. arXiv:1912.01556  [pdf

    cs.DC

    A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees

    Authors: Oyku Melikoglu, Oguz Ergin, Behzad Salami, Julian Pavon, Osman Unsal, Adrian Cristal

    Abstract: This paper presents a deeply pipelined and massively parallel Binary Search Tree (BST) accelerator for Field Programmable Gate Arrays (FPGAs). Our design relies on the extremely parallel on-chip memory, or Block RAMs (BRAMs) architecture of FPGAs. To achieve significant throughput for the search operation on BST, we present several novel mechanisms including tree duplication as well as horizontal,… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: 8 pages, 9 figures

  7. arXiv:1806.09679  [pdf, other

    cs.LG cs.AR stat.ML

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Authors: Behzad Salami, Osman Unsal, Adrian Cristal

    Abstract: Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptibl… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Comments: 8 pages, 6 figures

    MSC Class: 68T01

  8. arXiv:1804.05267  [pdf, other

    cs.LG cs.NE stat.ML

    Low-Precision Floating-Point Schemes for Neural Network Training

    Authors: Marc Ortiz, Adrián Cristal, Eduard Ayguadé, Marc Casas

    Abstract: The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of performance and energy efficiency. In the first part of this paper, the behaviour of the 12-bit fixed-point arithmetic when training a convolutional neural network w… ▽ More

    Submitted 14 April, 2018; originally announced April 2018.

    Comments: 16 pages, 9 figures and 4 tables

    ACM Class: I.2.6; I.5