Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Hemani, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00207  [pdf, other

    cs.AR

    CIS: Composable Instruction Set for Streaming Applications: Design, Modeling, and Scheduling

    Authors: Yu Yang, Jordi Altayó González, Ahmed Hemani

    Abstract: The efficiency improvement of hardware accelerators such as single-instruction-multiple-data (SIMD) and coarse-grained reconfigurable architecture (CGRA) empowers the rapid advancement of AI and machine learning applications. These streaming applications consist of numerous vector operations that can be naturally parallelized. Despite the outstanding achievements of today's hardware accelerators,… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2206.07984  [pdf, other

    cs.AR

    Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style

    Authors: Yu Yang, Ahmed Hemani

    Abstract: High-level synthesis (HLS) has been researched for decades and is still limited to fast FPGA prototyping and algorithmic RTL generation. A feasible end-to-end system-level synthesis solution has never been rigorously proven. Modularity and composability are the keys to enabling such a system-level synthesis framework that bridges the huge gap between system-level specification and physical level d… ▽ More

    Submitted 7 September, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

  3. arXiv:2108.12213  [pdf

    cs.AR

    Synthesis of Predictable Global NoC by Abutment in Synchoros VLSI Design

    Authors: Jordi Altayó González, Dimitrios Stathis, Ahmed Hemani

    Abstract: Synchoros VLSI design style has been proposed as an alternative to the standard cell best design style; the word synchoros is derived from the Greek word choros for space. Synchoricity discretises space with a virtual grid, the way synchronicity discretises time with clock ticks. SiLago (Silicon Lego) blocks are atomic synchoros building blocks like Lego bricks. SiLago blocks absorb all metal laye… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    ACM Class: B.4

  4. arXiv:2108.01192  [pdf, other

    cs.LG cs.AR

    MOHAQ: Multi-Objective Hardware-Aware Quantization of Recurrent Neural Networks

    Authors: Nesma M. Rezk, Tomas Nordström, Dimitrios Stathis, Zain Ul-Abdin, Eren Erdal Aksoy, Ahmed Hemani

    Abstract: The compression of deep learning models is of fundamental importance in deploying such models to edge devices. The selection of compression parameters can be automated to meet changes in the hardware platform and application using optimization algorithms. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers hardware efficiency and inference error as… ▽ More

    Submitted 20 January, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

  5. eBrainII: A 3 kW Realtime Custom 3D DRAM integrated ASIC implementation of a Biologically Plausible Model of a Human Scale Cortex

    Authors: Dimitrios Stathis, Chirag Sudarshan, Yu Yang, Matthias Jung, Syed Asad Mohamad Hasan Jafri, Christian Weis, Ahmed Hemani, Anders Lansner, Norbert Wehn

    Abstract: The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible and in spite of their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, for e.g. cortex, basal ganglia and amygdala have a greater potential to achieve biological… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

  6. arXiv:1910.11253  [pdf

    cs.AR

    Clock Tree Generation by Abutment in Synchoros VLSI Design

    Authors: Dimitrios Stathis, Panagiotis Chaourani, Syed M. A. H. Jafri, Ahmed Hemani

    Abstract: Synchoros VLSI design style has been proposed as an alternative to standard cell-based design. Standard cells are replaced by synchoros, large grain, VLSI design objects called SiLago (Silicon Lego) blocks. This new design style eliminates the need to synthesise ad hoc wires of any type: functional and infrastructural. SiLago blocks are organised into region instances. In a region instance, commun… ▽ More

    Submitted 29 April, 2022; v1 submitted 24 October, 2019; originally announced October 2019.

  7. arXiv:1910.06672  [pdf, other

    cs.AR

    Refresh Triggered Computation: Improving the Energy Efficiency of Convolutional Neural Network Accelerators

    Authors: Syed M. A. H. Jafri, Hasan Hassan, Ahmed Hemani, Onur Mutlu

    Abstract: To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve energy-efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architecture… ▽ More

    Submitted 7 October, 2020; v1 submitted 15 October, 2019; originally announced October 2019.