AxLaM: energy-efficient accelerator design for language models for edge computing

Tom Glint; Bhumika Mittal; Santripta Sharma; Abdul Qadir Ronak; Abhinav Goud; Neerja Kasture; Zaqi Momin; Aravind Krishna; Joycee Mekie

doi:10.1098/rsta.2023.0395

AxLaM: energy-efficient accelerator design for language models for edge computing

Philos Trans A Math Phys Eng Sci. 2025 Jan;383(2288):20230395. doi: 10.1098/rsta.2023.0395. Epub 2025 Jan 16.

Authors

Tom Glint¹, Bhumika Mittal², Santripta Sharma², Abdul Qadir Ronak³, Abhinav Goud³, Neerja Kasture³, Zaqi Momin³, Aravind Krishna³, Joycee Mekie³

Affiliations

¹ Forschungszentrum Jülich, Jülich, Germany.
² Ashoka University, Sonipat, Haryana, India.
³ Indian Institute of Technology Gandhinagar, Gandhinagar, Gujarat, India.

PMID: 39815979
DOI: 10.1098/rsta.2023.0395

Abstract

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.

Keywords: hardware accelerator; language model BERT; transformer accelerator.

Abstract

Grants and funding