Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Constantinides, G A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01475  [pdf, other

    cs.AR cs.LG

    Exploring FPGA designs for MX and beyond

    Authors: Ebby Samson, Naveen Mellempudi, Wayne Luk, George A. Constantinides

    Abstract: A number of companies recently worked together to release the new Open Compute Project MX standard for low-precision computation, aimed at efficient neural network implementation. In this paper, we describe and evaluate the first open-source FPGA implementation of the arithmetic defined in the standard. Our designs fully support all the standard's concrete formats for conversion into and out of MX… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  2. arXiv:2406.14963  [pdf, other

    cs.LG

    Optimised Grouped-Query Attention Mechanism for Transformers

    Authors: Yuang Chen, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

    Abstract: Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance. Our Asy… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML2024 ES-FoMo-II Workshop

  3. arXiv:2406.14956  [pdf, other

    cs.LG cs.CL

    Unlocking the Global Synergies in Low-Rank Adapters

    Authors: Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

    Abstract: Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the ef… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML2024 ES-FoMo-II Workshop

  4. arXiv:2406.12421  [pdf, other

    cs.AR

    ROVER: RTL Optimization via Verified E-Graph Rewriting

    Authors: Samuel Coward, Theo Drane, George A. Constantinides

    Abstract: Manual RTL design and optimization remains prevalent across the semiconductor industry because commercial logic and high-level synthesis tools are unable to match human designs. Our experience in industrial datapath design demonstrates that manual optimization can typically be decomposed into a sequence of local equivalence preserving transformations. By formulating datapath optimization as a grap… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.03227  [pdf, other

    cs.AR

    Soft GPGPU versus IP cores: Quantifying and Reducing the Performance Gap

    Authors: Martin Langhammer, George A. Constantinides

    Abstract: eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may be competitive for the kind of heavy numerical computations common in FPGA-based digital signal processing. In this paper we take a deep dive into the performance of the eGPU family on FFT computatio… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  6. arXiv:2403.00849  [pdf, other

    cs.AR cs.LG stat.ML

    NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions

    Authors: Marta Andronic, George A. Constantinides

    Abstract: Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. Among the most computationally intensive operations in a neural network (NN) is the dot product between the feature and weight vectors. Thus, some previous FPGA acceleration works have proposed mapping neurons with quantized inputs and outpu… ▽ More

    Submitted 3 July, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

  7. arXiv:2402.02446  [pdf, other

    cs.LG cs.CL

    LQER: Low-Rank Quantization Error Reconstruction for LLMs

    Authors: Cheng Zhang, Jianyi Cheng, George A. Constantinides, Yiren Zhao

    Abstract: Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages an activation-induced scale matrix to drive the singular value distribution of quantization error towards a desirable distribution, which enables nea… ▽ More

    Submitted 30 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML2024

  8. arXiv:2401.04261  [pdf, other

    cs.AR

    A Statically and Dynamically Scalable Soft GPGPU

    Authors: Martin Langhammer, George A. Constantinides

    Abstract: Current soft processor architectures for FPGAs do not utilize the potential of the massive parallelism available. FPGAs now support many thousands of embedded floating point operators, and have similar computational densities to GPGPUs. Several soft GPGPU or SIMT processors have been published, but the reported large areas and modest Fmax makes their widespread use unlikely for commercial designs.… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  9. arXiv:2312.06004  [pdf, other

    cs.AR

    Multiplier Optimization via E-Graph Rewriting

    Authors: Andy Wanna, Samuel Coward, Theo Drane, George A. Constantinides, Miloš D. Ercegovac

    Abstract: Multiplier circuits account for significant resource usage in datapath-dominated circuit designs, and RTL designers continue to build bespoke hand-crafted multiplication arrays for their particular application. The construction of an optimized multiplier presents trade-offs between pre-processing to generate a smaller array and array reduction. A data structure known as an e-graph has recently bee… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Preprint for work presented at the 2023 Asilomar Conference on Signals, Systems and Computers

  10. Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

    Authors: Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George A. Constantinides, Yiren Zhao

    Abstract: The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address thi… ▽ More

    Submitted 21 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP2023

  11. PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

    Authors: Marta Andronic, George A. Constantinides

    Abstract: Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated by t… ▽ More

    Submitted 6 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Journal ref: 2023 International Conference on Field Programmable Technology (ICFPT), Yokohama, Japan, 2023, pp. 60-68

  12. arXiv:2308.05170  [pdf, other

    cs.AR cs.AI

    FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

    Authors: Benjamin Ramhorst, Vladimir Loncar, George A. Constantinides

    Abstract: Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multipl… ▽ More

    Submitted 12 December, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

  13. arXiv:2307.15517  [pdf, other

    cs.AR

    A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats

    Authors: Jianyi Cheng, Cheng Zhang, Zhewen Yu, Christos-Savvas Bouganis, George A. Constantinides, Yiren Zhao

    Abstract: Model quantization represents both parameters (weights) and intermediate values (activations) in a more compact format, thereby directly reducing both computational and memory cost in hardware. The quantization of recent large language models (LLMs) faces challenges to achieve competitive memory density compared to other models such as convolutional neural networks, since values in LLMs require la… ▽ More

    Submitted 19 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  14. arXiv:2304.08400  [pdf, other

    cs.AR cs.LG

    ATHEENA: A Toolflow for Hardware Early-Exit Network Automation

    Authors: Benjamin Biggs, Christos-Savvas Bouganis, George A. Constantinides

    Abstract: The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPGAs. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited, we… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  15. arXiv:2303.01839  [pdf, other

    cs.AR

    Automating Constraint-Aware Datapath Optimization using E-Graphs

    Authors: Samuel Coward, George A. Constantinides, Theo Drane

    Abstract: Numerical hardware design requires aggressive optimization, where designers exploit branch constraints, creating optimization opportunities that are valid only on a sub-domain of input space. We developed an RTL optimization tool that automatically learns the consequences of conditional branches and exploits that knowledge to enable deep optimization. The tool deploys custom built program analysis… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

  16. arXiv:2205.14989  [pdf, other

    cs.DS cs.PL

    Combining E-Graphs with Abstract Interpretation

    Authors: Samuel Coward, George A. Constantinides, Theo Drane

    Abstract: E-graphs are a data structure that compactly represents equivalent expressions. They are constructed via the repeated application of rewrite rules. Often in practical applications, conditional rewrite rules are crucial, but their application requires the detection - at the time the e-graph is being built - that a condition is valid in the domain of application. Detecting condition validity amounts… ▽ More

    Submitted 15 August, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  17. arXiv:2204.11478  [pdf, other

    cs.AR

    Automatic Datapath Optimization using E-Graphs

    Authors: Samuel Coward, George A. Constantinides, Theo Drane

    Abstract: Manual optimization of Register Transfer Level (RTL) datapath is commonplace in industry but holds back development as it can be very time consuming. We utilize the fact that a complex transformation of one RTL into another equivalent RTL can be broken down into a sequence of smaller, localized transformations. By representing RTL as a graph and deploying modern graph rewriting techniques we can a… ▽ More

    Submitted 26 July, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

  18. arXiv:2203.09191  [pdf, other

    cs.LO cs.CL

    Abstract Interpretation on E-Graphs

    Authors: Samuel Coward, George A. Constantinides, Theo Drane

    Abstract: Recent e-graph applications have typically considered concrete semantics of expressions, where the notion of equivalence stems from concrete interpretation of expressions. However, equivalences that hold over one interpretation may not hold in an alternative interpretation. Such an observation can be exploited. We consider the application of abstract interpretation to e-graphs, and show that withi… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  19. arXiv:2201.11522  [pdf, other

    cs.SE cs.AR

    High-level Synthesis using the Julia Language

    Authors: Benjamin Biggs, Ian McInerney, Eric C. Kerrigan, George A. Constantinides

    Abstract: The growing proliferation of FPGAs and High-level Synthesis (HLS) tools has led to a large interest in designing hardware accelerators for complex operations and algorithms. However, existing HLS toolflows typically require a significant amount of user knowledge or training to be effective in both industrial and research applications. In this paper, we propose using the Julia language as the basis… ▽ More

    Submitted 17 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Presented at the 2nd Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE'22)

  20. Nonideality-Aware Training for Accurate and Robust Low-Power Memristive Neural Networks

    Authors: Dovydas Joksas, Erwei Wang, Nikolaos Barmpatsalos, Wing H. Ng, Anthony J. Kenyon, George A. Constantinides, Adnan Mehonic

    Abstract: Recent years have seen a rapid rise of artificial neural networks being employed in a number of cognitive tasks. The ever-increasing computing requirements of these structures have contributed to a desire for novel technologies and paradigms, including memristor-based hardware accelerators. Solutions based on memristive crossbars and analog data processing promise to improve the overall energy eff… ▽ More

    Submitted 5 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 32 pages, 20 figures, 4 tables

    Journal ref: Adv. Sci. 2022, 2105784

  21. Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

    Authors: Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

    Abstract: FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency… ▽ More

    Submitted 2 January, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: Accepted manuscript uploaded 04/12/21. DOA 22/11/21

  22. arXiv:2102.04270  [pdf, other

    cs.LG cs.AR

    Enabling Binary Neural Network Training on the Edge

    Authors: Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

    Abstract: The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the co… ▽ More

    Submitted 24 September, 2023; v1 submitted 8 February, 2021; originally announced February 2021.

  23. arXiv:1912.00867  [pdf, other

    math.NA cs.PL

    A Probabilistic Approach to Floating-Point Arithmetic

    Authors: Fredrik Dahlqvist, Rocco Salvia, George A Constantinides

    Abstract: Finite-precision floating point arithmetic unavoidably introduces rounding errors which are traditionally bounded using a worst-case analysis. However, worst-case analysis might be overly conservative because worst-case errors can be extremely rare events in practice. Here we develop a probabilistic model of rounding errors with which it becomes possible to estimate the likelihood that the roundin… ▽ More

    Submitted 10 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 9 pages, 6 figures

  24. arXiv:1910.12625  [pdf, other

    cs.LG cs.CV eess.SP stat.ML

    LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

    Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, i… ▽ More

    Submitted 2 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1904.00938. Accepted manuscript uploaded 02/03/20. DOA 01/03/20

  25. arXiv:1910.00271  [pdf, other

    cs.AR

    ARCHITECT: Arbitrary-precision Hardware with Digit Elision for Efficient Iterative Compute

    Authors: He Li, James J. Davis, John Wickerson, George A. Constantinides

    Abstract: Many algorithms feature an iterative loop that converges to the result of interest. The numerical operations in such algorithms are generally implemented using finite-precision arithmetic, either fixed- or floating-point, most of which operate least-significant digit first. This results in a fundamental problem: if, after some time, the result has not converged, is this because we have not run the… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  26. arXiv:1905.02438  [pdf, other

    cs.LG cs.AR cs.NE stat.ML

    Rethinking Arithmetic for Deep Neural Networks

    Authors: George A. Constantinides

    Abstract: We consider efficiency in the implementation of deep neural networks. Hardware accelerators are gaining interest as machine learning becomes one of the drivers of high-performance computing. In these accelerators, the directed graph describing a neural network can be implemented as a directed graph describing a Boolean circuit. We make this observation precise, leading naturally to an understandin… ▽ More

    Submitted 17 September, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

  27. arXiv:1904.00938  [pdf, other

    cs.LG stat.ML

    LUTNet: Rethinking Inference in FPGA Soft Logic

    Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binary values. Network binarisation on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is c… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted manuscript uploaded 01/04/19. DOA 03/03/19

  28. Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going

    Authors: Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms… ▽ More

    Submitted 8 July, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: Accepted manuscript uploaded 21/01/19. DOA 15/01/19

    Journal ref: ACM Comput. Surv. 52, 2, Article 40 (May 2019), 39 pages