-
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
Authors:
Tommaso Pegolotti,
Elias Frantar,
Dan Alistarh,
Markus Püschel
Abstract:
We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can…
▽ More
We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution. A preliminary implementation is available at https://github.com/IST-DASLab/QIGen.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Authors:
Mahdi Nikdan,
Tommaso Pegolotti,
Eugenia Iofinova,
Eldar Kurtic,
Dan Alistarh
Abstract:
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to…
▽ More
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Fast Möbius and Zeta Transforms
Authors:
Tommaso Pegolotti,
Bastian Seifert,
Markus Püschel
Abstract:
Möbius inversion of functions on partially ordered sets (posets) $\mathcal{P}$ is a classical tool in combinatorics. For finite posets it consists of two, mutually inverse, linear transformations called zeta and Möbius transform, respectively. In this paper we provide novel fast algorithms for both that require $O(nk)$ time and space, where $n = |\mathcal{P}|$ and $k$ is the width (length of longe…
▽ More
Möbius inversion of functions on partially ordered sets (posets) $\mathcal{P}$ is a classical tool in combinatorics. For finite posets it consists of two, mutually inverse, linear transformations called zeta and Möbius transform, respectively. In this paper we provide novel fast algorithms for both that require $O(nk)$ time and space, where $n = |\mathcal{P}|$ and $k$ is the width (length of longest antichain) of $\mathcal{P}$, compared to $O(n^2)$ for a direct computation. Our approach assumes that $\mathcal{P}$ is given as directed acyclic graph (DAG) $(\mathcal{E}, \mathcal{P})$. The algorithms are then constructed using a chain decomposition for a one time cost of $O(|\mathcal{E}| + |\mathcal{E}_\text{red}| k)$, where $\mathcal{E}_\text{red}$ is the number of edges in the DAG's transitive reduction. We show benchmarks with implementations of all algorithms including parallelized versions. The results show that our algorithms enable Möbius inversion on posets with millions of nodes in seconds if the defining DAGs are sufficiently sparse.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.