Search | arXiv e-print repository

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

Authors: Xueying Wu, Edward Hanson, Nansu Wang, Qilin Zheng, Xiaoxuan Yang, Huanrui Yang, Shiyu Li, Feng Cheng, Partha Pratim Pande, Janardhan Rao Doppa, Krishnendu Chakrabarty, Hai Li

Abstract: Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activ… ▽ More Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activated wordlines and bitlines increases. To guarantee computational accuracy, only a limited number of wordlines and bitlines of the crossbar array can be turned on concurrently, significantly reducing the achievable parallelism of the architecture. While the constraints on parallelism limit the efficiency of the accelerators, they also provide a new opportunity for fine-grained mixed-precision quantization. To enable efficient DNN inference on practical ReRAM-based accelerators, we propose an algorithm-architecture co-design framework called \underline{B}lock-\underline{W}ise mixed-precision \underline{Q}uantization (BWQ). At the algorithm level, BWQ-A introduces a mixed-precision quantization scheme at the block level, which achieves a high weight and activation compression ratio with negligible accuracy degradation. We also present the hardware architecture design BWQ-H, which leverages the low-bit-width models achieved by BWQ-A to perform high-efficiency DNN inference on ReRAM devices. BWQ-H also adopts a novel precision-aware weight mapping method to increase the ReRAM crossbar's throughput. Our evaluation demonstrates the effectiveness of BWQ, which achieves a 6.08x speedup and a 17.47x energy saving on average compared to existing ReRAM-based architectures. △ Less

Submitted 27 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 12 pages, 13 figures

arXiv:2308.01790 [pdf, ps, other]

Exact structures for persistence modules

Authors: Benjamin Blanchette, Thomas Brüstle, Eric J. Hanson

Abstract: We discuss applications of exact structures and relative homological algebra to the study of invariants of multiparameter persistence modules. This paper is mostly expository, but does contain a pair of novel results. Over finite posets, classical arguments about the relative projective modules of an exact structure make use of Auslander-Reiten theory. One of our results establishes a new adjuncti… ▽ More We discuss applications of exact structures and relative homological algebra to the study of invariants of multiparameter persistence modules. This paper is mostly expository, but does contain a pair of novel results. Over finite posets, classical arguments about the relative projective modules of an exact structure make use of Auslander-Reiten theory. One of our results establishes a new adjunction which allows us to "lift" these arguments to certain infinite posets over which Auslander-Reiten theory is not available. We give several examples of this lifting, in particular highlighting the non-existence and existence of resolutions by upsets when working with finitely presentable representations of the plane and of the closure of the positive quadrant, respectively. We then restrict our attention to finite posets. In this setting, we discuss the relationship between the global dimension of an exact structure and the representation dimension of the incidence algebra of the poset. We conclude with our second novel contribution. This is an explicit description of the irreducible morphisms between relative projective modules for several exact structures which have appeared previously in the literature. △ Less

Submitted 16 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: v2: corrected typos and minor erros, 25 pages

MSC Class: 55N31; 16G20; 18G25 (primary); 16E10; 16E20; 16S50; 19A49 (secondary)

arXiv:2302.06417 [pdf, other]

Analog, In-memory Compute Architectures for Artificial Intelligence

Authors: Patrick Bowen, Guy Regev, Nir Regev, Bruno Pedroni, Edward Hanson, Yiran Chen

Abstract: This paper presents an analysis of the fundamental limits on energy efficiency in both digital and analog in-memory computing architectures, and compares their performance to single instruction, single data (scalar) machines specifically in the context of machine inference. The focus of the analysis is on how efficiency scales with the size, arithmetic intensity, and bit precision of the computati… ▽ More This paper presents an analysis of the fundamental limits on energy efficiency in both digital and analog in-memory computing architectures, and compares their performance to single instruction, single data (scalar) machines specifically in the context of machine inference. The focus of the analysis is on how efficiency scales with the size, arithmetic intensity, and bit precision of the computation to be performed. It is shown that analog, in-memory computing architectures can approach arbitrarily high energy efficiency as both the problem size and processor size scales. △ Less

Submitted 13 January, 2023; originally announced February 2023.

Comments: 17 pages, 10 figures

arXiv:2212.14337 [pdf, other]

Biologically Plausible Learning on Neuromorphic Hardware Architectures

Authors: Christopher Wolters, Brady Taylor, Edward Hanson, Xiaoxuan Yang, Ulf Schlichtmann, Yiran Chen

Abstract: With an ever-growing number of parameters defining increasingly complex networks, Deep Learning has led to several breakthroughs surpassing human performance. As a result, data movement for these millions of model parameters causes a growing imbalance known as the memory wall. Neuromorphic computing is an emerging paradigm that confronts this imbalance by performing computations directly in analog… ▽ More With an ever-growing number of parameters defining increasingly complex networks, Deep Learning has led to several breakthroughs surpassing human performance. As a result, data movement for these millions of model parameters causes a growing imbalance known as the memory wall. Neuromorphic computing is an emerging paradigm that confronts this imbalance by performing computations directly in analog memories. On the software side, the sequential Backpropagation algorithm prevents efficient parallelization and thus fast convergence. A novel method, Direct Feedback Alignment, resolves inherent layer dependencies by directly passing the error from the output to each layer. At the intersection of hardware/software co-design, there is a demand for developing algorithms that are tolerable to hardware nonidealities. Therefore, this work explores the interrelationship of implementing bio-plausible learning in-situ on neuromorphic hardware, emphasizing energy, area, and latency constraints. Using the benchmarking framework DNN+NeuroSim, we investigate the impact of hardware nonidealities and quantization on algorithm performance, as well as how network topologies and algorithm-level design choices can scale latency, energy and area consumption of a chip. To the best of our knowledge, this work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa. The best results achieved for accuracy remain Backpropagation-based, notably when facing hardware imperfections. Direct Feedback Alignment, on the other hand, allows for significant speedup due to parallelization, reducing training time by a factor approaching N for N-layered networks. △ Less

Submitted 11 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

arXiv:2112.07632 [pdf, ps, other]

Homological approximations in persistence theory

Authors: Benjamin Blanchette, Thomas Brüstle, Eric J. Hanson

Abstract: We define a class of invariants, which we call homological invariants, for persistence modules over a finite poset. Informally, a homological invariant is one that respects some homological data and takes values in the free abelian group generated by a finite set of indecomposable modules. We focus in particular on groups generated by "spread modules", which are sometimes called "interval modules"… ▽ More We define a class of invariants, which we call homological invariants, for persistence modules over a finite poset. Informally, a homological invariant is one that respects some homological data and takes values in the free abelian group generated by a finite set of indecomposable modules. We focus in particular on groups generated by "spread modules", which are sometimes called "interval modules" in the persistence theory literature. We show that both the dimension vector and rank invariant are equivalent to homological invariants taking values in groups generated by spread modules. We also show that that the free abelian group generated by the "single-source" spread modules gives rise to a new invariant which is finer than the rank invariant. △ Less

Submitted 26 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: v2 (sizable update): added numerous references, reorganized paper, added new section on motivation and related work (Section 3), expanded upon the relationship between homological invariants and dimensions of hom-spaces (Theorem 1.1), extended main Theorem 1.2 (formerly Theorem 1.1), corrected errors in comparisons to other invariants (Section 7). 23 pages, comments welcome!

MSC Class: 55N31; 16E20 (primary); 16Z05; 18G35 (secondary)

arXiv:2005.07133 [pdf, other]

PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Authors: Shiyu Li, Edward Hanson, Hai Li, Yiran Chen

Abstract: Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult… ▽ More Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since redundancies within the CNN model are not fully exploited. We argue that kernel granularity decomposition can be conducted with low-rank assumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously by (1) implementing kernel sharing in convolution layers via a small number of basis kernels and (2) alternately adjusting bases and coefficients with sparse constraints. Experiments show that we can prune 97% parameters and 92% FLOPs on ResNet18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time memory consumption and a 53% reduction in inference latency. △ Less

Submitted 24 June, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: 9 pages, 5 figures, to appear on ICML2020

arXiv:1909.06981 [pdf, ps, other]

doi 10.4171/90-1/20

Universal proofs of entropic continuity bounds via majorization flow

Authors: Eric P. Hanson, Nilanjana Datta

Abstract: We introduce a notion of majorization flow, and demonstrate it to be a powerful tool for deriving simple and universal proofs of continuity bounds for entropic functions relevant in information theory. In particular, for the case of the alpha-Rényi entropy, whose connections to thermodynamics are discussed in this article, majorization flow yields a Lipschitz continuity bound for the case alpha >… ▽ More We introduce a notion of majorization flow, and demonstrate it to be a powerful tool for deriving simple and universal proofs of continuity bounds for entropic functions relevant in information theory. In particular, for the case of the alpha-Rényi entropy, whose connections to thermodynamics are discussed in this article, majorization flow yields a Lipschitz continuity bound for the case alpha > 1, thus resolving an open problem and providing a substantial improvement over previously known bounds. △ Less

Submitted 22 July, 2021; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: 29 pages; v2: added Cor. 3.2, Section 7, shortened some proofs, minor fixes; v3: added Section 6.2, minor fixes

arXiv:1809.11143 [pdf, ps, other]

Duality between source coding with quantum side information and c-q channel coding

Authors: Hao-Chung Cheng, Eric P. Hanson, Nilanjana Datta, Min-Hsiu Hsieh

Abstract: In this paper, we establish an interesting duality between two different quantum information-processing tasks, namely, classical source coding with quantum side information, and channel coding over c-q channels. The duality relates the optimal error exponents of these two tasks, generalizing the classical results of Ahlswede and Dueck. We establish duality both at the operational level and at the… ▽ More In this paper, we establish an interesting duality between two different quantum information-processing tasks, namely, classical source coding with quantum side information, and channel coding over c-q channels. The duality relates the optimal error exponents of these two tasks, generalizing the classical results of Ahlswede and Dueck. We establish duality both at the operational level and at the level of the entropic quantities characterizing these exponents. For the latter, the duality is given by an exact relation, whereas for the former, duality manifests itself in the following sense: an optimal coding strategy for one task can be used to construct an optimal coding strategy for the other task. Along the way, we derive a bound on the error exponent for c-q channel coding with constant composition codes which might be of independent interest. △ Less

Submitted 28 September, 2018; originally announced September 2018.

Comments: 35 pages

arXiv:1803.07505 [pdf, other]

doi 10.1109/TIT.2020.3038517

Non-Asymptotic Classical Data Compression with Quantum Side Information

Authors: Hao-Chung Cheng, Eric P. Hanson, Nilanjana Datta, Min-Hsiu Hsieh

Abstract: In this paper, we analyze classical data compression with quantum side information (also known as the classical-quantum Slepian-Wolf protocol) in the so-called large and moderate deviation regimes. In the non-asymptotic setting, the protocol involves compressing classical sequences of finite length $n$ and decoding them with the assistance of quantum side information. In the large deviation regime… ▽ More In this paper, we analyze classical data compression with quantum side information (also known as the classical-quantum Slepian-Wolf protocol) in the so-called large and moderate deviation regimes. In the non-asymptotic setting, the protocol involves compressing classical sequences of finite length $n$ and decoding them with the assistance of quantum side information. In the large deviation regime, the compression rate is fixed, and we obtain bounds on the error exponent function, which characterizes the minimal probability of error as a function of the rate. Devetak and Winter showed that the asymptotic data compression limit for this protocol is given by a conditional entropy. For any protocol with a rate below this quantity, the probability of error converges to one asymptotically and its speed of convergence is given by the strong converse exponent function. We obtain finite blocklength bounds on this function, and determine exactly its asymptotic value. In the moderate deviation regime for the compression rate, the latter is no longer considered to be fixed. It is allowed to depend on the blocklength $n$, but assumed to decay slowly to the asymptotic data compression limit. Starting from a rate above this limit, we determine the speed of convergence of the error probability to zero and show that it is given in terms of the conditional information variance. Our results complement earlier results obtained by Tomamichel and Hayashi, in which they analyzed the so-called small deviation regime of this protocol. △ Less

Submitted 26 March, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: 45 pages, 3 figures; v2 added reference [23] (prior work on strong converse exponent lower bounds); v3 fixed typos and added comparisons with reference [23]

arXiv:1707.04249 [pdf, other]

Tight uniform continuity bound for a family of entropies

Authors: Eric P. Hanson, Nilanjana Datta

Abstract: We prove a tight uniform continuity bound for a family of entropies which includes the von Neumann entropy, the Tsallis entropy and the $α$-Rényi entropy, $S_α$, for $α\in (0,1)$. We establish necessary and sufficient conditions for equality in the continuity bound and prove that these conditions are the same for every member of the family. Our result builds on recent work in which we constructed… ▽ More We prove a tight uniform continuity bound for a family of entropies which includes the von Neumann entropy, the Tsallis entropy and the $α$-Rényi entropy, $S_α$, for $α\in (0,1)$. We establish necessary and sufficient conditions for equality in the continuity bound and prove that these conditions are the same for every member of the family. Our result builds on recent work in which we constructed a state which was majorized by every state in a neighbourhood ($\varepsilon$-ball) of a given state, and thus was the minimal state in majorization order in the $\varepsilon$-ball. This minimal state satisfies a particular semigroup property, which we exploit to prove our bound. △ Less

Submitted 20 July, 2017; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: 16 pages, 4 figures. v2: added missing definition of Tsallis entropy, corrected minor typos

arXiv:1507.06217 [pdf, other]

Persistence Images: A Stable Vector Representation of Persistent Homology

Authors: Henry Adams, Sofya Chepushtanova, Tegan Emerson, Eric Hanson, Michael Kirby, Francis Motta, Rachel Neville, Chris Peterson, Patrick Shipman, Lori Ziegelmeier

Abstract: Many datasets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a dataset. A useful representation of this homological information is a persistence diagram (PD). Effo… ▽ More Many datasets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a dataset. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs. △ Less

Submitted 11 July, 2016; v1 submitted 22 July, 2015; originally announced July 2015.

Comments: Version 3 contains updated theoretical results supporting methodology; expanded discussion of related works; extended list of references; extended applications section; additional experimental results and new figures

ACM Class: F.2.2; I.5.2

Journal ref: Journal of Machine Learning Research 18 (2017), Number 8, 1-35

Showing 1–11 of 11 results for author: Hanson, E