Search | arXiv e-print repository

Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance

Authors: Osama Yousuf, Brian Hoskins, Karthick Ramu, Mitchell Fream, William A. Borders, Advait Madhavan, Matthew W. Daniels, Andrew Dienstfrey, Jabez J. McClelland, Martin Lueker-Boden, Gina C. Adam

Abstract: Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solution… ▽ More Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2312.13171 [pdf, other]

Programmable electrical coupling between stochastic magnetic tunnel junctions

Authors: Sidra Gibeault, Temitayo N. Adeyeye, Liam A. Pocher, Daniel P. Lathrop, Matthew W. Daniels, Mark D. Stiles, Jabez J. McClelland, William A. Borders, Jason T. Ryan, Philippe Talatchian, Ursula Ebels, Advait Madhavan

Abstract: Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable pr… ▽ More Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable probabilistic computing methods connecting $p$-bits have been proposed, practical implementations are limited by either minimal tunability or energy inefficient microprocessors-in-the-loop. In this work, we experimentally demonstrate the functionality of a scalable analog unit cell, namely a pair of $p$-bits with programmable electrical coupling. This tunable coupling is implemented with operational amplifier circuits that have a time constant of approximately 1us, which is faster than the mean dwell times of the SMTJs over most of the operating range. Programmability enables flexibility, allowing both positive and negative couplings, as well as coupling devices with widely varying device properties. These tunable coupling circuits can achieve the whole range of correlations from $-1$ to $1$, for both devices with similar timescales, and devices whose time scales vary by an order of magnitude. This range of correlation allows such circuits to be used for scalable implementations of simulated annealing with probabilistic computing. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.06446 [pdf, other]

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

Authors: William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

Abstract: The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as… ▽ More The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications. △ Less

Submitted 14 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2211.15925 [pdf]

doi 10.1109/JETCAS.2023.3238295

Device Modeling Bias in ReRAM-based Neural Network Simulations

Authors: Osama Yousuf, Imtiaz Hossen, Matthew W. Daniels, Martin Lueker-Boden, Andrew Dienstfrey, Gina C. Adam

Abstract: Data-driven modeling approaches such as jump tables are promising techniques to model populations of resistive random-access memory (ReRAM) or other emerging memory devices for hardware neural network simulations. As these tables rely on data interpolation, this work explores the open questions about their fidelity in relation to the stochastic device behavior they model. We study how various jump… ▽ More Data-driven modeling approaches such as jump tables are promising techniques to model populations of resistive random-access memory (ReRAM) or other emerging memory devices for hardware neural network simulations. As these tables rely on data interpolation, this work explores the open questions about their fidelity in relation to the stochastic device behavior they model. We study how various jump table device models impact the attained network performance estimates, a concept we define as modeling bias. Two methods of jump table device modeling, binning and Optuna-optimized binning, are explored using synthetic data with known distributions for benchmarking purposes, as well as experimental data obtained from TiOx ReRAM devices. Results on a multi-layer perceptron trained on MNIST show that device models based on binning can behave unpredictably particularly at low number of points in the device dataset, sometimes over-promising, sometimes under-promising target network accuracy. This paper also proposes device level metrics that indicate similar trends with the modeling bias metric at the network level. The proposed approach opens the possibility for future investigations into statistical device models with better performance, as well as experimentally verified modeling bias in different in-memory computing and neural network architectures. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2112.09159 [pdf]

doi 10.1103/PhysRevApplied.18.014039

Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions

Authors: Jonathan M. Goodwill, Nitin Prasad, Brian D. Hoskins, Matthew W. Daniels, Advait Madhavan, Lei Wan, Tiffany S. Santos, Michael Tran, Jordan A. Katine, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland

Abstract: The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magn… ▽ More The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware. △ Less

Submitted 6 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 22 pages plus 8 pages supplemental material; 7 figures plus 7 supplemental figures

Journal ref: Physical Review Applied, 18(1) 014039 (2022)

arXiv:2110.06737 [pdf, other]

doi 10.1103/PhysRevB.105.014411

Easy-plane spin Hall nano-oscillators as spiking neurons for neuromorphic computing

Authors: Danijela Marković, Matthew W. Daniels, Pankaj Sethi, Andrew D. Kent, Mark D. Stiles, Julie Grollier

Abstract: We show analytically using a macrospin approximation that easy-plane spin Hall nano-oscillators excited by a spin-current polarized perpendicularly to the easy-plane have phase dynamics analogous to that of Josephson junctions. Similarly to Josephson junctions, they can reproduce the spiking behavior of biological neurons that is appropriate for neuromorphic computing. We perform micromagnetic sim… ▽ More We show analytically using a macrospin approximation that easy-plane spin Hall nano-oscillators excited by a spin-current polarized perpendicularly to the easy-plane have phase dynamics analogous to that of Josephson junctions. Similarly to Josephson junctions, they can reproduce the spiking behavior of biological neurons that is appropriate for neuromorphic computing. We perform micromagnetic simulations of such oscillators realized in the nano-constriction geometry and show that the easy-plane spiking dynamics is preserved in an experimentally feasible architecture. Finally we simulate two elementary neural network blocks that implement operations essential for neuromorphic computing. First, we show that output spikes energies from two neurons can be summed and injected into a following layer neuron and second, we demonstrate that outputs can be multiplied by synaptic weights implemented by locally modifying the anisotropy. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 9 pages, 11 figures

arXiv:2106.03604 [pdf, other]

doi 10.1103/PhysRevB.104.054427

Mutual control of stochastic switching for two electrically coupled superparamagnetic tunnel junctions

Authors: Philippe Talatchian, Matthew W. Daniels, Advait Madhavan, Matthew R. Pufall, Emilie Jué, William H. Rippard, Jabez J. McClelland, Mark D. Stiles

Abstract: Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally indu… ▽ More Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally induced transition, the voltage across both SMTJs changes, modifying the transition rates of both. This coupling leads to significant correlation between the states of the two devices. Using fits to a generalized Néel-Brown model for the individual thermally bistable magnetic devices, we can accurately reproduce the behavior of the coupled devices with a Markov model. △ Less

Submitted 19 August, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 12 pages, 11 figures

Journal ref: Phys. Rev. B 104, 054427 (2021)

arXiv:2005.10704 [pdf, other]

doi 10.1109/JXCDC.2020.3022381

Temporal Memory with Magnetic Racetracks

Authors: Hamed Vakili, Mohammad Nazmus Sakib, Samiran Ganguly, Mircea Stan, Matthew W. Daniels, Advait Madhavan, Mark D. Stiles, Avik W. Ghosh

Abstract: Race logic is a relative timing code that represents information in a wavefront of digital edges on a set of wires in order to accelerate dynamic programming and machine learning algorithms. Skyrmions, bubbles, and domain walls are mobile magnetic configurations (solitons) with applications for Boolean data storage. We propose to use current-induced displacement of these solitons on magnetic racet… ▽ More Race logic is a relative timing code that represents information in a wavefront of digital edges on a set of wires in order to accelerate dynamic programming and machine learning algorithms. Skyrmions, bubbles, and domain walls are mobile magnetic configurations (solitons) with applications for Boolean data storage. We propose to use current-induced displacement of these solitons on magnetic racetracks as a native temporal memory for race logic computing. Locally synchronized racetracks can spatially store relative timings of digital edges and provide non-destructive read-out. The linear kinematics of skyrmion motion, the tunability and low-voltage asynchronous operation of the proposed device, and the elimination of any need for constant skyrmion nucleation make these magnetic racetracks a natural memory for low-power, high-throughput race logic applications. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 9 pages, 3 figures, submitted for review

arXiv:2004.12041 [pdf, other]

Memory-efficient training with streaming dimensionality reduction

Authors: Siyuan Huang, Brian D. Hoskins, Matthew W. Daniels, Mark D. Stiles, Gina C. Adam

Abstract: The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads. To minimize this overhead, especially on the movement and calculation of gradient information, we introduce streaming batch principal component analysis as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations… ▽ More The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads. To minimize this overhead, especially on the movement and calculation of gradient information, we introduce streaming batch principal component analysis as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic k-rank approximation of the network gradient. We demonstrate that the low rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini batch gradient descent. These results can lead to both improvements in the design of application specific integrated circuits for deep learning and in the speed of synchronization of machine learning models trained with data parallelism. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:1911.11204 [pdf, other]

doi 10.1103/PhysRevApplied.13.034016

Energy-efficient stochastic computing with superparamagnetic tunnel junctions

Authors: Matthew W. Daniels, Advait Madhavan, Philippe Talatchian, Alice Mizrahi, Mark D. Stiles

Abstract: Superparamagnetic tunnel junctions (SMTJs) have emerged as a competitive, realistic nanotechnology to support novel forms of stochastic computation in CMOS-compatible platforms. One of their applications is to generate random bitstreams suitable for use in stochastic computing implementations. We describe a method for digitally programmable bitstream generation based on pre-charge sense amplifiers… ▽ More Superparamagnetic tunnel junctions (SMTJs) have emerged as a competitive, realistic nanotechnology to support novel forms of stochastic computation in CMOS-compatible platforms. One of their applications is to generate random bitstreams suitable for use in stochastic computing implementations. We describe a method for digitally programmable bitstream generation based on pre-charge sense amplifiers. This generator is significantly more energy efficient than SMTJ-based bitstream generators that tune probabilities with spin currents and a factor of two more efficient than related CMOS-based implementations. The true randomness of this bitstream generator allows us to use them as the fundamental units of a novel neural network architecture. To take advantage of the potential savings, we codesign the algorithm with the circuit, rather than directly transcribing a classical neural network into hardware. The flexibility of the neural network mathematics allows us to adapt the network to the explicitly energy efficient choices we make at the device level. The result is a convolutional neural network design operating at $\approx$ 150 nJ per inference with 97 % performance on MNIST -- a factor of 1.4 to 7.7 improvement in energy efficiency over comparable proposals in the recent literature. △ Less

Submitted 6 March, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: 20 pages (12 pages main text), 12 figures

Journal ref: Phys. Rev. Applied 13, 034016 (2020)

arXiv:1903.01635 [pdf]

doi 10.3389/fnins.2019.00793

Streaming Batch Eigenupdates for Hardware Neuromorphic Networks

Authors: Brian D. Hoskins, Matthew W. Daniels, Siyuan Huang, Advait Madhavan, Gina C. Adam, Nikolai Zhitenev, Jabez J. McClelland, Mark D. Stiles

Abstract: Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the tim… ▽ More Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the time complexity of training does not scale with the network size, it is limited by the space complexity of stochastic gradient descent, which grows quadratically. The main objective of this work is to reduce this space complexity by using low-rank approximations of stochastic gradient descent. This low spatial complexity combined with streaming methods allows for significant reductions in memory and compute overhead, opening the doors for improvements in area, time and energy efficiency of training. We refer to this algorithm and architecture to implement it as the streaming batch eigenupdate (SBE) approach. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 13 pages, 5 figures

Journal ref: Frontiers in Neuroscience 13 (2019): 793

Showing 1–11 of 11 results for author: Daniels, M W