-
Employing Vector Field Techniques on the Analysis of Memristor Cellular Nonlinear Networks Cell Dynamics
Authors:
Chandan Singh,
Vasileios Ntinas,
Dimitrios Prousalis,
Yongmin Wang,
Ahmet Samil Demirkol,
Ioannis Messaris,
Vikas Rana,
Stephan Menzel,
Alon Ascoli,
Ronald Tetzlaff
Abstract:
This paper introduces an innovative graphical analysis tool for investigating the dynamics of Memristor Cellular Nonlinear Networks (M-CNNs) featuring 2nd-order processing elements, known as M-CNN cells. In the era of specialized hardware catering to the demands of intelligent autonomous systems, the integration of memristors within Cellular Nonlinear Networks (CNNs) has emerged as a promising par…
▽ More
This paper introduces an innovative graphical analysis tool for investigating the dynamics of Memristor Cellular Nonlinear Networks (M-CNNs) featuring 2nd-order processing elements, known as M-CNN cells. In the era of specialized hardware catering to the demands of intelligent autonomous systems, the integration of memristors within Cellular Nonlinear Networks (CNNs) has emerged as a promising paradigm due to their exceptional characteristics. However, the standard Dynamic Route Map (DRM) analysis, applicable to 1st-order systems, fails to address the intricacies of 2nd-order M-CNN cell dynamics, as well the 2nd-order DRM (DRM2) exhibits limitations on the graphical illustration of local dynamical properties of the M-CNN cells, e.g. state derivative's magnitude. To address this limitation, we propose a novel integration of M-CNN cell vector field into the cell's phase portrait, enhancing the analysis efficacy and enabling efficient M-CNN cell design. A comprehensive exploration of M-CNN cell dynamics is presented, showcasing the utility of the proposed graphical tool for various scenarios, including bistable and monostable behavior, and demonstrating its superior ability to reveal subtle variations in cell behavior. Through this work, we offer a refined perspective on the analysis and design of M-CNNs, paving the way for advanced applications in edge computing and specialized hardware.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
In-Memory Mirroring: Cloning Without Reading
Authors:
Simranjeet Singh,
Ankit Bende,
Chandan Kumar Jha,
Vikas Rana,
Rolf Drechsler,
Sachin Patkar,
Farhad Merchant
Abstract:
In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which…
▽ More
In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which results in performance and energy penalties. To solve this fundamental problem, this paper presents in-memory mirroring (IMM). IMM eliminates the need for read and write-back steps, thus avoiding energy and performance penalties. Instead, we perform data movement within memory, involving row-wise and column-wise data transfers. Additionally, the IMM scheme enables parallel cloning of entire row (word) with a complexity of $\mathcal{O}(1)$. Moreover, our analysis of the energy consumption of the proposed technique using resistive random-access memory crossbar and experimentally validated JART VCM v1b model. The IMM increases energy efficiency and shows 2$\times$ performance improvement compared to conventional data movement methods.
△ Less
Submitted 4 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Error Detection and Correction Codes for Safe In-Memory Computations
Authors:
Luca Parrini,
Taha Soliman,
Benjamin Hettwer,
Jan Micha Borrmann,
Simranjeet Singh,
Ankit Bende,
Vikas Rana,
Farhad Merchant,
Norbert Wehn
Abstract:
In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate a…
▽ More
In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate an architectural-level mitigation technique based on the coordinated action of multiple checksum codes, to detect and correct errors at run-time. This implementation demonstrates higher efficiency in recovering accuracy across different AI algorithms and technologies compared to more traditional methods such as Triple Modular Redundancy (TMR). The results show that several configurations of our implementation recover more than 91% of the original accuracy with less than half of the area required by TMR and less than 40% of latency overhead.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Experimental Validation of Memristor-Aided Logic Using 1T1R TaOx RRAM Crossbar Array
Authors:
Ankit Bende,
Simranjeet Singh,
Chandan Kumar Jha,
Tim Kempen,
Felix Cüppers,
Christopher Bengel,
Andre Zambanini,
Dennis Nielinger,
Sachin Patkar,
Rolf Drechsler,
Rainer Waser,
Farhad Merchant,
Vikas Rana
Abstract:
Memristor-aided logic (MAGIC) design style holds a high promise for realizing digital logic-in-memory functionality. The ability to implement a specific gate in a MAGIC design style hinges on the SET-to-RESET threshold ratio. The TaOx memristive devices exhibit distinct SET-to-RESET ratios, enabling the implementation of OR and NOT operations. As the adoption of the MAGIC design style gains moment…
▽ More
Memristor-aided logic (MAGIC) design style holds a high promise for realizing digital logic-in-memory functionality. The ability to implement a specific gate in a MAGIC design style hinges on the SET-to-RESET threshold ratio. The TaOx memristive devices exhibit distinct SET-to-RESET ratios, enabling the implementation of OR and NOT operations. As the adoption of the MAGIC design style gains momentum, it becomes crucial to understand the breakdown of energy consumption in the various phases of its operation. This paper presents experimental demonstrations of the OR and NOT gates on a 1T1R crossbar array. Additionally, it provides insights into the energy distribution for performing these operations at different stages. Through our experiments across different gates, we found that the energy consumption is dominated by initialization in the MAGIC design style. The energy split-up is 14.8%, 85%, and 0.2% for execution, initialization, and read operations respectively.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
MemSPICE: Automated Simulation and Energy Estimation Framework for MAGIC-Based Logic-in-Memory
Authors:
Simranjeet Singh,
Chandan Kumar Jha,
Ankit Bende,
Vikas Rana,
Sachin Patkar,
Rolf Drechsler,
Farhad Merchant
Abstract:
Existing logic-in-memory (LiM) research is limited to generating mappings and micro-operations. In this paper, we present~\emph{MemSPICE}, a novel framework that addresses this gap by automatically generating both the netlist and testbench needed to evaluate the LiM on a memristive crossbar. MemSPICE goes beyond conventional approaches by providing energy estimation scripts to calculate the precis…
▽ More
Existing logic-in-memory (LiM) research is limited to generating mappings and micro-operations. In this paper, we present~\emph{MemSPICE}, a novel framework that addresses this gap by automatically generating both the netlist and testbench needed to evaluate the LiM on a memristive crossbar. MemSPICE goes beyond conventional approaches by providing energy estimation scripts to calculate the precise energy consumption of the testbench at the SPICE level. We propose an automated framework that utilizes the mapping obtained from the SIMPLER tool to perform accurate energy estimation through SPICE simulations. To the best of our knowledge, no existing framework is capable of generating a SPICE netlist from a hardware description language. By offering a comprehensive solution for SPICE-based netlist generation, testbench creation, and accurate energy estimation, MemSPICE empowers researchers and engineers working on memristor-based LiM to enhance their understanding and optimization of energy usage in these systems. Finally, we tested the circuits from the ISCAS'85 benchmark on MemSPICE and conducted a detailed energy analysis.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Semi-Quantitative Group Testing for Efficient and Accurate qPCR Screening of Pathogens with a Wide Range of Loads
Authors:
Ananthan Nambiar,
Chao Pan,
Vishal Rana,
Mahdi Cheraghchi,
João Ribeiro,
Sergei Maslov,
Olgica Milenkovic
Abstract:
Pathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the dise…
▽ More
Pathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the disease prevalence is low. However, it does not fully utilize the quantitative information provided by qPCR methods, nor is it able to accommodate a wide range of pathogen loads. To address these issues, we introduce a novel adaptive semi-quantitative group testing (SQGT) scheme to efficiently screen populations via two-stage qPCR testing. The SQGT method quantizes cycle threshold ($Ct$) values into multiple bins, leveraging the information from the first stage of screening to improve the detection sensitivity. Dynamic $Ct$ threshold adjustments mitigate dilution effects and enhance test accuracy. Comparisons with traditional binary outcome GT methods show that SQGT reduces the number of tests by $24$% while maintaining a negligible false negative rate.
△ Less
Submitted 2 August, 2023; v1 submitted 30 July, 2023;
originally announced July 2023.
-
Should We Even Optimize for Execution Energy? Rethinking Mapping for MAGIC Design Style
Authors:
Simranjeet Singh,
Chandan Kumar Jha,
Ankit Bende,
Phrangboklang Lyngton Thangkhiew,
Vikas Rana,
Sachin Patkar,
Rolf Drechsler,
Farhad Merchant
Abstract:
Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessin…
▽ More
Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessing such an implementation's significance. The current energy estimation methods rely on coarse-grained techniques, which underestimate the energy consumption of MAGIC-styled operations performed on a memristor crossbar. To address this issue, we analyze the energy breakdown in MAGIC operations and propose a solution that utilizes mapping from the SIMPLER MAGIC tool to achieve accurate energy estimation through SPICE simulations. In contrast to existing research that primarily focuses on optimizing execution energy, our findings reveal that the memristor's initialization energy in the MAGIC design style is, on average, 68x higher. We demonstrate that this initialization energy significantly dominates the overall energy consumption. By highlighting this aspect, we aim to redirect the attention of designers towards developing algorithms and strategies that prioritize optimizations in initializations rather than execution for more effective energy savings.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Finite State Automata Design using 1T1R ReRAM Crossbar
Authors:
Simranjeet Singh,
Omar Ghazal,
Chandan Kumar Jha,
Vikas Rana,
Rolf Drechsler,
Rishad Shafik,
Alex Yakovlev,
Sachin Patkar,
Farhad Merchant
Abstract:
Data movement costs constitute a significant bottleneck in modern machine learning (ML) systems. When combined with the computational complexity of algorithms, such as neural networks, designing hardware accelerators with low energy footprint remains challenging. Finite state automata (FSA) constitute a type of computation model used as a low-complexity learning unit in ML systems. The implementat…
▽ More
Data movement costs constitute a significant bottleneck in modern machine learning (ML) systems. When combined with the computational complexity of algorithms, such as neural networks, designing hardware accelerators with low energy footprint remains challenging. Finite state automata (FSA) constitute a type of computation model used as a low-complexity learning unit in ML systems. The implementation of FSA consists of a number of memory states. However, FSA can be in one of the states at a given time. It switches to another state based on the present state and input to the FSA. Due to its natural synergy with memory, it is a promising candidate for in-memory computing for reduced data movement costs. This work focuses on a novel FSA implementation using resistive RAM (ReRAM) for state storage in series with a CMOS transistor for biasing controls. We propose using multi-level ReRAM technology capable of transitioning between states depending on bias pulse amplitude and duration. We use an asynchronous control circuit for writing each ReRAM-transistor cell for the on-demand switching of the FSA. We investigate the impact of the device-to-device and cycle-to-cycle variations on the cell and show that FSA transitions can be seamlessly achieved without degradation of performance. Through extensive experimental evaluation, we demonstrate the implementation of FSA on 1T1R ReRAM crossbar.
△ Less
Submitted 30 June, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Integrated Architecture for Neural Networks and Security Primitives using RRAM Crossbar
Authors:
Simranjeet Singh,
Furqan Zahoor,
Gokulnath Rajendran,
Vikas Rana,
Sachin Patkar,
Anupam Chattopadhyay,
Farhad Merchant
Abstract:
This paper proposes an architecture that integrates neural networks (NNs) and hardware security modules using a single resistive random access memory (RRAM) crossbar. The proposed architecture enables using a single crossbar to implement NN, true random number generator (TRNG), and physical unclonable function (PUF) applications while exploiting the multi-state storage characteristic of the RRAM c…
▽ More
This paper proposes an architecture that integrates neural networks (NNs) and hardware security modules using a single resistive random access memory (RRAM) crossbar. The proposed architecture enables using a single crossbar to implement NN, true random number generator (TRNG), and physical unclonable function (PUF) applications while exploiting the multi-state storage characteristic of the RRAM crossbar for the vector-matrix multiplication operation required for the implementation of NN. The TRNG is implemented by utilizing the crossbar's variation in device switching thresholds to generate random bits. The PUF is implemented using the same crossbar initialized as an entropy source for the TRNG. Additionally, the weights locking concept is introduced to enhance the security of NNs by preventing unauthorized access to the NN weights. The proposed architecture provides flexibility to configure the RRAM device in multiple modes to suit different applications. It shows promise in achieving a more efficient and compact design for the hardware implementation of NNs and security primitives.
△ Less
Submitted 1 May, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Machine Unlearning of Federated Clusters
Authors:
Chao Pan,
Jin Sima,
Saurav Prakash,
Vishal Rana,
Olgica Milenkovic
Abstract:
Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning fo…
▽ More
Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.
△ Less
Submitted 30 June, 2023; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Short Blocklength Wiretap Channel Codes via Deep Learning: Design and Performance Evaluation
Authors:
Vidhi Rana,
Remi A. Chou
Abstract:
We design short blocklength codes for the Gaussian wiretap channel under information-theoretic security guarantees. Our approach consists in decoupling the reliability and secrecy constraints in our code design. Specifically, we handle the reliability constraint via an autoencoder, and handle the secrecy constraint with hash functions. For blocklengths smaller than or equal to 128, we evaluate thr…
▽ More
We design short blocklength codes for the Gaussian wiretap channel under information-theoretic security guarantees. Our approach consists in decoupling the reliability and secrecy constraints in our code design. Specifically, we handle the reliability constraint via an autoencoder, and handle the secrecy constraint with hash functions. For blocklengths smaller than or equal to 128, we evaluate through simulations the probability of error at the legitimate receiver and the leakage at the eavesdropper for our code construction. This leakage is defined as the mutual information between the confidential message and the eavesdropper's channel observations, and is empirically measured via a neural network-based mutual information estimator. Our simulation results provide examples of codes with positive secrecy rates that outperform the best known achievable secrecy rates obtained non-constructively for the Gaussian wiretap channel. Additionally, we show that our code design is suitable for the compound and arbitrarily varying Gaussian wiretap channels, for which the channel statistics are not perfectly known but only known to belong to a pre-specified uncertainty set. These models not only capture uncertainty related to channel statistics estimation, but also scenarios where the eavesdropper jams the legitimate transmission or influences its own channel statistics by changing its location.
△ Less
Submitted 23 January, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
NeuroHammer: Inducing Bit-Flips in Memristive Crossbar Memories
Authors:
Felix Staudigl,
Hazem Al Indari,
Daniel Schön,
Dominik Sisejkovic,
Farhad Merchant,
Jan Moritz Joseph,
Vikas Rana,
Stephan Menzel,
Rainer Leupers
Abstract:
Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we p…
▽ More
Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we present \emph{NeuroHammer}, a security threat in ReRAM crossbars caused by thermal crosstalk between memory cells. We demonstrate that bit-flips can be deliberately induced in ReRAM devices in a crossbar by systematically writing adjacent memory cells. A simulation flow is developed to evaluate NeuroHammer and the impact of physical parameters on the effectiveness of the attack. Finally, we discuss the security implications in the context of possible attack scenarios.
△ Less
Submitted 6 December, 2021; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Secret Sharing from Correlated Gaussian Random Variables and Public Communication
Authors:
Vidhi Rana,
Remi A. Chou,
Hyuck Kwon
Abstract:
In this paper, we study an information-theoretic secret sharing problem, where a dealer distributes shares of a secret among a set of participants under the following constraints: (i) authorized sets of users can recover the secret by pooling their shares, and (ii) non-authorized sets of colluding users cannot learn any information about the secret. We assume that the dealer and participants obser…
▽ More
In this paper, we study an information-theoretic secret sharing problem, where a dealer distributes shares of a secret among a set of participants under the following constraints: (i) authorized sets of users can recover the secret by pooling their shares, and (ii) non-authorized sets of colluding users cannot learn any information about the secret. We assume that the dealer and participants observe the realizations of correlated Gaussian random variables and that the dealer can communicate with participants through a one-way, authenticated, rate-limited, and public channel. Unlike traditional secret sharing protocols, in our setting, no perfectly secure channel is needed between the dealer and the participants. Our main result is a closed-form characterization of the fundamental trade-off between secret rate and public communication rate.
△ Less
Submitted 11 November, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Performance Portability Strategies for Grid C++ Expression Templates
Authors:
Peter A. Boyle,
M. A. Clark,
Carleton DeTar,
Meifeng Lin,
Verinder Rana,
Alejandro Vaquero Avilés-Casco
Abstract:
One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Ju…
▽ More
One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)$\times$SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.