Search | arXiv e-print repository

How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training

Authors: Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel, Markus Nage

Abstract: This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particula… ▽ More This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2401.11002 [pdf, other]

Fast Registration of Photorealistic Avatars for VR Facial Animation

Authors: Chaitanya Patel, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei

Abstract: Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime mode… ▽ More Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime models are significantly degraded. Online registration is also challenging due to oblique camera views and differences in modality. In this work, we first show that the domain gap between the avatar and headset-camera images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we develop a system design that decouples the problem into two parts: 1) an iterative refinement module that takes in-domain inputs, and 2) a generic avatar-guided image-to-image style transfer module that is conditioned on current estimation of expression and head pose. These two modules reinforce each other, as image style transfer becomes easier when close-to-ground-truth examples are shown, and better domain-gap removal helps registration. Our system produces high-quality results efficiently, obviating the need for costly offline registration to generate personalized labels. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over direct regression methods as well as offline registration. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Project page: https://chaitanya100100.github.io/FastRegistration/

arXiv:2309.14670 [pdf, other]

DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks

Authors: Sweta Priyadarshi, Tianyu Jiang, Hsin-Pai Cheng, Sendil Krishna, Viswanath Ganapathy, Chirag Patel

Abstract: With the growing demand for vision applications and deployment across edge devices, the development of hardware-friendly architectures that maintain performance during device deployment becomes crucial. Neural architecture search (NAS) techniques explore various approaches to discover efficient architectures for diverse learning tasks in a computationally efficient manner. In this paper, we presen… ▽ More With the growing demand for vision applications and deployment across edge devices, the development of hardware-friendly architectures that maintain performance during device deployment becomes crucial. Neural architecture search (NAS) techniques explore various approaches to discover efficient architectures for diverse learning tasks in a computationally efficient manner. In this paper, we present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2 . Conventional NAS algorithms rely on a computationally extensive stage where an accuracy predictor is learned to estimate model performance within search space. This building of accuracy predictors helps them predict the performance of models that are not being finetuned. Here, we have developed an elegant approach to eliminate building the accuracy predictor and extend DONNA to a computationally efficient setting. The loss metric of individual blocks forming the network serves as the surrogate performance measure for the sampled models in the NAS search stage. To validate the performance of DONNAv2 we have performed extensive experiments involving a range of diverse vision tasks including classification, object detection, image denoising, super-resolution, and panoptic perception network (YOLOP). The hardware-in-the-loop experiments were carried out using the Samsung Galaxy S10 mobile platform. Notably, DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets. Furthermore, to improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted at ICCV-Workshop on Resource-Efficient Deep Learning, 2023

arXiv:2309.07730 [pdf, other]

AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks

Authors: Soumadeep Das, Aryan Mohammadi Pasikhani, Prosanta Gope, John A. Clark, Chintan Patel, Biplab Sikdar

Abstract: Underwater Acoustic Sensor Networks (UW-ASNs) are predominantly used for underwater environments and find applications in many areas. However, a lack of security considerations, the unstable and challenging nature of the underwater environment, and the resource-constrained nature of the sensor nodes used for UW-ASNs (which makes them incapable of adopting security primitives) make the UW-ASN prone… ▽ More Underwater Acoustic Sensor Networks (UW-ASNs) are predominantly used for underwater environments and find applications in many areas. However, a lack of security considerations, the unstable and challenging nature of the underwater environment, and the resource-constrained nature of the sensor nodes used for UW-ASNs (which makes them incapable of adopting security primitives) make the UW-ASN prone to vulnerabilities. This paper proposes an Adaptive decentralised Intrusion Detection and Prevention System called AIDPS for UW-ASNs. The proposed AIDPS can improve the security of the UW-ASNs so that they can efficiently detect underwater-related attacks (e.g., blackhole, grayhole and flooding attacks). To determine the most effective configuration of the proposed construction, we conduct a number of experiments using several state-of-the-art machine learning algorithms (e.g., Adaptive Random Forest (ARF), light gradient-boosting machine, and K-nearest neighbours) and concept drift detection algorithms (e.g., ADWIN, kdqTree, and Page-Hinkley). Our experimental results show that incremental ARF using ADWIN provides optimal performance when implemented with One-class support vector machine (SVM) anomaly-based detectors. Furthermore, our extensive evaluation results also show that the proposed scheme outperforms state-of-the-art bench-marking methods while providing a wider range of desirable features such as scalability and complexity. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.01729 [pdf, other]

Softmax Bias Correction for Quantized Generative Models

Authors: Nilesh Prasad Pandey, Marios Fournarakis, Chirag Patel, Markus Nagel

Abstract: Post-training quantization (PTQ) is the go-to compression technique for large generative models, such as stable diffusion or large language models. PTQ methods commonly keep the softmax activation in higher precision as it has been shown to be very sensitive to quantization noise. However, this can lead to a significant runtime and power overhead during inference on resource-constraint edge device… ▽ More Post-training quantization (PTQ) is the go-to compression technique for large generative models, such as stable diffusion or large language models. PTQ methods commonly keep the softmax activation in higher precision as it has been shown to be very sensitive to quantization noise. However, this can lead to a significant runtime and power overhead during inference on resource-constraint edge devices. In this work, we investigate the source of the softmax sensitivity to quantization and show that the quantization operation leads to a large bias in the softmax output, causing accuracy degradation. To overcome this issue, we propose an offline bias correction technique that improves the quantizability of softmax without additional compute during deployment, as it can be readily absorbed into the quantization parameters. We demonstrate the effectiveness of our method on stable diffusion v1.5 and 125M-size OPT language model, achieving significant accuracy improvement for 8-bit quantized softmax. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2305.06640 [pdf, other]

Speaker Diaphragm Excursion Prediction: deep attention and online adaptation

Authors: Yuwei Ren, Matt Zivney, Yin Huang, Eddie Choy, Chirag Patel, Hao Xu

Abstract: Speaker protection algorithm is to leverage the playback signal properties to prevent over excursion while maintaining maximum loudness, especially for the mobile phone with tiny loudspeakers. This paper proposes efficient DL solutions to accurately model and predict the nonlinear excursion, which is challenging for conventional solutions. Firstly, we build the experiment and pre-processing pipeli… ▽ More Speaker protection algorithm is to leverage the playback signal properties to prevent over excursion while maintaining maximum loudness, especially for the mobile phone with tiny loudspeakers. This paper proposes efficient DL solutions to accurately model and predict the nonlinear excursion, which is challenging for conventional solutions. Firstly, we build the experiment and pre-processing pipeline, where the feedback current and voltage are sampled as input, and laser is employed to measure the excursion as ground truth. Secondly, one FFTNet model is proposed to explore the dominant low-frequency and other unknown harmonics, and compares to a baseline ConvNet model. In addition, BN re-estimation is designed to explore the online adaptation; and INT8 quantization based on AI Model efficiency toolkit (AIMET\footnote{AIMET is a product of Qualcomm Innovation Center, Inc.}) is applied to further reduce the complexity. The proposed algorithm is verified in two speakers and 3 typical deployment scenarios, and $>$99\% residual DC is less than 0.1 mm, much better than traditional solutions. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 5 pages, 4 figures, ICASSP 2023

arXiv:2305.03266 [pdf, other]

RARES: Runtime Attack Resilient Embedded System Design Using Verified Proof-of-Execution

Authors: Avani Dave Nilanjan Banerjee Chintan Patel

Abstract: Modern society is getting accustomed to the Internet of Things (IoT) and Cyber-Physical Systems (CPS) for a variety of applications that involves security-critical user data and information transfers. In the lower end of the spectrum, these devices are resource-constrained with no attack protection. They become a soft target for malicious code modification attacks that steals and misuses device da… ▽ More Modern society is getting accustomed to the Internet of Things (IoT) and Cyber-Physical Systems (CPS) for a variety of applications that involves security-critical user data and information transfers. In the lower end of the spectrum, these devices are resource-constrained with no attack protection. They become a soft target for malicious code modification attacks that steals and misuses device data in malicious activities. The resilient system requires continuous detection, prevention, and/or recovery and correct code execution (including in degraded mode). By end large, existing security primitives (e.g., secure-boot, Remote Attestation RA, Control Flow Attestation (CFA) and Data Flow Attestation (DFA)) focuses on detection and prevention, leaving the proof of code execution and recovery unanswered. To this end, the proposed work presents lightweight RARES -- Runtime Attack Resilient Embedded System design using verified Proof-of-Execution. It presents first custom hardware control register (Ctrl_register) based runtime memory modification attacks classification and detection technique. It further demonstrates the Proof Of Concept (POC) implementation of use-case-specific attacks prevention and onboard recovery techniques. The prototype implementation on Artix 7 Field Programmable Gate Array (FPGA) and state-of-the-art comparison demonstrates very low (2.3%) resource overhead and efficacy of the proposed solution. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.11489 [pdf, other]

FVCARE:Formal Verification of Security Primitives in Resilient Embedded SoCs

Authors: Avani Dave, Nilanjan Banerjee, Chintan Patel

Abstract: With the increased utilization, the small embedded and IoT devices have become an attractive target for sophisticated attacks that can exploit the devices security critical information and data in malevolent activities. Secure boot and Remote Attestation (RA) techniques verifies the integrity of the devices software state at boot-time and runtime. Correct implementation and formal verification of… ▽ More With the increased utilization, the small embedded and IoT devices have become an attractive target for sophisticated attacks that can exploit the devices security critical information and data in malevolent activities. Secure boot and Remote Attestation (RA) techniques verifies the integrity of the devices software state at boot-time and runtime. Correct implementation and formal verification of these security primitives provide strong security guarantees and enhance user confidence. The formal verification of these security primitives is considered challenging, as it involves complex hardware software interactions, semantics gaps and requires bit-precise reasoning. To address these challenges, this paper presents FVCARE an end to end system co-verification framework. It also defines the security properties for resilient small embedded systems. FVCARE divides the end to end system co verification problem into two modules: 1) verifying the (bit precise) initial system settings, registers, and access control policies by hardware verification techniques, and 2) verifying the system specification, security properties, and functional correctness using source-level software abstraction of the hardware. The evaluation of proposed techniques on SRACARE based systems demonstrates its efficacy in security co verification. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2303.17951 [pdf, other]

FP8 versus INT8 for efficient deep learning inference

Authors: Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

Abstract: Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive t… ▽ More Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models. △ Less

Submitted 15 June, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2302.05397 [pdf, other]

A Practical Mixed Precision Algorithm for Post-Training Quantization

Authors: Nilesh Prasad Pandey, Markus Nagel, Mart van Baalen, Yin Huang, Chirag Patel, Tijmen Blankevoort

Abstract: Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get quantized to the same number of bits. However, for many networks some layers are significantly more robust to quantization noise than others, leaving an important axi… ▽ More Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get quantized to the same number of bits. However, for many networks some layers are significantly more robust to quantization noise than others, leaving an important axis of improvement unused. As many hardware solutions provide multiple different bit-width settings, mixed-precision quantization has emerged as a promising solution to find a better performance-efficiency trade-off than homogeneous quantization. However, most existing mixed precision algorithms are rather difficult to use for practitioners as they require access to the training data, have many hyper-parameters to tune or even depend on end-to-end retraining of the entire model. In this work, we present a simple post-training mixed precision algorithm that only requires a small unlabeled calibration dataset to automatically select suitable bit-widths for each layer for desirable on-device performance. Our algorithm requires no hyper-parameter tuning, is robust to data variation and takes into account practical hardware deployment constraints making it a great candidate for practical use. We experimentally validate our proposed method on several computer vision tasks, natural language processing tasks and many different networks, and show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2211.07090 [pdf, other]

Hand gesture recognition using 802.11ad mmWave sensor in the mobile device

Authors: Yuwei Ren, Jiuyuan Lu, Andrian Beletchi, Yin Huang, Ilia Karmanov, Daniel Fontijne, Chirag Patel, Hao Xu

Abstract: We explore the feasibility of AI assisted hand-gesture recognition using 802.11ad 60GHz (mmWave) technology in smartphones. Range-Doppler information (RDI) is obtained by using pulse Doppler radar for gesture recognition. We built a prototype system, where radar sensing and WLAN communication waveform can coexist by time-division duplex (TDD), to demonstrate the real-time hand-gesture inference. I… ▽ More We explore the feasibility of AI assisted hand-gesture recognition using 802.11ad 60GHz (mmWave) technology in smartphones. Range-Doppler information (RDI) is obtained by using pulse Doppler radar for gesture recognition. We built a prototype system, where radar sensing and WLAN communication waveform can coexist by time-division duplex (TDD), to demonstrate the real-time hand-gesture inference. It can gather sensing data and predict gestures within 100 milliseconds. First, we build the pipeline for the real-time feature processing, which is robust to occasional frame drops in the data stream. RDI sequence restoration is implemented to handle the frame dropping in the continuous data stream, and also applied to data augmentation. Second, different gestures RDI are analyzed, where finger and hand motions can clearly show distinctive features. Third, five typical gestures (swipe, palm-holding, pull-push, finger-sliding and noise) are experimented with, and a classification framework is explored to segment the different gestures in the continuous gesture sequence with arbitrary inputs. We evaluate our architecture on a large multi-person dataset and report > 95% accuracy with one CNN + LSTM model. Further, a pure CNN model is developed to fit to on-device implementation, which minimizes the inference latency, power consumption and computation cost. And the accuracy of this CNN model is more than 93% with only 2.29K parameters. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: 6 pages, 12 figures

Journal ref: 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW)

arXiv:2208.06165 [pdf, other]

Customer Empowered Privacy-Preserving Secure Verification using Decentralized Identifier and Verifiable Credentials For Product Delivery Using Robots

Authors: Chintan Patel

Abstract: In the age of respiratory illnesses like COVID 19, we understand the necessity for a robot based delivery system to ensure safe and contact free courier delivery. A blockchain based Dynamic IDentifier gives people total power over their identities while preserving auditability and anonymity. A human mobile phone and a robot are machines created with a chip, making it simple to deploy a physical un… ▽ More In the age of respiratory illnesses like COVID 19, we understand the necessity for a robot based delivery system to ensure safe and contact free courier delivery. A blockchain based Dynamic IDentifier gives people total power over their identities while preserving auditability and anonymity. A human mobile phone and a robot are machines created with a chip, making it simple to deploy a physical unclonable function based verification system between the robot and the customer. This article presents a novel framework and a first customer verification scheme for verified courier delivery utilizing the blockchain enabled DID and PUF enabled robots. We employ DID for customer authentication between a robot (a service provider) and a customer and PUF for robot verification by the customer. We ve also put the proposed work into practice and demonstrated its capabilities in terms of throughput, latency, computing cost, and communication cost. We also show formal security proof for the proposed user verification scheme based on the tamarin prover. △ Less

Submitted 12 August, 2022; originally announced August 2022.

arXiv:2208.02592 [pdf, other]

Resilient Risk based Adaptive Authentication and Authorization (RAD-AA) Framework

Authors: Jaimandeep Singh, Chintan Patel, Naveen Kumar Chaudhary

Abstract: In recent cyber attacks, credential theft has emerged as one of the primary vectors of gaining entry into the system. Once attacker(s) have a foothold in the system, they use various techniques including token manipulation to elevate the privileges and access protected resources. This makes authentication and token based authorization a critical component for a secure and resilient cyber system. I… ▽ More In recent cyber attacks, credential theft has emerged as one of the primary vectors of gaining entry into the system. Once attacker(s) have a foothold in the system, they use various techniques including token manipulation to elevate the privileges and access protected resources. This makes authentication and token based authorization a critical component for a secure and resilient cyber system. In this paper we discuss the design considerations for such a secure and resilient authentication and authorization framework capable of self-adapting based on the risk scores and trust profiles. We compare this design with the existing standards such as OAuth 2.0, OpenID Connect and SAML 2.0. We then study popular threat models such as STRIDE and PASTA and summarize the resilience of the proposed architecture against common and relevant threat vectors. We call this framework as Resilient Risk based Adaptive Authentication and Authorization (RAD-AA). The proposed framework excessively increases the cost for an adversary to launch and sustain any cyber attack and provides much-needed strength to critical infrastructure. We also discuss the machine learning (ML) approach for the adaptive engine to accurately classify transactions and arrive at risk scores. △ Less

Submitted 29 November, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2207.10353 [pdf, other]

Secure Lightweight Authentication for Multi User IoT Environment

Authors: Chintan Patel

Abstract: The Internet of Things (IoT) is giving a boost to a plethora of new opportunities for the robust and sustainable deployment of cyber physical systems. The cornerstone of any IoT system is the sensing devices. These sensing devices have considerable resource constraints, including insufficient battery capacity, CPU capability, and physical security. Because of such resource constraints, designing l… ▽ More The Internet of Things (IoT) is giving a boost to a plethora of new opportunities for the robust and sustainable deployment of cyber physical systems. The cornerstone of any IoT system is the sensing devices. These sensing devices have considerable resource constraints, including insufficient battery capacity, CPU capability, and physical security. Because of such resource constraints, designing lightweight cryptographic protocols is an opportunity. Remote User Authentication ensures that two parties establish a secure and durable session key. This study presents a lightweight and safe authentication strategy for the user-gateway (U GW) IoT network model. The proposed system is designed leveraging Elliptic Curve Cryptography (ECC). We undertake a formal security analysis with both the Automated Validation of Internet Security Protocols (AVISPA) and Burrows Abadi Needham (BAN) logic tools and an information security assessment with the Delev Yao channel. We use publish subscribe based Message Queuing Telemetry Transport (MQTT) protocol for communication. Additionally, the performance analysis and comparison of security features show that the proposed scheme is resilient to well known cryptographic threats. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.02706 [pdf, other]

LDA-2IoT : A Level Dependent Authentication using Two Factor for IoT Paradigm

Authors: Chintan Patel, Nishant Doshi

Abstract: The widespread expansion of the IoT based services are changing peoples living habits. With the vast data generation and intelligent decision support system, an IoT is supporting many industries to improve their products and services. The major challenge for IoT developers is to design a secure data transmission system and a trustworthy inter device and user device communication system. The data s… ▽ More The widespread expansion of the IoT based services are changing peoples living habits. With the vast data generation and intelligent decision support system, an IoT is supporting many industries to improve their products and services. The major challenge for IoT developers is to design a secure data transmission system and a trustworthy inter device and user device communication system. The data starts its journey from the sensing devices and reaches the user dashboard through a different medium. Authentication between two IoT devices provides a reliable and lightweight key generation system. In this paper, we put forward a novel authentication approach for the IoT paradigm. We postulate an ECC based two factor Level Dependent Authentication for Generic IoT (LDA 2IoT) in which users at a particular level in the hierarchy can access the sensors deployed at below or the equal level of the hierarchy. We impart the security analysis for the proposed LDA 2IoT based on the Dolev Yao channel and widely accepted random oracle based ROR model. We provide the implementation of the proposed scheme using the MQTT protocol. Finally, we set forth a performance analysis for the proposed LDA 2IoT system by comparing it with the other existing scheme. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2201.08442 [pdf, other]

Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)

Authors: Sangeetha Siddegowda, Marios Fournarakis, Markus Nagel, Tijmen Blankevoort, Chirag Patel, Abhijit Khobare

Abstract: While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additi… ▽ More While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additional noise it induces can lead to accuracy degradation. In this white paper, we present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization and thus drive the broader AI ecosystem towards low latency and energy-efficient inference. AIMET provides users with the ability to simulate as well as optimize PyTorch and TensorFlow models. Specifically for quantization, AIMET includes various post-training quantization (PTQ, cf. chapter 4) and quantization-aware training (QAT, cf. chapter 5) techniques that guarantee near floating-point accuracy for 8-bit fixed-point inference. We provide a practical guide to quantization via AIMET by covering PTQ and QAT workflows, code examples and practical tips that enable users to efficiently and effectively quantize models using AIMET and reap the benefits of low-bit integer inference. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2106.08295

arXiv:2111.10079 [pdf, other]

Evaluating Self and Semi-Supervised Methods for Remote Sensing Segmentation Tasks

Authors: Chaitanya Patel, Shashank Sharma, Valerie J. Pasquarella, Varun Gulshan

Abstract: Self- and semi-supervised machine learning techniques leverage unlabeled data for improving downstream task performance. These methods are especially valuable for remote sensing tasks where producing labeled ground truth datasets can be prohibitively expensive but there is easy access to a wealth of unlabeled imagery. We perform a rigorous evaluation of SimCLR, a self-supervised method, and FixMat… ▽ More Self- and semi-supervised machine learning techniques leverage unlabeled data for improving downstream task performance. These methods are especially valuable for remote sensing tasks where producing labeled ground truth datasets can be prohibitively expensive but there is easy access to a wealth of unlabeled imagery. We perform a rigorous evaluation of SimCLR, a self-supervised method, and FixMatch, a semi-supervised method, on three remote sensing tasks: riverbed segmentation, land cover mapping, and flood mapping. We quantify performance improvements on these remote sensing segmentation tasks when additional imagery outside of the original supervised dataset is made available for training. We also design experiments to test the effectiveness of these techniques when the test set is domain shifted to sample different geographic areas compared to the training and validation sets. We find that such techniques significantly improve generalization performance when labeled data is limited and there are geographic domain shifts between the training data and the validation/test data. △ Less

Submitted 19 June, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

arXiv:2101.06300 [pdf, other]

CARE: Lightweight Attack Resilient Secure Boot Architecturewith Onboard Recovery for RISC-V based SOC

Authors: Avani Dave, Nilanjan Banerjee, Chintan Patel

Abstract: Recent technological advancements have proliferated the use of small embedded devices for collecting, processing, and transferring the security-critical information. The Internet of Things (IoT) has enabled remote access and control of these network-connected devices. Consequently, an attacker can exploit security vulnerabilities and compromise these devices. In this context, the secure boot becom… ▽ More Recent technological advancements have proliferated the use of small embedded devices for collecting, processing, and transferring the security-critical information. The Internet of Things (IoT) has enabled remote access and control of these network-connected devices. Consequently, an attacker can exploit security vulnerabilities and compromise these devices. In this context, the secure boot becomes a useful security mechanism to verify the integrity and authenticity of the software state of the devices. However, the current secure boot schemes focus on detecting the presence of potential malware on the device but not on disinfecting and restoring the soft-ware to a benign state. This manuscript presents CARE- the first secure boot framework that provides detection, resilience, and onboard recovery mechanism for the com-promised devices. The framework uses a prototype hybrid CARE: Code Authentication and Resilience Engine to verify the software state and restore it to a benign state. It uses Physical Memory Protection (PMP) and other security enchaining techniques of RISC-V processor to pro-vide resilience from modern attacks. The state-of-the-art comparison and performance analysis results indicate that the proposed secure boot framework provides a promising resilience and recovery mechanism with very little 8 % performance and resource overhead △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:2101.06148 [pdf, other]

SRACARE: Secure Remote Attestation with Code Authentication and Resilience Engine

Authors: Avani Dave, Nilanjan Banerjee, Chintan Patel

Abstract: Recent technological advancements have enabled proliferated use of small embedded and IoT devices for collecting, processing, and transferring the security-critical information and user data. This exponential use has acted as a catalyst in the recent growth of sophisticated attacks such as the replay, man-in-the-middle, and malicious code modification to slink, leak, tweak or exploit the security-… ▽ More Recent technological advancements have enabled proliferated use of small embedded and IoT devices for collecting, processing, and transferring the security-critical information and user data. This exponential use has acted as a catalyst in the recent growth of sophisticated attacks such as the replay, man-in-the-middle, and malicious code modification to slink, leak, tweak or exploit the security-critical information in malevolent activities. Therefore, secure communication and software state assurance (at run-time and boot-time) of the device has emerged as open security problems. Furthermore, these devices need to have an appropriate recovery mechanism to bring them back to the known-good operational state. Previous researchers have demonstrated independent methods for attack detection and safeguard. However, the majority of them lack in providing onboard system recovery and secure communication techniques. To bridge this gap, this manuscript proposes SRACARE- a framework that utilizes the custom lightweight, secure communication protocol that performs remote/local attestation, and secure boot with an onboard resilience recovery mechanism to protect the devices from the above-mentioned attacks. The prototype employs an efficient lightweight, low-power 32-bit RISC-V processor, secure communication protocol, code authentication, and resilience engine running on the Artix 7 Field Programmable Gate Array(FPGA) board. This work presents the performance evaluation and state-of-the-art comparison results, which shows promising resilience to attacks and demonstrate the novel protection mechanism with onboard recovery. The framework achieves these with only 8 % performance overhead and a very small increase in hardware-software footprint. △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:2003.04583 [pdf, other]

TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style

Authors: Chaitanya Patel, Zhouyingcheng Liao, Gerard Pons-Moll

Abstract: In this paper, we present TailorNet, a neural model which predicts clothing deformation in 3D as a function of three factors: pose, shape and style (garment geometry), while retaining wrinkle detail. This goes beyond prior models, which are either specific to one style and shape, or generalize to different shapes producing smooth results, despite being style specific. Our hypothesis is that (even… ▽ More In this paper, we present TailorNet, a neural model which predicts clothing deformation in 3D as a function of three factors: pose, shape and style (garment geometry), while retaining wrinkle detail. This goes beyond prior models, which are either specific to one style and shape, or generalize to different shapes producing smooth results, despite being style specific. Our hypothesis is that (even non-linear) combinations of examples smooth out high frequency components such as fine-wrinkles, which makes learning the three factors jointly hard. At the heart of our technique is a decomposition of deformation into a high frequency and a low frequency component. While the low-frequency component is predicted from pose, shape and style parameters with an MLP, the high-frequency component is predicted with a mixture of shape-style specific pose models. The weights of the mixture are computed with a narrow bandwidth kernel to guarantee that only predictions with similar high-frequency patterns are combined. The style variation is obtained by computing, in a canonical pose, a subspace of deformation, which satisfies physical constraints such as inter-penetration, and draping on the body. TailorNet delivers 3D garments which retain the wrinkles from the physics based simulations (PBS) it is learned from, while running more than 1000 times faster. In contrast to PBS, TailorNet is easy to use and fully differentiable, which is crucial for computer vision algorithms. Several experiments demonstrate TailorNet produces more realistic results than prior work, and even generates temporally coherent deformations on sequences of the AMASS dataset, despite being trained on static poses from a different dataset. To stimulate further research in this direction, we will make a dataset consisting of 55800 frames, as well as our model publicly available at https://virtualhumans.mpi-inf.mpg.de/tailornet. △ Less

Submitted 15 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: Accepted to CVPR 2020. Chaitanya Patel and Zhouyingcheng Liao contributed equally

arXiv:1911.12491 [pdf, other]

QKD: Quantization-aware Knowledge Distillation

Authors: Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak

Abstract: Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation p… ▽ More Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge Distillation (QKD) wherein quantization and KD are care-fully coordinated in three phases. First, Self-studying (SS) phase fine-tunes a quantized low-precision student network without KD to obtain a good initialization. Second, Co-studying (CS) phase tries to train a teacher to make it more quantizaion-friendly and powerful than a fixed teacher. Finally, Tutoring (TU) phase transfers knowledge from the trained teacher to the student. We extensively evaluate our method on ImageNet and CIFAR-10/100 datasets and show an ablation study on networks with both standard and depthwise-separable convolutions. The proposed QKD outperformed existing state-of-the-art methods (e.g., 1.3% improvement on ResNet-18 with W4A4, 2.6% on MobileNetV2 with W4A4). Additionally, QKD could recover the full-precision accuracy at as low as W3A3 quantization on ResNet and W6A6 quantization on MobilenetV2. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1908.06544 [pdf, other]

HumanMeshNet: Polygonal Mesh Recovery of Humans

Authors: Abbhinav Venkat, Chaitanya Patel, Yudhik Agrawal, Avinash Sharma

Abstract: 3D Human Body Reconstruction from a monocular image is an important problem in computer vision with applications in virtual and augmented reality platforms, animation industry, en-commerce domain, etc. While several of the existing works formulate it as a volumetric or parametric learning with complex and indirect reliance on re-projections of the mesh, we would like to focus on implicitly learnin… ▽ More 3D Human Body Reconstruction from a monocular image is an important problem in computer vision with applications in virtual and augmented reality platforms, animation industry, en-commerce domain, etc. While several of the existing works formulate it as a volumetric or parametric learning with complex and indirect reliance on re-projections of the mesh, we would like to focus on implicitly learning the mesh representation. To that end, we propose a novel model, HumanMeshNet, that regresses a template mesh's vertices, as well as receives a regularization by the 3D skeletal locations in a multi-branch, multi-task setup. The image to mesh vertex regression is further regularized by the neighborhood constraint imposed by mesh topology ensuring smooth surface reconstruction. The proposed paradigm can theoretically learn local surface deformations induced by body shape variations and can therefore learn high-resolution meshes going ahead. We show comparable performance with SoA (in terms of surface and joint error) with far lesser computational complexity, modeling cost and therefore real-time reconstructions on three publicly available datasets. We also show the generalizability of the proposed paradigm for a similar task of predicting hand mesh models. Given these initial results, we would like to exploit the mesh topology in an explicit manner going ahead. △ Less

Submitted 18 August, 2019; originally announced August 2019.

Comments: to appear in ICCV-W, 2019. Project: https://github.com/yudhik11/HumanMeshNet

arXiv:1603.02393 [pdf, other]

doi 10.1109/TCAD.2017.2717782

Microprocessor Optimizations for the Internet of Things: A Survey

Authors: Tosiron Adegbija, Anita Rogacs, Chandrakant Patel, Ann Gordon-Ross

Abstract: The Internet of Things (IoT) refers to a pervasive presence of interconnected and uniquely identifiable physical devices. These devices' goal is to gather data and drive actions in order to improve productivity, and ultimately reduce or eliminate reliance on human intervention for data acquisition, interpretation, and use. The proliferation of these connected low-power devices will result in a dat… ▽ More The Internet of Things (IoT) refers to a pervasive presence of interconnected and uniquely identifiable physical devices. These devices' goal is to gather data and drive actions in order to improve productivity, and ultimately reduce or eliminate reliance on human intervention for data acquisition, interpretation, and use. The proliferation of these connected low-power devices will result in a data explosion that will significantly increase data transmission costs with respect to energy consumption and latency. Edge computing reduces these costs by performing computations at the edge nodes, prior to data transmission, to interpret and/or utilize the data. While much research has focused on the IoT's connected nature and communication challenges, the challenges of IoT embedded computing with respect to device microprocessors has received much less attention. This paper explores IoT applications' execution characteristics from a microarchitectural perspective and the microarchitectural characteristics that will enable efficient and effective edge computing. To tractably represent a wide variety of next-generation IoT applications, we present a broad IoT application classification methodology based on application functions, to enable quicker workload characterizations for IoT microprocessors. We then survey and discuss potential microarchitectural optimizations and computing paradigms that will enable the design of right-provisioned microprocessors that are efficient, configurable, extensible, and scalable. This paper provides a foundation for the analysis and design of a diverse set of microprocessor architectures for next-generation IoT devices. △ Less

Submitted 20 February, 2018; v1 submitted 8 March, 2016; originally announced March 2016.

Comments: Published at IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD); Special Issue on Circuit and System Design for Internet of Things

Showing 1–23 of 23 results for author: Patel, C