Search | arXiv e-print repository

An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection

Authors: Amit Prasad, Bappaditya Dey, Victor Blanco, Sandip Halder

Abstract: Deep learning-based semiconductor defect inspection has gained traction in recent years, offering a powerful and versatile approach that provides high accuracy, adaptability, and efficiency in detecting and classifying nano-scale defects. However, semiconductor manufacturing processes are continually evolving, leading to the emergence of new types of defects over time. This presents a significant… ▽ More Deep learning-based semiconductor defect inspection has gained traction in recent years, offering a powerful and versatile approach that provides high accuracy, adaptability, and efficiency in detecting and classifying nano-scale defects. However, semiconductor manufacturing processes are continually evolving, leading to the emergence of new types of defects over time. This presents a significant challenge for conventional supervised defect detectors, as they may suffer from catastrophic forgetting when trained on new defect datasets, potentially compromising performance on previously learned tasks. An alternative approach involves the constant storage of previously trained datasets alongside pre-trained model versions, which can be utilized for (re-)training from scratch or fine-tuning whenever encountering a new defect dataset. However, adhering to such a storage template is impractical in terms of size, particularly when considering High-Volume Manufacturing (HVM). Additionally, semiconductor defect datasets, especially those encompassing stochastic defects, are often limited and expensive to obtain, thus lacking sufficient representation of the entire universal set of defectivity. This work introduces a task-agnostic, meta-learning approach aimed at addressing this challenge, which enables the incremental addition of new defect classes and scales to create a more robust and generalized model for semiconductor defect inspection. We have benchmarked our approach using real resist-wafer SEM (Scanning Electron Microscopy) datasets for two process steps, ADI and AEI, demonstrating its superior performance compared to conventional supervised training methods. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted for presentation at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2024 Industry Track

arXiv:2407.10348 [pdf, other]

Addressing Class Imbalance and Data Limitations in Advanced Node Semiconductor Defect Inspection: A Generative Approach for SEM Images

Authors: Bappaditya Dey, Vic De Ridder, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge

Abstract: Precision in identifying nanometer-scale device-killer defects is crucial in both semiconductor research and development as well as in production processes. The effectiveness of existing ML-based approaches in this context is largely limited by the scarcity of data, as the production of real semiconductor wafer data for training these models involves high financial and time costs. Moreover, the ex… ▽ More Precision in identifying nanometer-scale device-killer defects is crucial in both semiconductor research and development as well as in production processes. The effectiveness of existing ML-based approaches in this context is largely limited by the scarcity of data, as the production of real semiconductor wafer data for training these models involves high financial and time costs. Moreover, the existing simulation methods fall short of replicating images with identical noise characteristics, surface roughness and stochastic variations at advanced nodes. We propose a method for generating synthetic semiconductor SEM images using a diffusion model within a limited data regime. In contrast to images generated through conventional simulation methods, SEM images generated through our proposed DL method closely resemble real SEM images, replicating their noise characteristics and surface roughness adaptively. Our main contributions, which are validated on three different real semiconductor datasets, are: i) proposing a patch-based generative framework utilizing DDPM to create SEM images with intended defect classes, addressing challenges related to class-imbalance and data insufficiency, ii) demonstrating generated synthetic images closely resemble real SEM images acquired from the tool, preserving all imaging conditions and metrology characteristics without any metadata supervision, iii) demonstrating a defect detector trained on generated defect dataset, either independently or combined with a limited real dataset, can achieve similar or improved performance on real wafer SEM images during validation/testing compared to exclusive training on a real defect dataset, iv) demonstrating the ability of the proposed approach to transfer defect types, critical dimensions, and imaging conditions from one specified CD/Pitch and metrology specifications to another, thereby highlighting its versatility. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 8 pages, 11 figures, to be presented at 2024 International Symposium ELMAR, and published by IEEE in the conference proceedings

arXiv:2405.09309 [pdf, ps, other]

Identification via Permutation Channels

Authors: Abhishek Sarkar, Bikash Kumar Dey

Abstract: We study message identification over a $q$-ary uniform permutation channel, where the transmitted vector is permuted by a permutation chosen uniformly at random. For discrete memoryless channels (DMCs), the number of identifiable messages grows doubly exponentially. Identification capacity, the maximum second-order exponent, is known to be the same as the Shannon capacity of the DMC. Permutation c… ▽ More We study message identification over a $q$-ary uniform permutation channel, where the transmitted vector is permuted by a permutation chosen uniformly at random. For discrete memoryless channels (DMCs), the number of identifiable messages grows doubly exponentially. Identification capacity, the maximum second-order exponent, is known to be the same as the Shannon capacity of the DMC. Permutation channels support reliable communication of only polynomially many messages. A simple achievability result shows that message sizes growing as $2^{c_nn^{q-1}}$ are identifiable for any $c_n\rightarrow 0$. We prove two converse results. A ``soft'' converse shows that for any $R>0$, there is no sequence of identification codes with message size growing as $2^{Rn^{q-1}}$ with a power-law decay ($n^{-μ}$) of the error probability. We also prove a ``strong" converse showing that for any sequence of identification codes with message size $2^{Rn^{q-1}\log n}$ ($R>0$), the sum of type I and type II error probabilities approaches at least $1$ as $n\rightarrow \infty$. To prove the soft converse, we use a sequence of steps to construct a new identification code with a simpler structure which relates to a set system, and then use a lower bound on the normalized maximum pairwise intersection of a set system. To prove the strong converse, we use results on approximation of distributions. △ Less

Submitted 4 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 9 pages. Extended and generalized version of submission to ITW 2024

MSC Class: 68P30; 94A15

arXiv:2404.05862 [pdf, other]

Towards Improved Semiconductor Defect Inspection for high-NA EUVL based on SEMI-SuperYOLO-NAS

Authors: Ying-Lin Chen, Jacob Deforce, Vic De Ridder, Bappaditya Dey, Victor Blanco, Sandip Halder, Philippe Leray

Abstract: Due to potential pitch reduction, the semiconductor industry is adopting High-NA EUVL technology. However, its low depth of focus presents challenges for High Volume Manufacturing. To address this, suppliers are exploring thinner photoresists and new underlayers/hardmasks. These may suffer from poor SNR, complicating defect detection. Vision-based ML algorithms offer a promising solution for semic… ▽ More Due to potential pitch reduction, the semiconductor industry is adopting High-NA EUVL technology. However, its low depth of focus presents challenges for High Volume Manufacturing. To address this, suppliers are exploring thinner photoresists and new underlayers/hardmasks. These may suffer from poor SNR, complicating defect detection. Vision-based ML algorithms offer a promising solution for semiconductor defect inspection. However, developing a robust ML model across various image resolutions without explicit training remains a challenge for nano-scale defect inspection. This research's goal is to propose a scale-invariant ADCD framework capable to upscale images, addressing this issue. We propose an improvised ADCD framework as SEMI-SuperYOLO-NAS, which builds upon the baseline YOLO-NAS architecture. This framework integrates a SR assisted branch to aid in learning HR features by the defect detection backbone, particularly for detecting nano-scale defect instances from LR images. Additionally, the SR-assisted branch can recursively generate upscaled images from their corresponding downscaled counterparts, enabling defect detection inference across various image resolutions without requiring explicit training. Moreover, we investigate improved data augmentation strategy aimed at generating diverse and realistic training datasets to enhance model performance. We have evaluated our proposed approach using two original FAB datasets obtained from two distinct processes and captured using two different imaging tools. Finally, we demonstrate zero-shot inference for our model on a new, originating from a process condition distinct from the training dataset and possessing different Pitch characteristics. Experimental validation demonstrates that our proposed ADCD framework aids in increasing the throughput of imaging tools for defect inspection by reducing the required image pixel resolutions. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2401.08733 [pdf, other]

In the Eyes of the Bystander: Are the Stances on Different Conflicts Correlated?

Authors: Yiyao Tao, Hengyu Zhang, Babli Dey, Selenge Tulga, Hanjia Lyu, Jiebo Luo

Abstract: Public opinion on international conflicts, such as the concurrent Russia-Ukraine and Israel-Palestine crises, often reflects a society's values, beliefs, and history. These simultaneous conflicts have sparked heated global online discussions, offering a unique opportunity to explore the dynamics of public opinion in multiple international crises. This study investigates how public opinions toward… ▽ More Public opinion on international conflicts, such as the concurrent Russia-Ukraine and Israel-Palestine crises, often reflects a society's values, beliefs, and history. These simultaneous conflicts have sparked heated global online discussions, offering a unique opportunity to explore the dynamics of public opinion in multiple international crises. This study investigates how public opinions toward one conflict might influence or relate to another, a relatively unexplored area in contemporary research. Focusing on Chinese netizens, who represent a significant online population, this study examines their perspectives, which are increasingly influential in global discourse due to China's unique cultural and political landscape. The research finds a range of opinions, including neutral stances towards both conflicts and a statistical correlation between attitudes towards each, indicating interconnected or mutually influenced viewpoints. The study also highlights the significant role of news media, particularly in China, where state policies and global politics shape conflict portrayal, in impacting public opinion. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.09462 [pdf, other]

Applying Machine Learning Models on Metrology Data for Predicting Device Electrical Performance

Authors: Bappaditya Dey, Anh Tuan Ngo, Sara Sacchi, Victor Blanco, Philippe Leray, Sandip Halder

Abstract: Moore Law states that transistor density will double every two years, which is sustained until today due to continuous multi-directional innovations, such as extreme ultraviolet lithography, novel patterning techniques etc., leading the semiconductor industry towards 3nm node and beyond. For any patterning scheme, the most important metric to evaluate the quality of printed patterns is EPE, with o… ▽ More Moore Law states that transistor density will double every two years, which is sustained until today due to continuous multi-directional innovations, such as extreme ultraviolet lithography, novel patterning techniques etc., leading the semiconductor industry towards 3nm node and beyond. For any patterning scheme, the most important metric to evaluate the quality of printed patterns is EPE, with overlay being its largest contribution. Overlay errors can lead to fatal failures of IC devices such as short circuits or broken connections in terms of P2P electrical contacts. Therefore, it is essential to develop effective overlay analysis and control techniques to ensure good functionality of fabricated semiconductor devices. In this work we have used an imec N14 BEOL process flow using LELE patterning technique to print metal layers with minimum pitch of 48nm with 193i lithography. FF structures are decomposed into two mask layers (M1A and M1B) and then the LELE flow is carried out to make the final patterns. Since a single M1 layer is decomposed into two masks, control of overlay between the two masks is critical. The goal of this work is of two-fold as, (a) to quantify the impact of overlay on capacitance and (b) to see if we can predict the final capacitance measurements with selected machine learning models at an early stage. To do so, scatterometry spectra are collected on these electrical test structures at (a)post litho, (b)post TiN hardmask etch, and (c)post Cu plating and CMP. Critical Dimension and overlay measurements for line-space pattern are done with SEM post litho, post etch and post Cu CMP. Various machine learning models are applied to do the capacitance prediction with multiple metrology inputs at different steps of wafer processing. Finally, we demonstrate that by using appropriate machine learning models we are able to do better prediction of electrical results. △ Less

Submitted 20 November, 2023; originally announced December 2023.

arXiv:2312.01921 [pdf, other]

A Machine Learning Approach Towards SKILL Code Autocompletion

Authors: Enrique Dehaerne, Bappaditya Dey, Wannes Meert

Abstract: As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even be… ▽ More As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations. △ Less

Submitted 24 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted for SPIE Advanced Lithography + Patterning, 2024

ACM Class: I.2.2

arXiv:2311.11439 [pdf, other]

Improved Defect Detection and Classification Method for Advanced IC Nodes by Using Slicing Aided Hyper Inference with Refinement Strategy

Authors: Vic De Ridder, Bappaditya Dey, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge

Abstract: In semiconductor manufacturing, lithography has often been the manufacturing step defining the smallest possible pattern dimensions. In recent years, progress has been made towards high-NA (Numerical Aperture) EUVL (Extreme-Ultraviolet-Lithography) paradigm, which promises to advance pattern shrinking (2 nm node and beyond). However, a significant increase in stochastic defects and the complexity… ▽ More In semiconductor manufacturing, lithography has often been the manufacturing step defining the smallest possible pattern dimensions. In recent years, progress has been made towards high-NA (Numerical Aperture) EUVL (Extreme-Ultraviolet-Lithography) paradigm, which promises to advance pattern shrinking (2 nm node and beyond). However, a significant increase in stochastic defects and the complexity of defect detection becomes more pronounced with high-NA. Present defect inspection techniques (both non-machine learning and machine learning based), fail to achieve satisfactory performance at high-NA dimensions. In this work, we investigate the use of the Slicing Aided Hyper Inference (SAHI) framework for improving upon current techniques. Using SAHI, inference is performed on size-increased slices of the SEM images. This leads to the object detector's receptive field being more effective in capturing small defect instances. First, the performance on previously investigated semiconductor datasets is benchmarked across various configurations, and the SAHI approach is demonstrated to substantially enhance the detection of small defects, by approx. 2x. Afterwards, we also demonstrated application of SAHI leads to flawless detection rates on a new test dataset, with scenarios not encountered during training, whereas previous trained models failed. Finally, we formulate an extension of SAHI that does not significantly reduce true-positive predictions while eliminating false-positive predictions. △ Less

Submitted 21 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 12 pages, 9 figures, to be presented at International Conference on Machine Intelligence with Applications (ICMIA), and to be published in conference proceedings by AIP

arXiv:2311.11145 [pdf, other]

doi 10.1109/ELMAR59410.2023.10253916

Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization

Authors: Enrique Dehaerne, Bappaditya Dey, Sandip Halder, Stefan De Gendt

Abstract: As semiconductor patterning dimensions shrink, more advanced Scanning Electron Microscopy (SEM) image-based defect inspection techniques are needed. Recently, many Machine Learning (ML)-based approaches have been proposed for defect localization and have shown impressive results. These methods often rely on feature extraction from a full SEM image and possibly a number of regions of interest. In t… ▽ More As semiconductor patterning dimensions shrink, more advanced Scanning Electron Microscopy (SEM) image-based defect inspection techniques are needed. Recently, many Machine Learning (ML)-based approaches have been proposed for defect localization and have shown impressive results. These methods often rely on feature extraction from a full SEM image and possibly a number of regions of interest. In this study, we propose a deep Reinforcement Learning (RL)-based approach to defect localization which iteratively extracts features from increasingly smaller regions of the input image. We compare the results of 18 agents trained with different feature extractors. We discuss the advantages and disadvantages of different feature extractors as well as the RL-based framework in general for semiconductor defect localization. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: 5 pages, 5 figures, 3 tables

ACM Class: I.4.9

Journal ref: 2023 International Symposium ELMAR, Zadar, Croatia, 2023, pp. 49-53

arXiv:2310.14815 [pdf, other]

Deep learning denoiser assisted roughness measurements extraction from thin resists with low Signal-to-Noise Ratio(SNR) SEM images: analysis with SMILE

Authors: Sara Sacchi, Bappaditya Dey, Iacopo Mochi, Sandip Halder, Philippe Leray

Abstract: The technological advance of High Numerical Aperture Extreme Ultraviolet Lithography (High NA EUVL) has opened the gates to extensive researches on thinner photoresists (below 30nm), necessary for the industrial implementation of High NA EUVL. Consequently, images from Scanning Electron Microscopy (SEM) suffer from reduced imaging contrast and low Signal-to-Noise Ratio (SNR), impacting the measure… ▽ More The technological advance of High Numerical Aperture Extreme Ultraviolet Lithography (High NA EUVL) has opened the gates to extensive researches on thinner photoresists (below 30nm), necessary for the industrial implementation of High NA EUVL. Consequently, images from Scanning Electron Microscopy (SEM) suffer from reduced imaging contrast and low Signal-to-Noise Ratio (SNR), impacting the measurement of unbiased Line Edge Roughness (uLER) and Line Width Roughness (uLWR). Thus, the aim of this work is to enhance the SNR of SEM images by using a Deep Learning denoiser and enable robust roughness extraction of the thin resist. For this study, we acquired SEM images of Line-Space (L/S) patterns with a Chemically Amplified Resist (CAR) with different thicknesses (15nm, 20nm, 25nm, 30nm), underlayers (Spin-On-Glass-SOG, Organic Underlayer-OUL) and frames of averaging (4, 8, 16, 32, and 64 Fr). After denoising, a systematic analysis has been carried out on both noisy and denoised images using an open-source metrology software, SMILE 2.3.2, for investigating mean CD, SNR improvement factor, biased and unbiased LWR/LER Power Spectral Density (PSD). Denoised images with lower number of frames present unaltered Critical Dimensions (CDs), enhanced SNR (especially for low number of integration frames), and accurate measurements of uLER and uLWR, with the same accuracy as for noisy images with a consistent higher number of frames. Therefore, images with a small number of integration frames and with SNR < 2 can be successfully denoised, and advantageously used in improving metrology throughput while maintaining reliable roughness measurements for the thin resist. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2309.11174 [pdf, ps, other]

Byzantine Multiple Access Channels -- Part II: Communication With Adversary Identification

Authors: Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We introduce the problem of determining the identity of a byzantine user (internal adversary) in a communication system. We consider a two-user discrete memoryless multiple access channel where either user may deviate from the prescribed behaviour. Owing to the noisy nature of the channel, it may be overly restrictive to attempt to detect all deviations. In our formulation, we only require detecti… ▽ More We introduce the problem of determining the identity of a byzantine user (internal adversary) in a communication system. We consider a two-user discrete memoryless multiple access channel where either user may deviate from the prescribed behaviour. Owing to the noisy nature of the channel, it may be overly restrictive to attempt to detect all deviations. In our formulation, we only require detecting deviations which impede the decoding of the non-deviating user's message. When neither user deviates, correct decoding is required. When one user deviates, the decoder must either output a pair of messages of which the message of the non-deviating user is correct or identify the deviating user. The users and the receiver do not share any randomness. The results include a characterization of the set of channels where communication is feasible, and an inner and outer bound on the capacity region. We also show that whenever the rate region has non-empty interior, the capacity region is same as the capacity region under randomized encoding, where each user shares independent randomness with the receiver. We also give an outer bound for this randomized coding capacity region. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.03380

arXiv:2308.08376 [pdf, other]

Automated Semiconductor Defect Inspection in Scanning Electron Microscope Images: a Systematic Review

Authors: Thibault Lechien, Enrique Dehaerne, Bappaditya Dey, Victor Blanco, Sandip Halder, Stefan De Gendt, Wannes Meert

Abstract: A growing need exists for efficient and accurate methods for detecting defects in semiconductor materials and devices. These defects can have a detrimental impact on the efficiency of the manufacturing process, because they cause critical failures and wafer-yield limitations. As nodes and patterns get smaller, even high-resolution imaging techniques such as Scanning Electron Microscopy (SEM) produ… ▽ More A growing need exists for efficient and accurate methods for detecting defects in semiconductor materials and devices. These defects can have a detrimental impact on the efficiency of the manufacturing process, because they cause critical failures and wafer-yield limitations. As nodes and patterns get smaller, even high-resolution imaging techniques such as Scanning Electron Microscopy (SEM) produce noisy images due to operating close to sensitivity levels and due to varying physical properties of different underlayers or resist materials. This inherent noise is one of the main challenges for defect inspection. One promising approach is the use of machine learning algorithms, which can be trained to accurately classify and locate defects in semiconductor samples. Recently, convolutional neural networks have proved to be particularly useful in this regard. This systematic review provides a comprehensive overview of the state of automated semiconductor defect inspection on SEM images, including the most recent innovations and developments. 38 publications were selected on this topic, indexed in IEEE Xplore and SPIE databases. For each of these, the application, methodology, dataset, results, limitations and future work were summarized. A comprehensive overview and analysis of their methods is provided. Finally, promising avenues for future work in the field of SEM-based defect inspection are suggested. △ Less

Submitted 18 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 16 pages, 12 figures, 3 tables

arXiv:2308.07180 [pdf, other]

SEMI-CenterNet: A Machine Learning Facilitated Approach for Semiconductor Defect Inspection

Authors: Vic De Ridder, Bappaditya Dey, Enrique Dehaerne, Sandip Halder, Stefan De Gendt, Bartel Van Waeyenberge

Abstract: Continual shrinking of pattern dimensions in the semiconductor domain is making it increasingly difficult to inspect defects due to factors such as the presence of stochastic noise and the dynamic behavior of defect patterns and types. Conventional rule-based methods and non-parametric supervised machine learning algorithms like KNN mostly fail at the requirements of semiconductor defect inspectio… ▽ More Continual shrinking of pattern dimensions in the semiconductor domain is making it increasingly difficult to inspect defects due to factors such as the presence of stochastic noise and the dynamic behavior of defect patterns and types. Conventional rule-based methods and non-parametric supervised machine learning algorithms like KNN mostly fail at the requirements of semiconductor defect inspection at these advanced nodes. Deep Learning (DL)-based methods have gained popularity in the semiconductor defect inspection domain because they have been proven robust towards these challenging scenarios. In this research work, we have presented an automated DL-based approach for efficient localization and classification of defects in SEM images. We have proposed SEMI-CenterNet (SEMI-CN), a customized CN architecture trained on SEM images of semiconductor wafer defects. The use of the proposed CN approach allows improved computational efficiency compared to previously studied DL models. SEMI-CN gets trained to output the center, class, size, and offset of a defect instance. This is different from the approach of most object detection models that use anchors for bounding box prediction. Previous methods predict redundant bounding boxes, most of which are discarded in postprocessing. CN mitigates this by only predicting boxes for likely defect center points. We train SEMI-CN on two datasets and benchmark two ResNet backbones for the framework. Initially, ResNet models pretrained on the COCO dataset undergo training using two datasets separately. Primarily, SEMI-CN shows significant improvement in inference time against previous research works. Finally, transfer learning (using weights of custom SEM dataset) is applied from ADI dataset to AEI dataset and vice-versa, which reduces the required training time for both backbones to reach the best mAP against conventional training method. △ Less

Submitted 15 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.15516 [pdf, other]

doi 10.1117/12.2675573

YOLOv8 for Defect Inspection of Hexagonal Directed Self-Assembly Patterns: A Data-Centric Approach

Authors: Enrique Dehaerne, Bappaditya Dey, Hossein Esfandiar, Lander Verstraete, Hyo Seon Suh, Sandip Halder, Stefan De Gendt

Abstract: Shrinking pattern dimensions leads to an increased variety of defect types in semiconductor devices. This has spurred innovation in patterning approaches such as Directed self-assembly (DSA) for which no traditional, automatic defect inspection software exists. Machine Learning-based SEM image analysis has become an increasingly popular research topic for defect inspection with supervised ML model… ▽ More Shrinking pattern dimensions leads to an increased variety of defect types in semiconductor devices. This has spurred innovation in patterning approaches such as Directed self-assembly (DSA) for which no traditional, automatic defect inspection software exists. Machine Learning-based SEM image analysis has become an increasingly popular research topic for defect inspection with supervised ML models often showing the best performance. However, little research has been done on obtaining a dataset with high-quality labels for these supervised models. In this work, we propose a method for obtaining coherent and complete labels for a dataset of hexagonal contact hole DSA patterns while requiring minimal quality control effort from a DSA expert. We show that YOLOv8, a state-of-the-art neural network, achieves defect detection precisions of more than 0.9 mAP on our final dataset which best reflects DSA expert defect labeling expectations. We discuss the strengths and limitations of our proposed labeling approach and suggest directions for future work in data-centric ML-based defect inspection. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 8 pages, 10 figures, accepted for the 38th EMLC Conference 2023

ACM Class: I.4.9

Journal ref: Proceedings Volume 12802, 38th European Mask and Lithography Conference (EMLC 2023); 128020S (2023)

arXiv:2307.08693 [pdf, other]

SEMI-DiffusionInst: A Diffusion Model Based Approach for Semiconductor Defect Classification and Segmentation

Authors: Vic De Ridder, Bappaditya Dey, Sandip Halder, Bartel Van Waeyenberge

Abstract: With continuous progression of Moore's Law, integrated circuit (IC) device complexity is also increasing. Scanning Electron Microscope (SEM) image based extensive defect inspection and accurate metrology extraction are two main challenges in advanced node (2 nm and beyond) technology. Deep learning (DL) algorithm based computer vision approaches gained popularity in semiconductor defect inspection… ▽ More With continuous progression of Moore's Law, integrated circuit (IC) device complexity is also increasing. Scanning Electron Microscope (SEM) image based extensive defect inspection and accurate metrology extraction are two main challenges in advanced node (2 nm and beyond) technology. Deep learning (DL) algorithm based computer vision approaches gained popularity in semiconductor defect inspection over last few years. In this research work, a new semiconductor defect inspection framework "SEMI-DiffusionInst" is investigated and compared to previous frameworks. To the best of the authors' knowledge, this work is the first demonstration to accurately detect and precisely segment semiconductor defect patterns by using a diffusion model. Different feature extractor networks as backbones and data sampling strategies are investigated towards achieving a balanced trade-off between precision and computing efficiency. Our proposed approach outperforms previous work on overall mAP and performs comparatively better or as per for almost all defect classes (per class APs). The bounding box and segmentation mAPs achieved by the proposed SEMI-DiffusionInst model are improved by 3.83% and 2.10%, respectively. Among individual defect types, precision on line collapse and thin bridge defects are improved approximately 15\% on detection task for both defect types. It has also been shown that by tuning inference hyperparameters, inference time can be improved significantly without compromising model precision. Finally, certain limitations and future work strategy to overcome them are discussed. △ Less

Submitted 16 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures, To be published by IEEE in the proceedings of the 2023 ELMAR conference

arXiv:2306.13867 [pdf, other]

Physics-Informed Machine Learning for Modeling and Control of Dynamical Systems

Authors: Truong X. Nghiem, Ján Drgoňa, Colin Jones, Zoltan Nagy, Roland Schwan, Biswadip Dey, Ankush Chakrabarty, Stefano Di Cairano, Joel A. Paulson, Andrea Carron, Melanie N. Zeilinger, Wenceslao Shaw Cortez, Draguna L. Vrabie

Abstract: Physics-informed machine learning (PIML) is a set of methods and tools that systematically integrate machine learning (ML) algorithms with physical constraints and abstract mathematical models developed in scientific and engineering domains. As opposed to purely data-driven methods, PIML models can be trained from additional information obtained by enforcing physical laws such as energy and mass c… ▽ More Physics-informed machine learning (PIML) is a set of methods and tools that systematically integrate machine learning (ML) algorithms with physical constraints and abstract mathematical models developed in scientific and engineering domains. As opposed to purely data-driven methods, PIML models can be trained from additional information obtained by enforcing physical laws such as energy and mass conservation. More broadly, PIML models can include abstract properties and conditions such as stability, convexity, or invariance. The basic premise of PIML is that the integration of ML and physics can yield more effective, physically consistent, and data-efficient models. This paper aims to provide a tutorial-like overview of the recent advances in PIML for dynamical system modeling and control. Specifically, the paper covers an overview of the theory, fundamental concepts and methods, tools, and applications on topics of: 1) physics-informed learning for system identification; 2) physics-informed learning for control; 3) analysis and verification of PIML models; and 4) physics-informed digital twins. The paper is concluded with a perspective on open challenges and future research opportunities. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 16 pages, 4 figures, to be published in 2023 American Control Conference (ACC)

arXiv:2306.06034 [pdf, other]

RANS-PINN based Simulation Surrogates for Predicting Turbulent Flows

Authors: Shinjan Ghosh, Amit Chakraborty, Georgia Olympia Brikis, Biswadip Dey

Abstract: Physics-informed neural networks (PINNs) provide a framework to build surrogate models for dynamical systems governed by differential equations. During the learning process, PINNs incorporate a physics-based regularization term within the loss function to enhance generalization performance. Since simulating dynamics controlled by partial differential equations (PDEs) can be computationally expensi… ▽ More Physics-informed neural networks (PINNs) provide a framework to build surrogate models for dynamical systems governed by differential equations. During the learning process, PINNs incorporate a physics-based regularization term within the loss function to enhance generalization performance. Since simulating dynamics controlled by partial differential equations (PDEs) can be computationally expensive, PINNs have gained popularity in learning parametric surrogates for fluid flow problems governed by Navier-Stokes equations. In this work, we introduce RANS-PINN, a modified PINN framework, to predict flow fields (i.e., velocity and pressure) in high Reynolds number turbulent flow regimes. To account for the additional complexity introduced by turbulence, RANS-PINN employs a 2-equation eddy viscosity model based on a Reynolds-averaged Navier-Stokes (RANS) formulation. Furthermore, we adopt a novel training approach that ensures effective initialization and balance among the various components of the loss function. The effectiveness of the RANS-PINN framework is then demonstrated using a parametric PINN. △ Less

Submitted 11 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Journal ref: Published at the 1st workshop on Synergy of Scientific and Machine Learning Modeling, ICML 2023

arXiv:2305.00092 [pdf, other]

Improving Gradient Computation for Differentiable Physics Simulation with Contacts

Authors: Yaofeng Desmond Zhong, Jiequn Han, Biswadip Dey, Georgia Olympia Brikis

Abstract: Differentiable simulation enables gradients to be back-propagated through physics simulations. In this way, one can learn the dynamics and properties of a physics system by gradient-based optimization or embed the whole differentiable simulation as a layer in a deep learning model for downstream tasks, such as planning and control. However, differentiable simulation at its current stage is not per… ▽ More Differentiable simulation enables gradients to be back-propagated through physics simulations. In this way, one can learn the dynamics and properties of a physics system by gradient-based optimization or embed the whole differentiable simulation as a layer in a deep learning model for downstream tasks, such as planning and control. However, differentiable simulation at its current stage is not perfect and might provide wrong gradients that deteriorate its performance in learning tasks. In this paper, we study differentiable rigid-body simulation with contacts. We find that existing differentiable simulation methods provide inaccurate gradients when the contact normal direction is not fixed - a general situation when the contacts are between two moving objects. We propose to improve gradient computation by continuous collision detection and leverage the time-of-impact (TOI) to calculate the post-collision velocities. We demonstrate our proposed method, referred to as TOI-Velocity, on two optimal control problems. We show that with TOI-Velocity, we are able to learn an optimal control sequence that matches the analytical solution, while without TOI-Velocity, existing differentiable simulation methods fail to do so. △ Less

Submitted 28 April, 2023; originally announced May 2023.

Comments: 5th Annual Conference on Learning for Dynamics and Control

Journal ref: Proceedings of Machine Learning Research vol 211, 2023

arXiv:2304.14166 [pdf, ps, other]

Hypothesis Testing for Adversarial Channels: Chernoff-Stein Exponents

Authors: Eeshan Modak, Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We study the Chernoff-Stein exponent of the following binary hypothesis testing problem: Associated with each hypothesis is a set of channels. A transmitter, without knowledge of the hypothesis, chooses the vector of inputs to the channel. Given the hypothesis, from the set associated with the hypothesis, an adversary chooses channels, one for each element of the input vector. Based on the channel… ▽ More We study the Chernoff-Stein exponent of the following binary hypothesis testing problem: Associated with each hypothesis is a set of channels. A transmitter, without knowledge of the hypothesis, chooses the vector of inputs to the channel. Given the hypothesis, from the set associated with the hypothesis, an adversary chooses channels, one for each element of the input vector. Based on the channel outputs, a detector attempts to distinguish between the hypotheses. We study the Chernoff-Stein exponent for the cases where the transmitter (i) is deterministic, (ii) may privately randomize, and (iii) shares randomness with the detector that is unavailable to the adversary. It turns out that while a memoryless transmission strategy is optimal under shared randomness, it may be strictly suboptimal when the transmitter only has private randomness. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: This is a slightly edited version of the extended manuscript submitted to ISIT 2023 for review on February 5, 2023; the paper has been accepted for presentation

arXiv:2304.13840 [pdf, other]

A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation

Authors: Enrique Dehaerne, Bappaditya Dey, Sandip Halder, Stefan De Gendt

Abstract: Innovative Electronic Design Automation (EDA) solutions are important to meet the design requirements for increasingly complex electronic devices. Verilog, a hardware description language, is widely used for the design and verification of digital circuits and is synthesized using specific EDA tools. However, writing code is a repetitive and time-intensive task. This paper proposes, primarily, a no… ▽ More Innovative Electronic Design Automation (EDA) solutions are important to meet the design requirements for increasingly complex electronic devices. Verilog, a hardware description language, is widely used for the design and verification of digital circuits and is synthesized using specific EDA tools. However, writing code is a repetitive and time-intensive task. This paper proposes, primarily, a novel deep learning framework for training a Verilog autocompletion model and, secondarily, a Verilog dataset of files and snippets obtained from open-source repositories. The framework involves integrating models pretrained on general programming language data and finetuning them on a dataset curated to be similar to a target downstream task. This is validated by comparing different pretrained models trained on different subsets of the proposed Verilog dataset using multiple evaluation metrics. These experiments demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from scratch. Code and data are made available at: https://github.com/99EnriqueD/verilog_autocompletion . △ Less

Submitted 7 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Updated text to correct language errors and added a link to supplementary code and data (https://github.com/99EnriqueD/verilog_autocompletion). 6 pages, 3 figures, 4 tables. To be presented as a WIP poster at DAC 2023

ACM Class: I.2.2

arXiv:2302.09569 [pdf]

doi 10.1117/12.2657555

SEMI-PointRend: Improved Semiconductor Wafer Defect Classification and Segmentation as Rendering

Authors: MinJin Hwang, Bappaditya Dey, Enrique Dehaerne, Sandip Halder, Young-han Shin

Abstract: In this study, we applied the PointRend (Point-based Rendering) method to semiconductor defect segmentation. PointRend is an iterative segmentation algorithm inspired by image rendering in computer graphics, a new image segmentation method that can generate high-resolution segmentation masks. It can also be flexibly integrated into common instance segmentation meta-architecture such as Mask-RCNN a… ▽ More In this study, we applied the PointRend (Point-based Rendering) method to semiconductor defect segmentation. PointRend is an iterative segmentation algorithm inspired by image rendering in computer graphics, a new image segmentation method that can generate high-resolution segmentation masks. It can also be flexibly integrated into common instance segmentation meta-architecture such as Mask-RCNN and semantic meta-architecture such as FCN. We implemented a model, termed as SEMI-PointRend, to generate precise segmentation masks by applying the PointRend neural network module. In this paper, we focus on comparing the defect segmentation predictions of SEMI-PointRend and Mask-RCNN for various defect types (line-collapse, single bridge, thin bridge, multi bridge non-horizontal). We show that SEMI-PointRend can outperforms Mask R-CNN by up to 18.8% in terms of segmentation mean average precision. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 7 pages, 6 figures, 5 tables. To be published by SPIE in the proceedings of Metrology, Inspection, and Process Control XXXVII

ACM Class: I.4.9

Journal ref: Proc. SPIE 12496, Metrology, Inspection, and Process Control XXXVII, 1249608 (27 April 2023)

arXiv:2302.09565 [pdf, other]

doi 10.1117/12.2657564

Optimizing YOLOv7 for Semiconductor Defect Detection

Authors: Enrique Dehaerne, Bappaditya Dey, Sandip Halder, Stefan De Gendt

Abstract: The field of object detection using Deep Learning (DL) is constantly evolving with many new techniques and models being proposed. YOLOv7 is a state-of-the-art object detector based on the YOLO family of models which have become popular for industrial applications. One such possible application domain can be semiconductor defect inspection. The performance of any machine learning model depends on i… ▽ More The field of object detection using Deep Learning (DL) is constantly evolving with many new techniques and models being proposed. YOLOv7 is a state-of-the-art object detector based on the YOLO family of models which have become popular for industrial applications. One such possible application domain can be semiconductor defect inspection. The performance of any machine learning model depends on its hyperparameters. Furthermore, combining predictions of one or more models in different ways can also affect performance. In this research, we experiment with YOLOv7, a recently proposed, state-of-the-art object detector, by training and evaluating models with different hyperparameters to investigate which ones improve performance in terms of detection precision for semiconductor line space pattern defects. The base YOLOv7 model with default hyperparameters and Non Maximum Suppression (NMS) prediction combining outperforms all RetinaNet models from previous work in terms of mean Average Precision (mAP). We find that vertically flipping images randomly during training yields a 3% improvement in the mean AP of all defect classes. Other hyperparameter values improved AP only for certain classes compared to the default model. Combining models that achieve the best AP for different defect classes was found to be an effective ensembling strategy. Combining predictions from ensembles using Weighted Box Fusion (WBF) prediction gave the best performance. The best ensemble with WBF improved on the mAP of the default model by 10%. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 8 pages, 4 figures, 5 tables. To be published by SPIE in the proceedings of Metrology, Inspection, and Process Control XXXVII

ACM Class: I.4.9

Journal ref: Proc. SPIE 12496, Metrology, Inspection, and Process Control XXXVII, 124962D (27 April 2023)

arXiv:2212.06011 [pdf, other]

A Neural ODE Interpretation of Transformer Layers

Authors: Yaofeng Desmond Zhong, Tongtao Zhang, Amit Chakraborty, Biswadip Dey

Abstract: Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon… ▽ More Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Journal ref: Published at the DLDE Workshop in NeurIPS 2022

arXiv:2211.14748 [pdf, ps, other]

Safe Human Robot-Interaction using Switched Model Reference Admittance Control

Authors: Chayan Kumar Paul, Bhabani Shankar Dey, Udayan Banerjee, Indra Narayan Kar

Abstract: Physical Human-Robot Interaction (pHRI) task involves tight coupling between safety constraints and compliance with human intentions. In this paper, a novel switched model reference admittance controller is developed to maintain compliance with the external force while upholding safety constraints in the workspace for an n-link manipulator involved in pHRI. A switched reference model is designed f… ▽ More Physical Human-Robot Interaction (pHRI) task involves tight coupling between safety constraints and compliance with human intentions. In this paper, a novel switched model reference admittance controller is developed to maintain compliance with the external force while upholding safety constraints in the workspace for an n-link manipulator involved in pHRI. A switched reference model is designed for the admittance controller to generate the reference trajectory within the safe workspace. The stability analysis of the switched reference model is carried out by an appropriate selection of the Common Quadratic Lyapunov Function (CQLF) so that asymptotic convergence of the trajectory tracking error is ensured. The efficacy of the proposed controller is validated in simulation on a two-link robot manipulator. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.12769 [pdf, ps, other]

Byzantine Multiple Access Channels -- Part I: Reliable Communication

Authors: Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We study communication over a Multiple Access Channel (MAC) where users can possibly be adversarial. The receiver is unaware of the identity of the adversarial users (if any). When all users are non-adversarial, we want their messages to be decoded reliably. When a user behaves adversarially, we require that the honest users' messages be decoded reliably. An adversarial user can mount an attack by… ▽ More We study communication over a Multiple Access Channel (MAC) where users can possibly be adversarial. The receiver is unaware of the identity of the adversarial users (if any). When all users are non-adversarial, we want their messages to be decoded reliably. When a user behaves adversarially, we require that the honest users' messages be decoded reliably. An adversarial user can mount an attack by sending any input into the channel rather than following the protocol. It turns out that the $2$-user MAC capacity region follows from the point-to-point Arbitrarily Varying Channel (AVC) capacity. For the $3$-user MAC in which at most one user may be malicious, we characterize the capacity region for deterministic codes and randomized codes (where each user shares an independent random secret key with the receiver). These results are then generalized for the $k$-user MAC where the adversary may control all users in one out of a collection of given subsets. △ Less

Submitted 11 September, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: This supercedes Part I of arxiv:1904.11925

arXiv:2211.02185 [pdf, other]

Deep Learning based Defect classification and detection in SEM images: A Mask R-CNN approach

Authors: Bappaditya Dey, Enrique Dehaerne, Kasem Khalil, Sandip Halder, Philippe Leray, Magdy A. Bayoumi

Abstract: In this research work, we have demonstrated the application of Mask-RCNN (Regional Convolutional Neural Network), a deep-learning algorithm for computer vision and specifically object detection, to semiconductor defect inspection domain. Stochastic defect detection and classification during semiconductor manufacturing has grown to be a challenging task as we continuously shrink circuit pattern dim… ▽ More In this research work, we have demonstrated the application of Mask-RCNN (Regional Convolutional Neural Network), a deep-learning algorithm for computer vision and specifically object detection, to semiconductor defect inspection domain. Stochastic defect detection and classification during semiconductor manufacturing has grown to be a challenging task as we continuously shrink circuit pattern dimensions (e.g., for pitches less than 32 nm). Defect inspection and analysis by state-of-the-art optical and e-beam inspection tools is generally driven by some rule-based techniques, which in turn often causes to misclassification and thereby necessitating human expert intervention. In this work, we have revisited and extended our previous deep learning-based defect classification and detection method towards improved defect instance segmentation in SEM images with precise extent of defect as well as generating a mask for each defect category/instance. This also enables to extract and calibrate each segmented mask and quantify the pixels that make up each mask, which in turn enables us to count each categorical defect instances as well as to calculate the surface area in terms of pixels. We are aiming at detecting and segmenting different types of inter-class stochastic defect patterns such as bridge, break, and line collapse as well as to differentiate accurately between intra-class multi-categorical defect bridge scenarios (as thin/single/multi-line/horizontal/non-horizontal) for aggressive pitches as well as thin resists (High NA applications). Our proposed approach demonstrates its effectiveness both quantitatively and qualitatively. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2206.13505

arXiv:2208.08873 [pdf, ps, other]

Robust Artificial Delay based Impedance Control of Robotic Manipulators with Uncertain Dynamics

Authors: Udayan Banerjee, Bhabani Shankar Dey, Indra Narayan Kar, Subir Kumar Saha

Abstract: In this paper an artificial delay based impedance controller is proposed for robotic manipulators with uncertainty in dynamics. The control law unites the time delayed estimation (TDE) framework with a second order switching controller of super twisting algorithm (STA) type via a novel generalized filtered tracking error (GFTE). While time delayed estimation framework eliminates the need for accur… ▽ More In this paper an artificial delay based impedance controller is proposed for robotic manipulators with uncertainty in dynamics. The control law unites the time delayed estimation (TDE) framework with a second order switching controller of super twisting algorithm (STA) type via a novel generalized filtered tracking error (GFTE). While time delayed estimation framework eliminates the need for accurate modelling of robot dynamics by estimating the uncertain robot dynamics and interaction forces from immediate past data of state and control effort, the second order switching control law in the outer loop provides robustness against the time delayed estimation (TDE) error that arises due to approximation of the manipulator dynamics. Thus, the proposed control law tries to establish a desired impedance model between the robot end effector variables i.e. force and motion in presence of uncertainties, both when it is encountering smooth contact forces and during free motion. Simulation results for a two link manipulator using the proposed controller along with convergence analysis are shown to validate the proposition. △ Less

Submitted 20 August, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

arXiv:2208.03284 [pdf, ps, other]

doi 10.2172/1886020

Interpretable Uncertainty Quantification in AI for HEP

Authors: Thomas Y. Chen, Biprateep Dey, Aishik Ghosh, Michael Kagan, Brian Nord, Nesar Ramachandra

Abstract: Estimating uncertainty is at the core of performing scientific measurements in HEP: a measurement is not useful without an estimate of its uncertainty. The goal of uncertainty quantification (UQ) is inextricably linked to the question, "how do we physically and statistically interpret these uncertainties?" The answer to this question depends not only on the computational task we aim to undertake,… ▽ More Estimating uncertainty is at the core of performing scientific measurements in HEP: a measurement is not useful without an estimate of its uncertainty. The goal of uncertainty quantification (UQ) is inextricably linked to the question, "how do we physically and statistically interpret these uncertainties?" The answer to this question depends not only on the computational task we aim to undertake, but also on the methods we use for that task. For artificial intelligence (AI) applications in HEP, there are several areas where interpretable methods for UQ are essential, including inference, simulation, and control/decision-making. There exist some methods for each of these areas, but they have not yet been demonstrated to be as trustworthy as more traditional approaches currently employed in physics (e.g., non-AI frequentist and Bayesian methods). Shedding light on the questions above requires additional understanding of the interplay of AI systems and uncertainty quantification. We briefly discuss the existing methods in each area and relate them to tasks across HEP. We then discuss recommendations for avenues to pursue to develop the necessary techniques for reliable widespread usage of AI with UQ over the next decade. △ Less

Submitted 6 September, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Submitted to the Proceedings of the US Community Study on the Future of Particle Physics (Snowmass 2021)

Report number: FERMILAB-FN-1179-SCD; arXiv:2208.03284 oai:inspirehep.net:2132723

arXiv:2206.13441 [pdf, other]

doi 10.1016/j.trc.2022.103955

EMVLight: a Multi-agent Reinforcement Learning Framework for an Emergency Vehicle Decentralized Routing and Traffic Signal Control System

Authors: Haoran Su, Yaofeng D. Zhong, Joseph Y. J. Chow, Biswadip Dey, Li Jin

Abstract: Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic s… ▽ More Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic signal control. In this paper, we propose EMVLight, a decentralized reinforcement learning (RL) framework for joint dynamic EMV routing and traffic signal pre-emption. We adopt the multi-agent advantage actor-critic method with policy sharing and spatial discounted factor. This framework addresses the coupling between EMV navigation and traffic signal control via an innovative design of multi-class RL agents and a novel pressure-based reward function. The proposed methodology enables EMVLight to learn network-level cooperative traffic signal phasing strategies that not only reduce EMV travel time but also shortens the travel time of non-EMVs. Simulation-based experiments indicate that EMVLight enables up to a $42.6\%$ reduction in EMV travel time as well as an $23.5\%$ shorter average travel time compared with existing approaches. △ Less

Submitted 29 June, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 19 figures, 10 tables. Manuscript extended on previous work arXiv:2109.05429, arXiv:2111.00278

Journal ref: Transportation Research Part C: Emerging Technologies Volume 146, January 2023, 103955

arXiv:2205.14568 [pdf, other]

Conditionally Calibrated Predictive Distributions by Probability-Probability Map: Application to Galaxy Redshift Estimation and Probabilistic Forecasting

Authors: Biprateep Dey, David Zhao, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee

Abstract: Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional c… ▽ More Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional calibration with the probability of occurrence of an event given input $\mathbf{x}$ being significantly different from the predicted probability. Current calibration methods do not fully assess and enforce conditionally calibrated PDs. Here we propose \texttt{Cal-PIT}, a method that addresses both PD diagnostics and recalibration by learning a single probability-probability map from calibration data. The key idea is to regress probability integral transform scores against $\mathbf{x}$. The estimated regression provides interpretable diagnostics of conditional coverage across the feature space. The same regression function morphs the misspecified PD to a re-calibrated PD for all $\mathbf{x}$. We benchmark our corrected prediction bands (a by-product of corrected PDs) against oracle bands and state-of-the-art predictive inference algorithms for synthetic data. We also provide results for two applications: (i) probabilistic nowcasting given sequences of satellite images, and (ii) conditional density estimation of galaxy distances given imaging data (so-called photometric redshift estimation). Our code is available as a Python package https://github.com/lee-group-cmu/Cal-PIT . △ Less

Submitted 17 July, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: 21 pages, 11 figures. Under review. Code available as a Python package https://github.com/lee-group-cmu/Cal-PIT

arXiv:2205.08355 [pdf, other]

Demystifying the Data Need of ML-surrogates for CFD Simulations

Authors: Tongtao Zhang, Biswadip Dey, Krishna Veeraraghavan, Harshad Kulkarni, Amit Chakraborty

Abstract: Computational fluid dynamics (CFD) simulations, a critical tool in various engineering applications, often require significant time and compute power to predict flow properties. The high computational cost associated with CFD simulations significantly restricts the scope of design space exploration and limits their use in planning and operational control. To address this issue, machine learning (M… ▽ More Computational fluid dynamics (CFD) simulations, a critical tool in various engineering applications, often require significant time and compute power to predict flow properties. The high computational cost associated with CFD simulations significantly restricts the scope of design space exploration and limits their use in planning and operational control. To address this issue, machine learning (ML) based surrogate models have been proposed as a computationally efficient tool to accelerate CFD simulations. However, a lack of clarity about CFD data requirements often challenges the widespread adoption of ML-based surrogates among design engineers and CFD practitioners. In this work, we propose an ML-based surrogate model to predict the temperature distribution inside the cabin of a passenger vehicle under various operating conditions and use it to demonstrate the trade-off between prediction performance and training dataset size. Our results show that the prediction accuracy is high and stable even when the training size is gradually reduced from 2000 to 200. The ML-based surrogates also reduce the compute time from ~30 minutes to around ~9 milliseconds. Moreover, even when only 50 CFD simulations are used for training, the temperature trend (e.g., locations of hot/cold regions) predicted by the ML-surrogate matches quite well with the results from CFD simulations. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Published on AI2ASE AAAI2022

MSC Class: I.6

arXiv:2201.08363 [pdf, other]

Physics-informed neural networks for modeling rate- and temperature-dependent plasticity

Authors: Rajat Arora, Pratik Kakkar, Biswadip Dey, Amit Chakraborty

Abstract: This work presents a physics-informed neural network (PINN) based framework to model the strain-rate and temperature dependence of the deformation fields in elastic-viscoplastic solids. To avoid unbalanced back-propagated gradients during training, the proposed framework uses a simple strategy with no added computational complexity for selecting scalar weights that balance the interplay between di… ▽ More This work presents a physics-informed neural network (PINN) based framework to model the strain-rate and temperature dependence of the deformation fields in elastic-viscoplastic solids. To avoid unbalanced back-propagated gradients during training, the proposed framework uses a simple strategy with no added computational complexity for selecting scalar weights that balance the interplay between different terms in the physics-based loss function. In addition, we highlight a fundamental challenge involving the selection of appropriate model outputs so that the mechanical problem can be faithfully solved using a PINN-based approach. We demonstrate the effectiveness of this approach by studying two test problems modeling the elastic-viscoplastic deformation in solids at different strain rates and temperatures, respectively. Our results show that the proposed PINN-based approach can accurately predict the spatio-temporal evolution of deformation in elastic-viscoplastic materials. △ Less

Submitted 22 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 11 pages, 7 figures; Accepted in NeurIPS 2022, Machine Learning and the Physical Sciences workshop

arXiv:2111.00278 [pdf, other]

A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles

Authors: Haoran Su, Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: Emergency vehicles (EMVs) play a critical role in a city's response to time-critical events such as medical emergencies and fire outbreaks. The existing approaches to reduce EMV travel time employ route optimization and traffic signal pre-emption without accounting for the coupling between route these two subproblems. As a result, the planned route often becomes suboptimal. In addition, these appr… ▽ More Emergency vehicles (EMVs) play a critical role in a city's response to time-critical events such as medical emergencies and fire outbreaks. The existing approaches to reduce EMV travel time employ route optimization and traffic signal pre-emption without accounting for the coupling between route these two subproblems. As a result, the planned route often becomes suboptimal. In addition, these approaches also do not focus on minimizing disruption to the overall traffic flow. To address these issues, we introduce EMVLight in this paper. This is a decentralized reinforcement learning (RL) framework for simultaneous dynamic routing and traffic signal control. EMVLight extends Dijkstra's algorithm to efficiently update the optimal route for an EMV in real-time as it travels through the traffic network. Consequently, the decentralized RL agents learn network-level cooperative traffic signal phase strategies that reduce EMV travel time and the average travel time of non-EMVs in the network. We have carried out comprehensive experiments with synthetic and real-world maps to demonstrate this benefit. Our results show that EMVLight outperforms benchmark transportation engineering techniques as well as existing RL-based traffic signal control methods. △ Less

Submitted 20 February, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Artificial Intelligence and Humanitarian Assistance and Disaster Recovery (AI + HADR) workshop, NeurIPS 2021. arXiv admin note: substantial text overlap with arXiv:2109.05429

arXiv:2110.15209 [pdf, other]

Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression

Authors: Biprateep Dey, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee, David Zhao, Markus Michael Rau, Alex I. Malz

Abstract: Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF)… ▽ More Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits $z_{1}$ and $z_{2}$ should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case. △ Less

Submitted 27 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

arXiv:2109.05429 [pdf, other]

EMVLight: A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles

Authors: Haoran Su, Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: Emergency vehicles (EMVs) play a crucial role in responding to time-critical events such as medical emergencies and fire outbreaks in an urban area. The less time EMVs spend traveling through the traffic, the more likely it would help save people's lives and reduce property loss. To reduce the travel time of EMVs, prior work has used route optimization based on historical traffic-flow data and tra… ▽ More Emergency vehicles (EMVs) play a crucial role in responding to time-critical events such as medical emergencies and fire outbreaks in an urban area. The less time EMVs spend traveling through the traffic, the more likely it would help save people's lives and reduce property loss. To reduce the travel time of EMVs, prior work has used route optimization based on historical traffic-flow data and traffic signal pre-emption based on the optimal route. However, traffic signal pre-emption dynamically changes the traffic flow which, in turn, modifies the optimal route of an EMV. In addition, traffic signal pre-emption practices usually lead to significant disturbances in traffic flow and subsequently increase the travel time for non-EMVs. In this paper, we propose EMVLight, a decentralized reinforcement learning (RL) framework for simultaneous dynamic routing and traffic signal control. EMVLight extends Dijkstra's algorithm to efficiently update the optimal route for the EMVs in real time as it travels through the traffic network. The decentralized RL agents learn network-level cooperative traffic signal phase strategies that not only reduce EMV travel time but also reduce the average travel time of non-EMVs in the network. This benefit has been demonstrated through comprehensive experiments with synthetic and real-world maps. These experiments show that EMVLight outperforms benchmark transportation engineering techniques and existing RL-based signal control methods. △ Less

Submitted 28 June, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

Comments: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2105.03420 [pdf, other]

Compound Arbitrarily Varying Channels

Authors: Syomantak Chaudhuri, Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We propose a communication model, that we call compound arbitrarily varying channels (CAVC), which unifies and generalizes compound channels and arbitrarily varying channels (AVC). A CAVC can be viewed as a noisy channel with a fixed, but unknown, compound-state and an AVC-state which may vary with every channel use. The AVC-state is controlled by an adversary who is aware of the compound-state. W… ▽ More We propose a communication model, that we call compound arbitrarily varying channels (CAVC), which unifies and generalizes compound channels and arbitrarily varying channels (AVC). A CAVC can be viewed as a noisy channel with a fixed, but unknown, compound-state and an AVC-state which may vary with every channel use. The AVC-state is controlled by an adversary who is aware of the compound-state. We study three problems in this setting: 'communication', 'communication and compound-state identification', and 'communication or compound-state identification'. For these problems, we study conditions for feasibility and capacity under deterministic coding and random coding. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2105.03380 [pdf, ps, other]

Communication With Adversary Identification in Byzantine Multiple Access Channels

Authors: Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We introduce the problem of determining the identity of a byzantine user (internal adversary) in a communication system. We consider a two-user discrete memoryless multiple access channel where either user may deviate from the prescribed behaviour. Owing to the noisy nature of the channel, it may be overly restrictive to attempt to detect all deviations. In our formulation, we only require detecti… ▽ More We introduce the problem of determining the identity of a byzantine user (internal adversary) in a communication system. We consider a two-user discrete memoryless multiple access channel where either user may deviate from the prescribed behaviour. Owing to the noisy nature of the channel, it may be overly restrictive to attempt to detect all deviations. In our formulation, we only require detecting deviations which impede the decoding of the non-deviating user's message. When neither user deviates, correct decoding is required. When one user deviates, the decoder must either output a pair of messages of which the message of the non-deviating user is correct or identify the deviating user. The users and the receiver do not share any randomness. The results include a characterization of the set of channels where communication is feasible, and an inner and outer bound on the capacity region. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2102.06794 [pdf, other]

Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models

Authors: Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: The incorporation of appropriate inductive bias plays a critical role in learning dynamics from data. A growing body of work has been exploring ways to enforce energy conservation in the learned dynamics by encoding Lagrangian or Hamiltonian dynamics into the neural network architecture. These existing approaches are based on differential equations, which do not allow discontinuity in the states a… ▽ More The incorporation of appropriate inductive bias plays a critical role in learning dynamics from data. A growing body of work has been exploring ways to enforce energy conservation in the learned dynamics by encoding Lagrangian or Hamiltonian dynamics into the neural network architecture. These existing approaches are based on differential equations, which do not allow discontinuity in the states and thereby limit the class of systems one can learn. However, in reality, most physical systems, such as legged robots and robotic manipulators, involve contacts and collisions, which introduce discontinuities in the states. In this paper, we introduce a differentiable contact model, which can capture contact mechanics: frictionless/frictional, as well as elastic/inelastic. This model can also accommodate inequality constraints, such as limits on the joint angles. The proposed contact model extends the scope of Lagrangian and Hamiltonian neural networks by allowing simultaneous learning of contact and system properties. We demonstrate this framework on a series of challenging 2D and 3D physical systems with different coefficients of restitution and friction. The learned dynamics can be used as a differentiable physics simulator for downstream gradient-based optimization tasks, such as planning and control. △ Less

Submitted 12 November, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2101.07127 [pdf, other]

Fundamental Limits of Demand-Private Coded Caching

Authors: Chinmay Gurjarpadhye, Jithin Ravi, Sneha Kamath, Bikash Kumar Dey, Nikhil Karamchandani

Abstract: We consider the coded caching problem with an additional privacy constraint that a user should not get any information about the demands of the other users. We first show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that serves only a subset of the demands for the $N$ files and $NK$ users problem. We further use this fact to construct a demand-… ▽ More We consider the coded caching problem with an additional privacy constraint that a user should not get any information about the demands of the other users. We first show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that serves only a subset of the demands for the $N$ files and $NK$ users problem. We further use this fact to construct a demand-private scheme for $N$ files and $K$ users from a particular known non-private scheme for $N$ files and $NK-K+1$ users. It is then demonstrated that, the memory-rate pair $(M,\min \{N,K\}(1-M/N))$, which is achievable for non-private schemes with uncoded transmissions, is also achievable under demand privacy. We further propose a scheme that improves on these ideas by removing some redundant transmissions. The memory-rate trade-off achieved using our schemes is shown to be within a multiplicative factor of 3 from the optimal when $K < N$ and of 8 when $N\leq K$. Finally, we give the exact memory-rate trade-off for demand-private coded caching problems with $N\geq K=2$. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 43 pages, 6 figures

arXiv:2012.02334 [pdf, other]

Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

Authors: Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: The last few years have witnessed an increased interest in incorporating physics-informed inductive bias in deep learning frameworks. In particular, a growing volume of literature has been exploring ways to enforce energy conservation while using neural networks for learning dynamics from observed time-series data. In this work, we survey ten recently proposed energy-conserving neural network mode… ▽ More The last few years have witnessed an increased interest in incorporating physics-informed inductive bias in deep learning frameworks. In particular, a growing volume of literature has been exploring ways to enforce energy conservation while using neural networks for learning dynamics from observed time-series data. In this work, we survey ten recently proposed energy-conserving neural network models, including HNN, LNN, DeLaN, SymODEN, CHNN, CLNN and their variants. We provide a compact derivation of the theory behind these models and explain their similarities and differences. Their performance are compared in 4 physical systems. We point out the possibility of leveraging some of these energy-conserving models to design energy-based controllers. △ Less

Submitted 28 April, 2023; v1 submitted 3 December, 2020; originally announced December 2020.

arXiv:2011.05927 [pdf, other]

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Authors: Udari Madhushani, Biswadip Dey, Naomi Ehrich Leonard, Amit Chakraborty

Abstract: Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant cha… ▽ More Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications. △ Less

Submitted 28 March, 2022; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2011.01456 [pdf, other]

Frequency-compensated PINNs for Fluid-dynamic Design Problems

Authors: Tongtao Zhang, Biswadip Dey, Pratik Kakkar, Arindam Dasgupta, Amit Chakraborty

Abstract: Incompressible fluid flow around a cylinder is one of the classical problems in fluid-dynamics with strong relevance with many real-world engineering problems, for example, design of offshore structures or design of a pin-fin heat exchanger. Thus learning a high-accuracy surrogate for this problem can demonstrate the efficacy of a novel machine learning approach. In this work, we propose a physics… ▽ More Incompressible fluid flow around a cylinder is one of the classical problems in fluid-dynamics with strong relevance with many real-world engineering problems, for example, design of offshore structures or design of a pin-fin heat exchanger. Thus learning a high-accuracy surrogate for this problem can demonstrate the efficacy of a novel machine learning approach. In this work, we propose a physics-informed neural network (PINN) architecture for learning the relationship between simulation output and the underlying geometry and boundary conditions. In addition to using a physics-based regularization term, the proposed approach also exploits the underlying physics to learn a set of Fourier features, i.e. frequency and phase offset parameters, and then use them for predicting flow velocity and pressure over the spatio-temporal domain. We demonstrate this approach by predicting simulation results over out of range time interval and for novel design conditions. Our results show that incorporation of Fourier features improves the generalization performance over both temporal domain and design space. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: Machine Learning for Engineering Modeling, Simulation, and Design (ML4Eng) Workshop, NeurIPS 2020

arXiv:2006.00257 [pdf, other]

Private Index Coding

Authors: Varun Narayanan, Jithin Ravi, Vivek K. Mishra, Bikash Kumar Dey, Nikhil Karamchandani, Vinod M. Prabhakaran

Abstract: We study the fundamental problem of index coding under an additional privacy constraint that requires each receiver to learn nothing more about the collection of messages beyond its demanded messages from the server and what is available to it as side information. To enable such private communication, we allow the use of a collection of independent secret keys, each of which is shared amongst a su… ▽ More We study the fundamental problem of index coding under an additional privacy constraint that requires each receiver to learn nothing more about the collection of messages beyond its demanded messages from the server and what is available to it as side information. To enable such private communication, we allow the use of a collection of independent secret keys, each of which is shared amongst a subset of users and is known to the server. The goal is to study properties of the key access structures which make the problem feasible and then design encoding and decoding schemes efficient in the size of the server transmission as well as the sizes of the secret keys. We call this the private index coding problem. We begin by characterizing the key access structures that make private index coding feasible. We also give conditions to check if a given linear scheme is a valid private index code. For up to three users, we characterize the rate region of feasible server transmission and key rates, and show that all feasible rates can be achieved using scalar linear coding and time sharing; we also show that scalar linear codes are sub-optimal for four receivers. The outer bounds used in the case of three users are extended to arbitrary number of users and seen as a generalized version of the well-known polymatroidal bounds for the standard non-private index coding. We also show that the presence of common randomness and private randomness does not change the rate region. Furthermore, we study the case where no keys are shared among the users and provide some necessary and sufficient conditions for feasibility in this setting under a weaker notion of privacy. If the server has the ability to multicast to any subset of users, we demonstrate how this flexibility can be used to provide privacy and characterize the minimum number of server multicasts required. △ Less

Submitted 30 May, 2020; originally announced June 2020.

Comments: 46 pages, 10 figures

arXiv:2002.08860 [pdf, other]

Dissipative SymODEN: Encoding Hamiltonian Dynamics with Dissipation and Control into Deep Learning

Authors: Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: In this work, we introduce Dissipative SymODEN, a deep learning architecture which can infer the dynamics of a physical system with dissipation from observed state trajectories. To improve prediction accuracy while reducing network size, Dissipative SymODEN encodes the port-Hamiltonian dynamics with energy dissipation and external input into the design of its computation graph and learns the dynam… ▽ More In this work, we introduce Dissipative SymODEN, a deep learning architecture which can infer the dynamics of a physical system with dissipation from observed state trajectories. To improve prediction accuracy while reducing network size, Dissipative SymODEN encodes the port-Hamiltonian dynamics with energy dissipation and external input into the design of its computation graph and learns the dynamics in a structured way. The learned model, by revealing key aspects of the system, such as the inertia, dissipation, and potential energy, paves the way for energy-based controllers. △ Less

Submitted 29 April, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: Published at ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (DeepDiffEq)

arXiv:1911.06995 [pdf, other]

Demand-Private Coded Caching and the Exact Trade-off for N=K=2

Authors: Sneha Kamath, Jithin Ravi, Bikash Kumar Dey

Abstract: The distributed coded caching problem has been studied extensively in the recent past. While the known coded caching schemes achieve an improved transmission rate, they violate the privacy of the users since in these schemes the demand of one user is revealed to others in the delivery phase. In this paper, we consider the coded caching problem under the constraint that the demands of the other use… ▽ More The distributed coded caching problem has been studied extensively in the recent past. While the known coded caching schemes achieve an improved transmission rate, they violate the privacy of the users since in these schemes the demand of one user is revealed to others in the delivery phase. In this paper, we consider the coded caching problem under the constraint that the demands of the other users remain information theoretically secret from each user. We first show that the memory-rate pair $(M,\min \{N,K\}(1-M/N))$ is achievable under information theoretic demand privacy, while using broadcast transmissions. We then show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that satisfies only a restricted subset of demands of $NK$ users for $N$ files. We then focus on the demand-private coded caching problem for $K=2$ users, $N=2$ files. We characterize the exact memory-rate trade-off for this case. To show the achievability, we use our first result to construct a demand-private scheme from a non-private scheme satisfying a restricted demand subset that is known from an earlier work by Tian. Further, by giving a converse based on the extra requirement of privacy, we show that the obtained achievable region is the exact memory-rate trade-off. △ Less

Submitted 18 February, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

Comments: 8 pages, 2 figures

arXiv:1911.02895 [pdf, other]

doi 10.1098/rspa.2019.0585

Beacon-referenced Pursuit for Collective Motions in Three Dimensions

Authors: Kevin S. Galloway, Biswadip Dey

Abstract: Motivated by real-world applications of unmanned aerial vehicles, this paper introduces a decentralized control mechanism to guide steering control of autonomous agents maneuvering in the vicinity of multiple moving entities (e.g. other autonomous agents) and stationary entities (e.g. fixed beacons or points of references) in a three-dimensional environment. The proposed control law, which can be… ▽ More Motivated by real-world applications of unmanned aerial vehicles, this paper introduces a decentralized control mechanism to guide steering control of autonomous agents maneuvering in the vicinity of multiple moving entities (e.g. other autonomous agents) and stationary entities (e.g. fixed beacons or points of references) in a three-dimensional environment. The proposed control law, which can be perceived as a modification of the three-dimensional constant bearing (CB) pursuit law, provides a means to allocate simultaneous attention to multiple entities. We investigate the behavior of the closed-loop dynamics for a system with one agent referencing two beacons, as well as a two-agent mutual pursuit system wherein each agent employs the beacon-referenced CB pursuit law with regards to the other agent and a stationary beacon. Under certain assumptions on the associated control parameters, we demonstrate that this problem admits circling equilibria with agents moving on circular orbits with a common radius, in planes perpendicular to a common axis passing through the beacons. As the common radius and distances from the beacon are determined by the choice of parameters in the pursuit law, this approach provides a means to engineer desired formations in a 3-dimensional setting. △ Less

Submitted 7 November, 2019; originally announced November 2019.

arXiv:1910.02133 [pdf, other]

A Conditional Generative Model for Predicting Material Microstructures from Processing Methods

Authors: Akshay Iyer, Biswadip Dey, Arindam Dasgupta, Wei Chen, Amit Chakraborty

Abstract: Microstructures of a material form the bridge linking processing conditions - which can be controlled, to the material property - which is the primary interest in engineering applications. Thus a critical task in material design is establishing the processing-structure relationship, which requires domain expertise and techniques that can model the high-dimensional material microstructure. This wor… ▽ More Microstructures of a material form the bridge linking processing conditions - which can be controlled, to the material property - which is the primary interest in engineering applications. Thus a critical task in material design is establishing the processing-structure relationship, which requires domain expertise and techniques that can model the high-dimensional material microstructure. This work proposes a deep learning based approach that models the processing-structure relationship as a conditional image synthesis problem. In particular, we develop an auxiliary classifier Wasserstein GAN with gradient penalty (ACWGAN-GP) to synthesize microstructures under a given processing condition. This approach is free of feature engineering, requires modest domain knowledge and is applicable to a wide range of material systems. We demonstrate this approach using the ultra high carbon steel (UHCS) database, where each microstructure is annotated with a label describing the cooling method it was subjected to. Our results show that ACWGAN-GP can synthesize high-quality multiphase microstructures for a given cooling method. △ Less

Submitted 4 October, 2019; originally announced October 2019.

arXiv:1909.12077 [pdf, other]

Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control

Authors: Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Abstract: In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed man… ▽ More In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way, which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies. △ Less

Submitted 29 February, 2024; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: Published as a Conference Paper at ICLR 2020

Journal ref: International Conference on Learning Representations (ICLR 2020); https://openreview.net/forum?id=ryxmb1rKDS

arXiv:1904.11925 [pdf, ps, other]

Byzantine Multiple Access

Authors: Neha Sangwan, Mayank Bakshi, Bikash Kumar Dey, Vinod M. Prabhakaran

Abstract: We study communication over multiple access channels (MAC) where one of the users is possibly adversarial. When all users are non-adversarial, we want their messages to be decoded reliably. When an adversary is present, we consider two different decoding guarantees. In part I, we require that the honest users' messages be decoded reliably. We study the 3-user MAC; 2-user MAC capacity follows fro… ▽ More We study communication over multiple access channels (MAC) where one of the users is possibly adversarial. When all users are non-adversarial, we want their messages to be decoded reliably. When an adversary is present, we consider two different decoding guarantees. In part I, we require that the honest users' messages be decoded reliably. We study the 3-user MAC; 2-user MAC capacity follows from point-to-point AVC capacity. We characterize the capacity region for randomized codes. We also study the capacity region for deterministic codes. We obtain necessary conditions including a new non-symmetrizability condition for the capacity region to be non-trivial. We show that when none of the users are symmetrizable, the randomized coding capacity region is also achievable with deterministic codes. In part II, we consider the weaker goal of authenticated communication where we only require that an adversarial user must not be able to cause an undetected error on the honest users' messages. For the 2-user MAC, we show that the following 3-phase scheme is rate-optimal: a standard MAC code is first used to achieve unauthenticated communication followed by two authentication phases where each user authenticates their message treating the other user as a possible adversary. We show that the authentication phases can be very short since this form of authentication itself, when possible, can be achieved for message sets whose size grow doubly exponentially in blocklength. This leads to our result that the authenticated communication capacity region of a discrete memoryless MAC is either zero or the (unauthenticated) MAC capacity region itself. This also, arguably, explains the similar nature of authenticated communication capacity of a discrete memoryless point-to-point adversarial channel recently found by Kosut and Kliewer (ITW, 2018). △ Less

Submitted 26 April, 2019; originally announced April 2019.

Comments: Part II is an extended version of the paper titled "Multiple Access Channels with Adversarial Users" to be presented at IEEE International Symposium on Information Theory 2019

arXiv:1809.04464 [pdf, other]

Arbitrarily Varying Remote Sources

Authors: Amitalok J. Budkuley, Bikash Kumar Dey, Sidharth Jaggi, Vinod M. Prabhakaran

Abstract: We study a lossy source coding problem for an arbitrarily varying remote source (AVRS) which was proposed in a prior work. An AVRS transmits symbols, each generated in an independent and identically distributed manner, which are sought to be estimated at the decoder. These symbols are remotely generated, and the encoder and decoder observe noise corrupted versions received through a two-output noi… ▽ More We study a lossy source coding problem for an arbitrarily varying remote source (AVRS) which was proposed in a prior work. An AVRS transmits symbols, each generated in an independent and identically distributed manner, which are sought to be estimated at the decoder. These symbols are remotely generated, and the encoder and decoder observe noise corrupted versions received through a two-output noisy channel. This channel is an arbitrarily varying channel controlled by a jamming adversary. We assume that the adversary knows the coding scheme as well as the source data non-causally, and hence, can employ malicious jamming strategies correlated to them. Our interest lies in studying the rate distortion function for codes with a stochastic encoder, i.e, when the encoder can privately randomize while the decoder is deterministic. We provide upper and lower bounds on this rate distortion function. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: 10 pages. arXiv admin note: substantial text overlap with arXiv:1704.07693

Showing 1–50 of 91 results for author: Dey, B