Search | arXiv e-print repository

arXiv:2407.00950 [pdf, other]

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

Authors: Ziyi Liu, Idan Attias, Daniel M. Roy

Abstract: In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. In addition to the usual reward signal, we assume the learner has access to additional variables, observed in each round after acting. When these variables $d$-separate the action from the reward, existing work in causal bandits demonstrates that one can achieve stric… ▽ More In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. In addition to the usual reward signal, we assume the learner has access to additional variables, observed in each round after acting. When these variables $d$-separate the action from the reward, existing work in causal bandits demonstrates that one can achieve strictly better (minimax) rates of regret (Lu et al., 2020). Our goal is to adapt to this favorable "conditionally benign" structure, if it is present in the environment, while simultaneously recovering worst-case minimax regret, if it is not. Notably, the learner has no prior knowledge of whether the favorable structure holds. In this paper, we establish the Pareto optimal frontier of adaptive rates. We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments, resolving an open question raised by Bilodeau et al. (2022). Furthermore, we are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting. Finally, we examine the common assumption that the marginal distributions of the post-action contexts are known and show that a nontrivial estimate is necessary for better-than-worst-case minimax rates. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted to ICML 2024

arXiv:2404.06498 [pdf, other]

Simultaneous linear connectivity of neural networks modulo permutation

Authors: Ekansh Sharma, Devin Kwok, Tom Denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite

Abstract: Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between traine… ▽ More Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between trained networks if they are permuted appropriately. In this work, we refine these arguments into three distinct claims of increasing strength. We show that existing evidence only supports "weak linear connectivity"-that for each pair of networks belonging to a set of SGD solutions, there exist (multiple) permutations that linearly connect it with the other networks. In contrast, the claim "strong linear connectivity"-that for each network, there exists one permutation that simultaneously connects it with the other networks-is both intuitively and practically more desirable. This stronger claim would imply that the loss landscape is convex after accounting for permutation, and enable linear interpolation between three or more independently trained models without increased loss. In this work, we introduce an intermediate claim-that for certain sequences of networks, there exists one permutation that simultaneously aligns matching pairs of networks from these sequences. Specifically, we discover that a single permutation aligns sequences of iteratively trained as well as iteratively pruned networks, meaning that two networks exhibit low loss barriers at each step of their optimization and sparsification trajectories respectively. Finally, we provide the first evidence that strong linear connectivity may be possible under certain conditions, by showing that barriers decrease with increasing network width when interpolating among three networks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 11 pages, 6 figures

arXiv:2403.17218 [pdf, other]

A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le

Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabiliti… ▽ More Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities. Although recent work has applied LLMs to vulnerability detection using generic prompting techniques, their full capabilities for this task and the types of errors they make when explaining identified vulnerabilities remain unclear. In this paper, we surveyed eleven LLMs that are state-of-the-art in code generation and commonly used as coding assistants, and evaluated their capabilities for vulnerability detection. We systematically searched for the best-performing prompts, incorporating techniques such as in-context learning and chain-of-thought, and proposed three of our own prompting methods. Our results show that while our prompting methods improved the models' performance, LLMs generally struggled with vulnerability detection. They reported 0.5-0.63 Balanced Accuracy and failed to distinguish between buggy and fixed versions of programs in 76% of cases on average. By comprehensively analyzing and categorizing 287 instances of model reasoning, we found that 57% of LLM responses contained errors, and the models frequently predicted incorrect locations of buggy code and misidentified bug types. LLMs only correctly localized 6 out of 27 bugs in DbgBench, and these 6 bugs were predicted correctly by 70-100% of human participants. These findings suggest that despite their potential for other tasks, LLMs may fail to properly comprehend critical code structures and security-related concepts. Our data and code are available at https://figshare.com/s/78fe02e56e09ec49300b. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.09327 [pdf, other]

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Authors: Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

Abstract: In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is… ▽ More In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $Ω(1/\varepsilon^2)$ and $Ω(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems. △ Less

Submitted 18 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 41 Pages, To appear in ICML 2024

arXiv:2312.17127 [pdf, other]

doi 10.1145/3632903

Probabilistic programming interfaces for random graphs: Markov categories, graphons, and nominal sets

Authors: Nathanael L. Ackerman, Cameron E. Freer, Younesse Kaddar, Jacek Karwowski, Sean K. Moss, Daniel M. Roy, Sam Staton, Hongseok Yang

Abstract: We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises f… ▽ More We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers 'black-and-white' graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted for POPL 2024

Journal ref: Proc. ACM Program. Lang. 8, POPL, Article 61 (2024), pp 1819-1849

arXiv:2308.02796 [pdf]

OBESEYE: Interpretable Diet Recommender for Obesity Management using Machine Learning and Explainable AI

Authors: Mrinmoy Roy, Srabonti Das, Anica Tasnim Protity

Abstract: Obesity, the leading cause of many non-communicable diseases, occurs mainly for eating more than our body requirements and lack of proper activity. So, being healthy requires heathy diet plans, especially for patients with comorbidities. But it is difficult to figure out the exact quantity of each nutrient because nutrients requirement varies based on physical and disease conditions. In our study… ▽ More Obesity, the leading cause of many non-communicable diseases, occurs mainly for eating more than our body requirements and lack of proper activity. So, being healthy requires heathy diet plans, especially for patients with comorbidities. But it is difficult to figure out the exact quantity of each nutrient because nutrients requirement varies based on physical and disease conditions. In our study we proposed a novel machine learning based system to predict the amount of nutrients one individual requires for being healthy. We applied different machine learning algorithms: linear regression, support vector machine (SVM), decision tree, random forest, XGBoost, LightGBM on fluid and 3 other major micronutrients: carbohydrate, protein, fat consumption prediction. We achieved high accuracy with low root mean square error (RMSE) by using linear regression in fluid prediction, random forest in carbohydrate prediction and LightGBM in protein and fat prediction. We believe our diet recommender system, OBESEYE, is the only of its kind which recommends diet with the consideration of comorbidities and physical conditions and promote encouragement to get rid of obesity. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Report number: Roy, M.(2023).OBESEYE: Interpretable Diet Recommender for Obesity Management using Machine Learning and Explainable AI.IJRAMT, 4(6), 1-7. https://journal.ijramt.com/ijramt/article/view/2733

arXiv:2308.02662 [pdf, ps, other]

Linear isomorphism testing of Boolean functions with small approximate spectral norm

Authors: Arijit Ghosh, Chandrima Kayal, Manaswi Paraashar, Manmatha Roy

Abstract: Two Boolean functions f, g : F_2^{n} \to {-1, 1} are called linearly isomorphic if there exists an invertible matrix M \in F_2^{n\times n} such that f\circ M = g. Testing linear isomorphism is a generalization of, now classical in the context of property testing, isomorphism testing between Boolean functions. Linear-invariance of Boolean functions has also been extensively studied in other areas l… ▽ More Two Boolean functions f, g : F_2^{n} \to {-1, 1} are called linearly isomorphic if there exists an invertible matrix M \in F_2^{n\times n} such that f\circ M = g. Testing linear isomorphism is a generalization of, now classical in the context of property testing, isomorphism testing between Boolean functions. Linear-invariance of Boolean functions has also been extensively studied in other areas like coding theory and cryptography, and mathematics in general. In this paper, we will study the following two variants of this problem: [1] [Communication complexity: ] Assume that Boolean functions f and g on F_2^{n} are given to Alice and Bob respectively, and the goal is to test linear isomorphism between f and g by exchanging a minimum amount of communication, measured in bits, between Alice and Bob. Our main result is an efficient two-party communication protocol for testing linear isomorphism in terms of the approximate spectral norm of the functions. We will crucially exploit the connection between approximate spectral norm and sign-approximating polynomials. [2] [Query complexity: ] If f: F_2^{n} \to { -1, 1 } is a known function and g : F_2^{n} \to { -1, 1 } be an unknown function with a query access. We study the query complexity of testing linear isomorphism between f and g in terms of the approximate spectral norm of f. As in the case of communication complexity, we will use properties of the approximate spectral norm. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.14067 [pdf]

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions

Authors: Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq

Abstract: Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed informa… ▽ More Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Journal ref: BJMHR, 10(6), 24-54 (2023)

arXiv:2307.10060 [pdf, other]

Accurate deep learning sub-grid scale models for large eddy simulations

Authors: Rikhi Bose, Arunabha M. Roy

Abstract: We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data… ▽ More We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data from direct simulations of the canonical channel flow at two friction Reynolds numbers $Re_τ\approx 395$ and 590 provided accurate data for training and testing. The two sets of models use different network architectures. One of the architectures uses tensor basis neural networks (TBNN) and embeds the simplified analytical model form of the general effective-viscosity hypothesis, thus incorporating the Galilean, rotational and reflectional invariances. The other architecture is that of a relatively simple network, that is able to incorporate the Galilean invariance only. However, this simpler architecture has better feature extraction capacity owing to its ability to establish relations between and extract information from cross-components of the integrity basis tensors and the SGS stresses. Both sets of models are used to predict the SGS stresses for feature datasets generated with different filter widths, and at different Reynolds numbers. It is shown that due to the simpler model's better feature learning capabilities, it outperforms the invariance embedded model in statistical performance metrics. In a priori tests, both sets of models provide similar levels of dissipation and backscatter. Based on the test results, both sets of models should be usable in a posteriori actual LESs. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.17759 [pdf, other]

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications. △ Less

Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2303.14697 [pdf, ps, other]

The central tree property and algorithmic problems on subgroups of free groups

Authors: Mallika Roy, Enric Ventura, Pascal Weil

Abstract: We study the average case complexity of the uniform membership problem for subgroups of free groups, and we show that it is orders of magnitude smaller than the worst case complexity of the best known algorithms. This applies to subgroups given by a fixed number of generators as well as to subgroups given by an exponential number of generators. The main idea behind this result is to exploit a gene… ▽ More We study the average case complexity of the uniform membership problem for subgroups of free groups, and we show that it is orders of magnitude smaller than the worst case complexity of the best known algorithms. This applies to subgroups given by a fixed number of generators as well as to subgroups given by an exponential number of generators. The main idea behind this result is to exploit a generic property of tuples of words, called the central tree property. An application is given to the average case complexity of the relative primitivity problem, using Shpilrain's recent algorithm to decide primitivity, whose average case complexity is a constant depending only on the rank of the ambient free group. △ Less

Submitted 19 October, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 28 pages. Inaccuracies corrected. To appear in Journal of Group Theory

MSC Class: 20E05; 20F10; 68Q17

arXiv:2303.04808 [pdf]

Prevalence and Major Risk Factors of Non-communicable Diseases: A Machine Learning based Cross-Sectional Study

Authors: Mrinmoy Roy, Anica Tasnim Protity, Srabonti Das, Porarthi Dhar

Abstract: Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was pre… ▽ More Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was present in 83.56% of all participants. CVD was more common in male participants. Consequently, male participants had a higher blood pressure distribution than females. Diabetes mellitus (DM), on the other hand, did not have a gender-based inclination. Both CVD and DM had an age-based progression. Our study showed that chronic respiratory illness was more frequent in middle-aged participants than in younger or elderly individuals. Based on the data, every one in five hospitalized patients was obese. We analyzed the co-morbidities and found that 31.5% of the population has only one NCD, 30.1% has two NCDs, and 38.3% has more than two NCDs. Besides, 86.25% of all diabetic patients had cardiovascular issues. All thyroid patients in our study had CVD. Using a t-test, we found a relationship between CKD and thyroid (p-value 0.061). Males under 35 years have a statistically significant relationship between thyroid and chronic respiratory diseases (p-value 0.018). We also found an association between DM and CKD among patients over 65 (p-value 0.038). Moreover, there has been a statistically significant relationship between CKD and Thyroid (P < 0.05) for those below 35 and 35-65. We used a two-way ANOVA test to find the statistically significant interaction of heart issues and chronic respiratory illness, in combination with diabetes. The combination of DM and RTI also affected CKD in male patients over 65 years old. △ Less

Submitted 18 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 25 pages, 10 figures, 3 tables

arXiv:2303.04275 [pdf, other]

A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head

Authors: Arunabha M. Roy, Jayabrata Bhaduri

Abstract: Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack superior feature extraction capability in complex and noisy environments, limiting the development o… ▽ More Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack superior feature extraction capability in complex and noisy environments, limiting the development of accurate and reliable object distinction. Method: To this end, we present DenseSPH-YOLOv5, a real-time DL-based high-performance damage detection model where DenseNet blocks have been integrated with the backbone to improve in preserving and reusing critical feature information. Additionally, convolutional block attention modules (CBAM) have been implemented to improve attention performance mechanisms for strong and discriminating deep spatial feature extraction that results in superior detection under various challenging environments. Moreover, additional feature fusion layers and a Swin-Transformer Prediction Head (SPH) have been added leveraging advanced self-attention mechanism for more efficient detection of multiscale object sizes and simultaneously reducing the computational complexity. Results: Evaluating the model performance in large-scale Road Damage Dataset (RDD-2018), at a detection rate of 62.4 FPS, DenseSPH-YOLOv5 obtains a mean average precision (mAP) value of 85.25 %, F1-score of 81.18 %, and precision (P) value of 89.51 % outperforming current state-of-the-art models. Significance: The present research provides an effective and efficient damage localization model addressing the shortcoming of existing DL-based damage detection models by providing highly accurate localized bounding box prediction. Current work constitutes a step towards an accurate and robust automated damage detection system in real-time in-field applications. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.09668 [pdf, other]

Physics-aware deep learning framework for linear elasticity

Authors: Arunabha M. Roy, Rikhi Bose

Abstract: The paper presents an efficient and robust data-driven deep learning (DL) computational framework developed for linear continuum elasticity problems. The methodology is based on the fundamentals of the Physics Informed Neural Networks (PINNs). For an accurate representation of the field variables, a multi-objective loss function is proposed. It consists of terms corresponding to the residual of th… ▽ More The paper presents an efficient and robust data-driven deep learning (DL) computational framework developed for linear continuum elasticity problems. The methodology is based on the fundamentals of the Physics Informed Neural Networks (PINNs). For an accurate representation of the field variables, a multi-objective loss function is proposed. It consists of terms corresponding to the residual of the governing partial differential equations (PDE), constitutive relations derived from the governing physics, various boundary conditions, and data-driven physical knowledge fitting terms across randomly selected collocation points in the problem domain. To this end, multiple densely connected independent artificial neural networks (ANNs), each approximating a field variable, are trained to obtain accurate solutions. Several benchmark problems including the Airy solution to elasticity and the Kirchhoff-Love plate problem are solved. Performance in terms of accuracy and robustness illustrates the superiority of the current framework showing excellent agreement with analytical solutions. The present work combines the benefits of the classical methods depending on the physical information available in analytical relations with the superior capabilities of the DL techniques in the data-driven construction of lightweight, yet accurate and robust neural networks. The models developed herein can significantly boost computational speed using minimal network parameters with easy adaptability in different computational platforms. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2301.00948 [pdf, other]

Understanding EEG signals for subject-wise Definition of Armoni Activities

Authors: Kislay Raj, Aditya Singh, Abhishek Mandal, Teerath Kumar, Arunabha M. Roy

Abstract: In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a clos… ▽ More In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a closed-loop methodology to improve the cognitive state of a person with ID (Intellectual disability). We have used a platform named 'Armoni', for providing training to the intellectually disabled individuals. The learning is performed in a closed-loop by using feedback in the form of change in affective state. For feedback to the Armoni, an EEG (Electroencephalograph) headband is used. All the changes in EEG are observed and classified against the change in the mean and standard deviation value of all frequency bands of signal. This comparison is being helpful in defining every activity with respect to change in brain signals. In this paper, we have discussed the process of treatment of EEG signal and its definition against the different activities of Armoni. We have tested it on 6 different systems with different age groups and cognitive levels. △ Less

Submitted 26 April, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: Submitted to SN Computer Science journal

arXiv:2301.00122 [pdf]

doi 10.24018/ejcompute.2023.3.1.85

Hair and Scalp Disease Detection using Machine Learning and Image Processing

Authors: Mrinmoy Roy, Anica Tasnim Protity

Abstract: Almost 80 million Americans suffer from hair loss due to aging, stress, medication, or genetic makeup. Hair and scalp-related diseases often go unnoticed in the beginning. Sometimes, a patient cannot differentiate between hair loss and regular hair fall. Diagnosing hair-related diseases is time-consuming as it requires professional dermatologists to perform visual and medical tests. Because of tha… ▽ More Almost 80 million Americans suffer from hair loss due to aging, stress, medication, or genetic makeup. Hair and scalp-related diseases often go unnoticed in the beginning. Sometimes, a patient cannot differentiate between hair loss and regular hair fall. Diagnosing hair-related diseases is time-consuming as it requires professional dermatologists to perform visual and medical tests. Because of that, the overall diagnosis gets delayed, which worsens the severity of the illness. Due to the image-processing ability, neural network-based applications are used in various sectors, especially healthcare and health informatics, to predict deadly diseases like cancers and tumors. These applications assist clinicians and patients and provide an initial insight into early-stage symptoms. In this study, we used a deep learning approach that successfully predicts three main types of hair loss and scalp-related diseases: alopecia, psoriasis, and folliculitis. However, limited study in this area, unavailability of a proper dataset, and degree of variety among the images scattered over the internet made the task challenging. 150 images were obtained from various sources and then preprocessed by denoising, image equalization, enhancement, and data balancing, thereby minimizing the error rate. After feeding the processed data into the 2D convolutional neural network (CNN) model, we obtained overall training accuracy of 96.2%, with a validation accuracy of 91.1%. The precision and recall score of alopecia, psoriasis, and folliculitis are 0.895, 0.846, and 1.0, respectively. We also created a dataset of the scalp images for future prospective researchers. △ Less

Submitted 30 May, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

Journal ref: EJ-Compute.2023;3(1):7-13

arXiv:2212.13556 [pdf, other]

Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

Authors: Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite

Abstract: To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds… ▽ More To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques. △ Less

Submitted 13 July, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 49 pages, 2 figures. This version corrects a mistake in the proof of Theorem 17. Proc. International Conference on Algorithmic Learning Theory (ALT), 2023

arXiv:2210.13738 [pdf, other]

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Authors: Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

Abstract: Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads… ▽ More Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning's effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 49 pages, 20 figures

Journal ref: Advances in Neural Information Processing Systems 2022

arXiv:2210.04252 [pdf, other]

Precise Single-stage Detector

Authors: Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin Ranjbarzadeh, Arunabha M Roy, Akhtar Hussain, Yao Shen

Abstract: There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confide… ▽ More There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: We will submit it soon to the IEEE transaction. Due to characters limitation, we can not upload the full abstract. Please read the pdf file for more detail

arXiv:2209.11234 [pdf, other]

doi 10.1002/adem.202300104

Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering

Authors: Lipichanda Goswami, Manoj Deka, Mohendra Roy

Abstract: The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (… ▽ More The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (DFT), in property prediction. Machine learning (ML)-based methods are faster and more accurate than DFT-based methods. Furthermore, the generative adversarial networks (GANs) have facilitated the generation of chemical compositions of inorganic materials without using crystal structure information. These developments have significantly impacted material engineering (ME) and research. Some of the latest developments in AI in ME herein are reviewed. First, the development of AI in the critical areas of ME, such as in material processing, the study of structure and material property, and measuring the performance of materials in various aspects, is discussed. Then, the significant methods of AI and their uses in MSE, such as graph neural network, generative models, transfer of learning, etc. are discussed. The use of AI to analyze the results from existing analytical instruments is also discussed. Finally, AI's advantages, disadvantages, and future in ME are discussed. △ Less

Submitted 27 April, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: V3

arXiv:2209.06977 [pdf]

SQL and NoSQL Databases Software architectures performance analysis and assessments -- A Systematic Literature review

Authors: Wisal Khan, Teerath Kumar, Zhang Cheng, Kislay Raj, Arunabha M Roy, Bin Luo

Abstract: Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm a… ▽ More Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm according to the organisation's needs; however, making the correct choice can often be challenging. The SQL and NoSQL Databases follow different architectures. Also, the mixed model is followed by each category of NoSQL Databases. Hence, data movement becomes difficult for cloud consumers across multiple cloud service providers (CSPs). In addition, each cloud platform IaaS, PaaS, SaaS, and DBaaS also monitors various paradigms. Objective: This systematic literature review (SLR) aims to study the related articles associated with SQL and NoSQL Database software architectures and tackle data portability and Interoperability among various cloud platforms. State of the art presented many performance comparison studies of SQL and NoSQL Databases by observing scaling, performance, availability, consistency and sharding characteristics. According to the research studies, NoSQL Database designed structures can be the right choice for big data analytics, while SQL Databases are suitable for OLTP Databases. The researcher proposes numerous approaches associated with data movement in the cloud. Platform-based APIs are developed, which makes users' data movement difficult. Therefore, data portability and Interoperability issues are noticed during data movement across multiple CSPs. To minimize developer efforts and Interoperability, Unified APIs are demanded to make data movement relatively more accessible among various cloud platforms. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 57 pages systematic literature review, already submitted to Big Data Research; More importantly, we can not add method, result and conclusion section in the abstract here due to characters limitations. Please check pdf file

arXiv:2208.00788 [pdf, other]

A Hybrid CNN-LSTM model for Video Deepfake Detection by Leveraging Optical Flow Features

Authors: Pallabi Saikia, Dhwani Dholaria, Priyanka Yadav, Vaidehi Patel, Mohendra Roy

Abstract: Deepfakes are the synthesized digital media in order to create ultra-realistic fake videos to trick the spectator. Deep generative algorithms, such as, Generative Adversarial Networks(GAN) are widely used to accomplish such tasks. This approach synthesizes pseudo-realistic contents that are very difficult to distinguish by traditional detection methods. In most cases, Convolutional Neural Network(… ▽ More Deepfakes are the synthesized digital media in order to create ultra-realistic fake videos to trick the spectator. Deep generative algorithms, such as, Generative Adversarial Networks(GAN) are widely used to accomplish such tasks. This approach synthesizes pseudo-realistic contents that are very difficult to distinguish by traditional detection methods. In most cases, Convolutional Neural Network(CNN) based discriminators are being used for detecting such synthesized media. However, it emphasise primarily on the spatial attributes of individual video frames, thereby fail to learn the temporal information from their inter-frame relations. In this paper, we leveraged an optical flow based feature extraction approach to extract the temporal features, which are then fed to a hybrid model for classification. This hybrid model is based on the combination of CNN and recurrent neural network (RNN) architectures. The hybrid model provides effective performance on open source data-sets such as, DFDC, FF++ and Celeb-DF. This proposed method shows an accuracy of 66.26%, 91.21% and 79.49% in DFDC, FF++, and Celeb-DF respectively with a very reduced No of sample size of approx 100 samples(frames). This promises early detection of fake contents compared to existing modalities. △ Less

Submitted 28 July, 2022; originally announced August 2022.

Journal ref: Copyright is with IEEE, Paper No: 832, IJCNN, 2022 IEEE World Congress on Computational Intelligence

arXiv:2207.13500 [pdf, other]

Modelling Social Context for Fake News Detection: A Graph Neural Network Based Approach

Authors: Pallabi Saikia, Kshitij Gundale, Ankit Jain, Dev Jadeja, Harvi Patel, Mohendra Roy

Abstract: Detection of fake news is crucial to ensure the authenticity of information and maintain the news ecosystems reliability. Recently, there has been an increase in fake news content due to the recent proliferation of social media and fake content generation techniques such as Deep Fake. The majority of the existing modalities of fake news detection focus on content based approaches. However, most of… ▽ More Detection of fake news is crucial to ensure the authenticity of information and maintain the news ecosystems reliability. Recently, there has been an increase in fake news content due to the recent proliferation of social media and fake content generation techniques such as Deep Fake. The majority of the existing modalities of fake news detection focus on content based approaches. However, most of these techniques fail to deal with ultra realistic synthesized media produced by generative models. Our recent studies find that the propagation characteristics of authentic and fake news are distinguishable, irrespective of their modalities. In this regard, we have investigated the auxiliary information based on social context to detect fake news. This paper has analyzed the social context of fake news detection with a hybrid graph neural network based approach. This hybrid model is based on integrating a graph neural network on the propagation of news and bi directional encoder representations from the transformers model on news content to learn the text features. Thus this proposed approach learns the content as well as the context features and hence able to outperform the baseline models with an f1 score of 0.91 on PolitiFact and 0.93 on the Gossipcop dataset, respectively △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: copyright with IEEE, Paper No: 834, IJCNN, 2022 IEEE World Congress on Computational Intelligence

arXiv:2207.12395 [pdf, other]

Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics

Authors: Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H. Huggins

Abstract: The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choi… ▽ More The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models. △ Less

Submitted 20 July, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: 42 pgs

arXiv:2206.14800 [pdf, other]

Understanding Generalization via Leave-One-Out Conditional Mutual Information

Authors: Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

Abstract: We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean… ▽ More We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a constant factor of two. As an application, we analyze the population risk of the one-inclusion graph algorithm, a general-purpose transductive learning algorithm for VC classes in the realizable setting. Using leave-one-out CMI, we match the optimal bound for learning VC classes in the realizable setting, answering an open challenge raised by Steinke and Zakynthinou (2020). Finally, in order to understand the role of leave-one-out CMI in studying generalization, we place leave-one-out CMI in a hierarchy of measures, with a novel unconditional mutual information at the root. For 0-1 loss and interpolating learning algorithms, this mutual information is observed to be precisely the risk. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 18 pages

arXiv:2206.02768 [pdf, other]

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

Authors: Mufan Bill Li, Mihai Nica, Daniel M. Roy

Abstract: The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current inf… ▽ More The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function. △ Less

Submitted 14 June, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 48 pages, 10 figures. Advances in Neural Information Processing Systems (2022)

arXiv:2203.00899 [pdf, other]

doi 10.48550/arXiv.2203.00899

Machine learning based lens-free imaging technique for field-portable cytometry

Authors: Rajkumar Vaghashiya, Sanghoon Shin, Varun Chauhan, Kaushal Kapadiya, Smit Sanghavi, Sungkyu Seo, Mohendra Roy

Abstract: Lens-free Shadow Imaging Technique (LSIT) is a well-established technique for the characterization of microparticles and biological cells. Due to its simplicity and cost-effectiveness, various low-cost solutions have been evolved, such as automatic analysis of complete blood count (CBC), cell viability, 2D cell morphology, 3D cell tomography, etc. The developed auto characterization algorithm so f… ▽ More Lens-free Shadow Imaging Technique (LSIT) is a well-established technique for the characterization of microparticles and biological cells. Due to its simplicity and cost-effectiveness, various low-cost solutions have been evolved, such as automatic analysis of complete blood count (CBC), cell viability, 2D cell morphology, 3D cell tomography, etc. The developed auto characterization algorithm so far for this custom-developed LSIT cytometer was based on the hand-crafted features of the cell diffraction patterns from the LSIT cytometer, that were determined from our empirical findings on thousands of samples of individual cell types, which limit the system in terms of induction of a new cell type for auto classification or characterization. Further, its performance is suffering from poor image (cell diffraction pattern) signatures due to its small signal or background noise. In this work, we address these issues by leveraging the artificial intelligence-powered auto signal enhancing scheme such as denoising autoencoder and adaptive cell characterization technique based on the transfer of learning in deep neural networks. The performance of our proposed method shows an increase in accuracy >98% along with the signal enhancement of >5 dB for most of the cell types, such as Red Blood Cell (RBC) and White Blood Cell (WBC). Furthermore, the model is adaptive to learn new type of samples within a few learning iterations and able to successfully classify the newly introduced sample along with the existing other sample types. △ Less

Submitted 2 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: Published in Biosensors Journal

Journal ref: https://www.mdpi.com/2079-6374/12/3/144

arXiv:2202.05100 [pdf, other]

Adaptively Exploiting d-Separators with Causal Bandits

Authors: Blair Bilodeau, Linbo Wang, Daniel M. Roy

Abstract: Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent "causal bandit" algorithms p… ▽ More Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent "causal bandit" algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a d-separator. In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a d-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion. △ Less

Submitted 26 October, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: 29 pages, 3 figures. Camera ready version

Journal ref: NeurIPS 2022

arXiv:2202.01916 [pdf, other]

doi 10.1016/j.matpr.2022.01.120

Artificial Intelligence Powered Material Search Engine

Authors: Mohendra Roy

Abstract: Many data-driven applications in material science have been made possible because of recent breakthroughs in artificial intelligence(AI). The use of AI in material engineering is becoming more viable as the number of material data such as X-Ray diffraction, various spectroscopy, and microscope data grows. In this work, we have reported a material search engine that uses the interatomic space (d va… ▽ More Many data-driven applications in material science have been made possible because of recent breakthroughs in artificial intelligence(AI). The use of AI in material engineering is becoming more viable as the number of material data such as X-Ray diffraction, various spectroscopy, and microscope data grows. In this work, we have reported a material search engine that uses the interatomic space (d value) from X-ray diffraction to provide material information. We have investigated various techniques for predicting prospective material using X-ray diffraction data. We used the Random Forest, Naive Bayes (Gaussian), and Neural Network algorithms to achieve this. These algorithms have an average accuracy of 88.50\%, 100.0\%, and 88.89\%, respectively. Finally, we combined all these techniques into an ensemble approach to make the prediction more generic. This ensemble method has a ~100\% accuracy rate. Furthermore, we are designing a graph neural network (GNN)-based architecture to improve interpretability and accuracy. Thus, we want to solve the computational and time complexity of traditional dictionary-based and metadata-based material search engines and to provide a more generic prediction. △ Less

Submitted 19 January, 2022; originally announced February 2022.

Comments: 4 pages

Report number: Article reference No: MATPR29663

Journal ref: Materials Today Journal, Elsevier; Article reference No: MATPR29663, 2022

arXiv:2112.00131 [pdf, other]

doi 10.1007/978-3-030-99194-4_30

CovidAlert -- A Wristwatch-based System to Alert Users from Face Touching

Authors: Mrinmoy Roy, Venkata Devesh Reddy Seethi, Rami Lake, Pratool Bharti

Abstract: Worldwide 2019 million people have been infected and 4.5 million have lost their lives in the ongoing Covid-19 pandemic. Until vaccines became widely available, precautions and safety measures like wearing masks, physical distancing, avoiding face touching were some of the primary means to curb the spread of virus. Face touching is a compulsive human begavior that can not be prevented without maki… ▽ More Worldwide 2019 million people have been infected and 4.5 million have lost their lives in the ongoing Covid-19 pandemic. Until vaccines became widely available, precautions and safety measures like wearing masks, physical distancing, avoiding face touching were some of the primary means to curb the spread of virus. Face touching is a compulsive human begavior that can not be prevented without making a continuous consious effort, even then it is inevitable. To address this problem, we have designed a smartwatch-based solution, CovidAlert, that leverages Random Forest algorithm trained on accelerometer and gyroscope data from the smartwatch to detects hand transition to face and sends a quick haptic alert to the users. CovidALert is highly energy efficient as it employs STA/LTA algorithm as a gatekeeper to curtail the usage of Random Forest model on the watch when user is inactive. The overall accuracy of our system is 88.4% with low false negatives and false positives. We also demonstrated the system viability by implementing it on a commercial Fossil Gen 5 smartwatch. △ Less

Submitted 11 April, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

Comments: 17 pages, 9 figures, PervasiveHealth2021 conference

arXiv:2111.09109 [pdf, other]

Physics-guided Loss Functions Improve Deep Learning Performance in Inverse Scattering

Authors: Zicheng Liu, Mayank Roy, Dilip K. Prasad, Krishna Agarwal

Abstract: Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algor… ▽ More Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algorithms and discuss how important physical phenomena cannot be effectively incorporated in the training process. We show the importance of including near-field priors in the learning process of DNNs. To this end, we propose new designs of loss functions which incorporate multiple-scattering based near-field quantities (such as scattered fields or induced currents within domain of interest). Effects of physics-guided loss functions are studied using a variety of numerical experiments. Pros and cons of the investigated ISP solvers with different loss functions are summarized. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2111.05275 [pdf, other]

Towards a Unified Information-Theoretic Framework for Generalization

Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

Abstract: In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs… ▽ More In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces. This result is an application of our general result showing that stable compression schemes Bousquet al. (2020) of size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020). We further study the CMI of empirical risk minimizers (ERMs) of class $H$ and show that it is possible to output all consistent classifiers (version space) with bounded CMI if and only if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove a general reduction showing that "leave-one-out" analysis is expressible via the CMI framework. As a corollary we investigate the CMI of the one-inclusion-graph algorithm proposed by Haussler et al. (1994). More generally, we show that the CMI framework is universal in the sense that for every consistent algorithm and data distribution, the expected risk vanishes as the number of samples diverges if and only if its evaluated CMI has sublinear growth with the number of samples. △ Less

Submitted 17 November, 2021; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: 22 Pages, NeurIPS 2021, This submission subsumes [arXiv:2011.02970] ("On the Information Complexity of Proper Learners for VC Classes in the Realizable Case")

arXiv:2111.00298 [pdf, other]

A fast accurate fine-grain object detection model based on YOLOv4 deep neural network

Authors: Arunabha M. Roy, Rikhi Bose, Jayabrata Bhaduri

Abstract: Early identification and prevention of various plant diseases in commercial farms and orchards is a key feature of precision agriculture technology. This paper presents a high-performance real-time fine-grain object detection framework that addresses several obstacles in plant disease detection that hinder the performance of traditional methods, such as, dense distribution, irregular morphology, m… ▽ More Early identification and prevention of various plant diseases in commercial farms and orchards is a key feature of precision agriculture technology. This paper presents a high-performance real-time fine-grain object detection framework that addresses several obstacles in plant disease detection that hinder the performance of traditional methods, such as, dense distribution, irregular morphology, multi-scale object classes, textural similarity, etc. The proposed model is built on an improved version of the You Only Look Once (YOLOv4) algorithm. The modified network architecture maximizes both detection accuracy and speed by including the DenseNet in the back-bone to optimize feature transfer and reuse, two new residual blocks in the backbone and neck enhance feature extraction and reduce computing cost; the Spatial Pyramid Pooling (SPP) enhances receptive field, and a modified Path Aggregation Network (PANet) preserves fine-grain localized information and improve feature fusion. Additionally, the use of the Hard-Swish function as the primary activation improved the model's accuracy due to better nonlinear feature extraction. The proposed model is tested in detecting four different diseases in tomato plants under various challenging environments. The model outperforms the existing state-of-the-art detection models in detection accuracy and speed. At a detection rate of 70.19 FPS, the proposed model obtained a precision value of $90.33 \%$, F1-score of $93.64 \%$, and a mean average precision ($mAP$) value of $96.29 \%$. Current work provides an effective and efficient method for detecting different plant diseases in complex scenarios that can be extended to different fruit and crop detection, generic disease detection, and various automated agricultural detection processes. △ Less

Submitted 30 October, 2021; originally announced November 2021.

MSC Class: 68T01; 68T05; 68T07; 68T10; 68T40; 68T45; 68U10; ACM Class: I.4.9; I.5.2; I.5.4; I.2.1; I.2.9; I.2.m; J.7

arXiv:2110.14804 [pdf, other]

Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers

Authors: Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy

Abstract: Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial… ▽ More Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor stochastic (i.i.d.). We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge. We extend existing KL regret upper bounds, which hold uniformly over target distributions, to possibly uncountable expert classes with arbitrary priors; provide the first full-information lower bounds for quantile regret on finite expert classes (which are tight); and provide an adaptively minimax optimal algorithm for the semi-adversarial paradigm that adapts to the true, unknown constraint faster, leading to uniformly improved regret bounds over existing methods. △ Less

Submitted 7 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 30 pages, 2 figures. Jeffrey Negrea and Blair Bilodeau are equal-contribution authors. Updated citations

Journal ref: NeurIPS 2021

arXiv:2110.13967 [pdf, other]

Evaluating Serverless Architecture for Big Data Enterprise Applications

Authors: Aimer Bhat, Madhumonti Roy, Heeki Park

Abstract: In this paper, we investigate serverless computing for performing large scale data processing with cloudnative primitives. In this paper, we investigate serverless computing for performing large scale data processing with cloudnative primitives. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: 8 pages

Journal ref: BDCAT 2021

arXiv:2107.10653 [pdf, other]

Dialogue Object Search

Authors: Monica Roy, Kaiyu Zheng, Jason Liu, Stefanie Tellex

Abstract: We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human… ▽ More We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human who has additional but inexact knowledge about the target's location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 3 pages, 1 figure. Robotics: Science and Systems (RSS) 2021 Workshop on Robotics for People (R4P): Perspectives on Interaction, Learning and Safety. Extended Abstract

arXiv:2106.04013 [pdf, other]

The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

Authors: Mufan Bill Li, Mihai Nica, Daniel M. Roy

Abstract: Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, $n$, the Gaussian approximation gets worse as the network depth, $d$, increases. Given that modern networks are deep, this raises the question of how well modern architectures, like… ▽ More Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, $n$, the Gaussian approximation gets worse as the network depth, $d$, increases. Given that modern networks are deep, this raises the question of how well modern architectures, like ResNets, are captured by the infinite-width limit. To provide a better approximation, we study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant. In contrast to the Gaussian infinite-width limit, we show theoretically that the network exhibits log-Gaussian behaviour at initialization in the infinite-depth-and-width limit, with parameters depending on the ratio $d/n$. Using Monte Carlo simulations, we demonstrate that even basic properties of standard ResNet architectures are poorly captured by the Gaussian limit, but remarkably well captured by our log-Gaussian limit. Moreover, our analysis reveals that ReLU ResNets at initialization are hypoactivated: fewer than half of the ReLUs are activated. Additionally, we calculate the interlayer correlations, which have the effect of exponentially increasing the variance of the network output. Based on our analysis, we introduce Balanced ResNets, a simple architecture modification, which eliminates hypoactivation and interlayer correlations and is more amenable to theoretical analysis. △ Less

Submitted 27 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2104.13818

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 1 May, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: This entry is redundant and was created in error. See arXiv:1908.06077 for the latest version

arXiv:2104.03252 [pdf]

Leaving Goals on the Pitch: Evaluating Decision Making in Soccer

Authors: Maaike Van Roy, Pieter Robberechts, Wen-Chi Yang, Luc De Raedt, Jesse Davis

Abstract: Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluate… ▽ More Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluates whether this decrease in long-distance shots is warranted. Therefore, we propose a novel generic framework to reason about decision-making in soccer by combining techniques from machine learning and artificial intelligence (AI). First, we model how a team has behaved offensively over the course of two seasons by learning a Markov Decision Process (MDP) from event stream data. Second, we use reasoning techniques arising from the AI literature on verification to each team's MDP. This allows us to reason about the efficacy of certain potential decisions by posing counterfactual questions to the MDP. Our key conclusion is that teams would score more goals if they shot more often from outside the penalty box in a small number of team-specific locations. The proposed framework can easily be extended and applied to analyze other aspects of the game. △ Less

Submitted 16 February, 2023; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: Add missing funding

Journal ref: 2021 MIT Sloan Sports Analytics Conference

arXiv:2102.00931 [pdf, other]

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

Authors: Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

Abstract: We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds dep… ▽ More We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates. △ Less

Submitted 15 August, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

Comments: COLT 2021

arXiv:2101.10589 [pdf, other]

Glioblastoma Multiforme Patient Survival Prediction

Authors: Snehal Rajput, Rupal Agravat, Mohendra Roy, Mehul S Raval

Abstract: Glioblastoma Multiforme is a very aggressive type of brain tumor. Due to spatial and temporal intra-tissue inhomogeneity, location and the extent of the cancer tissue, it is difficult to detect and dissect the tumor regions. In this paper, we propose survival prognosis models using four regressors operating on handcrafted image-based and radiomics features. We hypothesize that the radiomics shape… ▽ More Glioblastoma Multiforme is a very aggressive type of brain tumor. Due to spatial and temporal intra-tissue inhomogeneity, location and the extent of the cancer tissue, it is difficult to detect and dissect the tumor regions. In this paper, we propose survival prognosis models using four regressors operating on handcrafted image-based and radiomics features. We hypothesize that the radiomics shape features have the highest correlation with survival prediction. The proposed approaches were assessed on the Brain Tumor Segmentation (BraTS-2020) challenge dataset. The highest accuracy of image features with random forest regressor approach was 51.5\% for the training and 51.7\% for the validation dataset. The gradient boosting regressor with shape features gave an accuracy of 91.5\% and 62.1\% on training and validation datasets respectively. It is better than the BraTS 2020 survival prediction challenge winners on the training and validation datasets. Our work shows that handcrafted features exhibit a strong correlation with survival prediction. The consensus based regressor with gradient boosting and radiomics shape features is the best combination for survival prediction. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 10 pages, 9 figures

Journal ref: 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2021)

arXiv:2012.07976 [pdf, other]

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

Authors: Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

Abstract: Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, c… ▽ More Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, conventional statistical learning approaches have yet been able to provide a satisfactory explanation on why deep learning works. A recent line of works aims to address the problem by trying to predict the generalization performance through complexity measures. In this competition, we invite the community to propose complexity measures that can accurately predict generalization of models. A robust and general complexity measure would potentially lead to a better understanding of deep learning's underlying mechanism and behavior of deep models on unseen data, or shed light on better generalization bounds. All these outcomes will be important for making deep learning more robust and reliable. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 20 pages, 2 figures. Accepted for NeurIPS 2020 Competitions Track. Lead organizer: Yiding Jiang

arXiv:2011.02970 [pdf, other]

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

Abstract: We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be boun… ▽ More We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Comments: 5 Pages

arXiv:2010.15110 [pdf, other]

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

Authors: Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

Abstract: In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics… ▽ More In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15% to 45% of training time. Overall, our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)

arXiv:2010.13764 [pdf, other]

Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability

Authors: Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy

Abstract: To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in… ▽ More To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in perspective. Rather than attempt to define interpretability, we propose to model the \emph{act} of \emph{enforcing} interpretability. As a starting point, we focus on the setting of empirical risk minimization for binary classification, and view interpretability as a constraint placed on learning. That is, we assume we are given a subset of hypothesis that are deemed to be interpretable, possibly depending on the data distribution and other aspects of the context. We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses. This model allows us to reason about the statistical implications of enforcing interpretability, using known results in statistical learning theory. Focusing on accuracy, we perform a case analysis, explaining why one may or may not observe a trade-off between accuracy and interpretability when the restriction to interpretable classifiers does or does not come at the cost of some excess statistical risk. We close with some worked examples and some open problems, which we hope will spur further theoretical development around the tradeoffs involved in interpretability. △ Less

Submitted 28 October, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 12 pages; minor edits

arXiv:2010.11924 [pdf, other]

In Search of Robust Measures of Generalization

Authors: Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

Abstract: One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neu… ▽ More One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness. △ Less

Submitted 20 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 27 pages, 11 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:2009.08576 [pdf, other]

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

Abstract: Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why.… ▽ More Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why. We show that, unlike pruning after training, randomly shuffling the weights these methods prune within each layer or sampling new initial values preserves or improves accuracy. As such, the per-weight pruning decisions made by these methods can be replaced by a per-layer choice of the fraction of weights to prune. This property suggests broader challenges with the underlying pruning heuristics, the desire to prune at initialization, or both. △ Less

Submitted 21 March, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: Published in ICLR 2021

arXiv:2008.09442 [pdf, ps, other]

doi 10.1109/TVLSI.2020.3016939

ADIC: Anomaly Detection Integrated Circuit in 65nm CMOS utilizing Approximate Computing

Authors: Bapi Kar, Pradeep Kumar Gopalakrishnan, Sumon Kumar Bose, Mohendra Roy, Arindam Basu

Abstract: In this paper, we present a low-power anomaly detection integrated circuit (ADIC) based on a one-class classifier (OCC) neural network. The ADIC achieves low-power operation through a combination of (a) careful choice of algorithm for online learning and (b) approximate computing techniques to lower average energy. In particular, online pseudoinverse update method (OPIUM) is used to train a random… ▽ More In this paper, we present a low-power anomaly detection integrated circuit (ADIC) based on a one-class classifier (OCC) neural network. The ADIC achieves low-power operation through a combination of (a) careful choice of algorithm for online learning and (b) approximate computing techniques to lower average energy. In particular, online pseudoinverse update method (OPIUM) is used to train a randomized neural network for quick and resource efficient learning. An additional 42% energy saving can be achieved when a lighter version of OPIUM method is used for training with the same number of data samples lead to no significant compromise on the quality of inference. Instead of a single classifier with large number of neurons, an ensemble of K base learner approach is chosen to reduce learning memory by a factor of K. This also enables approximate computing by dynamically varying the neural network size based on anomaly detection. Fabricated in 65nm CMOS, the ADIC has K = 7 Base Learners (BL) with 32 neurons in each BL and dissipates 11.87pJ/OP and 3.35pJ/OP during learning and inference respectively at Vdd = 0.75V when all 7 BLs are enabled. Further, evaluated on the NASA bearing dataset, approximately 80% of the chip can be shut down for 99% of the lifetime leading to an energy efficiency of 0.48pJ/OP, an 18.5 times reduction over full-precision computing running at Vdd = 1.2V throughout the lifetime. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: 12

arXiv:2007.06552 [pdf, other]

Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization

Authors: Blair Bilodeau, Jeffrey Negrea, Daniel M. Roy

Abstract: We consider prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. This semi-adversarial setting includes (at the extremes) the classical i.i.d. setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions. The Hedge… ▽ More We consider prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. This semi-adversarial setting includes (at the extremes) the classical i.i.d. setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions. The Hedge algorithm -- long known to be minimax (rate) optimal in the adversarial regime -- was recently shown to be simultaneously minimax optimal for i.i.d. data. In this work, we propose to relax the i.i.d. assumption by seeking adaptivity at all levels of a natural ordering on constraint sets. We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels. We achieve this optimal adaptivity using the follow-the-regularized-leader (FTRL) framework, with a novel adaptive regularization scheme that implicitly scales as the square root of the entropy of the current predictive distribution, rather than the entropy of the initial predictive distribution. Finally, we provide novel technical tools to study the statistical performance of FTRL along the semi-adversarial spectrum. △ Less

Submitted 21 July, 2022; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: 71 pages, 3 figures. Blair Bilodeau and Jeffrey Negrea are equal-contribution authors; order was determined randomly

arXiv:2007.01160 [pdf, ps, other]

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance

Authors: Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

Abstract: We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts. We obtain tight bounds on the minimax regret via a new approach that exploits the self-concordance property of the logarithmic loss. We show that for any expert class with (sequential) metric entropy $\mathcal{O}(γ^{-p})$ at… ▽ More We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts. We obtain tight bounds on the minimax regret via a new approach that exploits the self-concordance property of the logarithmic loss. We show that for any expert class with (sequential) metric entropy $\mathcal{O}(γ^{-p})$ at scale $γ$, the minimax regret is $\mathcal{O}(n^{p/(p+1)})$, and that this rate cannot be improved without additional assumptions on the expert class under consideration. As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts. △ Less

Submitted 3 August, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: 25 pages

Journal ref: Proceedings of the 37th International Conference on Machine Learning, ICML 2020

Showing 1–50 of 104 results for author: Roy, M