Search | arXiv e-print repository

Product and Ratio of Two $α-κ-μ$ Shadowed Random Variables and its Application to Wireless Communication

Authors: Shashank Shekhar, Sheetal Kalyani

Abstract: This work studies the product and ratio statistics of independent and non-identically distributed (i.n.i.d) $ α-κ- μ$ shadowed random variables. We derive the series expression for the probability density function (PDF), cumulative distribution function (CDF), and moment generating function (MGF) of the product and ratio of i.n.i.d $ α- κ- μ$ shadowed random variables. We then give the single inte… ▽ More This work studies the product and ratio statistics of independent and non-identically distributed (i.n.i.d) $ α-κ- μ$ shadowed random variables. We derive the series expression for the probability density function (PDF), cumulative distribution function (CDF), and moment generating function (MGF) of the product and ratio of i.n.i.d $ α- κ- μ$ shadowed random variables. We then give the single integral representation for the derived PDF expressions. Further, as application examples, 1) outage probability has been derived for cascaded wireless systems, and 2) physical-layer security metrics like secrecy outage probability and strictly positive secrecy capacity are derived for the classic three-node model with $α-κ-μ$ shadowed fading. Next, we discuss an intelligent reflecting surface-assisted communication system over $α-κ-μ$ shadowed fading. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2203.15760

arXiv:2407.02536 [pdf, other]

doi 10.4230/LIPIcs.GIScience.2023.3

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors: Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Abstract: Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob… ▽ More Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. △ Less

Submitted 1 July, 2024; originally announced July 2024.

ACM Class: E.m; F.2; E.1; H.3; I.5; J.0

arXiv:2407.00890 [pdf, other]

Macroeconomic Forecasting with Large Language Models

Authors: Andrea Carriero, Davide Pettenuzzo, Shubhranshu Shekhar

Abstract: This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time seri… ▽ More This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time series data compared to conventional methods remains an area of interest. To address this, we conduct a rigorous evaluation of LLMs against traditional macro forecasting methods, using as common ground the FRED-MD database. Our findings provide valuable insights into the strengths and limitations of LLMs in forecasting macroeconomic time series, shedding light on their applicability in real-world scenarios △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00317 [pdf, other]

Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Authors: Subhankar Ghosh, Arun Sharma, Jayant Gupta, Shashi Shekhar

Abstract: Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across… ▽ More Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach. △ Less

Submitted 4 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted in The 16th Conference on Spatial Information Theory (COSIT) 2024

ACM Class: E.m; H.3.3; I.5; J.4; J.4

arXiv:2406.04886 [pdf, other]

Seeing the Unseen: Visual Metaphor Captioning for Videos

Authors: Abisek Rajakumar Kalarani, Pushpak Bhattacharyya, Sumit Shekhar

Abstract: Metaphors are a common communication tool used in our day-to-day life. The detection and generation of metaphors in textual form have been studied extensively but metaphors in other forms have been under-explored. Recent studies have shown that Vision-Language (VL) models cannot understand visual metaphors in memes and adverts. As of now, no probing studies have been done that involve complex lang… ▽ More Metaphors are a common communication tool used in our day-to-day life. The detection and generation of metaphors in textual form have been studied extensively but metaphors in other forms have been under-explored. Recent studies have shown that Vision-Language (VL) models cannot understand visual metaphors in memes and adverts. As of now, no probing studies have been done that involve complex language phenomena like metaphors with videos. Hence, we introduce a new VL task of describing the metaphors present in the videos in our work. To facilitate this novel task, we construct and release a manually created dataset with 705 videos and 2115 human-written captions, along with a new metric called Average Concept Distance (ACD), to automatically evaluate the creativity of the metaphors generated. We also propose a novel low-resource video metaphor captioning system: GIT-LLaVA, which obtains comparable performance to SoTA video language models on the proposed task. We perform a comprehensive analysis of existing video language models on this task and publish our dataset, models, and benchmark results to enable further research. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.17839 [pdf, other]

PeerFL: A Simulator for Peer-to-Peer Federated Learning at Scale

Authors: Alka Luqman, Shivanshu Shekhar, Anupam Chattopadhyay

Abstract: This work integrates peer-to-peer federated learning tools with NS3, a widely used network simulator, to create a novel simulator designed to allow heterogeneous device experiments in federated learning. This cross-platform adaptability addresses a critical gap in existing simulation tools, enhancing the overall utility and user experience. NS3 is leveraged to simulate WiFi dynamics to facilitate… ▽ More This work integrates peer-to-peer federated learning tools with NS3, a widely used network simulator, to create a novel simulator designed to allow heterogeneous device experiments in federated learning. This cross-platform adaptability addresses a critical gap in existing simulation tools, enhancing the overall utility and user experience. NS3 is leveraged to simulate WiFi dynamics to facilitate federated learning experiments with participants that move around physically during training, leading to dynamic network characteristics. Our experiments showcase the simulator's efficiency in computational resource utilization at scale, with a maximum of 450 heterogeneous devices modelled as participants in federated learning. This positions it as a valuable tool for simulation-based investigations in peer-to-peer federated learning. The framework is open source and available for use and extension to the community. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2403.06268 [pdf, other]

Physics-Guided Abnormal Trajectory Gap Detection

Authors: Arun Sharma, Shashi Shekhar

Abstract: Given trajectories with gaps (i.e., missing data), we investigate algorithms to identify abnormal gaps in trajectories which occur when a given moving object did not report its location, but other moving objects in the same geographic region periodically did. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security… ▽ More Given trajectories with gaps (i.e., missing data), we investigate algorithms to identify abnormal gaps in trajectories which occur when a given moving object did not report its location, but other moving objects in the same geographic region periodically did. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security concerns such as illegal fishing, illegal oil transfers, and trans-shipments. The problem is challenging due to the difficulty of bounding the possible locations of the moving object during a trajectory gap, and the very high computational cost of detecting gaps in such a large volume of location data. The current literature on anomalous trajectory detection assumes linear interpolation within gaps, which may not be able to detect abnormal gaps since objects within a given region may have traveled away from their shortest path. In preliminary work, we introduced an abnormal gap measure that uses a classical space-time prism model to bound an object's possible movement during the trajectory gap and provided a scalable memoized gap detection algorithm (Memo-AGD). In this paper, we propose a Space Time-Aware Gap Detection (STAGD) approach to leverage space-time indexing and merging of trajectory gaps. We also incorporate a Dynamic Region Merge-based (DRM) approach to efficiently compute gap abnormality scores. We provide theoretical proofs that both algorithms are correct and complete and also provide analysis of asymptotic time complexity. Experimental results on synthetic and real-world maritime trajectory data show that the proposed approach substantially improves computation time over the baseline technique. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2402.14974 [pdf, other]

Towards Spatially-Lucid AI Classification in Non-Euclidean Space: An Application for MxIF Oncology Data

Authors: Majid Farhadloo, Arun Sharma, Jayant Gupta, Alexey Leontovich, Svetomir N. Markovic, Shashi Shekhar

Abstract: Given multi-category point sets from different place-types, our goal is to develop a spatially-lucid classifier that can distinguish between two classes based on the arrangements of their points. This problem is important for many applications, such as oncology, for analyzing immune-tumor relationships and designing new immunotherapies. It is challenging due to spatial variability and interpretabi… ▽ More Given multi-category point sets from different place-types, our goal is to develop a spatially-lucid classifier that can distinguish between two classes based on the arrangements of their points. This problem is important for many applications, such as oncology, for analyzing immune-tumor relationships and designing new immunotherapies. It is challenging due to spatial variability and interpretability needs. Previously proposed techniques require dense training data or have limited ability to handle significant spatial variability within a single place-type. Most importantly, these deep neural network (DNN) approaches are not designed to work in non-Euclidean space, particularly point sets. Existing non-Euclidean DNN methods are limited to one-size-fits-all approaches. We explore a spatial ensemble framework that explicitly uses different training strategies, including weighted-distance learning rate and spatial domain adaptation, on various place-types for spatially-lucid classification. Experimental results on real-world datasets (e.g., MxIF oncology data) show that the proposed framework provides higher prediction accuracy than baseline methods. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: SIAM International Conference on Data Mining (SDM24)

arXiv:2402.10971 [pdf, other]

Enabling data-driven and bidirectional model development in Verilog-A for photonic devices

Authors: Dias Azhigulov, Zeqin Lu, James Pond, Lukas Chrostowski, Sudip Shekhar

Abstract: We present a method to model photonic components in Verilog-A by introducing bidirectional signaling through a single port. To achieve this, the concept of power waves and scattering parameters from electromagnetism are employed. As a consequence, one can simultaneously transmit forward and backward propagating waves on a single wire while also capturing realistic, measurement-backed response of p… ▽ More We present a method to model photonic components in Verilog-A by introducing bidirectional signaling through a single port. To achieve this, the concept of power waves and scattering parameters from electromagnetism are employed. As a consequence, one can simultaneously transmit forward and backward propagating waves on a single wire while also capturing realistic, measurement-backed response of photonic components in Verilog-A. We demonstrate examples to show the efficacy of the proposed technique in accounting for critical effects in photonic integrated circuits such as Fabry-Perot cavity resonance, reflections to lasers, etc. Our solution makes electronic-photonic co-simulation more intuitive and accurate. △ Less

Submitted 3 July, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.01742 [pdf, other]

Towards Optimizing the Costs of LLM Usage

Authors: Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, Nishanth Kotla

Abstract: Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. I… ▽ More Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration. △ Less

Submitted 29 January, 2024; originally announced February 2024.

Comments: 8 pages + Appendix, Total 12 pages

arXiv:2401.16515 [pdf, other]

Dynamic Electro-Optic Analog Memory for Neuromorphic Photonic Computing

Authors: Sean Lam, Ahmed Khaled, Simon Bilodeau, Bicky A. Marquez, Paul R. Prucnal, Lukas Chrostowski, Bhavin J. Shastri, Sudip Shekhar

Abstract: Artificial intelligence (AI) has seen remarkable advancements across various domains, including natural language processing, computer vision, autonomous vehicles, and biology. However, the rapid expansion of AI technologies has escalated the demand for more powerful computing resources. As digital computing approaches fundamental limits, neuromorphic photonics emerges as a promising platform to co… ▽ More Artificial intelligence (AI) has seen remarkable advancements across various domains, including natural language processing, computer vision, autonomous vehicles, and biology. However, the rapid expansion of AI technologies has escalated the demand for more powerful computing resources. As digital computing approaches fundamental limits, neuromorphic photonics emerges as a promising platform to complement existing digital systems. In neuromorphic photonic computing, photonic devices are controlled using analog signals. This necessitates the use of digital-to-analog converters (DAC) and analog-to-digital converters (ADC) for interfacing with these devices during inference and training. However, data movement between memory and these converters in conventional von Neumann computing architectures consumes energy. To address this, analog memory co-located with photonic computing devices is proposed. This approach aims to reduce the reliance on DACs and ADCs and minimize data movement to enhance compute efficiency. This paper demonstrates a monolithically integrated neuromorphic photonic circuit with co-located capacitive analog memory and compares various analog memory technologies for neuromorphic photonic computing using the MNIST dataset as a benchmark. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 22 pages, 10 figures

arXiv:2311.16896 [pdf, other]

65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics

Authors: Zhongjin Lin, Bhavin J. Shastri, Shangxuan Yu, Jingxiang Song, Yuntao Zhu, Arman Safarnejadian, Wangning Cai, Yanmei Lin, Wei Ke, Mustafa Hammood, Tianye Wang, Mengyue Xu, Zibo Zheng, Mohammed Al-Qadasi, Omid Esmaeeli, Mohamed Rahim, Grzegorz Pakulski, Jens Schmid, Pedro Barrios, Weihong Jiang, Hugh Morison, Matthew Mitchell, Xiaogang Qiang, Xun Guan, Nicolas A. F. Jaeger , et al. (6 additional authors not shown)

Abstract: Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III… ▽ More Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, can implement an entire layer of a neural network. It can execute 65 billion operations per second (GOPS) per neuron, including simultaneous weight updates-a hitherto unachieved speed. Our processor stands out from conventional photonic processors, which have static weights set during training, as it supports fast "hardware-in-the-loop" training, and can dynamically adjust the inputs (fan-in) and outputs (fan-out) within a layer, thereby enhancing its versatility. Our processor can perform large-scale dot-product operations with vector dimensions up to 131,072. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112*112-pixel images after "hardware-in-the-loop" training. To handle "hardware-in-the-loop" training for clustering AI tasks, we provide a solution for multiplications involving two negative numbers based on our processor. △ Less

Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 19 pages, 6 figures

MSC Class: 78A05

arXiv:2311.12825 [pdf, ps, other]

A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data

Authors: Shashank Shekhar, Asif Salim, Adesh Bansode, Vivaswan Jinturkar, Anirudha Nayak

Abstract: Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based o… ▽ More Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based on particle swarm optimization (PSO). We propose a simple objective function for the optimization of the instance-centric CF generation problem. The PSO brings in a lot of flexibility in terms of carrying out multi-objective optimization in large dimensions, capability for multiple CF generation, and setting box constraints or immutability of data attributes. An algorithm is proposed that incorporates these features and it enables greater control over the proximity and sparsity properties over the generated CFs. The proposed algorithm is evaluated with a set of action-ability metrics in real-world datasets, and the results were superior compared to that of the state-of-the-arts. △ Less

Submitted 30 November, 2023; v1 submitted 30 September, 2023; originally announced November 2023.

Comments: Accepted in IEEE CSDE 2023

arXiv:2310.19384 [pdf, other]

Deep anytime-valid hypothesis testing

Authors: Teodora Pandeva, Patrick Forré, Aaditya Ramdas, Shubhanshu Shekhar

Abstract: We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing,… ▽ More We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing, and conditional-independence testing, as well as modern problems, such as testing for adversarial robustness of machine learning (ML) models. Our proposed framework has the following advantages over classical batch tests: 1) it continuously monitors online data streams and efficiently aggregates evidence against the null, 2) it provides tight control over the type I error without the need for multiple testing correction, 3) it adapts the sample size requirement to the unknown hardness of the problem. We develop a principled approach of leveraging the representation capability of ML models within the testing-by-betting framework, a game-theoretic approach for designing sequential tests. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines on several tasks. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.15179 [pdf, other]

Reducing Uncertainty in Sea-level Rise Prediction: A Spatial-variability-aware Approach

Authors: Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian

Abstract: Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such a… ▽ More Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tipping points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or deep learning to combine different climate projections. Such approaches are inadequate when different regions require different weighting schemes for accurate and reliable sea-level rise predictions. This paper proposes a zonal regression model which addresses spatial variability and model inter-dependency. Experimental results show more reliable predictions using the weights learned via this approach on a regional scale. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 6 pages, 5 figures, I-GUIDE 2023 conference

ACM Class: J.2; I.2.m; I.2.6; I.2.1; I.2

arXiv:2310.01547 [pdf, other]

On the near-optimality of betting confidence sets for bounded means

Authors: Shubhanshu Shekhar, Aaditya Ramdas

Abstract: Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-base… ▽ More Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs. Our main contributions are as follows: (i) We first compare CIs using the values of their first-order asymptotic widths (scaled by $\sqrt{n}$), and show that the betting CI of Waudby-Smith and Ramdas (2023) has a smaller limiting width than existing empirical Bernstein (EB)-CIs. (ii) Next, we establish two lower bounds that characterize the minimum width achievable by any method for constructing CIs/CSs in terms of certain inverse information projections. (iii) Finally, we show that the betting CI and CS match the fundamental limits, modulo an additive logarithmic term and a multiplicative constant. Overall these results imply that the betting CI~(and CS) admit stronger theoretical guarantees than the existing state-of-the-art EB-CI~(and CS); both in the asymptotic and finite-sample regimes. △ Less

Submitted 24 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 53 pages, 2 figures

arXiv:2309.09111 [pdf, ps, other]

Reducing sequential change detection to sequential estimation

Authors: Shubhanshu Shekhar, Aaditya Ramdas

Abstract: We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $θ$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimati… ▽ More We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $θ$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimation using confidence sequences: we begin a new $(1-α)$-confidence sequence at each time step, and proclaim a change when the intersection of all active confidence sequences becomes empty. We prove that the average run length is at least $1/α$, resulting in a change detection scheme with minimal structural assumptions~(thus allowing for possibly dependent observations, and nonparametric distribution classes), but strong guarantees. Our approach bears an interesting parallel with the reduction from change detection to sequential testing of Lorden (1971) and the e-detector of Shin et al. (2022). △ Less

Submitted 24 November, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: 11 pages

arXiv:2308.03977 [pdf, other]

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

Authors: Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos

Abstract: Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite… ▽ More Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models. △ Less

Submitted 12 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.03360 [pdf]

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Authors: Shivani Shekhar, Simran Tiwari, T. C. Rensink, Ramy Eskander, Wael Salloum

Abstract: The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reaso… ▽ More The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reasoning paradigms that allow for comprehensive understanding of medical variables. In this work, we examine the power of coupling symbolic reasoning with language modeling toward improved understanding of unstructured clinical texts. We show that such a combination improves the extraction of several medical variables from unstructured records. In addition, we show that the state-of-the-art commercially-free LLMs enjoy retrieval capabilities comparable to those provided by their commercial counterparts. Finally, we elaborate on the need for LLM steering through the application of symbolic reasoning as the exclusive use of LLMs results in the lowest performance. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.01401 [pdf, other]

Multi-Task Learning Improves Performance In Deep Argument Mining Models

Authors: Amirhossein Farzam, Shashank Shekhar, Isaac Mehlhaff, Marco Morucci

Abstract: The successful analysis of argumentative techniques from user-generated text is central to many downstream tasks such as political and market analysis. Recent argument mining tools use state-of-the-art deep learning methods to extract and annotate argumentative techniques from various online text corpora, however each task is treated as separate and different bespoke models are fine-tuned for each… ▽ More The successful analysis of argumentative techniques from user-generated text is central to many downstream tasks such as political and market analysis. Recent argument mining tools use state-of-the-art deep learning methods to extract and annotate argumentative techniques from various online text corpora, however each task is treated as separate and different bespoke models are fine-tuned for each dataset. We show that different argument mining tasks share common semantic and logical structure by implementing a multi-task approach to argument mining that achieves better performance than state-of-the-art methods for the same problems. Our model builds a shared representation of the input text that is common to all tasks and exploits similarities between tasks in order to further boost performance via parameter-sharing. Our results are important for argument mining as they show that different tasks share substantial similarities and suggest a holistic approach to the extraction of argumentative techniques from text. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.00931 [pdf, other]

"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning

Authors: Abisek Rajakumar Kalarani, Pushpak Bhattacharyya, Niyati Chhaya, Sumit Shekhar

Abstract: Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted im… ▽ More Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted image captioning where the caption is generated based on both the image and its context. Our approach aims to overcome the context-independent (image and text are treated independently) nature of the existing approaches. We exploit context by pretraining our model with datasets of three tasks: news image captioning where the news article is the context, contextual visual entailment, and keyword extraction from the context. The second pretraining task is a new VL task, and we construct and release two datasets for the task with 1.1M and 2.2K data instances. Our system achieves state-of-the-art results with an improvement of up to 8.34 CIDEr score on the benchmark news image captioning datasets. To the best of our knowledge, ours is the first effort at incorporating contextual information in pretraining the models for the VL tasks. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.09675 [pdf, other]

Spatial Computing Opportunities in Biomedical Decision Support: The Atlas-EHR Vision

Authors: Majid Farhadloo, Arun Sharma, Shashi Shekhar, Svetomir N. Markovic

Abstract: We consider the problem of reducing the time needed by healthcare professionals to understand patient medical history via the next generation of biomedical decision support. This problem is societally important because it has the potential to improve healthcare quality and patient outcomes. However, navigating electronic health records is challenging due to the high patient-doctor ratios, potentia… ▽ More We consider the problem of reducing the time needed by healthcare professionals to understand patient medical history via the next generation of biomedical decision support. This problem is societally important because it has the potential to improve healthcare quality and patient outcomes. However, navigating electronic health records is challenging due to the high patient-doctor ratios, potentially long medical histories, the urgency of treatment for some medical conditions, and patient variability. The current electronic health record systems provides only a longitudinal view of patient medical history, which is time-consuming to browse, and doctors often need to engage nurses, residents, and others for initial analysis. To overcome this limitation, we envision an alternative spatial representation of patients' histories (e.g., electronic health records (EHRs)) and other biomedical data in the form of Atlas-EHR. Just like Google Maps allows a global, national, regional, and local view, the Atlas-EHR may start with an overview of the patient's anatomy and history before drilling down to spatially anatomical sub-systems, their individual components, or sub-components. Atlas-EHR presents a compelling opportunity for spatial computing since healthcare is almost a fifth of the US economy. However, the traditional spatial computing designed for geographic use cases (e.g., navigation, land-surveys, mapping) faces many hurdles in the biomedical domain. This paper presents a number of open research questions under this theme in five broad areas of spatial computing. △ Less

Submitted 28 February, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2305.06884 [pdf, ps, other]

Risk-limiting Financial Audits via Weighted Sampling without Replacement

Authors: Shubhanshu Shekhar, Ziyu Xu, Zachary C. Lipton, Pierre J. Liang, Aaditya Ramdas

Abstract: We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sa… ▽ More We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sampling scheme. Using the idea of importance weighting to construct test martingales, we first develop a framework to construct CSs for arbitrary sampling strategies. Next, we develop methods to improve the quality of CSs by incorporating side information about the unknown values associated with each item. We show that when the side information is sufficiently predictive, it can directly drive the sampling. Addressing the case where the accuracy is unknown a priori, we introduce a method that incorporates side information via control variates. Crucially, our construction is adaptive: if the side information is highly predictive of the unknown misstated amounts, then the benefits of incorporating it are significant; but if the side information is uncorrelated, our methods learn to ignore it. Our methods recover state-of-the-art bounds for the special case when the weights are equal, which has already found applications in election auditing. The harder weighted case solves our more challenging problem of AI-assisted financial auditing. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 23 pages, 8 figures, to appear in the Proceedings of Uncertainty in Artificial Intelligence (UAI) 2023

arXiv:2304.13807 [pdf, other]

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

Authors: Hyeonjung, Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar

Abstract: Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and tran… ▽ More Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation. △ Less

Submitted 19 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Under review for ACM Computing Surveys journal. 29 pages

arXiv:2304.13089 [pdf, other]

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Authors: Shashank Shekhar, Florian Bordes, Pascal Vincent, Ari Morcos

Abstract: Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned… ▽ More Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.12210 [pdf, other]

A Cookbook of Self-Supervised Learning

Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be. △ Less

Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2302.14757 [pdf, other]

Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

Authors: Prachi Singh, Srikrishna Karanam, Sumit Shekhar

Abstract: We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e.g., birthday/greeting cards. In addition to enhancing user experience, integrating audio that matches the theme/style of these inputs also helps improve the accessibility of these documents (e.g., visually impaired people can listen to… ▽ More We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e.g., birthday/greeting cards. In addition to enhancing user experience, integrating audio that matches the theme/style of these inputs also helps improve the accessibility of these documents (e.g., visually impaired people can listen to the audio instead). While recent work in audio retrieval exists, these methods and datasets are targeted explicitly towards natural images. However, our problem considers multimodal design documents (created by users using creative software) substantially different from a naturally clicked photograph. To this end, our first contribution is collecting and curating a new large-scale dataset called Melodic-Design (or MELON), comprising design documents representing various styles, themes, templates, illustrations, etc., paired with music audio. Given our paired image-text-audio dataset, our next contribution is a novel multimodal cross-attention audio retrieval (MMCAR) algorithm that enables training neural networks to learn a common shared feature space across image, text, and audio dimensions. We use these learned features to demonstrate that our method outperforms existing state-of-the-art methods and produce a new reference benchmark for the research community on our new dataset. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 5 pages including references

arXiv:2302.02544 [pdf, other]

Sequential change detection via backward confidence sequences

Authors: Shubhanshu Shekhar, Aaditya Ramdas

Abstract: We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $θ$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $θ$, then we can also successfully perform SCD for $θ$. This is accomplished by checking if two CSs… ▽ More We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $θ$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $θ$, then we can also successfully perform SCD for $θ$. This is accomplished by checking if two CSs -- one forwards and the other backwards -- ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: 24 pages, 10 figures

arXiv:2301.05739 [pdf, other]

Eco-PiNN: A Physics-informed Neural Network for Eco-toll Estimation

Authors: Yan Li, Mingzhou Yang, Matthew Eagon, Majid Farhadloo, Yiqun Xie, William F. Northrop, Shashi Shekhar

Abstract: The eco-toll estimation problem quantifies the expected environmental cost (e.g., energy consumption, exhaust emissions) for a vehicle to travel along a path. This problem is important for societal applications such as eco-routing, which aims to find paths with the lowest exhaust emissions or energy need. The challenges of this problem are three-fold: (1) the dependence of a vehicle's eco-toll on… ▽ More The eco-toll estimation problem quantifies the expected environmental cost (e.g., energy consumption, exhaust emissions) for a vehicle to travel along a path. This problem is important for societal applications such as eco-routing, which aims to find paths with the lowest exhaust emissions or energy need. The challenges of this problem are three-fold: (1) the dependence of a vehicle's eco-toll on its physical parameters; (2) the lack of access to data with eco-toll information; and (3) the influence of contextual information (i.e. the connections of adjacent segments in the path) on the eco-toll of road segments. Prior work on eco-toll estimation has mostly relied on pure data-driven approaches and has high estimation errors given the limited training data. To address these limitations, we propose a novel Eco-toll estimation Physics-informed Neural Network framework (Eco-PiNN) using three novel ideas, namely, (1) a physics-informed decoder that integrates the physical laws of the vehicle engine into the network, (2) an attention-based contextual information encoder, and (3) a physics-informed regularization to reduce overfitting. Experiments on real-world heavy-duty truck data show that the proposed method can greatly improve the accuracy of eco-toll estimation compared with state-of-the-art methods. △ Less

Submitted 18 January, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: Full version of the paper accepted for the SDM23 conference; Yan Li and Mingzhou Yang contributed equally to this paper

arXiv:2301.00750 [pdf, other]

Interactive Control over Temporal Consistency while Stylizing Video Streams

Authors: Sumit Shekhar, Max Reimann, Moritz Hilscher, Amir Semmo, Jürgen Döllner, Matthias Trapp

Abstract: Image stylization has seen significant advancement and widespread interest over the years, leading to the development of a multitude of techniques. Extending these stylization techniques, such as Neural Style Transfer (NST), to videos is often achieved by applying them on a per-frame basis. However, per-frame stylization usually lacks temporal consistency, expressed by undesirable flickering artif… ▽ More Image stylization has seen significant advancement and widespread interest over the years, leading to the development of a multitude of techniques. Extending these stylization techniques, such as Neural Style Transfer (NST), to videos is often achieved by applying them on a per-frame basis. However, per-frame stylization usually lacks temporal consistency, expressed by undesirable flickering artifacts. Most of the existing approaches for enforcing temporal consistency suffer from one or more of the following drawbacks: They (1) are only suitable for a limited range of techniques, (2) do not support online processing as they require the complete video as input, (3) cannot provide consistency for the task of stylization, or (4) do not provide interactive consistency control. Domain-agnostic techniques for temporal consistency aim to eradicate flickering completely but typically disregard aesthetic aspects. For stylization tasks, however, consistency control is an essential requirement as a certain amount of flickering adds to the artistic look and feel. Moreover, making this control interactive is paramount from a usability perspective. To achieve the above requirements, we propose an approach that stylizes video streams in real-time at full HD resolutions while providing interactive consistency control. We develop a lite optical-flow network that operates at 80 FPS on desktop systems with sufficient accuracy. Further, we employ an adaptive combination of local and global consistency features and enable interactive selection between them. Objective and subjective evaluations demonstrate that our method is superior to state-of-the-art video consistency approaches. △ Less

Submitted 29 June, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

arXiv:2301.00270 [pdf, other]

NetEffect: Discovery and Exploitation of Generalized Network Effects

Authors: Meng-Chieh Lee, Shubhranshu Shekhar, Jaemin Yoo, Christos Faloutsos

Abstract: Given a large graph with few node labels, how can we (a) identify whether there is generalized network-effects (GNE) or not, (b) estimate GNE to explain the interrelations among node classes, and (c) exploit GNE efficiently to improve the performance on downstream tasks? The knowledge of GNE is valuable for various tasks like node classification, and targeted advertising. However, identifying GNE… ▽ More Given a large graph with few node labels, how can we (a) identify whether there is generalized network-effects (GNE) or not, (b) estimate GNE to explain the interrelations among node classes, and (c) exploit GNE efficiently to improve the performance on downstream tasks? The knowledge of GNE is valuable for various tasks like node classification, and targeted advertising. However, identifying GNE such as homophily, heterophily or their combination is challenging in real-world graphs due to limited availability of node labels and noisy edges. We propose NetEffect, a graph mining approach to address the above issues, enjoying the following properties: (i) Principled: a statistical test to determine the presence of GNE in a graph with few node labels; (ii) General and Explainable: a closed-form solution to estimate the specific type of GNE observed; and (iii) Accurate and Scalable: the integration of GNE for accurate and fast node classification. Applied on real-world graphs, NetEffect discovers the unexpected absence of GNE in numerous graphs, which were recognized to exhibit heterophily. Further, we show that incorporating GNE is effective on node classification. On a million-scale real-world graph, NetEffect achieves over 7 times speedup (14 minutes vs. 2 hours) compared to most competitors. △ Less

Submitted 12 February, 2024; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: Accepted to PAKDD 2024

arXiv:2212.09108 [pdf, ps, other]

A Permutation-Free Kernel Independence Test

Authors: Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas

Abstract: In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of… ▽ More In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck. △ Less

Submitted 18 December, 2022; originally announced December 2022.

Comments: 52 pages, 4 figures

arXiv:2212.04617 [pdf, other]

UNet Based Pipeline for Lung Segmentation from Chest X-Ray Images

Authors: Shashank Shekhar, Ritika Nandi, H Srikanth Kamath

Abstract: Biomedical image segmentation is one of the fastest growing fields which has seen extensive automation through the use of Artificial Intelligence. This has enabled widespread adoption of accurate techniques to expedite the screening and diagnostic processes which would otherwise take several days to finalize. In this paper, we present an end-to-end pipeline to segment lungs from chest X-ray images… ▽ More Biomedical image segmentation is one of the fastest growing fields which has seen extensive automation through the use of Artificial Intelligence. This has enabled widespread adoption of accurate techniques to expedite the screening and diagnostic processes which would otherwise take several days to finalize. In this paper, we present an end-to-end pipeline to segment lungs from chest X-ray images, training the neural network model on the Japanese Society of Radiological Technology (JSRT) dataset, using UNet to enable faster processing of initial screening for various lung disorders. The pipeline developed can be readily used by medical centers with just the provision of X-Ray images as input. The model will perform the preprocessing, and provide a segmented image as the final output. It is expected that this will drastically reduce the manual effort involved and lead to greater accessibility in resource-constrained locations. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 6 Pages

arXiv:2211.14908 [pdf, other]

A Permutation-free Kernel Two-Sample Test

Authors: Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas

Abstract: The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$α$ test, one usually selects the rejection threshold as the $(1-α)$-quantile of the perm… ▽ More The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$α$ test, one usually selects the rejection threshold as the $(1-α)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power. △ Less

Submitted 4 February, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), with an oral presentation

arXiv:2211.03317 [pdf, ps, other]

Instantaneous Channel Oblivious Phase Shift Design for an IRS-Assisted SIMO System with Quantized Phase Shift

Authors: Shashank Shekhar, Athira Subhash, Tejesh Kella, Sheetal Kalyani

Abstract: We design the phase shifts of an intelligent reflecting surface (IRS)-assisted single-input-multiple-output communication system to minimize the outage probability (OP) and to maximize the ergodic rate. Our phase shifts design uses only statistical channel state information since these depend only on the large-scale fading coefficients; the obtained phase shift design remains valid for a longer ti… ▽ More We design the phase shifts of an intelligent reflecting surface (IRS)-assisted single-input-multiple-output communication system to minimize the outage probability (OP) and to maximize the ergodic rate. Our phase shifts design uses only statistical channel state information since these depend only on the large-scale fading coefficients; the obtained phase shift design remains valid for a longer time frame. We further assume that one has access to only quantized phase values. The closed-form expressions for OP and ergodic rate are derived for the considered system. Next, two optimization problems are formulated to choose the phase shifts of IRS such that (i) OP is minimized and (ii) the ergodic rate is maximized. We used the multi-valued particle swarm optimization (MPSO) and particle swarm optimization (PSO) algorithms to solve the optimization problems. Numerical simulations are performed to study the impact of various parameters on the OP and ergodic rate. We also discuss signaling overhead between BS and IRS controller. It is shown that the overhead can be reduced up to $99.69 \%$ by using statistical CSI for phase shift design and $5$ bits to represent the phase shifts without significantly compromising on the performance. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.02927 [pdf, other]

Unsupervised Machine Learning for Explainable Health Care Fraud Detection

Authors: Shubhranshu Shekhar, Jetson Leder-Luis, Leman Akoglu

Abstract: The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify provide… ▽ More The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify providers that overbill Medicare, the US federal health insurance program for elderly adults and the disabled. Using large-scale Medicare claims data, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for Medicare fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and several case studies validate our approach and findings both quantitatively and qualitatively. △ Less

Submitted 23 February, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

Comments: NBER Working paper #30946

arXiv:2210.08879 [pdf, other]

Robust Planning for Human-Robot Joint Tasks with Explicit Reasoning on Human Mental State

Authors: Anthony Favier, Shashank Shekhar, Rachid Alami

Abstract: We consider the human-aware task planning problem where a human-robot team is given a shared task with a known objective to achieve. Recent approaches tackle it by modeling it as a team of independent, rational agents, where the robot plans for both agents' (shared) tasks. However, the robot knows that humans cannot be administered like artificial agents, so it emulates and predicts the human's de… ▽ More We consider the human-aware task planning problem where a human-robot team is given a shared task with a known objective to achieve. Recent approaches tackle it by modeling it as a team of independent, rational agents, where the robot plans for both agents' (shared) tasks. However, the robot knows that humans cannot be administered like artificial agents, so it emulates and predicts the human's decisions, actions, and reactions. Based on earlier approaches, we describe a novel approach to solve such problems, which models and uses execution-time observability conventions. Abstractly, this modeling is based on situation assessment, which helps our approach capture the evolution of individual agents' beliefs and anticipate belief divergences that arise in practice. It decides if and when belief alignment is needed and achieves it with communication. These changes improve the solver's performance: (a) communication is effectively used, and (b) robust for more realistic and challenging problems. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 10 pages, 2 figures, 1 table, AI-HRI AAAI 2022 Fall Symposium Series

Report number: AIHRI/2022/8188

arXiv:2210.04081 [pdf, other]

Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining

Authors: Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, Christos Faloutsos

Abstract: How can we solve semi-supervised node classification in various graphs possibly with noisy features and structures? Graph neural networks (GNNs) have succeeded in many graph mining tasks, but their generalizability to various graph scenarios is limited due to the difficulty of training, hyperparameter tuning, and the selection of a model itself. Einstein said that we should "make everything as sim… ▽ More How can we solve semi-supervised node classification in various graphs possibly with noisy features and structures? Graph neural networks (GNNs) have succeeded in many graph mining tasks, but their generalizability to various graph scenarios is limited due to the difficulty of training, hyperparameter tuning, and the selection of a model itself. Einstein said that we should "make everything as simple as possible, but not simpler." We rephrase it into the careful simplicity principle: a carefully-designed simple model can surpass sophisticated ones in real-world graphs. Based on the principle, we propose SlimG for semi-supervised node classification, which exhibits four desirable properties: It is (a) accurate, winning or tying on 10 out of 13 real-world datasets; (b) robust, being the only one that handles all scenarios of graph data (homophily, heterophily, random structure, noisy features, etc.); (c) fast and scalable, showing up to 18 times faster training in million-scale graphs; and (d) interpretable, thanks to the linearity and sparsity. We explain the success of SlimG through a systematic study of the designs of existing GNNs, sanity checks, and comprehensive ablation studies. △ Less

Submitted 16 June, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Accepted to KDD 2023

arXiv:2209.09207 [pdf, other]

Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method

Authors: Mrinal Haloi, Shashank Shekhar, Nikhil Fande, Siddhant Swaroop Dash, Sanjay G

Abstract: Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table det… ▽ More Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table detection with more than seven thousand samples containing a wide variety of table structures collected from many diverse sources. In addition to that, we also present baseline results using a convolutional neural network-based method to detect table structure in documents. Experimental results show the superiority of applying convolutional deep learning methods over classical computer vision-based methods. The introduction of this diverse table detection dataset will enable the community to develop high throughput deep learning methods for understanding document layout and tabular data processing. Dataset is available at: 1. https://www.kaggle.com/datasets/mrinalim/stdw-dataset 2. https://huggingface.co/datasets/n3011/STDW △ Less

Submitted 30 November, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

Comments: Open source Table detection dataset and baseline results

MSC Class: 68T45

arXiv:2208.03664 [pdf, ps, other]

SINR Analysis of an IRS Assisted MU-MISO System

Authors: Lakshmi Jayalal, Shashank Shekhar, Athira Subhash, Sheetal Kalyani

Abstract: In this work, we characterize the outage probability (OP) of an intelligent reflecting surface (IRS) assisted multi-user multiple-input-single-output (MU-MISO) communication system. Using a two-step approximation method, we approximate the signal-to-interference-plus-noise ratio (SINR) for any downlink user by a Log-Normal random variable. The impact of various system parameters is studied using t… ▽ More In this work, we characterize the outage probability (OP) of an intelligent reflecting surface (IRS) assisted multi-user multiple-input-single-output (MU-MISO) communication system. Using a two-step approximation method, we approximate the signal-to-interference-plus-noise ratio (SINR) for any downlink user by a Log-Normal random variable. The impact of various system parameters is studied using the closed-form expression of OP. It is concluded that the position of IRS has a critical role, but an appropriate increase in the number of IRS elements would help to compensate for the loss in performance if the position of IRS is suboptimal. △ Less

Submitted 7 August, 2022; originally announced August 2022.

arXiv:2207.07219 [pdf, other]

Software-defined Dynamic 5G Network Slice Management for Industrial Internet of Things

Authors: Ziran Min, Shashank Shekhar, Charif Mahmoudi, Valerio Formicola, Swapna Gokhale, Aniruddha Gokhale

Abstract: This paper addresses the challenges of delivering fine-grained Quality of Service (QoS) and communication determinism over 5G wireless networks for real-time and autonomous needs of Industrial Internet of Things (IIoT) applications while effectively sharing network resources. Specifically, this work presents DANSM, a software-defined, dynamic and autonomous network slice management middleware for… ▽ More This paper addresses the challenges of delivering fine-grained Quality of Service (QoS) and communication determinism over 5G wireless networks for real-time and autonomous needs of Industrial Internet of Things (IIoT) applications while effectively sharing network resources. Specifically, this work presents DANSM, a software-defined, dynamic and autonomous network slice management middleware for 5G-based IIoT use cases, such as adaptive robotic repair. Empirical studies evaluating DANSM on our testbed comprising a Free5GC-based core and UERANSIM-based simulations reveal that the software-defined DANSM solution can efficiently balance the traffic load in the data plane thereby reducing the end-to-end response time and improve the service performance by completing 34% more subtasks than a Modified Greedy Algorithm (MGA), 64% more subtasks than First Fit Descending (FFD) and 22% more subtasks than Best Fit Descending (BFD) approaches all while minimizing operational costs. △ Less

Submitted 11 November, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: 8 pages, 8 figures, conference

arXiv:2206.14486 [pdf, other]

Beyond neural scaling laws: beating power law scaling via data pruning

Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

Abstract: Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scal… ▽ More Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning. △ Less

Submitted 21 April, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric scores

arXiv:2206.12753 [pdf, other]

Spatiotemporal Data Mining: A Survey

Authors: Arun Sharma, Zhe Jiang, Shashi Shekhar

Abstract: Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mini… ▽ More Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2205.12840 [pdf, other]

SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection

Authors: Divya Kothandaraman, Sumit Shekhar, Abhilasha Sancheti, Manoj Ghuhan, Tripti Shukla, Dinesh Manocha

Abstract: We present a novel method, SALAD, for the challenging vision task of adapting a pre-trained "source" domain network to a "target" domain, with a small budget for annotation in the "target" domain and a shift in the label space. Further, the task assumes that the source data is not available for adaptation, due to privacy concerns or otherwise. We postulate that such systems need to jointly optimiz… ▽ More We present a novel method, SALAD, for the challenging vision task of adapting a pre-trained "source" domain network to a "target" domain, with a small budget for annotation in the "target" domain and a shift in the label space. Further, the task assumes that the source data is not available for adaptation, due to privacy concerns or otherwise. We postulate that such systems need to jointly optimize the dual task of (i) selecting fixed number of samples from the target domain for annotation and (ii) transfer of knowledge from the pre-trained network to the target domain. To do this, SALAD consists of a novel Guided Attention Transfer Network (GATN) and an active learning function, HAL. The GATN enables feature distillation from pre-trained network to the target network, complemented with the target samples mined by HAL using transfer-ability and uncertainty criteria. SALAD has three key benefits: (i) it is task-agnostic, and can be applied across various visual tasks such as classification, segmentation and detection; (ii) it can handle shifts in output label space from the pre-trained source network to the target domain; (iii) it does not require access to source data for adaptation. We conduct extensive experiments across 3 visual tasks, viz. digits classification (MNIST, SVHN, VISDA), synthetic (GTA5) to real (CityScapes) image segmentation, and document layout detection (PubLayNet to DSSE). We show that our source-free approach, SALAD, results in an improvement of 0.5%-31.3%(across datasets and tasks) over prior adaptation methods that assume access to large amounts of annotated source data for adaptation. △ Less

Submitted 22 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2203.15760 [pdf, ps, other]

A New Expression for the Product of Two $κ-μ$ Shadowed Random Variables and its Application to Wireless Communication

Authors: Shashank Shekhar, Sheetal Kalyani

Abstract: In this work, the product of two independent and non-identically distributed (i.n.i.d) $κ- μ$ shadowed random variables is studied. We derive the series expression for the probability density function (PDF), cumulative distribution function (CDF), and moment generating function (MGF) of the product of two (i.n.i.d) $κ- μ$ shadowed random variables. The derived formulation in this work is quite gen… ▽ More In this work, the product of two independent and non-identically distributed (i.n.i.d) $κ- μ$ shadowed random variables is studied. We derive the series expression for the probability density function (PDF), cumulative distribution function (CDF), and moment generating function (MGF) of the product of two (i.n.i.d) $κ- μ$ shadowed random variables. The derived formulation in this work is quite general as they incorporate most of the typically used fading channels. As an application example, outage probability (OP) has been derived for cascaded wireless systems and relay-assisted communications with a variable gain relay. Extensive Monte-Carlo simulations have also been carried out. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.06297 [pdf, other]

Instance-Dependent Regret Analysis of Kernelized Bandits

Authors: Shubhanshu Shekhar, Tara Javidi

Abstract: We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$ with a norm bounded by $M<\infty$ in a Reproducing Kernel Hilbert Space~(RKHS) associated with a positive definite kernel $K$. Prior results, working in a \emph{minimax framework}, have characterized the wo… ▽ More We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$ with a norm bounded by $M<\infty$ in a Reproducing Kernel Hilbert Space~(RKHS) associated with a positive definite kernel $K$. Prior results, working in a \emph{minimax framework}, have characterized the worst-case~(over all functions in the problem class) limits on regret achievable by \emph{any} algorithm, and have constructed algorithms with matching~(modulo polylogarithmic factors) worst-case performance for the \matern family of kernels. These results suffer from two drawbacks. First, the minimax lower bound gives no information about the limits of regret achievable by the commonly used algorithms on specific problem instances. Second, due to their worst-case nature, the existing upper bound analysis fails to adapt to easier problem instances within the function class. Our work takes steps to address both these issues. First, we derive \emph{instance-dependent} regret lower bounds for algorithms with uniformly~(over the function class) vanishing normalized cumulative regret. Our result, valid for all the practically relevant kernelized bandits algorithms, such as, GP-UCB, GP-TS and SupKernelUCB, identifies a fundamental complexity measure associated with every problem instance. We then address the second issue, by proposing a new minimax near-optimal algorithm which also adapts to easier problem instances. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 26 pages, 1 figure

arXiv:2203.04889 [pdf, other]

Low-light Image and Video Enhancement via Selective Manipulation of Chromaticity

Authors: Sumit Shekhar, Max Reimann, Amir Semmo, Sebastian Pasewaldt, Jürgen Döllner, Matthias Trapp

Abstract: Image acquisition in low-light conditions suffers from poor quality and significant degradation in visual aesthetics. This affects the visual perception of the acquired image and the performance of various computer vision and image processing algorithms applied after acquisition. Especially for videos, the additional temporal domain makes it more challenging, wherein we need to preserve quality in… ▽ More Image acquisition in low-light conditions suffers from poor quality and significant degradation in visual aesthetics. This affects the visual perception of the acquired image and the performance of various computer vision and image processing algorithms applied after acquisition. Especially for videos, the additional temporal domain makes it more challenging, wherein we need to preserve quality in a temporally coherent manner. We present a simple yet effective approach for low-light image and video enhancement. To this end, we introduce "Adaptive Chromaticity", which refers to an adaptive computation of image chromaticity. The above adaptivity allows us to avoid the costly step of low-light image decomposition into illumination and reflectance, employed by many existing techniques. All stages in our method consist of only point-based operations and high-pass or low-pass filtering, thereby ensuring that the amount of temporal incoherence is negligible when applied on a per-frame basis for videos. Our results on standard lowlight image datasets show the efficacy of our algorithm and its qualitative and quantitative superiority over several state-of-the-art techniques. For videos captured in the wild, we perform a user study to demonstrate the preference for our method in comparison to state-of-the-art approaches. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2201.08901 [pdf]

An Ensemble Model for Face Liveness Detection

Authors: Shashank Shekhar, Avinash Patel, Mrinal Haloi, Asif Salim

Abstract: In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by… ▽ More In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by several means, for example, they can capture a user photo from social media and do an imposter attack using printouts of users faces or using a digital photo from a mobile device and even create a more sophisticated attack like video replay attack. We have tried to understand the different methods of attack and created an in-house large-scale dataset covering all the kinds of attacks to train a robust deep learning model. We propose an ensemble method where multiple features of the face and background regions are learned to predict whether the user is a bonafide or an attacker. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Comments: Accepted and presented at MLDM 2022. To be published in Lattice journal

arXiv:2201.06955 [pdf, other]

Understanding COVID-19 Effects on Mobility: A Community-Engaged Approach

Authors: Arun Sharma, Majid Farhadloo, Yan Li, Aditya Kulkarni, Jayant Gupta, Shashi Shekhar

Abstract: Given aggregated mobile device data, the goal is to understand the impact of COVID-19 policy interventions on mobility. This problem is vital due to important societal use cases, such as safely reopening the economy. Challenges include understanding and interpreting questions of interest to policymakers, cross-jurisdictional variability in choice and time of interventions, the large data volume, a… ▽ More Given aggregated mobile device data, the goal is to understand the impact of COVID-19 policy interventions on mobility. This problem is vital due to important societal use cases, such as safely reopening the economy. Challenges include understanding and interpreting questions of interest to policymakers, cross-jurisdictional variability in choice and time of interventions, the large data volume, and unknown sampling bias. The related work has explored the COVID-19 impact on travel distance, time spent at home, and the number of visitors at different points of interest. However, many policymakers are interested in long-duration visits to high-risk business categories and understanding the spatial selection bias to interpret summary reports. We provide an Entity Relationship diagram, system architecture, and implementation to support queries on long-duration visits in addition to fine resolution device count maps to understand spatial bias. We closely collaborated with policymakers to derive the system requirements and evaluate the system components, the summary reports, and visualizations. △ Less

Submitted 10 January, 2022; originally announced January 2022.

arXiv:2201.06433 [pdf, other]

A Comparative study of Hyper-Parameter Optimization Tools

Authors: Shashank Shekhar, Adesh Bansode, Asif Salim

Abstract: Most of the machine learning models have associated hyper-parameters along with their parameters. While the algorithm gives the solution for parameters, its utility for model performance is highly dependent on the choice of hyperparameters. For a robust performance of a model, it is necessary to find out the right hyper-parameter combination. Hyper-parameter optimization (HPO) is a systematic proc… ▽ More Most of the machine learning models have associated hyper-parameters along with their parameters. While the algorithm gives the solution for parameters, its utility for model performance is highly dependent on the choice of hyperparameters. For a robust performance of a model, it is necessary to find out the right hyper-parameter combination. Hyper-parameter optimization (HPO) is a systematic process that helps in finding the right values for them. The conventional methods for this purpose are grid search and random search and both methods create issues in industrial-scale applications. Hence a set of strategies have been recently proposed based on Bayesian optimization and evolutionary algorithm principles that help in runtime issues in a production environment and robust performance. In this paper, we compare the performance of four python libraries, namely Optuna, Hyper-opt, Optunity, and sequential model-based algorithm configuration (SMAC) that has been proposed for hyper-parameter optimization. The performance of these tools is tested using two benchmarks. The first one is to solve a combined algorithm selection and hyper-parameter optimization (CASH) problem The second one is the NeurIPS black-box optimization challenge in which a multilayer perception (MLP) architecture has to be chosen from a set of related architecture constraints and hyper-parameters. The benchmarking is done with six real-world datasets. From the experiments, we found that Optuna has better performance for CASH problem and HyperOpt for MLP problem. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: Selected and presented at IEEE CSDE 2021. To be published in Proceedings of IEEE CSDE 2021

Showing 1–50 of 102 results for author: Shekhar, S