Search | arXiv e-print repository

Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks

Authors: Haritha K, Ramya Burra, Srishti Mittal, Sarthak Sharma, Abhilash Venkatesh, Anshoo Tandon

Abstract: This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. Our list of contributions are as follow… ▽ More This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. Our list of contributions are as follows. Firstly, we enhance MOTION2NX by providing a tensorized version of several primitive functions including the Hadamard product, indicator function and argmax function. Our design of secure indicator function based on a novel approach that uses secure Relu function available in the baseline MOTION2NX implementation. The secure indicator function is used, in turn, as a building block for a novel implementation of secure argmax. Secondly, we also develop a novel splitting of the computations at each CNN layer into multiple configurable chunks thereby resulting in significant reduction in RAM usage. Thirdly, we adapt an existing Helper node algorithm, working in tandem with the ABY2.0 protocol, for efficient convolution computation. This algorithm not only reduces execution time but also reduces the RAM usage required to execute CNN models, but comes at a cost of an additional compute server. Moreover, the ideas presented in this paper can also be applied to secure neural network training. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 20 pages, 1 figure. arXiv admin note: text overlap with arXiv:2310.10133

arXiv:2408.11935 [pdf, other]

Explainable Anomaly Detection: Counterfactual driven What-If Analysis

Authors: Logan Cummins, Alexander Sommers, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure, Thomas Arnold

Abstract: There exists three main areas of study inside of the field of predictive maintenance: anomaly detection, fault diagnosis, and remaining useful life prediction. Notably, anomaly detection alerts the stakeholder that an anomaly is occurring. This raises two fundamental questions: what is causing the fault and how can we fix it? Inside of the field of explainable artificial intelligence, counterfactu… ▽ More There exists three main areas of study inside of the field of predictive maintenance: anomaly detection, fault diagnosis, and remaining useful life prediction. Notably, anomaly detection alerts the stakeholder that an anomaly is occurring. This raises two fundamental questions: what is causing the fault and how can we fix it? Inside of the field of explainable artificial intelligence, counterfactual explanations can give that information in the form of what changes to make to put the data point into the opposing class, in this case "healthy". The suggestions are not always actionable which may raise the interest in asking "what if we do this instead?" In this work, we provide a proof of concept for utilizing counterfactual explanations as what-if analysis. We perform this on the PRONOSTIA dataset with a temporal convolutional network as the anomaly detector. Our method presents the counterfactuals in the form of a what-if analysis for this base problem to inspire future work for more complex systems and scenarios. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 8 pages, 6 figures, 3 tables

arXiv:2408.03558 [pdf, other]

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods

Authors: Onkar Susladkar, Gayatri Deshmukh, Sparsh Mittal, Parth Shastri

Abstract: In image processing, one of the most challenging tasks is to render an image's semantic meaning using a variety of artistic approaches. Existing techniques for arbitrary style transfer (AST) frequently experience mode-collapse, over-stylization, or under-stylization due to a disparity between the style and content images. We propose a novel framework called D$^2$Styler (Discrete Diffusion Styler)… ▽ More In image processing, one of the most challenging tasks is to render an image's semantic meaning using a variety of artistic approaches. Existing techniques for arbitrary style transfer (AST) frequently experience mode-collapse, over-stylization, or under-stylization due to a disparity between the style and content images. We propose a novel framework called D$^2$Styler (Discrete Diffusion Styler) that leverages the discrete representational capability of VQ-GANs and the advantages of discrete diffusion, including stable training and avoidance of mode collapse. Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process. This makes it easy to move features from the style image to the content image without bias. The proposed method substantially enhances the visual quality of style-transferred images, allowing the combination of content and style in a visually appealing manner. We take style images from the WikiArt dataset and content images from the COCO dataset. Experimental results demonstrate that D$^2$Styler produces high-quality style-transferred images and outperforms twelve existing methods on nearly all the metrics. The qualitative results and ablation studies provide further insights into the efficacy of our technique. The code is available at https://github.com/Onkarsus13/D2Styler. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: Paper accepted at 27th International Conference on Pattern Recognition (ICPR), 2024

arXiv:2408.00283 [pdf, other]

Navigating Text-to-Image Generative Bias across Indic Languages

Authors: Surbhi Mittal, Arnav Sudan, Mayank Vatsa, Richa Singh, Tamar Glaser, Tal Hassner

Abstract: This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffu… ▽ More This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to evaluate the support for Indic languages in these models and identify areas needing improvement. Given the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness within the Indic linguistic landscape. The data and code for the IndicTTI benchmark can be accessed at https://iab-rubric.org/resources/other-databases/indictti. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted in ECCV 2024

arXiv:2407.20696 [pdf]

Implementation of Formal Standard for Interoperability in M&S/System of Systems Integration with DEVS/SOA

Authors: Saurabh Mittal, Bernard P. Zeigler, José L. Risco-Martín

Abstract: Modeling and Simulation (M&S) is finding increasing application in development and testing of command and control systems comprised of information-intensive component systems. Achieving interoperability is one of the chief System of Systems (SoS) engineering objectives in the development of command and control (C2) capabilities for joint and coalition warfare. In this paper, we apply an SoS perspe… ▽ More Modeling and Simulation (M&S) is finding increasing application in development and testing of command and control systems comprised of information-intensive component systems. Achieving interoperability is one of the chief System of Systems (SoS) engineering objectives in the development of command and control (C2) capabilities for joint and coalition warfare. In this paper, we apply an SoS perspective on the integration of M&S with such systems. We employ recently developed interoperability concepts based on linguistic categories along with the Discrete Event System Specification (DEVS) formalism to implement a standard for interoperability. We will show how the developed standard is implemented in DEVS/SOA net-centric modeling and simulation framework that uses XML-based Service Oriented Architecture (SOA). We will discuss the simulator interfaces and the design issues in their implementation in DEVS/SOA. We will illustrate the application of DEVS/SOA in a multi-agent test instrumentation system that is deployable as a SOA. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2407.03686

Journal ref: The International C2 Journal, 3(1), pp. 1-61, 2009

arXiv:2407.08281 [pdf]

doi 10.1177/0037549709104727

eUDEVS: Executable UML with DEVS Theory of Modeling and Simulation

Authors: José L. Risco-Martín, J. M. Cruz, Saurabh Mittal, Bernard P. Zeigler

Abstract: Modeling and Simulation (M&S) for system design and prototyping is practiced today both in the industry and academia. M&S are two different areas altogether and have specific objectives. However, most of the times these two separate areas are taken together. The developed code is tightly woven around both the model and the underlying simulator that executes it. This constraints both the model deve… ▽ More Modeling and Simulation (M&S) for system design and prototyping is practiced today both in the industry and academia. M&S are two different areas altogether and have specific objectives. However, most of the times these two separate areas are taken together. The developed code is tightly woven around both the model and the underlying simulator that executes it. This constraints both the model development and the simulation engine that impacts scalability of the developed code. Furthermore, a lot of time is spent in development of a model because it needs both domain knowledge and simulation techniques, which also requires communication among users and developers. Unified Modeling Language (UML) is widely accepted in the industry, whereas Discrete Event Specification (DEVS) based modeling that separates the model and the simulator, provides a cleaner methodology to develop models and is much used in academia. DEVS today is used by engineers who understand discrete event modeling at a much detailed level and are able to translate requirements to DEVS modeling code. There have been earlier efforts to integrate UML and DEVS but they haven't succeeded in providing a transformation mechanism due to inherent differences in these two modeling paradigms. This paper presents an integrated approach towards crosstransformations between UML and DEVS using the proposed eUDEVS, which stands for executable UML based on DEVS. Further, we will also show that the obtained DEVS models belong to a specific class of DEVS models called Finite Deterministic DEVS (FD-DEVS) that is available as a W3C XML Schema in XFD-DEVS. We also put the proposed eUDEVS in a much larger unifying framework called DEVS Unified Process that allows bifurcated model-continuity based lifecycle methodology for systems M&S. Finally, we demonstrate the laid concepts with a complete example. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Journal ref: SIMULATION: Transactions of the SCS, 85(11-12), pp. 750-777, 2009

arXiv:2407.03686 [pdf]

doi 10.1177/0037549709340968

DEVS/SOA: A Cross-Platform Framework for Net-centric Modeling and Simulation in DEVS Unified Process

Authors: Saurabh Mittal, José L. Risco-Martín, Bernard P. Zeigler

Abstract: Discrete EVent Specification (DEVS) environments are known to be implemented over middleware systems such as HLA, RMI, CORBA and others. DEVS exhibits concepts of systems theory and modeling and supports capturing the system behavior from the physical and behavioral perspectives. Further, they are implemented using Object-oriented languages like Java and C++. This research work uses the Java platf… ▽ More Discrete EVent Specification (DEVS) environments are known to be implemented over middleware systems such as HLA, RMI, CORBA and others. DEVS exhibits concepts of systems theory and modeling and supports capturing the system behavior from the physical and behavioral perspectives. Further, they are implemented using Object-oriented languages like Java and C++. This research work uses the Java platform to implement DEVS over a Service Oriented Architecture (SOA) framework. Called the DEVS/SOA, the framework supports a development and testing environment known as DEVS Unified Process that is built on a model-continuity-based life cycle methodology. DEVS Unified Process allows DEVS-based Modeling and Simulation (M&S) over net-centric platforms using DEVS/SOA. This framework also provides the crucial feature of run-time composability of coupled systems using SOA. We describe the architecture and designs of the both the server and the client. The client application communicates with multiple servers hosting DEVS simulation services. These Simulation services are developed using the proposed symmetrical services architecture wherein the server can act as both a service provider and a service consumer contrary to the unidirectional client-server paradigm. We also discuss how this Services based architecture provides solutions for cross-platform distributed M&S. We demonstrate DEVS/SOA framework with a scenario of Joint Close Air Support specified in Business Process Modeling Notation (BPMN). We also provide a real-world application of Network health monitoring using DEVS/SOA layered architectural framework. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Journal ref: SIMULATION, 85(7), pp. 419-450, 2009

arXiv:2406.18812 [pdf, other]

A Survey on Privacy Attacks Against Digital Twin Systems in AI-Robotics

Authors: Ivan A. Fernandez, Subash Neupane, Trisha Chakraborty, Shaswata Mitra, Sudip Mittal, Nisha Pillai, Jingdao Chen, Shahram Rahimi

Abstract: Industry 4.0 has witnessed the rise of complex robots fueled by the integration of Artificial Intelligence/Machine Learning (AI/ML) and Digital Twin (DT) technologies. While these technologies offer numerous benefits, they also introduce potential privacy and security risks. This paper surveys privacy attacks targeting robots enabled by AI and DT models. Exfiltration and data leakage of ML models… ▽ More Industry 4.0 has witnessed the rise of complex robots fueled by the integration of Artificial Intelligence/Machine Learning (AI/ML) and Digital Twin (DT) technologies. While these technologies offer numerous benefits, they also introduce potential privacy and security risks. This paper surveys privacy attacks targeting robots enabled by AI and DT models. Exfiltration and data leakage of ML models are discussed in addition to the potential extraction of models derived from first-principles (e.g., physics-based). We also discuss design considerations with DT-integrated robotics touching on the impact of ML model training, responsible AI and DT safeguards, data governance and ethical considerations on the effectiveness of these attacks. We advocate for a trusted autonomy approach, emphasizing the need to combine robotics, AI, and DT technologies with robust ethical frameworks and trustworthiness principles for secure and reliable AI robotic systems. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 10 pages, 3 figures, 1 table

arXiv:2406.17031 [pdf, other]

Impact of extragalactic point sources on the low-frequency sky spectrum and cosmic dawn global 21-cm measurements

Authors: Shikhar Mittal, Girish Kulkarni, Dominic Anstey, Eloy de Lera Acedo

Abstract: Contribution of resolved and unresolved extragalactic point sources to the low-frequency sky spectrum is a potentially non-negligible part of the astrophysical foregrounds for cosmic dawn 21-cm experiments. The clustering of such point sources on the sky, combined with the frequency-dependence of the antenna beam, can also make this contribution chromatic. By combining low-frequency measurements o… ▽ More Contribution of resolved and unresolved extragalactic point sources to the low-frequency sky spectrum is a potentially non-negligible part of the astrophysical foregrounds for cosmic dawn 21-cm experiments. The clustering of such point sources on the sky, combined with the frequency-dependence of the antenna beam, can also make this contribution chromatic. By combining low-frequency measurements of the luminosity function and the angular correlation function of extragalactic point sources, we develop a model for the contribution of these sources to the low-frequency sky spectrum. Using this model, we find that the contribution of sources with flux density $>10^{-6}\,$Jy to the sky-averaged spectrum is smooth and of the order of a few kelvins at 50 - $200\,$MHz. We combine this model with measurements of the galactic foreground spectrum and convolve the result with the beam of the conical log-spiral antenna planned as part of the Radio Experiment for the Analysis of Cosmic Hydrogen (REACH) project. We find that the contribution of point sources to resultant spectrum is $\sim0.4\%$ of the total foregrounds, but still larger by at least an order of magnitude than the standard predictions for the cosmological 21-cm signal. As a result, not accounting for the point-source contribution leads to a systematic bias in 21-cm signal recovery. We show, however, that in the REACH case, this reconstruction bias can be removed by modelling the point-source contribution as a power law with a running spectral index. We make our code publicly available as a Python package labelled epspy. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 13 pages, 14 figures, and 1 appendix. Submitted to MNRAS. Comments are welcome

arXiv:2406.02322 [pdf, other]

A Survey of Transformer Enabled Time Series Synthesis

Authors: Alexander Sommers, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure, Thomas Arnold

Abstract: Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the… ▽ More Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the transformer, generative AI, and time series data, and reviews works in this sparsely populated subdomain. The reviewed works show great variety in approach, and have not yet converged on a conclusive answer to the problems the domain poses. GANs, diffusion models, state space models, and autoencoders were all encountered alongside or surrounding the transformers which originally motivated the survey. While too open a domain to offer conclusive insights, the works surveyed are quite suggestive, and several recommendations for best practice, and suggestions of valuable future work, are provided. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.20971 [pdf, other]

Amortizing intractable inference in diffusion models for vision, language, and control

Authors: Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin

Abstract: Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat… ▽ More Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Code: https://github.com/GFNOrg/diffusion-finetuning

arXiv:2405.19162 [pdf, other]

Does learning the right latent variables necessarily improve in-context learning?

Authors: Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Dhanya Sridhar, Guillaume Lajoie

Abstract: Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by in… ▽ More Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16388 [pdf, other]

Multi-Reference Preference Optimization for Large Language Models

Authors: Hung Le, Quan Tran, Dung Nguyen, Kien Do, Saloni Mittal, Kelechi Ogueji, Svetha Venkatesh

Abstract: How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning opti… ▽ More How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models, substantially enhancing preference learning capabilities compared to the single-reference DPO. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance. Furthermore, MRPO effectively finetunes LLMs to exhibit superior performance in several downstream natural language processing tasks such as GSM8K and TruthfulQA. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 20 pages

arXiv:2405.15262 [pdf, other]

doi 10.1051/0004-6361/202450310

Gamma-ray line emission from the Local Bubble

Authors: Thomas Siegert, Michael M. Schulreich, Niklas Bauer, Rudi Reinhardt, Saurabh Mittal, Hiroki Yoneda

Abstract: Deep-sea archives that include intermediate-lived radioactive $^{60}\mathrm{Fe}$ particles suggest the occurrence of several recent supernovae inside the present-day volume of the Local Bubble during the last $\sim 10$ Myr. The isotope $^{60}\mathrm{Fe}$ is mainly produced in massive stars and ejected in supernova explosions, which should always result in a sizeable yield of $^{26}\mathrm{Al}$ fro… ▽ More Deep-sea archives that include intermediate-lived radioactive $^{60}\mathrm{Fe}$ particles suggest the occurrence of several recent supernovae inside the present-day volume of the Local Bubble during the last $\sim 10$ Myr. The isotope $^{60}\mathrm{Fe}$ is mainly produced in massive stars and ejected in supernova explosions, which should always result in a sizeable yield of $^{26}\mathrm{Al}$ from the same objects. $^{60}\mathrm{Fe}$ and $^{26}\mathrm{Al}$ decay with lifetimes of 3.82 and 1.05 Myr, and emit $γ$-rays at 1332 and 1809 keV, respectively. These $γ$-rays have been measured as diffuse glow of the Milky Way, and would also be expected from inside the Local Bubble as foreground emission. Based on two scenarios, one employing a geometrical model and the other state-of-the-art hydrodynamics simulations, we estimate the expected fluxes of the 1332 and 1809 keV $γ$-ray lines, as well as the resulting 511 keV line from positron annihilation due to the $^{26}\mathrm{Al}$ $β^+$-decay. We find fluxes in the range of $10^{-6}$-$10^{-5}\,\mathrm{ph\,cm^{-2}\,s^{-1}}$ for all three lines with isotropic contributions of 10-50%. We show that these fluxes are within reach for the upcoming COSI-SMEX $γ$-ray telescope over its nominal satellite mission duration of 2 yr. Given the Local Bubble models considered, we conclude that in the case of 10-20 Myr-old superbubbles, the distributions of $^{60}\mathrm{Fe}$ and $^{26}\mathrm{Al}$ are not co-spatial - an assumption usually made in $γ$-ray data analyses. In fact, this should be taken into account however when analysing individual nearby targets for their $^{60}\mathrm{Fe}$ to $^{26}\mathrm{Al}$ flux ratio as this gauges the stellar evolution models and the age of the superbubbles. A flux ratio measured for the Local Bubble could further constrain models of $^{60}\mathrm{Fe}$ deposition on Earth and its moon. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: accepted in A&A, 18 pages, 13 figures, 2 tables

Journal ref: A&A 689, A2 (2024)

arXiv:2405.14820 [pdf, other]

doi 10.3847/1538-4357/ad4be4

Identification of hot gas around low-mass protostars

Authors: Merel L. R. van 't Hoff, Edwin A. Bergin, Penelope Riley, Sanil Mittal, Jes K. Jørgensen, John J. Tobin

Abstract: The low carbon content of Earth and primitive meteorites compared to the Sun and interstellar grains suggests that carbon-rich grains were destroyed in the inner few astronomical units of the young solar system. A promising mechanism to selectively destroy carbonaceous grains is thermal sublimation within the soot line at $\gtrsim$ 300 K. To address whether such hot conditions are common amongst l… ▽ More The low carbon content of Earth and primitive meteorites compared to the Sun and interstellar grains suggests that carbon-rich grains were destroyed in the inner few astronomical units of the young solar system. A promising mechanism to selectively destroy carbonaceous grains is thermal sublimation within the soot line at $\gtrsim$ 300 K. To address whether such hot conditions are common amongst low-mass protostars, we observe CH$_3$CN transitions at 1, 2 and 3 mm with the NOrthern Extended Millimeter Array (NOEMA) toward seven low-mass and one intermediate-mass protostar ($L_{\rm{bol}} \sim2-300 L_\odot$), as CH$_3$CN is an excellent temperature tracer. We find $>$ 300 K gas toward all sources, indicating that hot gas may be prevalent. Moreover, the excitation temperature for CH$_3$OH obtained with the same observations is always lower ($\sim$135-250 K), suggesting that CH$_3$CN and CH$_3$OH have a different spatial distribution. A comparison of the column densities at 1 and 3 mm shows a stronger increase at 3 mm for CH$_3$CN than for CH$_3$OH. Since the dust opacity is lower at longer wavelengths, this indicates that CH$_3$CN is enhanced in the hot gas compared to CH$_3$OH. If this CH$_3$CN enhancement is the result of carbon-grain sublimation, these results suggests that Earth's initial formation conditions may not be rare. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 19 pages, 9 figures, 4 tables (plus 13 pages appendix with 10 figures, 4 tables). Accepted for publication in ApJ

arXiv:2405.08120 [pdf, other]

From Questions to Insightful Answers: Building an Informed Chatbot for University Resources

Authors: Subash Neupane, Elias Hossain, Jason Keith, Himanshu Tripathi, Farbod Ghiasi, Noorbakhsh Amiri Golilarz, Amin Amirlatifi, Sudip Mittal, Shahram Rahimi

Abstract: This paper presents BARKPLUG V.2, a Large Language Model (LLM)-based chatbot system built using Retrieval Augmented Generation (RAG) pipelines to enhance the user experience and access to information within academic settings.The objective of BARKPLUG V.2 is to provide information to users about various campus resources, including academic departments, programs, campus facilities, and student resou… ▽ More This paper presents BARKPLUG V.2, a Large Language Model (LLM)-based chatbot system built using Retrieval Augmented Generation (RAG) pipelines to enhance the user experience and access to information within academic settings.The objective of BARKPLUG V.2 is to provide information to users about various campus resources, including academic departments, programs, campus facilities, and student resources at a university setting in an interactive fashion. Our system leverages university data as an external data corpus and ingests it into our RAG pipelines for domain-specific question-answering tasks. We evaluate the effectiveness of our system in generating accurate and pertinent responses for Mississippi State University, as a case study, using quantitative measures, employing frameworks such as Retrieval Augmented Generation Assessment(RAGAS). Furthermore, we evaluate the usability of this system via subjective satisfaction surveys using the System Usability Scale (SUS). Our system demonstrates impressive quantitative performance, with a mean RAGAS score of 0.96, and experience, as validated by usability assessments. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.18228 [pdf, other]

doi 10.1007/978-3-031-58495-4_12

TextGram: Towards a better domain-adaptive pretraining

Authors: Sharayu Hiwarkhedkar, Saloni Mittal, Vidula Magdum, Omkar Dhekane, Raviraj Joshi, Geetanjali Kale, Arnav Ladkat

Abstract: For green AI, it is crucial to measure and reduce the carbon footprint emitted during the training of large language models. In NLP, performing pre-training on Transformer models requires significant computational resources. This pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks. Thus, it is important that we select the correct data in… ▽ More For green AI, it is crucial to measure and reduce the carbon footprint emitted during the training of large language models. In NLP, performing pre-training on Transformer models requires significant computational resources. This pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks. Thus, it is important that we select the correct data in the form of domain-specific data from this vast corpus to achieve optimum results aligned with our domain-specific tasks. While training on large unsupervised data is expensive, it can be optimized by performing a data selection step before pretraining. Selecting important data reduces the space overhead and the substantial amount of time required to pre-train the model while maintaining constant accuracy. We investigate the existing selection strategies and propose our own domain-adaptive data selection method - TextGram - that effectively selects essential data from large corpora. We compare and evaluate the results of finetuned models for text classification task with and without data selection. We show that the proposed strategy works better compared to other selection methods. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted at SPELLL 2023

arXiv:2404.18216 [pdf, other]

doi 10.1007/978-3-031-58495-4_4

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi

Authors: Saloni Mittal, Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Raviraj Joshi

Abstract: The availability of text or topic classification datasets in the low-resource Marathi language is limited, typically consisting of fewer than 4 target labels, with some achieving nearly perfect accuracy. In this work, we introduce L3Cube-MahaNews, a Marathi text classification corpus that focuses on News headlines and articles. This corpus stands out as the largest supervised Marathi Corpus, conta… ▽ More The availability of text or topic classification datasets in the low-resource Marathi language is limited, typically consisting of fewer than 4 target labels, with some achieving nearly perfect accuracy. In this work, we introduce L3Cube-MahaNews, a Marathi text classification corpus that focuses on News headlines and articles. This corpus stands out as the largest supervised Marathi Corpus, containing over 1.05L records classified into a diverse range of 12 categories. To accommodate different document lengths, MahaNews comprises three supervised datasets specifically designed for short text, long documents, and medium paragraphs. The consistent labeling across these datasets facilitates document length-based analysis. We provide detailed data statistics and baseline results on these datasets using state-of-the-art pre-trained BERT models. We conduct a comparative analysis between monolingual and multilingual BERT models, including MahaBERT, IndicBERT, and MuRIL. The monolingual MahaBERT model outperforms all others on every dataset. These resources also serve as Marathi topic classification datasets or models and are publicly available at https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted at SPELLL 2023

arXiv:2404.08601 [pdf, ps, other]

Generating Synthetic Time Series Data for Cyber-Physical Systems

Authors: Alexander Sommers, Somayeh Bakhtiari Ramezani, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure

Abstract: Data augmentation is an important facilitator of deep learning applications in the time series domain. A gap is identified in the literature, demonstrating sparse exploration of the transformer, the dominant sequence model, for data augmentation in time series. A architecture hybridizing several successful priors is put forth and tested using a powerful time domain similarity metric. Results sugge… ▽ More Data augmentation is an important facilitator of deep learning applications in the time series domain. A gap is identified in the literature, demonstrating sparse exploration of the transformer, the dominant sequence model, for data augmentation in time series. A architecture hybridizing several successful priors is put forth and tested using a powerful time domain similarity metric. Results suggest the challenge of this domain, and several valuable directions for future work. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.04985 [pdf]

Towards a generalized accessibility measure for transportation equity and efficiency

Authors: Rajat Verma, Mithun Debnath, Shagun Mittal, Satish V. Ukkusuri

Abstract: Locational measures of accessibility are widely used in urban and transportation planning to understand the impact of the transportation system on influencing people's access to places. However, there is a considerable lack of measurement standards and publicly available data. We propose a generalized measure of locational accessibility that has a comprehensible form for transportation planning an… ▽ More Locational measures of accessibility are widely used in urban and transportation planning to understand the impact of the transportation system on influencing people's access to places. However, there is a considerable lack of measurement standards and publicly available data. We propose a generalized measure of locational accessibility that has a comprehensible form for transportation planning analysis. This metric combines the cumulative opportunities approach with gravity-based measures and is capable of catering to multiple trip purposes, travel modes, cost thresholds, and scales of analysis. Using data from multiple publicly available datasets, this metric is computed by trip purpose and travel time threshold for all block groups in the United States, and the data is made publicly accessible. Further, case studies of three large metropolitan areas reveal substantial inefficiencies in transportation infrastructure, with the most inefficiency observed in sprawling and non-core urban areas, especially for bicycling. Subsequently, it is shown that targeted investment in facilities can contribute to a more equitable distribution of accessibility to essential shopping and service facilities. By assigning greater weights to socioeconomically disadvantaged neighborhoods, the proposed metric formally incorporates equity considerations into transportation planning, contributing to a more equitable distribution of accessibility to essential services and facilities. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Article originally submitted to the Journal of Transport Geography. Article contains 5 main figures. Supplementary Material is prepared but not uploaded in this submission

arXiv:2403.14681 [pdf]

doi 10.4018/IJBAN.338367

AI Ethics: A Bibliometric Analysis, Critical Issues, and Key Gaps

Authors: Di Kevin Gao, Andrew Haverly, Sudip Mittal, Jiming Wu, Jingdao Chen

Abstract: Artificial intelligence (AI) ethics has emerged as a burgeoning yet pivotal area of scholarly research. This study conducts a comprehensive bibliometric analysis of the AI ethics literature over the past two decades. The analysis reveals a discernible tripartite progression, characterized by an incubation phase, followed by a subsequent phase focused on imbuing AI with human-like attributes, culmi… ▽ More Artificial intelligence (AI) ethics has emerged as a burgeoning yet pivotal area of scholarly research. This study conducts a comprehensive bibliometric analysis of the AI ethics literature over the past two decades. The analysis reveals a discernible tripartite progression, characterized by an incubation phase, followed by a subsequent phase focused on imbuing AI with human-like attributes, culminating in a third phase emphasizing the development of human-centric AI systems. After that, they present seven key AI ethics issues, encompassing the Collingridge dilemma, the AI status debate, challenges associated with AI transparency and explainability, privacy protection complications, considerations of justice and fairness, concerns about algocracy and human enfeeblement, and the issue of superintelligence. Finally, they identify two notable research gaps in AI ethics regarding the large ethics model (LEM) and AI identification and extend an invitation for further scholarly research. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Journal ref: International Journal of Business Analytics (IJBAN), 2024, 11(1), 1-19

arXiv:2403.08607 [pdf, other]

MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models

Authors: Subash Neupane, Shaswata Mitra, Sudip Mittal, Noorbakhsh Amiri Golilarz, Shahram Rahimi, Amin Amirlatifi

Abstract: Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses. However, their lack of domain-specific knowledge limits their applicability in healthcare settings, where contextual and comprehensive responses are vital. To address this challenge and enable the generation of patient-centric responses that are contextually relevant and comprehensive, we propose Me… ▽ More Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses. However, their lack of domain-specific knowledge limits their applicability in healthcare settings, where contextual and comprehensive responses are vital. To address this challenge and enable the generation of patient-centric responses that are contextually relevant and comprehensive, we propose MedInsight:a novel retrieval augmented framework that augments LLM inputs (prompts) with relevant background information from multiple sources. MedInsight extracts pertinent details from the patient's medical record or consultation transcript. It then integrates information from authoritative medical textbooks and curated web resources based on the patient's health history and condition. By constructing an augmented context combining the patient's record with relevant medical knowledge, MedInsight generates enriched, patient-specific responses tailored for healthcare applications such as diagnosis, treatment recommendations, or patient education. Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses. Quantitative evaluation using the Ragas metric and TruLens for answer similarity and answer correctness demonstrates the model's efficacy. Furthermore, human evaluation studies involving Subject Matter Expert (SMEs) confirm MedInsight's utility, with moderate inter-rater agreement on the relevance and correctness of the generated responses. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05971 [pdf, other]

doi 10.1063/5.0207942

Harmonic Balance for Differential Constitutive Models under Oscillatory Shear

Authors: Shivangi Mittal, Yogesh M. Joshi, Sachin Shanbhag

Abstract: Harmonic balance (HB) is a popular Fourier-Galerkin method used in the analysis of nonlinear vibration problems where dynamical systems are subjected to periodic forcing. We adapt HB to find the periodic steady-state response of nonlinear differential constitutive models subjected to large amplitude oscillatory shear flow. By incorporating the alternating-frequency-time scheme into HB, we develop… ▽ More Harmonic balance (HB) is a popular Fourier-Galerkin method used in the analysis of nonlinear vibration problems where dynamical systems are subjected to periodic forcing. We adapt HB to find the periodic steady-state response of nonlinear differential constitutive models subjected to large amplitude oscillatory shear flow. By incorporating the alternating-frequency-time scheme into HB, we develop a computer program called FLASH (acronym for Fast Large Amplitude Simulation using Harmonic balance), which makes it convenient to apply HB to any differential constitutive model. We validate FLASH by considering two representative constitutive models, viz., the exponential Phan-Thien Tanner model and a nonlinear temporary network model. In terms of accuracy and speed, FLASH outperforms the conventional approach of solving initial value problems by numerical integration via time-stepping methods often by several orders of magnitude. We discuss how FLASH can be conveniently extended for other nonlinear constitutive models, which opens up potential applications in model calibration and selection, and stability analysis. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 41 pages, 9 figures

Journal ref: Physics of Fluids 36, 053104 (2024)

arXiv:2403.05551 [pdf]

A Bibliometric View of AI Ethics Development

Authors: Di Kevin Gao, Andrew Haverly, Sudip Mittal, Jingdao Chen

Abstract: Artificial Intelligence (AI) Ethics is a nascent yet critical research field. Recent developments in generative AI and foundational models necessitate a renewed look at the problem of AI Ethics. In this study, we perform a bibliometric analysis of AI Ethics literature for the last 20 years based on keyword search. Our study reveals a three-phase development in AI Ethics, namely an incubation phase… ▽ More Artificial Intelligence (AI) Ethics is a nascent yet critical research field. Recent developments in generative AI and foundational models necessitate a renewed look at the problem of AI Ethics. In this study, we perform a bibliometric analysis of AI Ethics literature for the last 20 years based on keyword search. Our study reveals a three-phase development in AI Ethics, namely an incubation phase, making AI human-like machines phase, and making AI human-centric machines phase. We conjecture that the next phase of AI ethics is likely to focus on making AI more machine-like as AI matches or surpasses humans intellectually, a term we coin as "machine-like human". △ Less

Submitted 8 February, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.14946 [pdf, other]

Dynamic control of 2D non-Hermitian photonic corner states in synthetic dimensions

Authors: Xinyuan Zheng, Mahmoud Jalali Mehrabad, Jonathan Vannucci, Kevin Li, Avik Dutt, Mohammad Hafezi, Sunil Mittal, Edo Waks

Abstract: Non-Hermitian models describe the physics of ubiquitous open systems with gain and loss. One intriguing aspect of non-Hermitian models is their inherent topology that can produce intriguing boundary phenomena like resilient higher-order topological insulators (HOTIs) and non-Hermitian skin effects (NHSE). Recently, time-multiplexed lattices in synthetic dimensions have emerged as a versatile platf… ▽ More Non-Hermitian models describe the physics of ubiquitous open systems with gain and loss. One intriguing aspect of non-Hermitian models is their inherent topology that can produce intriguing boundary phenomena like resilient higher-order topological insulators (HOTIs) and non-Hermitian skin effects (NHSE). Recently, time-multiplexed lattices in synthetic dimensions have emerged as a versatile platform for the investigation of these effects free of geometric restrictions. Despite holding broad applications, studies of these effects have been limited to static cases so far, and full dynamical control over the non-Hermitian effects has remained elusive. Here, we demonstrate the emergence of topological non-Hermitian corner states with remarkable temporal controllability and robustness in a two-dimensional photonic synthetic time lattice. Specifically, we showcase various dynamic control mechanisms for light confinement and flow, including spatial mode tapering, sequential non-Hermiticity on-off switching, dynamical corner state relocation, and light steering. Moreover, we establish the corner state's robustness in the presence of intensity modulation randomness and quantitatively determine its breakdown regime. Our findings extend non-Hermitian and topological photonic effects into higher synthetic dimensions, offering remarkable flexibility and real-time control possibilities. This opens avenues for topological classification, quantum walk simulations of many-body dynamics, and robust Floquet engineering, free from the limitations of physical geometries. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.12608 [pdf, other]

Patient-Centric Knowledge Graphs: A Survey of Current Methods, Challenges, and Applications

Authors: Hassan S. Al Khatib, Subash Neupane, Harish Kumar Manchukonda, Noorbakhsh Amiri Golilarz, Sudip Mittal, Amin Amirlatifi, Shahram Rahimi

Abstract: Patient-Centric Knowledge Graphs (PCKGs) represent an important shift in healthcare that focuses on individualized patient care by mapping the patient's health information in a holistic and multi-dimensional way. PCKGs integrate various types of health data to provide healthcare professionals with a comprehensive understanding of a patient's health, enabling more personalized and effective care. T… ▽ More Patient-Centric Knowledge Graphs (PCKGs) represent an important shift in healthcare that focuses on individualized patient care by mapping the patient's health information in a holistic and multi-dimensional way. PCKGs integrate various types of health data to provide healthcare professionals with a comprehensive understanding of a patient's health, enabling more personalized and effective care. This literature review explores the methodologies, challenges, and opportunities associated with PCKGs, focusing on their role in integrating disparate healthcare data and enhancing patient care through a unified health perspective. In addition, this review also discusses the complexities of PCKG development, including ontology design, data integration techniques, knowledge extraction, and structured representation of knowledge. It highlights advanced techniques such as reasoning, semantic search, and inference mechanisms essential in constructing and evaluating PCKGs for actionable healthcare insights. We further explore the practical applications of PCKGs in personalized medicine, emphasizing their significance in improving disease prediction and formulating effective treatment plans. Overall, this review provides a foundational perspective on the current state-of-the-art and best practices of PCKGs, guiding future research and applications in this dynamic field. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.06121 [pdf, other]

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Authors: Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

Abstract: Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and… ▽ More Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system. △ Less

Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Published at ICML 2024. Code for iDEM is available at https://github.com/jarridrb/dem

arXiv:2402.05483 [pdf, other]

doi 10.1177/0037549717690447

Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark

Authors: José L. Risco-Martín, Saurabh Mittal, Juan Carlos Fabero, Marina Zapater, Román Hermida

Abstract: The Discrete Event System Specification formalism (DEVS), which supports hierarchical and modular model composition, has been widely used to understand, analyze and develop a variety of systems. DEVS has been implemented in various languages and platforms over the years. The DEVStone benchmark was conceived to generate a set of models with varied structure and behavior, and to automate the evaluat… ▽ More The Discrete Event System Specification formalism (DEVS), which supports hierarchical and modular model composition, has been widely used to understand, analyze and develop a variety of systems. DEVS has been implemented in various languages and platforms over the years. The DEVStone benchmark was conceived to generate a set of models with varied structure and behavior, and to automate the evaluation of the performance of DEVS-based simulators. However, DEVStone is still in a preliminar phase and more model analysis is required. In this paper, we revisit DEVStone introducing new equations to compute the number of events triggered. We also introduce a new benchmark, called HOmem, designed as an alternative version of HOmod, with similar CPU and memory requirements, but with an easier implementation and analytically more manageable. Finally, we compare both the performance and memory footprint of five different DEVS simulators in two different hardware platforms. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Journal ref: SIMULATION, 93(6), 2017

arXiv:2402.05098 [pdf, other]

Improved off-policy training of diffusion samplers

Authors: Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin

Abstract: We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into… ▽ More We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference. △ Less

Submitted 26 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 24 pages; changed title from v2; code: https://github.com/GFNOrg/gfn-diffusion

arXiv:2402.00890 [pdf, other]

Utilizing Large Language Models to Translate RFC Protocol Specifications to CPSA Definitions

Authors: Martin Duclos, Ivan A. Fernandez, Kaneesha Moore, Sudip Mittal, Edward Zieglar

Abstract: This paper proposes the use of Large Language Models (LLMs) for translating Request for Comments (RFC) protocol specifications into a format compatible with the Cryptographic Protocol Shapes Analyzer (CPSA). This novel approach aims to reduce the complexities and efforts involved in protocol analysis, by offering an automated method for translating protocol specifications into structured models su… ▽ More This paper proposes the use of Large Language Models (LLMs) for translating Request for Comments (RFC) protocol specifications into a format compatible with the Cryptographic Protocol Shapes Analyzer (CPSA). This novel approach aims to reduce the complexities and efforts involved in protocol analysis, by offering an automated method for translating protocol specifications into structured models suitable for CPSA. In this paper we discuss the implementation of an RFC Protocol Translator, its impact on enhancing the accessibility of formal methods analysis, and its potential for improving the security of internet protocols. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.15547 [pdf, other]

doi 10.1126/science.ado0053

Observation of topological frequency combs

Authors: Christopher J. Flower, Mahmoud Jalali Mehrabad, Lida Xu, Gregory Moille, Daniel G. Suarez-Forero, Ogulcan Orsel, Gaurav Bahl, Yanne Chembo, Kartik Srinivasan, Sunil Mittal, Mohammad Hafezi

Abstract: On-chip generation of optical frequency combs using nonlinear ring resonators has opened the route to numerous novel applications of combs that were otherwise limited to mode-locked laser systems. Nevertheless, even after more than a decade of development, on-chip nonlinear combs still predominantly rely on the use of single-ring resonators. Recent theoretical investigations have shown that genera… ▽ More On-chip generation of optical frequency combs using nonlinear ring resonators has opened the route to numerous novel applications of combs that were otherwise limited to mode-locked laser systems. Nevertheless, even after more than a decade of development, on-chip nonlinear combs still predominantly rely on the use of single-ring resonators. Recent theoretical investigations have shown that generating combs in a topological array of resonators can provide a new avenue to engineer comb spectra. Here, we experimentally demonstrate the generation of such a novel class of frequency combs, topological frequency combs, in a two-dimensional (2D) lattice of hundreds of nonlinear ring resonators. Specifically, the lattice hosts topological edge states that exhibit fabrication-robust linear dispersion and spatial confinement at the boundary of the lattice. Upon optical pumping of the topological edge band, these unique properties of the edge states lead to the generation of a nested frequency comb that is spectrally confined within the edge bands across $\approx$40 longitudinal modes. Moreover, using spatial imaging of our topological lattice, we confirm that light generated in the comb teeth is indeed spatially confined at the lattice edge, characteristic of linear topological systems. Our results bring together the fields of topological photonics and optical frequency combs, providing an opportunity to explore the interplay between topology and nonlinear systems in a platform compatible with commercially available nanofabrication processes. △ Less

Submitted 8 April, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 9 pages, 5 figures (SI: 7 pages, 9 figures)

arXiv:2401.10373 [pdf, other]

Harmonized Spatial and Spectral Learning for Robust and Generalized Medical Image Segmentation

Authors: Vandan Gorade, Sparsh Mittal, Debesh Jha, Rekha Singhal, Ulas Bagci

Abstract: Deep learning has demonstrated remarkable achievements in medical image segmentation. However, prevailing deep learning models struggle with poor generalization due to (i) intra-class variations, where the same class appears differently in different samples, and (ii) inter-class independence, resulting in difficulties capturing intricate relationships between distinct objects, leading to higher fa… ▽ More Deep learning has demonstrated remarkable achievements in medical image segmentation. However, prevailing deep learning models struggle with poor generalization due to (i) intra-class variations, where the same class appears differently in different samples, and (ii) inter-class independence, resulting in difficulties capturing intricate relationships between distinct objects, leading to higher false negative cases. This paper presents a novel approach that synergies spatial and spectral representations to enhance domain-generalized medical image segmentation. We introduce the innovative Spectral Correlation Coefficient objective to improve the model's capacity to capture middle-order features and contextual long-range dependencies. This objective complements traditional spatial objectives by incorporating valuable spectral information. Extensive experiments reveal that optimizing this objective with existing architectures like UNet and TransUNet significantly enhances generalization, interpretability, and noise robustness, producing more confident predictions. For instance, in cardiac segmentation, we observe a 0.81 pp and 1.63 pp (pp = percentage point) improvement in DSC over UNet and TransUNet, respectively. Our interpretability study demonstrates that, in most tasks, objectives optimized with UNet outperform even TransUNet by introducing global contextual information alongside local details. These findings underscore the versatility and effectiveness of our proposed method across diverse imaging modalities and medical domains. △ Less

Submitted 8 August, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: Early Accepted at ICPR-2024 for Oral Presentation

arXiv:2401.10207 [pdf, other]

Eclectic Rule Extraction for Explainability of Deep Neural Network based Intrusion Detection Systems

Authors: Jesse Ables, Nathaniel Childers, William Anderson, Sudip Mittal, Shahram Rahimi, Ioana Banicescu, Maria Seale

Abstract: This paper addresses trust issues created from the ubiquity of black box algorithms and surrogate explainers in Explainable Intrusion Detection Systems (X-IDS). While Explainable Artificial Intelligence (XAI) aims to enhance transparency, black box surrogate explainers, such as Local Interpretable Model-Agnostic Explanation (LIME) and SHapley Additive exPlanation (SHAP), are difficult to trust. Th… ▽ More This paper addresses trust issues created from the ubiquity of black box algorithms and surrogate explainers in Explainable Intrusion Detection Systems (X-IDS). While Explainable Artificial Intelligence (XAI) aims to enhance transparency, black box surrogate explainers, such as Local Interpretable Model-Agnostic Explanation (LIME) and SHapley Additive exPlanation (SHAP), are difficult to trust. The black box nature of these surrogate explainers makes the process behind explanation generation opaque and difficult to understand. To avoid this problem, one can use transparent white box algorithms such as Rule Extraction (RE). There are three types of RE algorithms: pedagogical, decompositional, and eclectic. Pedagogical methods offer fast but untrustworthy white-box explanations, while decompositional RE provides trustworthy explanations with poor scalability. This work explores eclectic rule extraction, which strikes a balance between scalability and trustworthiness. By combining techniques from pedagogical and decompositional approaches, eclectic rule extraction leverages the advantages of both, while mitigating some of their drawbacks. The proposed Hybrid X-IDS architecture features eclectic RE as a white box surrogate explainer for black box Deep Neural Networks (DNN). The presented eclectic RE algorithm extracts human-readable rules from hidden layers, facilitating explainable and trustworthy rulesets. Evaluations on UNSW-NB15 and CIC-IDS-2017 datasets demonstrate the algorithm's ability to generate rulesets with 99.9% accuracy, mimicking DNN outputs. The contributions of this work include the hybrid X-IDS architecture, the eclectic rule extraction algorithm applicable to intrusion detection datasets, and a thorough analysis of performance and explainability, demonstrating the trade-offs involved in rule extraction speed and accuracy. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.10036 [pdf, other]

LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

Authors: Shaswata Mitra, Subash Neupane, Trisha Chakraborty, Sudip Mittal, Aritran Piplai, Manas Gaur, Shahram Rahimi

Abstract: Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat databases and customize them manually to suit a particular organization's needs. These analysts also depend on internal repositories, which act as private local knowledge database for an organization. Credible cyber intelligence, critical operational details, and relevant organizational information… ▽ More Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat databases and customize them manually to suit a particular organization's needs. These analysts also depend on internal repositories, which act as private local knowledge database for an organization. Credible cyber intelligence, critical operational details, and relevant organizational information are all stored in these local knowledge databases. Analysts undertake a labor intensive task utilizing these global and local knowledge databases to manually create organization's unique threat response and mitigation strategies. Recently, Large Language Models (LLMs) have shown the capability to efficiently process large diverse knowledge sources. We leverage this ability to process global and local knowledge databases to automate the generation of organization-specific threat intelligence. In this work, we present LOCALINTEL, a novel automated knowledge contextualization system that, upon prompting, retrieves threat reports from the global threat repositories and uses its local knowledge database to contextualize them for a specific organization. LOCALINTEL comprises of three key phases: global threat intelligence retrieval, local knowledge retrieval, and contextualized completion generation. The former retrieves intelligence from global threat repositories, while the second retrieves pertinent knowledge from the local knowledge database. Finally, the fusion of these knowledge sources is orchestrated through a generator to produce a contextualized completion. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07871 [pdf, other]

Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

Authors: Logan Cummins, Alex Sommers, Somayeh Bakhtiari Ramezani, Sudip Mittal, Joseph Jabour, Maria Seale, Shahram Rahimi

Abstract: Predictive maintenance is a well studied collection of techniques that aims to prolong the life of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time costs of upkeep. As these methods are adopted for more serious and potentially life-threaten… ▽ More Predictive maintenance is a well studied collection of techniques that aims to prolong the life of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time costs of upkeep. As these methods are adopted for more serious and potentially life-threatening applications, the human operators need trust the predictive system. This attracts the field of Explainable AI (XAI) to introduce explainability and interpretability into the predictive system. XAI brings methods to the field of predictive maintenance that can amplify trust in the users while maintaining well-performing systems. This survey on explainable predictive maintenance (XPM) discusses and presents the current methods of XAI as applied to predictive maintenance while following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines. We categorize the different XPM methods into groups that follow the XAI literature. Additionally, we include current challenges and a discussion on future research directions in XPM. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.06154 [pdf, other]

Comparison of home detection algorithms using smartphone GPS data

Authors: Rajat Verma, Shagun Mittal, Zengxiang Lei, Xiaowei Chen, Satish V. Ukkusuri

Abstract: Estimation of people's home locations using location-based services data from smartphones is a common task in human mobility assessment. However, commonly used home detection algorithms (HDAs) are often arbitrary and unexamined. In this study, we review existing HDAs and examine five HDAs using eight high-quality mobile phone geolocation datasets. These include four commonly used HDAs as well as a… ▽ More Estimation of people's home locations using location-based services data from smartphones is a common task in human mobility assessment. However, commonly used home detection algorithms (HDAs) are often arbitrary and unexamined. In this study, we review existing HDAs and examine five HDAs using eight high-quality mobile phone geolocation datasets. These include four commonly used HDAs as well as an HDA proposed in this work. To make quantitative comparisons, we propose three novel metrics to assess the quality of detected home locations and test them on eight datasets across four U.S. cities. We find that all three metrics show a consistent rank of HDAs' performances, with the proposed HDA outperforming the others. We infer that the temporal and spatial continuity of the geolocation data points matters more than the overall size of the data for accurate home detection. We also find that HDAs with high (and similar) performance metrics tend to create results with better consistency and closer to common expectations. Further, the performance deteriorates with decreasing data quality of the devices, though the patterns of relative performance persist. Finally, we show how the differences in home detection can lead to substantial differences in subsequent inferences using two case studies - (i) hurricane evacuation estimation, and (ii) correlation of mobility patterns with socioeconomic status. Our work contributes to improving the transparency of large-scale human mobility assessment applications. △ Less

Submitted 21 December, 2023; originally announced January 2024.

Comments: Paper currently under review in the journal "EPJ Data Science" (ISSN: 2193-1127); Manuscript: 24 pages (including 68 references, 7 figures, 3 tables); Supplementary material document not included

arXiv:2401.05680 [pdf, other]

Use of Graph Neural Networks in Aiding Defensive Cyber Operations

Authors: Shaswata Mitra, Trisha Chakraborty, Subash Neupane, Aritran Piplai, Sudip Mittal

Abstract: In an increasingly interconnected world, where information is the lifeblood of modern society, regular cyber-attacks sabotage the confidentiality, integrity, and availability of digital systems and information. Additionally, cyber-attacks differ depending on the objective and evolve rapidly to disguise defensive systems. However, a typical cyber-attack demonstrates a series of stages from attack i… ▽ More In an increasingly interconnected world, where information is the lifeblood of modern society, regular cyber-attacks sabotage the confidentiality, integrity, and availability of digital systems and information. Additionally, cyber-attacks differ depending on the objective and evolve rapidly to disguise defensive systems. However, a typical cyber-attack demonstrates a series of stages from attack initiation to final resolution, called an attack life cycle. These diverse characteristics and the relentless evolution of cyber attacks have led cyber defense to adopt modern approaches like Machine Learning to bolster defensive measures and break the attack life cycle. Among the adopted ML approaches, Graph Neural Networks have emerged as a promising approach for enhancing the effectiveness of defensive measures due to their ability to process and learn from heterogeneous cyber threat data. In this paper, we look into the application of GNNs in aiding to break each stage of one of the most renowned attack life cycles, the Lockheed Martin Cyber Kill Chain. We address each phase of CKC and discuss how GNNs contribute to preparing and preventing an attack from a defensive standpoint. Furthermore, We also discuss open research areas and further improvement scopes. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 35 pages, 9 figures, 8 tables

arXiv:2312.13504 [pdf, other]

doi 10.1103/PhysRevApplied.21.054044

Annealing reduces Si$_3$N$_4$ microwave-frequency dielectric loss in superconducting resonators

Authors: Sarang Mittal, Kazemi Adachi, Nicholas E. Frattini, Maxwell D. Urmey, Sheng-Xiang Lin, Alec L. Emser, Cyril Metzger, Luca Talamo, Sarah Dickson, David Carlson, Scott B. Papp, Cindy A. Regal, Konrad W. Lehnert

Abstract: The dielectric loss of silicon nitride (Si$_3$N$_4$) limits the performance of microwave-frequency devices that rely on this material for sensing, signal processing, and quantum communication. Using superconducting resonant circuits, we measure the cryogenic loss tangent of either as-deposited or high-temperature annealed stoichiometric Si$_3$N$_4$ as a function of drive strength and temperature.… ▽ More The dielectric loss of silicon nitride (Si$_3$N$_4$) limits the performance of microwave-frequency devices that rely on this material for sensing, signal processing, and quantum communication. Using superconducting resonant circuits, we measure the cryogenic loss tangent of either as-deposited or high-temperature annealed stoichiometric Si$_3$N$_4$ as a function of drive strength and temperature. The internal loss behavior of the electrical resonators is largely consistent with the standard tunneling model of two-level systems (TLS), including damping caused by resonant energy exchange with TLS and by the relaxation of non-resonant TLS. We further supplement the TLS model with a self-heating effect to explain an increase in the loss observed in as-deposited films at large drive powers. Critically, we demonstrate that annealing remedies this anomalous power-induced loss, reduces the relaxation-type damping by more than two orders of magnitude, and reduces the resonant-type damping by a factor of three. Employing infrared absorption spectroscopy, we find that annealing reduces the concentration of hydrogen in the Si$_3$N$_4$, suggesting that hydrogen impurities cause substantial dissipation. △ Less

Submitted 16 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 11 pages, 7 figures

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10534 [pdf, other]

Rethinking Robustness of Model Attributions

Authors: Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian

Abstract: For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile… ▽ More For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at https://github.com/ksandeshk/LENS. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: Accepted AAAI 2024

arXiv:2312.01128 [pdf, other]

SPEEDNet: Salient Pyramidal Enhancement Encoder-Decoder Network for Colonoscopy Images

Authors: Tushir Sahu, Vidhi Bhatt, Sai Chandra Teja R, Sparsh Mittal, Nagesh Kumar S

Abstract: Accurate identification and precise delineation of regions of significance, such as tumors or lesions, is a pivotal goal in medical imaging analysis. This paper proposes SPEEDNet, a novel architecture for precisely segmenting lesions within colonoscopy images. SPEEDNet uses a novel block named Dilated-Involutional Pyramidal Convolution Fusion (DIPC). A DIPC block combines the dilated involution la… ▽ More Accurate identification and precise delineation of regions of significance, such as tumors or lesions, is a pivotal goal in medical imaging analysis. This paper proposes SPEEDNet, a novel architecture for precisely segmenting lesions within colonoscopy images. SPEEDNet uses a novel block named Dilated-Involutional Pyramidal Convolution Fusion (DIPC). A DIPC block combines the dilated involution layers pairwise into a pyramidal structure to convert the feature maps into a compact space. This lowers the total number of parameters while improving the learning of representations across an optimal receptive field, thereby reducing the blurring effect. On the EBHISeg dataset, SPEEDNet outperforms three previous networks: UNet, FeedNet, and AttesResDUNet. Specifically, SPEEDNet attains an average dice score of 0.952 and a recall of 0.971. Qualitative results and ablation studies provide additional insights into the effectiveness of SPEEDNet. The model size of SPEEDNet is 9.81 MB, significantly smaller than that of UNet (22.84 MB), FeedNet(185.58 MB), and AttesResDUNet (140.09 MB). △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures

arXiv:2311.16700 [pdf, other]

Rethinking Intermediate Layers design in Knowledge Distillation for Kidney and Liver Tumor Segmentation

Authors: Vandan Gorade, Sparsh Mittal, Debesh Jha, Ulas Bagci

Abstract: Knowledge distillation (KD) has demonstrated remarkable success across various domains, but its application to medical imaging tasks, such as kidney and liver tumor segmentation, has encountered challenges. Many existing KD methods are not specifically tailored for these tasks. Moreover, prevalent KD methods often lack a careful consideration of `what' and `from where' to distill knowledge from th… ▽ More Knowledge distillation (KD) has demonstrated remarkable success across various domains, but its application to medical imaging tasks, such as kidney and liver tumor segmentation, has encountered challenges. Many existing KD methods are not specifically tailored for these tasks. Moreover, prevalent KD methods often lack a careful consideration of `what' and `from where' to distill knowledge from the teacher to the student. This oversight may lead to issues like the accumulation of training bias within shallower student layers, potentially compromising the effectiveness of KD. To address these challenges, we propose Hierarchical Layer-selective Feedback Distillation (HLFD). HLFD strategically distills knowledge from a combination of middle layers to earlier layers and transfers final layer knowledge to intermediate layers at both the feature and pixel levels. This design allows the model to learn higher-quality representations from earlier layers, resulting in a robust and compact student model. Extensive quantitative evaluations reveal that HLFD outperforms existing methods by a significant margin. For example, in the kidney segmentation task, HLFD surpasses the student model (without KD) by over 10\%, significantly improving its focus on tumor-specific features. From a qualitative standpoint, the student model trained using HLFD excels at suppressing irrelevant information and can focus sharply on tumor-specific details, which opens a new pathway for more efficient and accurate diagnostic tools. Code is available \href{https://github.com/vangorade/RethinkingKD_ISBI24}{here}. △ Less

Submitted 27 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Accepted at ISBI-2024 for Oral Presentation

arXiv:2311.08085 [pdf, other]

Optimizing Electric Vehicle Efficiency with Real-Time Telemetry using Machine Learning

Authors: Aryaman Rao, Harshit Gupta, Parth Singh, Shivam Mittal, Utkrash Singh, Dinesh Kumar Vishwakarma

Abstract: In the contemporary world with degrading natural resources, the urgency of energy efficiency has become imperative due to the conservation and environmental safeguarding. Therefore, it's crucial to look for advanced technology to minimize energy consumption. This research focuses on the optimization of battery-electric city style vehicles through the use of a real-time in-car telemetry system that… ▽ More In the contemporary world with degrading natural resources, the urgency of energy efficiency has become imperative due to the conservation and environmental safeguarding. Therefore, it's crucial to look for advanced technology to minimize energy consumption. This research focuses on the optimization of battery-electric city style vehicles through the use of a real-time in-car telemetry system that communicates between components through the robust Controller Area Network (CAN) protocol. By harnessing real-time data from various sensors embedded within vehicles, our driving assistance system provides the driver with visual and haptic actionable feedback that guides the driver on using the optimum driving style to minimize power consumed by the vehicle. To develop the pace feedback mechanism for the driver, real-time data is collected through a Shell Eco Marathon Urban Concept vehicle platform and after pre-processing, it is analyzed using the novel machine learning algorithm TEMSL, that outperforms the existing baseline approaches across various performance metrics. This innovative method after numerous experimentation has proven effective in enhancing energy efficiency, guiding the driver along the track, and reducing human errors. The driving-assistance system offers a range of utilities, from cost savings and extended vehicle lifespan to significant contributions to environmental conservation and sustainable driving practices. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.03447 [pdf, other]

Radiative transfer of Lyman-$α$ photons at cosmic dawn with realistic gas physics

Authors: Shikhar Mittal, Girish Kulkarni, Thibault Garel

Abstract: The cosmic dawn 21-cm signal is enabled by Ly~$α$ photons through a process called the Wouthuysen-Field effect. An accurate model of the signal in this epoch hinges on the accuracy of the computation of the Ly~$α$ coupling, which requires one to calculate the specific intensity of UV radiation from sources such as the first stars. Most traditional calculations of the Ly~$α$ coupling assume a delta… ▽ More The cosmic dawn 21-cm signal is enabled by Ly~$α$ photons through a process called the Wouthuysen-Field effect. An accurate model of the signal in this epoch hinges on the accuracy of the computation of the Ly~$α$ coupling, which requires one to calculate the specific intensity of UV radiation from sources such as the first stars. Most traditional calculations of the Ly~$α$ coupling assume a delta-function scattering cross-section, as the resonant nature of the Ly~$α$ scattering makes an accurate radiative transfer solution computationally expensive. Attempts to improve upon this traditional approach using numerical radiative transfer have recently emerged. However, the radiative transfer computation in these treatments suffers from assumptions such as a uniform density of intergalactic gas, zero gas temperature, and absence of gas bulk motion, or numerical approximations such as core skipping. We investigate the role played by these approximations in setting the value of the Ly~$α$ coupling and the 21-cm signal at cosmic dawn. We present results of Monte Carlo radiative transfer simulations, without core skipping, and show that neglecting gas temperature in the radiative transfer significantly underestimates the scattering rate and hence the Ly~$α$ coupling and the 21-cm signal. We also discuss the effect of these processes on the 21-cm power spectrum from the cosmic dawn. This work points the way towards higher-accuracy models to enable better inferences from future measurements. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 20 pages, 14 figures, 2 appendices. Submitted to MNRAS. Comments are welcome

arXiv:2311.02579 [pdf, other]

mahaNLP: A Marathi Natural Language Processing Library

Authors: Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Saloni Mittal, Raviraj Joshi

Abstract: We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use, extensible, and modular toolkit for Marathi text analysis built on state-of-the-art MahaBERT-based transformer models. Our work holds significant importance as… ▽ More We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use, extensible, and modular toolkit for Marathi text analysis built on state-of-the-art MahaBERT-based transformer models. Our work holds significant importance as other existing Indic NLP libraries provide basic Marathi processing support and rely on older models with restricted performance. Our toolkit stands out by offering a comprehensive array of NLP tasks, encompassing both fundamental preprocessing tasks and advanced NLP tasks like sentiment analysis, NER, hate speech detection, and sentence completion. This paper focuses on an overview of the mahaNLP framework, its features, and its usage. This work is a part of the L3Cube MahaNLP initiative, more information about it can be found at https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: Accepted at IJCNLP-AACL 2023

arXiv:2311.01247 [pdf, other]

Emergent (In)Security of Multi-Cloud Environments

Authors: Morgan Reece, Theodore Lander Jr., Sudip Mittal, Nidhi Rastogi, Josiah Dykstra, Andy Sampson

Abstract: As organizations increasingly use cloud services to host their IT infrastructure, there is a need to share data among these cloud hosted services and systems. A majority of IT organizations have workloads spread across different cloud service providers, growing their multi-cloud environments. When an organization grows their multi-cloud environment, the threat vectors and vulnerabilities for their… ▽ More As organizations increasingly use cloud services to host their IT infrastructure, there is a need to share data among these cloud hosted services and systems. A majority of IT organizations have workloads spread across different cloud service providers, growing their multi-cloud environments. When an organization grows their multi-cloud environment, the threat vectors and vulnerabilities for their cloud systems and services grow as well. The increase in the number of attack vectors creates a challenge of how to prioritize mitigations and countermeasures to best defend a multi-cloud environment against attacks. Utilizing multiple industry standard risk analysis tools, we conducted an analysis of multi-cloud threat vectors enabling calculation and prioritization for the identified mitigations and countermeasures. The prioritizations from the analysis showed that authentication and architecture are the highest risk areas of threat vectors. Armed with this data, IT managers are able to more appropriately budget cybersecurity expenditure to implement the most impactful mitigations and countermeasures. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Journal ref: 39th ACM Annual Computer Security Applications Conference 2023 (ACM ACSAC 2023)

arXiv:2311.00203 [pdf, other]

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Authors: Senjuti Dutta, Sid Mittal, Sherol Chen, Deepak Ramachandran, Ravi Rajakumar, Ian Kivlichan, Sunny Mak, Alena Butryna, Praveen Paritosh

Abstract: The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances… ▽ More The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX}} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.19355 [pdf, other]

Local random quantum circuits form approximate designs on arbitrary architectures

Authors: Shivan Mittal, Nicholas Hunter-Jones

Abstract: We consider random quantum circuits (RQC) on arbitrary connected graphs whose edges determine the allowed $2$-qudit interactions. Prior work has established that such $n$-qudit circuits with local dimension $q$ on 1D, complete, and $D$-dimensional graphs form approximate unitary designs, that is, they generate unitaries from distributions close to the Haar measure on the unitary group $U(q^n)$ aft… ▽ More We consider random quantum circuits (RQC) on arbitrary connected graphs whose edges determine the allowed $2$-qudit interactions. Prior work has established that such $n$-qudit circuits with local dimension $q$ on 1D, complete, and $D$-dimensional graphs form approximate unitary designs, that is, they generate unitaries from distributions close to the Haar measure on the unitary group $U(q^n)$ after polynomially many gates. Here, we extend those results by proving that RQCs comprised of $O(\mathrm{poly}(n,k))$ gates on a wide class of graphs form approximate unitary $k$-designs. We prove that RQCs on graphs with spanning trees of bounded degree and height form $k$-designs after $O(|E|n\,\mathrm{poly}(k))$ gates, where $|E|$ is the number of edges in the graph. Furthermore, we identify larger classes of graphs for which RQCs generate approximate designs in polynomial circuit size. For $k \leq 4$, we show that RQCs on graphs of certain maximum degrees form designs after $O(|E|n)$ gates, providing explicit constants. We determine our circuit size bounds from the spectral gaps of local Hamiltonians. To that end, we extend the finite-size (or Knabe) method for bounding gaps of frustration-free Hamiltonians on regular graphs to arbitrary connected graphs. We further introduce a new method based on the Detectability Lemma for determining the spectral gaps of Hamiltonians on arbitrary graphs. Our methods have wider applicability as the first method provides a succinct alternative proof of [Commun. Math. Phys. 291, 257 (2009)] and the second method proves that RQCs on any connected architecture form approximate designs in quasi-polynomial circuit size. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.18205 [pdf, other]

Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media

Authors: Shubham Mittal, Megha Sundriyal, Preslav Nakov

Abstract: Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to identify text segments that contain a checkworthy claim or assertion in a social media post. Despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by c… ▽ More Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to identify text segments that contain a checkworthy claim or assertion in a social media post. Despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by creating a novel dataset, X-CLAIM, consisting of 7K real-world claims collected from numerous social media platforms in five Indian languages and English. We report strong baselines with state-of-the-art encoder-only language models (e.g., XLM-R) and we demonstrate the benefits of training on multiple languages over alternative cross-lingual transfer methods such as zero-shot transfer, or training on translated data, from a high-resource language such as English. We evaluate generative large language models from the GPT series using prompting methods on the X-CLAIM dataset and we find that they underperform the smaller encoder-only language models for low-resource languages. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (main)

Showing 1–50 of 211 results for author: Mittal, S