-
Reimagining Communities through Transnational Bengali Decolonial Discourse with YouTube Content Creators
Authors:
Dipto Das,
Dhwani Gandhi,
Bryan Semaan
Abstract:
Colonialism--the policies and practices wherein a foreign body imposes its ways of life on local communities--has historically impacted how collectives perceive themselves in relation to others. One way colonialism has impacted how people see themselves is through nationalism, where nationalism is often understood through shared language, culture, religion, and geopolitical borders. The way coloni…
▽ More
Colonialism--the policies and practices wherein a foreign body imposes its ways of life on local communities--has historically impacted how collectives perceive themselves in relation to others. One way colonialism has impacted how people see themselves is through nationalism, where nationalism is often understood through shared language, culture, religion, and geopolitical borders. The way colonialism has shaped people's experiences with nationalism has shaped historical conflicts between members of different nation-states for a long time. While recent social computing research has studied how colonially marginalized people can engage in discourse to decolonize or re-imagine and reclaim themselves and their communities on their own terms--what is less understood is how technology can better support decolonial discourses in an effort to re-imagine nationalism. To understand this phenomenon, this research draws on a semi-structured interview study with YouTubers who make videos about culturally Bengali people whose lives were upended as a product of colonization and are now dispersed across Bangladesh, India, and Pakistan. This research seeks to understand people's motivations and strategies for engaging in video-mediated decolonial discourse in transnational contexts. We discuss how our work demonstrates the potential of the sociomateriality of decolonial discourse online and extends an invitation to foreground complexities of nationalism in social computing research.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Multi-Goal Motion Memory
Authors:
Yuanjie Lu,
Dibyendu Das,
Erion Plaku,
Xuesu Xiao
Abstract:
Autonomous mobile robots (e.g., warehouse logistics robots) often need to traverse complex, obstacle-rich, and changing environments to reach multiple fixed goals (e.g., warehouse shelves). Traditional motion planners need to calculate the entire multi-goal path from scratch in response to changes in the environment, which result in a large consumption of computing resources. This process is not o…
▽ More
Autonomous mobile robots (e.g., warehouse logistics robots) often need to traverse complex, obstacle-rich, and changing environments to reach multiple fixed goals (e.g., warehouse shelves). Traditional motion planners need to calculate the entire multi-goal path from scratch in response to changes in the environment, which result in a large consumption of computing resources. This process is not only time-consuming but also may not meet real-time requirements in application scenarios that require rapid response to environmental changes. In this paper, we provide a novel Multi-Goal Motion Memory technique that allows robots to use previous planning experiences to accelerate future multi-goal planning in changing environments. Specifically, our technique predicts collision-free and dynamically-feasible trajectories and distances between goal pairs to guide the sampling process to build a roadmap, to inform a Traveling Salesman Problem (TSP) solver to compute a tour, and to efficiently produce motion plans. Experiments conducted with a vehicle and a snake-like robot in obstacle-rich environments show that the proposed Motion Memory technique can substantially accelerate planning speed by up to 90\%. Furthermore, the solution quality is comparable to state-of-the-art algorithms and even better in some environments.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Segmentation-Free Guidance for Text-to-Image Diffusion Models
Authors:
Kambiz Azarian,
Debasmit Das,
Qiqi Hou,
Fatih Porikli
Abstract:
We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generate…
▽ More
We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generated image, based on the patch's relevance to concepts in the prompt. We evaluate segmentation-free guidance both objectively, using FID, CLIP, IS, and PickScore, and subjectively, through human evaluators. For the subjective evaluation, we also propose a methodology for subsampling the prompts in a dataset like MS COCO-30K to keep the number of human evaluations manageable while ensuring that the selected subset is both representative in terms of content and fair in terms of model performance. The results demonstrate the superiority of our segmentation-free guidance to the widely used classifier-free method. Human evaluators preferred segmentation-free guidance over classifier-free 60% to 19%, with 18% of occasions showing a strong preference. Additionally, PickScore win-rate, a recently proposed metric mimicking human preference, also indicates a preference for our method over classifier-free.
△ Less
Submitted 3 June, 2024;
originally announced July 2024.
-
Demand Analysis and Customized Product Offering Design on E-Commerce Platform
Authors:
Dipankar Das
Abstract:
It can be observed that the purchasing decision of an individual consumer in an electronic marketplace is determined by a set of factors, such as personal characteristics of the consumer, product pricing, minimum price-quantity combination offered, decision-making space, and underlying motivation of the consumer. These factors are combined to form a consumer's choice problem domain, which plays a…
▽ More
It can be observed that the purchasing decision of an individual consumer in an electronic marketplace is determined by a set of factors, such as personal characteristics of the consumer, product pricing, minimum price-quantity combination offered, decision-making space, and underlying motivation of the consumer. These factors are combined to form a consumer's choice problem domain, which plays a pivotal role in the product offering. In this study, we attempt to focus on how the products? Offered can be customized by incorporating the quantity and pack size of the products along with the factors above to form a more extensive domain for examining the combined effects of all of these factors on demand. Accordingly, the demand function is defined by a novel method invoking the extended domain of choice problem in the electronic marketplace. Consequently, the predictable uncertainty associated with the consumer's demand function may disappear, increase the likelihood of earning optimum revenue through customized combinations of the components of the extended domain of choice problem, and improve the understanding of the fluctuations in consumer demand. Finally, we propose a generalized price response function with standard properties applicable to E-Commerce.
△ Less
Submitted 29 April, 2024;
originally announced June 2024.
-
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Authors:
Debeshee Das,
Jie Zhang,
Florian Tramèr
Abstract:
Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample mem…
▽ More
Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks -- that distinguish the member and non-member distributions without looking at any trained model -- outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model's training data.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Molecule Graph Networks with Many-body Equivariant Interactions
Authors:
Zetian Mao,
Jiawen Li,
Chen Liang,
Diptesh Das,
Masato Sumita,
Koji Tsuda
Abstract:
Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on…
▽ More
Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates equivariant many-body interactions to preserve directional information in the message passing scheme. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties. Ablation studies show an average performance improvement of 7.9% across 11 out of 12 properties in QM9, 27.9% in forces in MD17, and 11.3% in polarizabilities (CCSD) in QM7b.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Task Planning for Object Rearrangement in Multi-room Environments
Authors:
Karan Mirakhor,
Sourav Ghosh,
Dipanjan Das,
Brojeshwar Bhowmick
Abstract:
Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hie…
▽ More
Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hierarchical task planner to efficiently plan a sequence of actions to discover unseen objects and rearrange misplaced objects within an untidy house to achieve a desired tidy state. The proposed method introduces several novel techniques, including (i) a method for discovering unseen objects using commonsense knowledge from large language models, (ii) a collision resolution and buffer prediction method based on Cross-Entropy Method to handle blocked goal and swap cases, (iii) a directed spatial graph-based state space for scalability, and (iv) deep reinforcement learning (RL) for producing an efficient planner. The planner interleaves the discovery of unseen objects and rearrangement to minimize the number of steps taken and overall traversal of the agent. The paper also presents new metrics and a benchmark dataset called MoPOR to evaluate the effectiveness of the rearrangement planning in a multi-room setting. The experimental results demonstrate that the proposed method effectively addresses the multi-room rearrangement problem.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
A Stochastic Incentive-based Demand Response Program for Virtual Power Plant with Solar, Battery, Electric Vehicles, and Controllable Loads
Authors:
Pratik Harsh,
Hongjian Sun,
Debapriya Das,
Goyal Awagan,
Jing Jiang
Abstract:
The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by m…
▽ More
The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by managing the energy demands in parallel with the uncertain energy outputs of the DERs. This work presents a stochastic incentive-based demand response model for the scheduling operation of VPP comprising solar-powered generating stations, battery swapping stations, electric vehicle charging stations, and consumers with controllable loads. The work also proposes a priority mechanism to consider the individual preferences of electric vehicle users and consumers with controllable loads. The scheduling approach for the VPP is framed as a multi-objective optimization problem, normalized using the utopia-tracking method. Subsequently, the normalized optimization problem is transformed into a stochastic formulation to address uncertainties in energy demand from charging stations and controllable loads. The proposed VPP scheduling approach is addressed on a 33-node distribution system simulated using MATLAB software, which is further validated using a real-time digital simulator.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Authors:
Chaitanya Malaviya,
Priyanka Agrawal,
Kuzman Ganchev,
Pranesh Srinivasan,
Fantine Huot,
Jonathan Berant,
Mark Yatskar,
Dipanjan Das,
Mirella Lapata,
Chris Alberti
Abstract:
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form o…
▽ More
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge.
△ Less
Submitted 28 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Probabilistic Interval Analysis of Unreliable Programs
Authors:
Dibyendu Das,
Soumyajit Dey
Abstract:
Advancement of chip technology will make future computer chips faster. Power consumption of such chips shall also decrease. But this speed gain shall not come free of cost, there is going to be a trade-off between speed and efficiency, i.e accuracy of the computation. In order to achieve this extra speed we will simply have to let our computers make more mistakes in computations. Consequently, sys…
▽ More
Advancement of chip technology will make future computer chips faster. Power consumption of such chips shall also decrease. But this speed gain shall not come free of cost, there is going to be a trade-off between speed and efficiency, i.e accuracy of the computation. In order to achieve this extra speed we will simply have to let our computers make more mistakes in computations. Consequently, systems built with these type of chips will possess an innate unreliability lying within. Programs written for these systems will also have to incorporate this unreliability. Researchers have already started developing programming frameworks for unreliable architectures as such.
In the present work, we use a restricted version of C-type languages to model the programs written for unreliable architectures. We propose a technique for statically analyzing codes written for these kind of architectures. Our technique, which primarily focuses on Interval/Range Analysis of this type of programs, uses the well established theory of abstract interpretation. While discussing unreliability of hardware, there comes scope of failure of the hardware components implicitly. There are two types of failure models, namely: 1) permanent failure model, where the hardware stops execution on failure and 2) transient failure model, where on failure, the hardware continues subsequent operations with wrong operand values. In this paper, we've only taken transient failure model into consideration. The goal of this analysis is to predict the probability with which a program variable assumes values from a given range at a given program point.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Transformer-based Joint Modelling for Automatic Essay Scoring and Off-Topic Detection
Authors:
Sourya Dipta Das,
Yash Vadi,
Kuldeep Yadav
Abstract:
Automated Essay Scoring (AES) systems are widely popular in the market as they constitute a cost-effective and time-effective option for grading systems. Nevertheless, many studies have demonstrated that the AES system fails to assign lower grades to irrelevant responses. Thus, detecting the off-topic response in automated essay scoring is crucial in practical tasks where candidates write unrelate…
▽ More
Automated Essay Scoring (AES) systems are widely popular in the market as they constitute a cost-effective and time-effective option for grading systems. Nevertheless, many studies have demonstrated that the AES system fails to assign lower grades to irrelevant responses. Thus, detecting the off-topic response in automated essay scoring is crucial in practical tasks where candidates write unrelated text responses to the given task in the question. In this paper, we are proposing an unsupervised technique that jointly scores essays and detects off-topic essays. The proposed Automated Open Essay Scoring (AOES) model uses a novel topic regularization module (TRM), which can be attached on top of a transformer model, and is trained using a proposed hybrid loss function. After training, the AOES model is further used to calculate the Mahalanobis distance score for off-topic essay detection. Our proposed method outperforms the baseline we created and earlier conventional methods on two essay-scoring datasets in off-topic detection as well as on-topic scoring. Experimental evaluation results on different adversarial strategies also show how the suggested method is robust for detecting possible human-level perturbations.
△ Less
Submitted 24 March, 2024;
originally announced April 2024.
-
Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration
Authors:
Shivam Singh,
Karthik Swaminathan,
Raghav Arora,
Ramandeep Singh,
Ahana Datta,
Dipanjan Das,
Snehasis Banerjee,
Mohan Sridharan,
Madhava Krishna
Abstract:
An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals f…
▽ More
An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals for a classical planning system that computed a sequence of low-level actions for the agent to achieve these goals. This paper describes DaTAPlan, our framework that significantly extends our prior work toward human-robot collaboration. Specifically, DaTAPlan planner computes actions for an agent and a human to collaboratively and jointly achieve the tasks anticipated by the LLM, and the agent automatically adapts to unexpected changes in human action outcomes and preferences. We evaluate DaTAPlan capabilities in a realistic simulation environment, demonstrating accurate task anticipation, effective human-robot collaboration, and the ability to adapt to unexpected changes. Project website: https://dataplan-hrc.github.io
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
qIoV: A Quantum-Driven Internet-of-Vehicles-Based Approach for Environmental Monitoring and Rapid Response Systems
Authors:
Ankur Nahar,
Koustav Kumar Mondal,
Debasis Das,
Rajkumar Buyya
Abstract:
This research addresses the critical necessity for advanced rapid response operations in managing a spectrum of environmental hazards. We propose a novel framework, qIoV that integrates quantum computing with the Internet-of-Vehicles (IoV) to leverage the computational efficiency, parallelism, and entanglement properties of quantum mechanics. Our approach involves the use of environmental sensors…
▽ More
This research addresses the critical necessity for advanced rapid response operations in managing a spectrum of environmental hazards. We propose a novel framework, qIoV that integrates quantum computing with the Internet-of-Vehicles (IoV) to leverage the computational efficiency, parallelism, and entanglement properties of quantum mechanics. Our approach involves the use of environmental sensors mounted on vehicles for precise air quality assessment. These sensors are designed to be highly sensitive and accurate, leveraging the principles of quantum mechanics to detect and measure environmental parameters. A salient feature of our proposal is the Quantum Mesh Network Fabric (QMF), a system designed to dynamically adjust the quantum network topology in accordance with vehicular movements. This capability is critical to maintaining the integrity of quantum states against environmental and vehicular disturbances, thereby ensuring reliable data transmission and processing. Moreover, our methodology is further augmented by the incorporation of a variational quantum classifier (VQC) with advanced quantum entanglement techniques. This integration offers a significant reduction in latency for hazard alert transmission, thus enabling expedited communication of crucial data to emergency response teams and the public. Our study on the IBM OpenQSAM 3 platform, utilizing a 127 Qubit system, revealed significant advancements in pair plot analysis, achieving over 90% in precision, recall, and F1-Score metrics and an 83% increase in the speed of toxic gas detection compared to conventional methods.Additionally, theoretical analyses validate the efficiency of quantum rotation, teleportation protocols, and the fidelity of quantum entanglement, further underscoring the potential of quantum computing in enhancing analytical performance.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
eRST: A Signaled Graph Theory of Discourse Relations and Organization
Authors:
Amir Zeldes,
Tatsuya Aoyama,
Yang Janet Liu,
Siyao Peng,
Debopam Das,
Luke Gessler
Abstract:
In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses…
▽ More
In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
PosSAM: Panoptic Open-vocabulary Segment Anything
Authors:
Vibashan VS,
Shubhankar Borse,
Hyojin Park,
Debasmit Das,
Vishal Patel,
Munawar Hayat,
Fatih Porikli
Abstract:
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing appr…
▽ More
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing approaches address this limitation by using multi-stage techniques and employing separate models to generate class-aware prompts, such as bounding boxes or segmentation masks. Our proposed method, PosSAM is an end-to-end model which leverages SAM's spatially rich features to produce instance-aware masks and harnesses CLIP's semantically discriminative features for effective instance classification. Specifically, we address the limitations of SAM and propose a novel Local Discriminative Pooling (LDP) module leveraging class-agnostic SAM and class-aware CLIP features for unbiased open-vocabulary classification. Furthermore, we introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image. We conducted extensive experiments to demonstrate our methods strong generalization properties across multiple datasets, achieving state-of-the-art performance with substantial improvements over SOTA open-vocabulary panoptic segmentation methods. In both COCO to ADE20K and ADE20K to COCO settings, PosSAM outperforms the previous state-of-the-art methods by a large margin, 2.4 PQ and 4.6 PQ, respectively. Project Website: https://vibashan.github.io/possam-web/.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
SNOW-SCA: ML-assisted Side-Channel Attack on SNOW-V
Authors:
Harshit Saurabh,
Anupam Golder,
Samarth Shivakumar Titti,
Suparna Kundu,
Chaoyun Li,
Angshuman Karmakar,
Debayan Das
Abstract:
This paper presents SNOW-SCA, the first power side-channel analysis (SCA) attack of a 5G mobile communication security standard candidate, SNOW-V, running on a 32-bit ARM Cortex-M4 microcontroller. First, we perform a generic known-key correlation (KKC) analysis to identify the leakage points. Next, a correlation power analysis (CPA) attack is performed, which reduces the attack complexity to two…
▽ More
This paper presents SNOW-SCA, the first power side-channel analysis (SCA) attack of a 5G mobile communication security standard candidate, SNOW-V, running on a 32-bit ARM Cortex-M4 microcontroller. First, we perform a generic known-key correlation (KKC) analysis to identify the leakage points. Next, a correlation power analysis (CPA) attack is performed, which reduces the attack complexity to two key guesses for each key byte. The correct secret key is then uniquely identified utilizing linear discriminant analysis (LDA). The profiled SCA attack with LDA achieves 100% accuracy after training with $<200$ traces, which means the attack succeeds with just a single trace. Overall, using the \textit{combined CPA and LDA attack} model, the correct secret key byte is recovered with <50 traces collected using the ChipWhisperer platform. The entire 256-bit secret key of SNOW-V can be recovered incrementally using the proposed SCA attack. Finally, we suggest low-overhead countermeasures that can be used to prevent these SCA attacks.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning
Authors:
Debrup Das,
Debopriyo Banerjee,
Somak Aditya,
Ashish Kulkarni
Abstract:
Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for kn…
▽ More
Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for knowledge retrieval and mathematical equation solving are open research questions. In this work, we present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API) through evaluations on mathematical reasoning datasets. We perform exhaustive ablations on MATH, a popular dataset for evaluating mathematical reasoning on diverse mathematical disciplines. We also conduct experiments involving well-known tool planners to study the impact of tool sequencing on the model performance. MathSensei achieves 13.5% better accuracy over gpt-3.5-turbo with Chain-of-Thought on the MATH dataset. We further observe that TALMs are not as effective for simpler math word problems (in GSM-8K), and the benefit increases as the complexity and required knowledge increases (progressively over AQuA, MMLU-Math, and higher level complex questions in MATH). The code and data are available at https://github.com/Debrup-61/MathSensei.
△ Less
Submitted 3 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Research status of the Mendeleev Periodic Table: a bibliometric analysis
Authors:
Kamna Sharma,
Deepak Kumar Das,
Saibal Ray
Abstract:
In this paper, we present a bibliometric analysis of the Mendeleev Periodic Table. We have conducted a comprehensive analysis of the Scopus-based database using the keyword "Mendeleev Periodic Table". Our findings suggest that the Mendeleev Periodic Table is an influential topic in the field of Inorganic as well as Organic Chemistry. Future researchers may focus on expanding our analysis to includ…
▽ More
In this paper, we present a bibliometric analysis of the Mendeleev Periodic Table. We have conducted a comprehensive analysis of the Scopus-based database using the keyword "Mendeleev Periodic Table". Our findings suggest that the Mendeleev Periodic Table is an influential topic in the field of Inorganic as well as Organic Chemistry. Future researchers may focus on expanding our analysis to include other bibliometric indicators to gain a more comprehensive understanding of the impact of the Mendeleev Periodic Table in chemistry-based scientific investigations and even in the field of astrochemistry.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Under the Surface: Tracking the Artifactuality of LLM-Generated Data
Authors:
Debarati Das,
Karin De Langis,
Anna Martin-Boyle,
Jaehyung Kim,
Minhwa Lee,
Zae Myung Kim,
Shirley Anugrah Hayati,
Risako Owan,
Bin Hu,
Ritik Parkar,
Ryan Koo,
Jonginn Park,
Aahan Tyagi,
Libby Ferland,
Sanjali Roy,
Vincent Liu,
Dongyeop Kang
Abstract:
This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant c…
▽ More
This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant concerns about the quality and diversity of the artificial data incorporated into training cycles, leading to an artificial data ecosystem. To the best of our knowledge, this is the first study to aggregate various types of LLM-generated text data, from more tightly constrained data like "task labels" to more lightly constrained "free-form text". We then stress test the quality and implications of LLM-generated artificial data, comparing it with human data across various existing benchmarks. Despite artificial data's capability to match human performance, this paper reveals significant hidden disparities, especially in complex tasks where LLMs often miss the nuanced understanding of intrinsic human-generated content. This study critically examines diverse LLM-generated data and emphasizes the need for ethical practices in data creation and when using LLMs. It highlights the LLMs' shortcomings in replicating human traits and behaviors, underscoring the importance of addressing biases and artifacts produced in LLM-generated content for future research and development. All data and code are available on our project page.
△ Less
Submitted 30 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
The "Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases
Authors:
Dipto Das,
Shion Guha,
Jed Brubaker,
Bryan Semaan
Abstract:
While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, althoug…
▽ More
While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities that have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event Classification
Authors:
Shubham Gupta,
Nandini Saini,
Suman Kundu,
Debasis Das
Abstract:
Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature…
▽ More
Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model's outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Authors:
Devikalyan Das,
Christopher Wewer,
Raza Yunus,
Eddy Ilg,
Jan Eric Lenssen
Abstract:
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we…
▽ More
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.
△ Less
Submitted 31 March, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models
Authors:
Debarati Das,
Ishaan Gupta,
Jaideep Srivastava,
Dongyeop Kang
Abstract:
Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph's global connectiv…
▽ More
Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph's global connectivity, thereby enhancing LLMs' efficiency in processing complex graph structures. The study also presents GraphTMI, a novel benchmark for evaluating LLMs in graph structure analysis, focusing on homophily, motif presence, and graph difficulty. Key findings indicate that the image modality, especially with vision-language models like GPT-4V, is superior to text in balancing token limits and preserving essential information and outperforms prior graph neural net (GNN) encoders. Furthermore, the research assesses how various factors affect the performance of each encoding modality and outlines the existing challenges and potential future developments for LLMs in graph understanding and reasoning tasks. All data will be publicly available upon acceptance.
△ Less
Submitted 13 March, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Several Consequences of Optimality
Authors:
Dibakar Das
Abstract:
Rationality is frequently associated with making the best possible decisions. It's widely acknowledged that humans, as rational beings, have limitations in their decision-making capabilities. Nevertheless, recent advancements in fields, such as, computing, science and technology, combined with the availability of vast amounts of data, have sparked optimism that these developments could potentially…
▽ More
Rationality is frequently associated with making the best possible decisions. It's widely acknowledged that humans, as rational beings, have limitations in their decision-making capabilities. Nevertheless, recent advancements in fields, such as, computing, science and technology, combined with the availability of vast amounts of data, have sparked optimism that these developments could potentially expand the boundaries of human bounded rationality through the augmentation of machine intelligence. In this paper, findings from a computational model demonstrated that when an increasing number of agents independently strive to achieve global optimality, facilitated by improved computing power, etc., they indirectly accelerated the occurrence of the "tragedy of the commons" by depleting shared resources at a faster rate. Further, as agents achieve optimality, there is a drop in information entropy among the solutions of the agents. Also, clear economic divide emerges among agents. Considering, two groups, one as producer and the other (the group agents searching for optimality) as consumer of the highest consumed resource, the consumers seem to gain more than the producers. Thus, bounded rationality could be seen as boon to sustainability.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection
Authors:
Devleena Das,
Vivek Khetan
Abstract:
Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT-UCS, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to identify a smaller, representative dataset that reduces the amount of data need…
▽ More
Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT-UCS, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to identify a smaller, representative dataset that reduces the amount of data needed to fine-tune PLMs for downstream tasks. We examine the efficacy of DEFT-UCS in the context of text-editing LMs, and compare to the state-of-the art text-editing model, CoEDIT. Our results demonstrate that DEFT-UCS models are just as accurate as CoEDIT, across eight different datasets consisting of six different editing tasks, while finetuned on 70% less data.
△ Less
Submitted 12 June, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Exploiting Unfair Advantages: Investigating Opportunistic Trading in the NFT Market
Authors:
Priyanka Bose,
Dipanjan Das,
Fabio Gritti,
Nicola Ruaro,
Christopher Kruegel,
Giovanni Vigna
Abstract:
As cryptocurrency evolved, new financial instruments, such as lending and borrowing protocols, currency exchanges, fungible and non-fungible tokens (NFT), staking and mining protocols have emerged. A financial ecosystem built on top of a blockchain is supposed to be fair and transparent for each participating actor. Yet, there are sophisticated actors who turn their domain knowledge and market ine…
▽ More
As cryptocurrency evolved, new financial instruments, such as lending and borrowing protocols, currency exchanges, fungible and non-fungible tokens (NFT), staking and mining protocols have emerged. A financial ecosystem built on top of a blockchain is supposed to be fair and transparent for each participating actor. Yet, there are sophisticated actors who turn their domain knowledge and market inefficiencies to their strategic advantage; thus extracting value from trades not accessible to others. This situation is further exacerbated by the fact that blockchain-based markets and decentralized finance (DeFi) instruments are mostly unregulated. Though a large body of work has already studied the unfairness of different aspects of DeFi and cryptocurrency trading, the economic intricacies of non-fungible token (NFT) trades necessitate further analysis and academic scrutiny.
The trading volume of NFTs has skyrocketed in recent years. A single NFT trade worth over a million US dollars, or marketplaces making billions in revenue is not uncommon nowadays. While previous research indicated the presence of wrongdoings in the NFT market, to our knowledge, we are the first to study predatory trading practices, what we call opportunistic trading, in depth. Opportunistic traders are sophisticated actors who employ automated, high-frequency NFT trading strategies, which, oftentimes, are malicious, deceptive, or, at the very least, unfair. Such attackers weaponize their advanced technical knowledge and superior understanding of DeFi protocols to disrupt trades of unsuspecting users, and collect profits from economic situations that are inaccessible to ordinary users, in a "supposedly" fair market. In this paper, we explore three such broad classes of opportunistic strategies aiming to realize three distinct trading objectives, viz., acquire, instant profit generation, and loss minimization.
△ Less
Submitted 5 September, 2023;
originally announced October 2023.
-
Motion Memory: Leveraging Past Experiences to Accelerate Future Motion Planning
Authors:
Dibyendu Das,
Yuanjie Lu,
Erion Plaku,
Xuesu Xiao
Abstract:
When facing a new motion-planning problem, most motion planners solve it from scratch, e.g., via sampling and exploration or starting optimization from a straight-line path. However, most motion planners have to experience a variety of planning problems throughout their lifetimes, which are yet to be leveraged for future planning. In this paper, we present a simple but efficient method called Moti…
▽ More
When facing a new motion-planning problem, most motion planners solve it from scratch, e.g., via sampling and exploration or starting optimization from a straight-line path. However, most motion planners have to experience a variety of planning problems throughout their lifetimes, which are yet to be leveraged for future planning. In this paper, we present a simple but efficient method called Motion Memory, which allows different motion planners to accelerate future planning using past experiences. Treating existing motion planners as either a closed or open box, we present a variety of ways that Motion Memory can contribute to reduce the planning time when facing a new planning problem. We provide extensive experiment results with three different motion planners on three classes of planning problems with over 30,000 problem instances and show that planning speed can be significantly reduced by up to 89% with the proposed Motion Memory technique and with increasing past planning experiences.
△ Less
Submitted 13 March, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Mitigating the Effect of Incidental Correlations on Part-based Learning
Authors:
Gaurav Bhatt,
Deepayan Das,
Leonid Sigal,
Vineeth N Balasubramanian
Abstract:
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific bac…
▽ More
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific backgrounds. These incidental correlations may have a detrimental impact on the generalization and interpretability of learned part representations. This study asserts that part-based representations could be more interpretable and generalize better with limited data, employing two innovative regularization methods. The first regularization separates foreground and background information's generative process via a unique mixture-of-parts formulation. Structural constraints are imposed on the parts using a weakly-supervised loss, guaranteeing that the mixture-of-parts for foreground and background entails soft, object-agnostic masks. The second regularization assumes the form of a distillation loss, ensuring the invariance of the learned parts to the incidental background correlations. Furthermore, we incorporate sparse and orthogonal constraints to facilitate learning high-quality part representations. By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets, including MiniImagenet, TieredImageNet, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the ImageNet-9 dataset. The implementation is available on GitHub: https://github.com/GauravBh1010tt/DPViT.git
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Authors:
Devleena Das,
Sonia Chernova,
Been Kim
Abstract:
As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classifica…
▽ More
As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining concepts in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.
△ Less
Submitted 10 November, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Energy Efficient UAV-Assisted Emergency Communication with Reliable Connectivity and Collision Avoidance
Authors:
Sharvari N P,
Dibakar Das,
Jyotsna Bapat,
Debabrata Das
Abstract:
Emergency communication is vital for search and rescue operations following natural disasters. Unmanned Aerial Vehicles (UAVs) can significantly assist emergency communication by agile positioning, maintaining connectivity during rapid motion, and relaying critical disaster-related information to Ground Control Stations (GCS). Designing effective routing protocols for relaying crucial data in UAV…
▽ More
Emergency communication is vital for search and rescue operations following natural disasters. Unmanned Aerial Vehicles (UAVs) can significantly assist emergency communication by agile positioning, maintaining connectivity during rapid motion, and relaying critical disaster-related information to Ground Control Stations (GCS). Designing effective routing protocols for relaying crucial data in UAV networks is challenging due to dynamic topology, rapid mobility, and limited UAV resources. This paper presents a novel energy-constrained routing mechanism that ensures connectivity, inter-UAV collision avoidance, and network restoration post-UAV fragmentation while adapting without a predefined UAV path. The proposed method employs improved Q learning to optimize the next-hop node selection. Considering these factors, the paper proposes a novel, Improved Q-learning-based Multi-hop Routing (IQMR) protocol. Simulation results validate IQMRs adaptability to changing system conditions and superiority over QMR, QTAR, and QFANET in energy efficiency and data throughput. IQMR achieves energy consumption efficiency improvements of 32.27%, 36.35%, and 36.35% over QMR, Q-FANET, and QTAR, along with significantly higher data throughput enhancements of 53.3%, 80.35%, and 93.36% over Q-FANET, QMR, and QTAR.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Network Centralities in Quantum Entanglement Distribution due to User Preferences
Authors:
Dibakar Das,
Shiva Kumar Malapaka,
Jyotsna Bapat,
Debabrata Das
Abstract:
Quantum networks are of great interest of late which apply quantum mechanics to transfer information securely. One of the key properties which are exploited is entanglement to transfer information from one network node to another. Applications like quantum teleportation rely on the entanglement between the concerned nodes. Thus, efficient entanglement distribution among network nodes is of utmost…
▽ More
Quantum networks are of great interest of late which apply quantum mechanics to transfer information securely. One of the key properties which are exploited is entanglement to transfer information from one network node to another. Applications like quantum teleportation rely on the entanglement between the concerned nodes. Thus, efficient entanglement distribution among network nodes is of utmost importance. Several entanglement distribution methods have been proposed in the literature which primarily rely on attributes, such as, fidelities, link layer network topologies, proactive distribution, etc. This paper studies the centralities of the network when the link layer topology of entanglements (referred to as entangled graph) is driven by usage patterns of peer-to-peer connections between remote nodes (referred to as connection graph) with different characteristics. Three different distributions (uniform, gaussian, and power law) are considered for the connection graph where the two nodes are selected from the same distribution. For the entangled graph, both reactive and proactive entanglements are employed to form a random graph. Results show that the edge centralities (measured as usage frequencies of individual edges during entanglement distribution) of the entangled graph follow power law distributions whereas the growth in entanglements with connections and node centralities (degrees of nodes) are monomolecularly distributed for most of the scenarios. These findings will help in quantum resource management, e.g., quantum technology with high reliability and lower decoherence time may be allocated to edges with high centralities.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization
Authors:
Jungsoo Lee,
Debasmit Das,
Jaegul Choo,
Sungha Choi
Abstract:
Test-time adaptation (TTA) methods, which generally rely on the model's predictions (e.g., entropy minimization) to adapt the source pretrained model to the unlabeled target domain, suffer from noisy signals originating from 1) incorrect or 2) open-set predictions. Long-term stable adaptation is hampered by such noisy signals, so training models without such error accumulation is crucial for pract…
▽ More
Test-time adaptation (TTA) methods, which generally rely on the model's predictions (e.g., entropy minimization) to adapt the source pretrained model to the unlabeled target domain, suffer from noisy signals originating from 1) incorrect or 2) open-set predictions. Long-term stable adaptation is hampered by such noisy signals, so training models without such error accumulation is crucial for practical TTA. To address these issues, including open-set TTA, we propose a simple yet effective sample selection method inspired by the following crucial empirical finding. While entropy minimization compels the model to increase the probability of its predicted label (i.e., confidence values), we found that noisy samples rather show decreased confidence values. To be more specific, entropy minimization attempts to raise the confidence values of an individual sample's prediction, but individual confidence values may rise or fall due to the influence of signals from numerous other predictions (i.e., wisdom of crowds). Due to this fact, noisy signals misaligned with such 'wisdom of crowds', generally found in the correct signals, fail to raise the individual confidence values of wrong samples, despite attempts to increase them. Based on such findings, we filter out the samples whose confidence values are lower in the adapted model than in the original model, as they are likely to be noisy. Our method is widely applicable to existing TTA methods and improves their long-term adaptation performance in both image classification (e.g., 49.4% reduced error rates with TENT) and semantic segmentation (e.g., 11.7% gain in mIoU with TENT).
△ Less
Submitted 4 September, 2023; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance
Authors:
Sourya Dipta Das,
Yash Vadi,
Abhishek Unnam,
Kuldeep Yadav
Abstract:
Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected out…
▽ More
Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our proposed approach outperforms other state-of-the-art OOD detection methods significantly.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Advancing Smart Malnutrition Monitoring: A Multi-Modal Learning Approach for Vital Health Parameter Estimation
Authors:
Ashish Marisetty,
Prathistith Raj M,
Praneeth Nemani,
Venkanna Udutalapally,
Debanjan Das
Abstract:
Malnutrition poses a significant threat to global health, resulting from an inadequate intake of essential nutrients that adversely impacts vital organs and overall bodily functioning. Periodic examinations and mass screenings, incorporating both conventional and non-invasive techniques, have been employed to combat this challenge. However, these approaches suffer from critical limitations, such a…
▽ More
Malnutrition poses a significant threat to global health, resulting from an inadequate intake of essential nutrients that adversely impacts vital organs and overall bodily functioning. Periodic examinations and mass screenings, incorporating both conventional and non-invasive techniques, have been employed to combat this challenge. However, these approaches suffer from critical limitations, such as the need for additional equipment, lack of comprehensive feature representation, absence of suitable health indicators, and the unavailability of smartphone implementations for precise estimations of Body Fat Percentage (BFP), Basal Metabolic Rate (BMR), and Body Mass Index (BMI) to enable efficient smart-malnutrition monitoring. To address these constraints, this study presents a groundbreaking, scalable, and robust smart malnutrition-monitoring system that leverages a single full-body image of an individual to estimate height, weight, and other crucial health parameters within a multi-modal learning framework. Our proposed methodology involves the reconstruction of a highly precise 3D point cloud, from which 512-dimensional feature embeddings are extracted using a headless-3D classification network. Concurrently, facial and body embeddings are also extracted, and through the application of learnable parameters, these features are then utilized to estimate weight accurately. Furthermore, essential health metrics, including BMR, BFP, and BMI, are computed to conduct a comprehensive analysis of the subject's health, subsequently facilitating the provision of personalized nutrition plans. While being robust to a wide range of lighting conditions across multiple devices, our model achieves a low Mean Absolute Error (MAE) of $\pm$ 4.7 cm and $\pm$ 5.3 kg in estimating height and weight.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Feature Importance Measurement based on Decision Tree Sampling
Authors:
Chao Huang,
Diptesh Das,
Koji Tsuda
Abstract:
Random forest is effective for prediction tasks but the randomness of tree generation hinders interpretability in feature importance analysis. To address this, we proposed DT-Sampler, a SAT-based method for measuring feature importance in tree-based model. Our method has fewer parameters than random forest and provides higher interpretability and stability for the analysis in real-world problems.…
▽ More
Random forest is effective for prediction tasks but the randomness of tree generation hinders interpretability in feature importance analysis. To address this, we proposed DT-Sampler, a SAT-based method for measuring feature importance in tree-based model. Our method has fewer parameters than random forest and provides higher interpretability and stability for the analysis in real-world problems. An implementation of DT-Sampler is available at https://github.com/tsudalab/DT-sampler.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
A Model of Competitive Assortment Planning Algorithm
Authors:
Dipankar Das
Abstract:
With a novel search algorithm or assortment planning or assortment optimization algorithm that takes into account a Bayesian approach to information updating and two-stage assortment optimization techniques, the current research provides a novel concept of competitiveness in the digital marketplace. Via the search algorithm, there is competition between the platform, vendors, and private brands of…
▽ More
With a novel search algorithm or assortment planning or assortment optimization algorithm that takes into account a Bayesian approach to information updating and two-stage assortment optimization techniques, the current research provides a novel concept of competitiveness in the digital marketplace. Via the search algorithm, there is competition between the platform, vendors, and private brands of the platform. The current paper suggests a model and discusses how competition and collusion arise in the digital marketplace through assortment planning or assortment optimization algorithm. Furthermore, it suggests a model of an assortment algorithm free from collusion between the platform and the large vendors. The paper's major conclusions are that collusive assortment may raise a product's purchase likelihood but fail to maximize expected revenue. The proposed assortment planning, on the other hand, maintains competitiveness while maximizing expected revenue.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Deep ANN-based Touch-less 3D Pad for Digit Recognition
Authors:
Pramit Kumar Pal,
Debarshi Dutta,
Attreyee Mandal,
Dipshika Das
Abstract:
The Covid-19 pandemic has changed the way humans interact with their environment. Common touch surfaces such as elevator switches and ATM switches are hazardous to touch as they are used by countless people every day, increasing the chance of getting infected. So, a need for touch-less interaction with machines arises. In this paper, we propose a method of recognizing the ten decimal digits (0-9)…
▽ More
The Covid-19 pandemic has changed the way humans interact with their environment. Common touch surfaces such as elevator switches and ATM switches are hazardous to touch as they are used by countless people every day, increasing the chance of getting infected. So, a need for touch-less interaction with machines arises. In this paper, we propose a method of recognizing the ten decimal digits (0-9) by writing the digits in the air near a sensing printed circuit board using a human hand. We captured the movement of the hand by a sensor based on projective capacitance and classified it into digits using an Artificial Neural Network. Our method does not use pictures, which significantly reduces the computational requirements and preserves users' privacy. Thus, the proposed method can be easily implemented in public places.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
COMEX: A Tool for Generating Customized Source Code Representations
Authors:
Debeshee Das,
Noble Saji Mathews,
Alex Mathai,
Srikanth Tamilselvam,
Kranthi Sedamaki,
Sridhar Chimalakonda,
Atul Kumar
Abstract:
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, v…
▽ More
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE.
Tool: https://pypi.org/project/comex - GitHub: https://github.com/IBM/tree-sitter-codeviews - Demo: https://youtu.be/GER6U87FVbU
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots
Authors:
Meenakshi Sarkar,
Vinayak Honkote,
Dibyendu Das,
Debasish Ghose
Abstract:
With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording var…
▽ More
With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision. The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents. The unique dataset provides an opportunity to develop and benchmark new visual prediction frameworks that can predict future image frames based on the action taken by the recording agent in partially observable scenarios or cases where the imaging sensor is mounted on a moving platform. We have benchmarked the dataset on our novel deep visual prediction framework called ACPNet where the approximated future image frames are also conditioned on action taken by the robot and demonstrated its potential for incorporating robot dynamics into the video prediction paradigm for mobile robotics and autonomous navigation research.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Balancing Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer
Authors:
Debarati Das,
David Ma,
Dongyeop Kang
Abstract:
Text style transfer is an exciting task within the field of natural language generation that is often plagued by the need for high-quality paired datasets. Furthermore, training a model for multi-attribute text style transfer requires datasets with sufficient support across all combinations of the considered stylistic attributes, adding to the challenges of training a style transfer model. This pa…
▽ More
Text style transfer is an exciting task within the field of natural language generation that is often plagued by the need for high-quality paired datasets. Furthermore, training a model for multi-attribute text style transfer requires datasets with sufficient support across all combinations of the considered stylistic attributes, adding to the challenges of training a style transfer model. This paper explores the impact of training data input diversity on the quality of the generated text from the multi-style transfer model. We construct a pseudo-parallel dataset by devising heuristics to adjust the style distribution in the training samples. We balance our training dataset using marginal and joint distributions to train our style transfer models. We observe that a balanced dataset produces more effective control effects over multiple styles than an imbalanced or skewed one. Through quantitative analysis, we explore the impact of multiple style distributions in training data on style-transferred output. These findings will better inform the design of style-transfer datasets.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Authors:
Elizabeth Clark,
Shruti Rijhwani,
Sebastian Gehrmann,
Joshua Maynez,
Roee Aharoni,
Vitaly Nikolaev,
Thibault Sellam,
Aditya Siddhant,
Dipanjan Das,
Ankur P. Parikh
Abstract:
Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensi…
▽ More
Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness, covering 6 languages, 9 systems and 4 datasets. As a result of its size and scope, SEAHORSE can serve both as a benchmark to evaluate learnt metrics, as well as a large-scale resource for training such metrics. We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE (Honovich et al., 2022) and mFACE (Aharoni et al., 2022). We make the SEAHORSE dataset and metrics publicly available for future research on multilingual and multifaceted summarization evaluation.
△ Less
Submitted 1 November, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
A 334$μ$W 0.158mm$^2$ ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Authors Version
Authors:
Archisman Ghosh,
Jose Maria Bermudo Mera,
Angshuman Karmakar,
Debayan Das,
Santosh Ghosh,
Ingrid Verbauwhede,
Shreyas Sen
Abstract:
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module lear…
▽ More
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard & Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice based cryptography. Because public key cryptography is mostly used to establish common secret keys, focus is on compact area, power and energy budget and to a lesser extent on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT) based multiplications, in this paper we highly optimize a Toom-Cook based multiplier. We demonstrate that a memory-efficient striding Toom-Cook with lazy interpolation, results in a highly compact, low power implementation, which on top enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating and shift-add multiplier has helped to achieve 38% less power than state-of-the art PQC core, 4x less memory, 36.8% reduction in multiplier energy and 118x reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes 0.158mm2 active area which is lowest reported till date despite process disadvantages of the state-of-the-art designs.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Scalable orthogonal delay-division multiplexed OEO artificial neural network trained for TI-ADC equalization
Authors:
Andrea Zazzi,
Arka Dipta Das,
Lukas Hüssen,
Renato Negra,
Jeremy Witzens
Abstract:
We propose a new signaling scheme for on-chip optical-electrical-optical artificial neural networks that utilizes orthogonal delay-division multiplexing and pilot-tone based self-homodyne detection. This scheme offers a more efficient scaling of the optical power budget with increasing network complexity. Our simulations, based on a 220 nm SOI silicon photonics technology, suggest that the network…
▽ More
We propose a new signaling scheme for on-chip optical-electrical-optical artificial neural networks that utilizes orthogonal delay-division multiplexing and pilot-tone based self-homodyne detection. This scheme offers a more efficient scaling of the optical power budget with increasing network complexity. Our simulations, based on a 220 nm SOI silicon photonics technology, suggest that the network can support 31 x 31 neurons, with 961 links and freely programmable weights, using a single 500 mW optical comb and an SNR of 21.3 dB per neuron. Moreover, it features a low sensitivity to temperature fluctuations, ensuring that it can be operated outside of a laboratory environment. We demonstrate the network's effectiveness in nonlinear equalization tasks by training it to equalize a time-interleaved ADC architecture, achieving an ENOB over 4 over the entire 75 GHz ADC bandwidth. We anticipate that this network architecture will enable broadband and low latency nonlinear signal processing in practical settings such as ultra-broadband data converters and real-time control systems.
△ Less
Submitted 19 October, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
Authors:
Fantine Huot,
Joshua Maynez,
Shashi Narayan,
Reinald Kim Amplayo,
Kuzman Ganchev,
Annie Louis,
Anders Sandholm,
Dipanjan Das,
Mirella Lapata
Abstract:
While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration fo…
▽ More
While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output.
A short video demonstrating our system is available at https://goo.gle/text-blueprint-demo.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Progressive Random Convolutions for Single Domain Generalization
Authors:
Seokeon Choi,
Debasmit Das,
Sungha Choi,
Seunghan Yang,
Hyunsin Park,
Sungrack Yun
Abstract:
Single domain generalization aims to train a generalizable model with only one source domain to perform well on arbitrary unseen target domains. Image augmentation based on Random Convolutions (RandConv), consisting of one convolution layer randomly initialized for each mini-batch, enables the model to learn generalizable visual representations by distorting local textures despite its simple and l…
▽ More
Single domain generalization aims to train a generalizable model with only one source domain to perform well on arbitrary unseen target domains. Image augmentation based on Random Convolutions (RandConv), consisting of one convolution layer randomly initialized for each mini-batch, enables the model to learn generalizable visual representations by distorting local textures despite its simple and lightweight structure. However, RandConv has structural limitations in that the generated image easily loses semantics as the kernel size increases, and lacks the inherent diversity of a single convolution operation. To solve the problem, we propose a Progressive Random Convolution (Pro-RandConv) method that recursively stacks random convolution layers with a small kernel size instead of increasing the kernel size. This progressive approach can not only mitigate semantic distortions by reducing the influence of pixels away from the center in the theoretical receptive field, but also create more effective virtual domains by gradually increasing the style diversity. In addition, we develop a basic random convolution layer into a random convolution block including deformable offsets and affine transformation to support texture and contrast diversification, both of which are also randomly initialized. Without complex generators or adversarial learning, we demonstrate that our simple yet effective augmentation strategy outperforms state-of-the-art methods on single domain generalization benchmarks.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Use of Federated Learning and Blockchain towards Securing Financial Services
Authors:
Pushpita Chatterjee,
Debashis Das,
Danda B Rawat
Abstract:
In recent days, the proliferation of several existing and new cyber-attacks pose an axiomatic threat to the stability of financial services. It is hard to predict the nature of attacks that can trigger a serious financial crisis. The unprecedented digital transformation to financial services has been accelerated during the COVID-19 pandemic and it is still ongoing. Attackers are taking advantage o…
▽ More
In recent days, the proliferation of several existing and new cyber-attacks pose an axiomatic threat to the stability of financial services. It is hard to predict the nature of attacks that can trigger a serious financial crisis. The unprecedented digital transformation to financial services has been accelerated during the COVID-19 pandemic and it is still ongoing. Attackers are taking advantage of this transformation and pose a new global threat to financial stability and integrity. Many large organizations are switching from centralized finance (CeFi) to decentralized finance (DeFi) because decentralized finance has many advantages. Blockchain can bring big and far-reaching effects on the trustworthiness, safety, accessibility, cost-effectiveness, and openness of the financial sector. The present paper gives an in-depth look at how blockchain and federated learning (FL) are used in financial services. It starts with an overview of recent developments in both use cases. This paper explores and discusses existing financial service vulnerabilities, potential threats, and consequent risks. So, we explain the problems that can be fixed in financial services and how blockchain and FL could help solve them. These problems include data protection, storage optimization, and making more money in financial services. We looked at many blockchain-enabled FL methods and came up with some possible solutions that could be used in financial services to solve several challenges like cost-effectiveness, automation, and security control. Finally, we point out some future directions at the end of this study.
△ Less
Submitted 4 February, 2023;
originally announced March 2023.
-
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
Authors:
Shubhankar Borse,
Debasmit Das,
Hyojin Park,
Hong Cai,
Risheek Garrepalli,
Fatih Porikli
Abstract:
We present DejaVu, a novel framework which leverages conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks such as segmentation, depth estimation, and surface normal prediction. First, we apply redaction to the input image, which removes certain structural information by sparse sampling or selective frequency removal. Next, we…
▽ More
We present DejaVu, a novel framework which leverages conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks such as segmentation, depth estimation, and surface normal prediction. First, we apply redaction to the input image, which removes certain structural information by sparse sampling or selective frequency removal. Next, we use a conditional regenerator, which takes the redacted image and the dense predictions as inputs, and reconstructs the original image by filling in the missing structural information. In the redacted image, structural attributes like boundaries are broken while semantic context is largely preserved. In order to make the regeneration feasible, the conditional generator will then require the structure information from the other input source, i.e., the dense predictions. As such, by including this conditional regeneration objective during training, DejaVu encourages the base network to learn to embed accurate scene structure in its dense prediction. This leads to more accurate predictions with clearer boundaries and better spatial consistency. When it is feasible to leverage additional computation, DejaVu can be extended to incorporate an attention-based regeneration module within the dense prediction network, which further improves accuracy. Through extensive experiments on multiple dense prediction benchmarks such as Cityscapes, COCO, ADE20K, NYUD-v2, and KITTI, we demonstrate the efficacy of employing DejaVu during training, as it outperforms SOTA methods at no added computation cost.
△ Less
Submitted 29 March, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.