-
Code-mixed Sentiment and Hate-speech Prediction
Authors:
Anjali Yadav,
Tanya Garg,
Matej Klemen,
Matej Ulcar,
Basant Agarwal,
Marko Robnik Sikonja
Abstract:
Code-mixed discourse combines multiple languages in a single text. It is commonly used in informal discourse in countries with several official languages, but also in many other countries in combination with English or neighboring languages. As recently large language models have dominated most natural language processing tasks, we investigated their performance in code-mixed settings for relevant…
▽ More
Code-mixed discourse combines multiple languages in a single text. It is commonly used in informal discourse in countries with several official languages, but also in many other countries in combination with English or neighboring languages. As recently large language models have dominated most natural language processing tasks, we investigated their performance in code-mixed settings for relevant tasks. We first created four new bilingual pre-trained masked language models for English-Hindi and English-Slovene languages, specifically aimed to support informal language. Then we performed an evaluation of monolingual, bilingual, few-lingual, and massively multilingual models on several languages, using two tasks that frequently contain code-mixed text, in particular, sentiment analysis and offensive language detection in social media texts. The results show that the most successful classifiers are fine-tuned bilingual models and multilingual models, specialized for social media texts, followed by non-specialized massively multilingual and monolingual models, while huge generative models are not competitive. For our affective problems, the models mostly perform slightly better on code-mixed data compared to non-code-mixed data.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Spatially-resolved hyperlocal weather prediction and anomaly detection using IoT sensor networks and machine learning techniques
Authors:
Anita B. Agarwal,
Rohit Rajesh,
Nitin Arul
Abstract:
Accurate and timely hyperlocal weather predictions are essential for various applications, ranging from agriculture to disaster management. In this paper, we propose a novel approach that combines hyperlocal weather prediction and anomaly detection using IoT sensor networks and advanced machine learning techniques. Our approach leverages data from multiple spatially-distributed yet relatively clos…
▽ More
Accurate and timely hyperlocal weather predictions are essential for various applications, ranging from agriculture to disaster management. In this paper, we propose a novel approach that combines hyperlocal weather prediction and anomaly detection using IoT sensor networks and advanced machine learning techniques. Our approach leverages data from multiple spatially-distributed yet relatively close locations and IoT sensors to create high-resolution weather models capable of predicting short-term, localized weather conditions such as temperature, pressure, and humidity. By monitoring changes in weather parameters across these locations, our system is able to enhance the spatial resolution of predictions and effectively detect anomalies in real-time. Additionally, our system employs unsupervised learning algorithms to identify unusual weather patterns, providing timely alerts. Our findings indicate that this system has the potential to enhance decision-making.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Deep Learning Image Recognition for Non-images
Authors:
Boris Kovalerchuk,
Divya Chandrika Kalla,
Bedant Agarwal
Abstract:
Powerful deep learning algorithms open an opportunity for solving non-image Machine Learning (ML) problems by transforming these problems to into the image recognition problems. The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data. Then deep learning CNN algorithms solve the learning problems on these images. The design of the CPC-R algori…
▽ More
Powerful deep learning algorithms open an opportunity for solving non-image Machine Learning (ML) problems by transforming these problems to into the image recognition problems. The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data. Then deep learning CNN algorithms solve the learning problems on these images. The design of the CPC-R algorithm allows preserving all high-dimensional information in 2-D images. The use of pair values mapping instead of single value mapping used in the alternative approaches allows encoding each n-D point with 2 times fewer visual elements. The attributes of an n-D point are divided into pairs of its values and each pair is visualized as 2-D points in the same 2-D Cartesian coordinates. Next, grey scale or color intensity values are assigned to each pair to encode the order of pairs. This is resulted in the heatmap image. The computational experiments with CPC-R are conducted for different CNN architectures, and methods to optimize the CPC-R images showing that the combined CPC-R and deep learning CNN algorithms are able to solve non-image ML problems reaching high accuracy on the benchmark datasets. This chapter expands our prior work by adding more experiments to test accuracy of classification, exploring saliency and informativeness of discovered features to test their interpretability, and generalizing the approach.
△ Less
Submitted 9 February, 2022; v1 submitted 27 June, 2021;
originally announced June 2021.
-
Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021
Authors:
Ajay Agarwal,
Basant Agarwal
Abstract:
On 26 January 2021, India witnessed a national embarrassment from the demographic least expected from - farmers. People across the nation watched in horror as a pseudo-patriotic mob of farmers stormed capital Delhi and vandalized the national pride- Red Fort. Investigations that followed the event revealed the existence of a social media trail that led to the likes of such an event. Consequently,…
▽ More
On 26 January 2021, India witnessed a national embarrassment from the demographic least expected from - farmers. People across the nation watched in horror as a pseudo-patriotic mob of farmers stormed capital Delhi and vandalized the national pride- Red Fort. Investigations that followed the event revealed the existence of a social media trail that led to the likes of such an event. Consequently, it became essential and necessary to archive this trail for social media analysis - not only to understand the bread-crumbs that are dispersed across the trail but also to visualize the role played by misinformation and fake news in this event. In this paper, we propose the tractor2twitter dataset which contains around 0.05 million tweets that were posted before, during, and after this event. Also, we benchmark our dataset with an Explainable AI ML model for classification of each tweet into either of the three categories - disinformation, misinformation, and opinion.
△ Less
Submitted 13 June, 2021; v1 submitted 24 April, 2021;
originally announced April 2021.
-
Physics of eccentric binary black hole mergers: A numerical relativity perspective
Authors:
E. A. Huerta,
Roland Haas,
Sarah Habib,
Anushri Gupta,
Adam Rebei,
Vishnu Chavva,
Daniel Johnson,
Shawn Rosofsky,
Erik Wessel,
Bhanu Agarwal,
Diyu Luo,
Wei Ren
Abstract:
Gravitational wave observations of eccentric binary black hole mergers will provide unequivocal evidence for the formation of these systems through dynamical assembly in dense stellar environments. The study of these astrophysically motivated sources is timely in view of electromagnetic observations, consistent with the existence of stellar mass black holes in the globular cluster M22 and in the G…
▽ More
Gravitational wave observations of eccentric binary black hole mergers will provide unequivocal evidence for the formation of these systems through dynamical assembly in dense stellar environments. The study of these astrophysically motivated sources is timely in view of electromagnetic observations, consistent with the existence of stellar mass black holes in the globular cluster M22 and in the Galactic center, and the proven detection capabilities of ground-based gravitational wave detectors. In order to get insights into the physics of these objects in the dynamical, strong-field gravity regime, we present a catalog of 89 numerical relativity waveforms that describe binary systems of non-spinning black holes with mass-ratios $1\leq q \leq 10$, and initial eccentricities as high as $e_0=0.18$ fifteen cycles before merger. We use this catalog to quantify the loss of energy and angular momentum through gravitational radiation, and the astrophysical properties of the black hole remnant, including its final mass and spin, and recoil velocity. We discuss the implications of these results for gravitational wave source modeling, and the design of algorithms to search for and identify eccentric binary black hole mergers in realistic detection scenarios.
△ Less
Submitted 5 September, 2019; v1 submitted 21 January, 2019;
originally announced January 2019.
-
A Deep Network Model for Paraphrase Detection in Short Text Messages
Authors:
Basant Agarwal,
Heri Ramampiaro,
Helge Langseth,
Massimiliano Ruocco
Abstract:
This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is tha…
▽ More
This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a specific fine-grained word-level similarity matching model. Our experimental results show that the proposed approach outperforms existing state-of-the-art approaches on user-generated noisy social media data, such as Twitter texts, and achieves highly competitive performance on a cleaner corpus.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Icon Based Information Retrieval and Disease Identification in Agriculture
Authors:
Namita Mittal,
Basant Agarwal,
Ajay Gupta,
Hemant Madhur
Abstract:
Recent developments in the ICT industry in past few decades has enabled the quick and easy access to the information available on the internet. But, digital literacy is the pre-requisite for its use. The main purpose of this paper is to provide an interface for digitally illiterate users, especially farmers to efficiently and effectively retrieve information through Internet. In addition, to enabl…
▽ More
Recent developments in the ICT industry in past few decades has enabled the quick and easy access to the information available on the internet. But, digital literacy is the pre-requisite for its use. The main purpose of this paper is to provide an interface for digitally illiterate users, especially farmers to efficiently and effectively retrieve information through Internet. In addition, to enable the farmers to identify the disease in their crop, its cause and symptoms using digital image processing and pattern recognition instantly without waiting for an expert to visit the farms and identify the disease.
△ Less
Submitted 7 April, 2014;
originally announced April 2014.