-
Quantum Machine Learning with Application to Progressive Supranuclear Palsy Network Classification
Authors:
Papri Saha
Abstract:
Machine learning and quantum computing are being progressively explored to shed light on possible computational approaches to deal with hitherto unsolvable problems. Classical methods for machine learning are ubiquitous in pattern recognition, with support vector machines (SVMs) being a prominent technique for network classification. However, there are limitations to the successful resolution of s…
▽ More
Machine learning and quantum computing are being progressively explored to shed light on possible computational approaches to deal with hitherto unsolvable problems. Classical methods for machine learning are ubiquitous in pattern recognition, with support vector machines (SVMs) being a prominent technique for network classification. However, there are limitations to the successful resolution of such classification instances when the input feature space becomes large, and the successive evaluation of so-called kernel functions becomes computationally exorbitant. The use of principal component analysis (PCA) substantially minimizes the dimensionality of feature space thereby enabling computational speed-ups of supervised learning: the creation of a classifier. Further, the application of quantum-based learning to the PCA reduced input feature space might offer an exponential speedup with fewer parameters. The present learning model is evaluated on a real clinical application: the diagnosis of Progressive Supranuclear Palsy (PSP) disorder. The results suggest that quantum machine learning has led to noticeable advancement and outperforms classical frameworks. The optimized variational quantum classifier classifies the PSP dataset with 86% accuracy as compared to conventional SVM. The other technique, a quantum kernel estimator, approximates the kernel function on the quantum machine and optimizes a classical SVM. In particular, we have demonstrated the successful application of the present model on both a quantum simulator and real chips of the IBM quantum platform.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management
Authors:
Seid Muhie Yimam,
Daryna Dementieva,
Tim Fischer,
Daniil Moskovskiy,
Naquee Rizwan,
Punyajoy Saha,
Sarthak Roy,
Martin Semmann,
Alexander Panchenko,
Chris Biemann,
Animesh Mukherjee
Abstract:
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat…
▽ More
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale -- and suggesting more options of actions like detoxification, counter speech generation, blocking, or, as a final measure, human intervention. Through a thorough analysis of abusive speech regulations across diverse jurisdictions, platforms, and research papers we highlight the gap in preventing measures and advocate for tailored proactive steps to combat its multifaceted manifestations. Our work aims to inform future strategies for effectively addressing abusive speech online.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
The Promise of Analog Deep Learning: Recent Advances, Challenges and Opportunities
Authors:
Aditya Datar,
Pramit Saha
Abstract:
Much of the present-day Artificial Intelligence (AI) utilizes artificial neural networks, which are sophisticated computational models designed to recognize patterns and solve complex problems by learning from data. However, a major bottleneck occurs during a device's calculation of weighted sums for forward propagation and optimization procedure for backpropagation, especially for deep neural net…
▽ More
Much of the present-day Artificial Intelligence (AI) utilizes artificial neural networks, which are sophisticated computational models designed to recognize patterns and solve complex problems by learning from data. However, a major bottleneck occurs during a device's calculation of weighted sums for forward propagation and optimization procedure for backpropagation, especially for deep neural networks, or networks with numerous layers. Exploration into different methods of implementing neural networks is necessary for further advancement of the area. While a great deal of research into AI hardware in both directions, analog and digital implementation widely exists, much of the existing survey works lacks discussion on the progress of analog deep learning. To this end, we attempt to evaluate and specify the advantages and disadvantages, along with the current progress with regards to deep learning, for analog implementations. In this paper, our focus lies on the comprehensive examination of eight distinct analog deep learning methodologies across multiple key parameters. These parameters include attained accuracy levels, application domains, algorithmic advancements, computational speed, and considerations of energy efficiency and power consumption. We also identify the neural network-based experiments implemented using these hardware devices and discuss comparative performance achieved by the different analog deep learning methods along with an analysis of their current limitations. Overall, we find that Analog Deep Learning has great potential for future consumer-level applications, but there is still a long road ahead in terms of scalability. Most of the current implementations are more proof of concept and are not yet practically deployable for large-scale models.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities
Authors:
Felix Wagner,
Wentian Xu,
Pramit Saha,
Ziyun Liang,
Daniel Whitehouse,
David Menon,
Natalie Voets,
J. Alison Noble,
Konstantinos Kamnitsas
Abstract:
Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous database…
▽ More
Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous databases that may contain scans and segmentation labels for different types of brain pathologies and diverse sets of MRI modalities. Is it feasible to use Federated Learning (FL) for training a single model on client databases that contain scans and labels of different brain pathologies and diverse sets of MRI modalities? We demonstrate promising results by combining appropriate, simple, and practical modifications to the model and training strategy: Designing a model with input channels that cover the whole set of modalities available across clients, training with random modality drop, and exploring the effects of feature normalization methods. Evaluation on 7 brain MRI databases with 5 different diseases shows that such FL framework can train a single model that is shown to be very promising in segmenting all disease types seen during training. Importantly, it is able to segment these diseases in new databases that contain sets of modalities different from those in training clients. These results demonstrate, for the first time, feasibility and effectiveness of using FL to train a single segmentation model on decentralised data with diverse brain diseases and MRI modalities, a necessary step towards leveraging heterogeneous real-world databases. Code will be made available at: https://github.com/FelixWag/FL-MultiDisease-MRI
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network
Authors:
Manvik Pasula,
Pramit Saha
Abstract:
This paper introduces a simple yet effective strategy for exercise classification and muscle group activation prediction (MGAP). These tasks have significant implications for personal fitness, facilitating more affordable, accessible, safer, and simpler exercise routines. This is particularly relevant for novices and individuals with disabilities. Previous research in the field is mostly dominated…
▽ More
This paper introduces a simple yet effective strategy for exercise classification and muscle group activation prediction (MGAP). These tasks have significant implications for personal fitness, facilitating more affordable, accessible, safer, and simpler exercise routines. This is particularly relevant for novices and individuals with disabilities. Previous research in the field is mostly dominated by the reliance on mounted sensors and a limited scope of exercises, reducing practicality for everyday use. Furthermore, existing MGAP methodologies suffer from a similar dependency on sensors and a restricted range of muscle groups, often excluding strength training exercises, which are pivotal for a comprehensive fitness regimen. Addressing these limitations, our research employs a video-based deep learning framework that encompasses a broad spectrum of exercises and muscle groups, including those vital for strength training. Utilizing the "Workout/Exercises Video" dataset, our approach integrates the X3D and SlowFast video activity recognition models in an effective way to enhance exercise classification and MGAP performance. Our findings demonstrate that this hybrid method obtained via weighted ensemble outperforms existing baseline models in accuracy. Pretrained models play a crucial role in enhancing overall performance, with optimal channel reduction values for the SlowFast model identified near 10. Through an ablation study that explores fine-tuning, we further elucidate the interrelation between the two tasks. Our composite model, a weighted-average ensemble of X3D and SlowFast, sets a new benchmark in both exercise classification and MGAP across all evaluated categories, offering a robust solution to the limitations of previous approaches.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Lumbar Spine Tumor Segmentation and Localization in T2 MRI Images Using AI
Authors:
Rikathi Pal,
Sudeshna Mondal,
Aditi Gupta,
Priya Saha,
Somoballi Ghoshal,
Amlan Chakrabarti,
Susmita Sur-Kolay
Abstract:
In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering an…
▽ More
In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering and Random Forest algorithms, the proposed method achieves successful spine tumor segmentation based on predefined masks initially delineated by domain experts in medical imaging. Subsequently, a Convolutional Neural Network (CNN) architecture is employed for tumor classification. Moreover, 3D vertebral segmentation and labeling techniques are used to help pinpoint the exact location of the tumors in the lumbar spine. Results indicate a remarkable performance, with 99% accuracy for tumor segmentation, 98% accuracy for tumor classification, and 99% accuracy for tumor localization achieved with the proposed approach. These metrics surpass the efficacy of existing state-of-the-art techniques, as evidenced by superior Dice Score, Class Accuracy, and Intersection over Union (IOU) on class accuracy metrics. This innovative methodology holds promise for enhancing the diagnostic capabilities in detecting and characterizing spinal tumors, thereby facilitating more effective clinical decision-making.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet
Authors:
Rikathi Pal,
Priya Saha,
Somoballi Ghoshal,
Amlan Chakrabarti,
Susmita Sur-Kolay
Abstract:
Segmentation and labeling of vertebrae in MRI images of the spine are critical for the diagnosis of illnesses and abnormalities. These steps are indispensable as MRI technology provides detailed information about the tissue structure of the spine. Both supervised and unsupervised segmentation methods exist, yet acquiring sufficient data remains challenging for achieving high accuracy. In this stud…
▽ More
Segmentation and labeling of vertebrae in MRI images of the spine are critical for the diagnosis of illnesses and abnormalities. These steps are indispensable as MRI technology provides detailed information about the tissue structure of the spine. Both supervised and unsupervised segmentation methods exist, yet acquiring sufficient data remains challenging for achieving high accuracy. In this study, we propose an enhancing approach based on modified attention U-Net architecture for panoptic segmentation of 3D sliced MRI data of the lumbar spine. Our method achieves an impressive accuracy of 99.5\% by incorporating novel masking logic, thus significantly advancing the state-of-the-art in vertebral segmentation and labeling. This contributes to more precise and reliable diagnosis and treatment planning.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Translation-based Video-to-Video Synthesis
Authors:
Pratim Saha,
Chengcui Zhang
Abstract:
Translation-based Video Synthesis (TVS) has emerged as a vital research area in computer vision, aiming to facilitate the transformation of videos between distinct domains while preserving both temporal continuity and underlying content features. This technique has found wide-ranging applications, encompassing video super-resolution, colorization, segmentation, and more, by extending the capabilit…
▽ More
Translation-based Video Synthesis (TVS) has emerged as a vital research area in computer vision, aiming to facilitate the transformation of videos between distinct domains while preserving both temporal continuity and underlying content features. This technique has found wide-ranging applications, encompassing video super-resolution, colorization, segmentation, and more, by extending the capabilities of traditional image-to-image translation to the temporal domain. One of the principal challenges faced in TVS is the inherent risk of introducing flickering artifacts and inconsistencies between frames during the synthesis process. This is particularly challenging due to the necessity of ensuring smooth and coherent transitions between video frames. Efforts to tackle this challenge have induced the creation of diverse strategies and algorithms aimed at mitigating these unwanted consequences. This comprehensive review extensively examines the latest progress in the realm of TVS. It thoroughly investigates emerging methodologies, shedding light on the fundamental concepts and mechanisms utilized for proficient video synthesis. This survey also illuminates their inherent strengths, limitations, appropriate applications, and potential avenues for future development.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
On Zero-Shot Counterspeech Generation by LLMs
Authors:
Punyajoy Saha,
Aalok Agrawal,
Abhik Jana,
Chris Biemann,
Animesh Mukherjee
Abstract:
With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large lan…
▽ More
With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Effect of Leaders Voice on Financial Market: An Empirical Deep Learning Expedition on NASDAQ, NSE, and Beyond
Authors:
Arijit Das,
Tanmoy Nandi,
Prasanta Saha,
Suman Das,
Saronyo Mukherjee,
Sudip Kumar Naskar,
Diganta Saha
Abstract:
Financial market like the price of stock, share, gold, oil, mutual funds are affected by the news and posts on social media. In this work deep learning based models are proposed to predict the trend of financial market based on NLP analysis of the twitter handles of leaders of different fields. There are many models available to predict financial market based on only the historical data of the fin…
▽ More
Financial market like the price of stock, share, gold, oil, mutual funds are affected by the news and posts on social media. In this work deep learning based models are proposed to predict the trend of financial market based on NLP analysis of the twitter handles of leaders of different fields. There are many models available to predict financial market based on only the historical data of the financial component but combining historical data with news and posts of the social media like Twitter is the main objective of the present work. Substantial improvement is shown in the result. The main features of the present work are: a) proposing completely generalized algorithm which is able to generate models for any twitter handle and any financial component, b) predicting the time window for a tweets effect on a stock price c) analyzing the effect of multiple twitter handles for predicting the trend. A detailed survey is done to find out the latest work in recent years in the similar field, find the research gap, and collect the required data for analysis and prediction. State-of-the-art algorithm is proposed and complete implementation with environment is given. An insightful trend of the result improvement considering the NLP analysis of twitter data on financial market components is shown. The Indian and USA financial markets are explored in the present work where as other markets can be taken in future. The socio-economic impact of the present work is discussed in conclusion.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks
Authors:
Somnath Banerjee,
Maulindu Sarkar,
Punyajoy Saha,
Binny Mathew,
Animesh Mukherjee
Abstract:
Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence f…
▽ More
Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance. To meet these objectives, in this paper, we introduce InfFeed, which uses influence functions to compute the influential instances for a target instance. Toward the first objective, we adjust the label of the target instance based on its influencer(s) label. In doing this, InfFeed outperforms the state-of-the-art baselines (including LLMs) by a maximum macro F1-score margin of almost 4% for hate speech classification, 3.5% for stance classification, and 3% for irony and 2% for sarcasm detection. Toward the second objective we show that manually re-annotating only those silver annotated data points in the extension set that have a negative influence can immensely improve the model performance bringing it very close to the scenario where all the data points in the extension set have gold labels. This allows for huge reduction of the number of data points that need to be manually annotated since out of the silver annotated extension dataset, the influence function scheme picks up ~1/1000 points that need manual correction.
△ Less
Submitted 9 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Zero shot VLMs for hate meme detection: Are we there yet?
Authors:
Naquee Rizwan,
Paramananda Bhaskar,
Mithun Das,
Swadhin Satyaprakash Majhi,
Punyajoy Saha,
Animesh Mukherjee
Abstract:
Multimedia content on social media is rapidly evolving, with memes gaining prominence as a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or vulnerable communities, making it imperative to identify and address such instances of hateful memes. Extensive research has been conducted to address this issue by developing hate meme detection models. However, a n…
▽ More
Multimedia content on social media is rapidly evolving, with memes gaining prominence as a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or vulnerable communities, making it imperative to identify and address such instances of hateful memes. Extensive research has been conducted to address this issue by developing hate meme detection models. However, a notable limitation of traditional machine/deep learning models is the requirement for labeled datasets for accurate classification. Recently, the research community has witnessed the emergence of several visual language models that have exhibited outstanding performance across various tasks. In this study, we aim to investigate the efficacy of these visual language models in handling intricate tasks such as hate meme detection. We use various prompt settings to focus on zero-shot classification of hateful/harmful memes. Through our analysis, we observe that large VLMs are still vulnerable for zero-shot hate meme detection.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi
Authors:
Mithun Das,
Saurabh Kumar Pandey,
Shivansh Sethi,
Punyajoy Saha,
Animesh Mukherjee
Abstract:
With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and…
▽ More
With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive speech/counterspeech pairs, of which 2,460 pairs are in Bengali and 2,602 pairs are in Hindi. We implement several baseline models considering various interlingual transfer mechanisms with different configurations to generate suitable counterspeech to set up an effective benchmark. We observe that the monolingual setup yields the best performance. Further, using synthetic transfer, language models can generate counterspeech to some extent; specifically, we notice that transferability is better when languages belong to the same language family.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
Authors:
Pramit Saha,
Divyanshu Mishra,
Felix Wagner,
Konstantinos Kamnitsas,
J. Alison Noble
Abstract:
Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with…
▽ More
Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People
Authors:
Chenhao He,
Pramit Saha
Abstract:
The utilization of deep learning-based object detection is an effective approach to assist visually impaired individuals in avoiding obstacles. In this paper, we implemented seven different YOLO object detection models \textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and YOLOv5 and performed comprehensive evaluation with carefully tuned hyperparameters, to analyze how these…
▽ More
The utilization of deep learning-based object detection is an effective approach to assist visually impaired individuals in avoiding obstacles. In this paper, we implemented seven different YOLO object detection models \textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and YOLOv5 and performed comprehensive evaluation with carefully tuned hyperparameters, to analyze how these models performed on images containing common daily-life objects presented on roads and sidewalks. After a systematic investigation, YOLOv8 was found to be the best model, which reached a precision of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which includes images from VOC dataset, COCO dataset, and TT100K dataset along with images collected by the researchers in the field. Despite being the latest model and demonstrating better performance in many other applications, YOLO-NAS was found to be suboptimal for the obstacle detection task.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Forecasting Lithium-Ion Battery Longevity with Limited Data Availability: Benchmarking Different Machine Learning Algorithms
Authors:
Hudson Hilal,
Pramit Saha
Abstract:
As the use of Lithium-ion batteries continues to grow, it becomes increasingly important to be able to predict their remaining useful life. This work aims to compare the relative performance of different machine learning algorithms, both traditional machine learning and deep learning, in order to determine the best-performing algorithms for battery cycle life prediction based on minimal data. We i…
▽ More
As the use of Lithium-ion batteries continues to grow, it becomes increasingly important to be able to predict their remaining useful life. This work aims to compare the relative performance of different machine learning algorithms, both traditional machine learning and deep learning, in order to determine the best-performing algorithms for battery cycle life prediction based on minimal data. We investigated 14 different machine learning models that were fed handcrafted features based on statistical data and split into 3 feature groups for testing. For deep learning models, we tested a variety of neural network models including different configurations of standard Recurrent Neural Networks, Gated Recurrent Units, and Long Short Term Memory with and without attention mechanism. Deep learning models were fed multivariate time series signals based on the raw data for each battery across the first 100 cycles. Our experiments revealed that the machine learning algorithms on handcrafted features performed particularly well, resulting in 10-20% average mean absolute percentage error. The best-performing algorithm was the Random Forest Regressor, which gave a minimum 9.8% mean absolute percentage error. Traditional machine learning models excelled due to their capability to comprehend general data set trends. In comparison, deep learning models were observed to perform particularly poorly on raw, limited data. Algorithms like GRU and RNNs that focused on capturing medium-range data dependencies were less adept at recognizing the gradual, slow trends critical for this task. Our investigation reveals that implementing machine learning models with hand-crafted features proves to be more effective than advanced deep learning models for predicting the remaining useful Lithium-ion battery life with limited data availability.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
How Hard Is Squash? -- Towards Information Theoretic Analysis of Motor Behavior in Squash
Authors:
Kavya Anand,
Pramit Saha
Abstract:
Fitts' law has been widely employed as a research method for analyzing tasks within the domain of Human-Computer Interaction (HCI). However, its application to non-computer tasks has remained limited. This study aims to extend the application of Fitts' law to the realm of sports, specifically focusing on squash. Squash is a high-intensity sport that requires quick movements and precise shots. Our…
▽ More
Fitts' law has been widely employed as a research method for analyzing tasks within the domain of Human-Computer Interaction (HCI). However, its application to non-computer tasks has remained limited. This study aims to extend the application of Fitts' law to the realm of sports, specifically focusing on squash. Squash is a high-intensity sport that requires quick movements and precise shots. Our research investigates the effectiveness of utilizing Fitts' law to evaluate the task difficulty and effort level associated with executing and responding to various squash shots. By understanding the effort/information rate required for each shot, we can determine which shots are more effective in making the opponent work harder. Additionally, this knowledge can be valuable for coaches in designing training programs. However, since Fitts' law was primarily developed for human-computer interaction, we adapted it to fit the squash scenario. This paper provides an overview of Fitts' law and its relevance to sports, elucidates the motivation driving this investigation, outlines the methodology employed to explore this novel avenue, and presents the obtained results, concluding with key insights. We conducted experiments with different shots and players, collecting data on shot speed, player movement time, and distance traveled. Using this data, we formulated a modified version of Fitts' law specifically for squash. The results provide insights into the difficulty and effectiveness of various shots, offering valuable information for both players and coaches in the sport of squash.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos
Authors:
Divyanshu Mishra,
He Zhao,
Pramit Saha,
Aris T. Papageorghiou,
J. Alison Noble
Abstract:
Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when…
▽ More
Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data
Authors:
Pramit Saha,
Divyanshu Mishra,
J. Alison Noble
Abstract:
The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unl…
▽ More
The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Probing LLMs for hate speech detection: strengths and vulnerabilities
Authors:
Sarthak Roy,
Ashish Harshavardhan,
Animesh Mukherjee,
Punyajoy Saha
Abstract:
Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without a…
▽ More
Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts.
△ Less
Submitted 28 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Using ChatGPT in HCI Research -- A Trioethnography
Authors:
Smit Desai,
Tanusree Sharma,
Pratyasha Saha
Abstract:
This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we e…
▽ More
This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we examine (1) the emotional impact of using ChatGPT, with a focus on frustration and embarrassment, (2) the absence of accountability and consideration of future implications in design, and raise (3) questions around bias from a Global South perspective. Our work aims to inspire critical discussions about utilizing ChatGPT in HCI research and advance equitable and inclusive technological development.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder
Authors:
Rownak Ara Rasul,
Promy Saha,
Diponkor Bala,
S M Rakib Ul Karim,
Md. Ibrahim Abdullah,
Bishwajit Saha
Abstract:
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify cru…
▽ More
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.
△ Less
Submitted 28 December, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Post-Deployment Adaptation with Access to Source Data via Federated Learning and Source-Target Remote Gradient Alignment
Authors:
Felix Wagner,
Zeju Li,
Pramit Saha,
Konstantinos Kamnitsas
Abstract:
Deployment of Deep Neural Networks in medical imaging is hindered by distribution shift between training data and data processed after deployment, causing performance degradation. Post-Deployment Adaptation (PDA) addresses this by tailoring a pre-trained, deployed model to the target data distribution using limited labelled or entirely unlabelled target data, while assuming no access to source tra…
▽ More
Deployment of Deep Neural Networks in medical imaging is hindered by distribution shift between training data and data processed after deployment, causing performance degradation. Post-Deployment Adaptation (PDA) addresses this by tailoring a pre-trained, deployed model to the target data distribution using limited labelled or entirely unlabelled target data, while assuming no access to source training data as they cannot be deployed with the model due to privacy concerns and their large size. This makes reliable adaptation challenging due to limited learning signal. This paper challenges this assumption and introduces FedPDA, a novel adaptation framework that brings the utility of learning from remote data from Federated Learning into PDA. FedPDA enables a deployed model to obtain information from source data via remote gradient exchange, while aiming to optimize the model specifically for the target domain. Tailored for FedPDA, we introduce a novel optimization method StarAlign (Source-Target Remote Gradient Alignment) that aligns gradients between source-target domain pairs by maximizing their inner product, to facilitate learning a target-specific model. We demonstrate the method's effectiveness using multi-center databases for the tasks of cancer metastases detection and skin lesion classification, where our method compares favourably to previous work. Code is available at: https://github.com/FelixWag/StarAlign
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
HateMM: A Multi-Modal Dataset for Hate Video Classification
Authors:
Mithun Das,
Rohit Raj,
Punyajoy Saha,
Binny Mathew,
Manish Gupta,
Animesh Mukherjee
Abstract:
Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are n…
▽ More
Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate ~43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
On the rise of fear speech in online social media
Authors:
Punyajoy Saha,
Kiran Garimella,
Narla Komal Kalyan,
Saurabh Kumar Pandey,
Pauras Mangesh Meher,
Binny Mathew,
Animesh Mukherjee
Abstract:
Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a ta…
▽ More
Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a target community. Although subtle, it might be highly effective, often pushing communities toward a physical conflict. Therefore, understanding their prevalence in social media is of paramount importance. This article presents a large-scale study to understand the prevalence of 400K fear speech and over 700K hate speech posts collected from Gab.com. Remarkably, users posting a large number of fear speech accrue more followers and occupy more central positions in social networks than users posting a large number of hate speech. They can also reach out to benign users more effectively than hate speech users through replies, reposts, and mentions. This connects to the fact that, unlike hate speech, fear speech has almost zero toxic content, making it look plausible. Moreover, while fear speech topics mostly portray a community as a perpetrator using a (fake) chain of argumentation, hate speech topics hurl direct multitarget insults, thus pointing to why general users could be more gullible to fear speech. Our findings transcend even to other platforms (Twitter and Facebook) and thus necessitate using sophisticated moderation policies and mass awareness to combat fear speech.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
HateProof: Are Hateful Meme Detection Systems really Robust?
Authors:
Piush Aggarwal,
Pranit Chawla,
Mithun Das,
Punyajoy Saha,
Binny Mathew,
Torsten Zesch,
Animesh Mukherjee
Abstract:
Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such syste…
▽ More
Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
Rationale-Guided Few-Shot Classification to Detect Abusive Language
Authors:
Punyajoy Saha,
Divyanshu Sheth,
Kushal Kedia,
Binny Mathew,
Animesh Mukherjee
Abstract:
Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in tha…
▽ More
Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RGFS-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RGFS-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.
△ Less
Submitted 27 July, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Hate Speech and Offensive Language Detection in Bengali
Authors:
Mithun Das,
Somnath Banerjee,
Punyajoy Saha,
Animesh Mukherjee
Abstract:
Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a curr…
▽ More
Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Road Rutting Detection using Deep Learning on Images
Authors:
Poonam Kumari Saha,
Deeksha Arya,
Ashutosh Kumar,
Hiroya Maeda,
Yoshihide Sekimoto
Abstract:
Road rutting is a severe road distress that can cause premature failure of road incurring early and costly maintenance costs. Research on road damage detection using image processing techniques and deep learning are being actively conducted in the past few years. However, these researches are mostly focused on detection of cracks, potholes, and their variants. Very few research has been done on th…
▽ More
Road rutting is a severe road distress that can cause premature failure of road incurring early and costly maintenance costs. Research on road damage detection using image processing techniques and deep learning are being actively conducted in the past few years. However, these researches are mostly focused on detection of cracks, potholes, and their variants. Very few research has been done on the detection of road rutting. This paper proposes a novel road rutting dataset comprising of 949 images and provides both object level and pixel level annotations. Object detection models and semantic segmentation models were deployed to detect road rutting on the proposed dataset, and quantitative and qualitative analysis of model predictions were done to evaluate model performance and identify challenges faced in the detection of road rutting using the proposed method. Object detection model YOLOX-s achieves mAP@IoU=0.5 of 61.6% and semantic segmentation model PSPNet (Resnet-50) achieves IoU of 54.69 and accuracy of 72.67, thus providing a benchmark accuracy for similar work in future. The proposed road rutting dataset and the results of our research study will help accelerate the research on detection of road rutting using deep learning.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Exploration of Parameter Spaces Assisted by Machine Learning
Authors:
A. Hammad,
Myeonghun Park,
Raymundo Ramos,
Pankaj Saha
Abstract:
We demonstrate two sampling procedures assisted by machine learning models via regression and classification. The main objective is the use of a neural network to suggest points likely inside regions of interest, reducing the number of evaluations of time consuming calculations. We compare results from this approach with results from other sampling methods, namely Markov chain Monte Carlo and Mult…
▽ More
We demonstrate two sampling procedures assisted by machine learning models via regression and classification. The main objective is the use of a neural network to suggest points likely inside regions of interest, reducing the number of evaluations of time consuming calculations. We compare results from this approach with results from other sampling methods, namely Markov chain Monte Carlo and MultiNest, obtaining results that range from comparably similar to arguably better. In particular, we augment our classifier method with a boosting technique that rapidly increases the efficiency within a few iterations. We show results from our methods applied to a toy model and the type II 2HDM, using 3 and 7 free parameters, respectively. The code used for this paper and instructions are publicly available on the web.
△ Less
Submitted 11 January, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments
Authors:
Millon Madhur Das,
Punyajoy Saha,
Mithun Das
Abstract:
The proliferation of online hate speech has necessitated the creation of algorithms which can detect toxicity. Most of the past research focuses on this detection as a classification task, but assigning an absolute toxicity label is often tricky. Hence, few of the past works transform the same task into a regression. This paper shows the comparative evaluation of different transformers and traditi…
▽ More
The proliferation of online hate speech has necessitated the creation of algorithms which can detect toxicity. Most of the past research focuses on this detection as a classification task, but assigning an absolute toxicity label is often tricky. Hence, few of the past works transform the same task into a regression. This paper shows the comparative evaluation of different transformers and traditional machine learning models on a recently released toxicity severity measurement dataset by Jigsaw. We further demonstrate the issues with the model predictions using explainability analysis.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Rethinking Task-Incremental Learning Baselines
Authors:
Md Sazzad Hossain,
Pritom Saha,
Townim Faisal Chowdhury,
Shafin Rahman,
Fuad Rahman,
Nabeel Mohammed
Abstract:
It is common to have continuous streams of new data that need to be introduced in the system in real-world applications. The model needs to learn newly added capabilities (future tasks) while retaining the old knowledge (past tasks). Incremental learning has recently become increasingly appealing for this problem. Task-incremental learning is a kind of incremental learning where task identity of n…
▽ More
It is common to have continuous streams of new data that need to be introduced in the system in real-world applications. The model needs to learn newly added capabilities (future tasks) while retaining the old knowledge (past tasks). Incremental learning has recently become increasingly appealing for this problem. Task-incremental learning is a kind of incremental learning where task identity of newly included task (a set of classes) remains known during inference. A common goal of task-incremental methods is to design a network that can operate on minimal size, maintaining decent performance. To manage the stability-plasticity dilemma, different methods utilize replay memory of past tasks, specialized hardware, regularization monitoring etc. However, these methods are still less memory efficient in terms of architecture growth or input data costs. In this study, we present a simple yet effective adjustment network (SAN) for task incremental learning that achieves near state-of-the-art performance while using minimal architectural size without using memory instances compared to previous state-of-the-art approaches. We investigate this approach on both 3D point cloud object (ModelNet40) and 2D image (CIFAR10, CIFAR100, MiniImageNet, MNIST, PermutedMNIST, notMNIST, SVHN, and FashionMNIST) recognition tasks and establish a strong baseline result for a fair comparison with existing methods. On both 2D and 3D domains, we also observe that SAN is primarily unaffected by different task orders in a task-incremental setting.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech
Authors:
Punyajoy Saha,
Kanishk Singh,
Adarsh Kumar,
Binny Mathew,
Animesh Mukherjee
Abstract:
Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi -…
▽ More
Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems
Authors:
Burhan A. Mudassar,
Sho Ko,
Maojingjing Li,
Priyabrata Saha,
Saibal Mukhopadhyay
Abstract:
Interactive autonomous applications require robustness of the perception engine to artifacts in unconstrained videos. In this paper, we examine the effect of camera motion on the task of action detection. We develop a novel ranking method to rank videos based on the degree of global camera motion. For the high ranking camera videos we show that the accuracy of action detection is decreased. We pro…
▽ More
Interactive autonomous applications require robustness of the perception engine to artifacts in unconstrained videos. In this paper, we examine the effect of camera motion on the task of action detection. We develop a novel ranking method to rank videos based on the degree of global camera motion. For the high ranking camera videos we show that the accuracy of action detection is decreased. We propose an action detection pipeline that is robust to the camera motion effect and verify it empirically. Specifically, we do actor feature alignment across frames and couple global scene features with local actor-specific features. We do feature alignment using a novel formulation of the Spatio-temporal Sampling Network (STSN) but with multi-scale offset prediction and refinement using a pyramid structure. We also propose a novel input dependent weighted averaging strategy for fusing local and global features. We show the applicability of our network on our dataset of moving camera videos with high camera motion (MOVE dataset) with a 4.1% increase in frame mAP and 17% increase in video mAP.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Authors:
Mithun Das,
Punyajoy Saha,
Binny Mathew,
Animesh Mukherjee
Abstract:
Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than one language have been used for conversation in social media. Typically, hate speech detection models are evaluated by measuring their performance on the held-…
▽ More
Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than one language have been used for conversation in social media. Typically, hate speech detection models are evaluated by measuring their performance on the held-out test data using metrics such as accuracy and F1-score. While these metrics are useful, it becomes difficult to identify using them where the model is failing, and how to resolve it. To enable more targeted diagnostic insights of such multilingual hate speech models, we introduce a set of functionalities for the purpose of evaluation. We have been inspired to design this kind of functionalities based on real-world conversation on social media. Considering Hindi as a base language, we craft test cases for each functionality. We name our evaluation dataset HateCheckHIn. To illustrate the utility of these functionalities , we test state-of-the-art transformer based m-BERT model and the Perspective API.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
Unraveled Multilevel Transformation Networks for Predicting Sparsely-Observed Spatiotemporal Dynamics
Authors:
Priyabrata Saha,
Saibal Mukhopadhyay
Abstract:
In this paper, we address the problem of predicting complex, nonlinear spatiotemporal dynamics when available data is recorded at irregularly-spaced sparse spatial locations. Most of the existing deep learning models for modeling spatiotemporal dynamics are either designed for data in a regular grid or struggle to uncover the spatial relations from sparse and irregularly-spaced data sites. We prop…
▽ More
In this paper, we address the problem of predicting complex, nonlinear spatiotemporal dynamics when available data is recorded at irregularly-spaced sparse spatial locations. Most of the existing deep learning models for modeling spatiotemporal dynamics are either designed for data in a regular grid or struggle to uncover the spatial relations from sparse and irregularly-spaced data sites. We propose a deep learning model that learns to predict unknown spatiotemporal dynamics using data from sparsely-distributed data sites. We base our approach on Radial Basis Function (RBF) collocation method which is often used for meshfree solution of partial differential equations (PDEs). The RBF framework allows us to unravel the observed spatiotemporal function and learn the spatial interactions among data sites on the RBF-space. The learned spatial features are then used to compose multilevel transformations of the raw observations and predict its evolution in future time steps. We demonstrate the advantage of our approach using both synthetic and real-world climate data.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Abusive and Threatening Language Detection in Urdu using Boosting based and BERT based models: A Comparative Approach
Authors:
Mithun Das,
Somnath Banerjee,
Punyajoy Saha
Abstract:
Online hatred is a growing concern on many social media platforms. To address this issue, different social media platforms have introduced moderation policies for such content. They also employ moderators who can check the posts violating moderation policies and take appropriate action. Academicians in the abusive language research domain also perform various studies to detect such content better.…
▽ More
Online hatred is a growing concern on many social media platforms. To address this issue, different social media platforms have introduced moderation policies for such content. They also employ moderators who can check the posts violating moderation policies and take appropriate action. Academicians in the abusive language research domain also perform various studies to detect such content better. Although there is extensive research in abusive language detection in English, there is a lacuna in abusive language detection in low resource languages like Hindi, Urdu etc. In this FIRE 2021 shared task - "HASOC- Abusive and Threatening language detection in Urdu" the organizers propose an abusive language detection dataset in Urdu along with threatening language detection. In this paper, we explored several machine learning models such as XGboost, LGBM, m-BERT based models for abusive and threatening content detection in Urdu based on the shared task. We observed the Transformer model specifically trained on abusive language dataset in Arabic helps in getting the best performance. Our model came First for both abusive and threatening content detection with an F1scoreof 0.88 and 0.54, respectively.
△ Less
Submitted 27 November, 2021;
originally announced November 2021.
-
Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages
Authors:
Somnath Banerjee,
Maulindu Sarkar,
Nancy Agrawal,
Punyajoy Saha,
Mithun Das
Abstract:
Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models f…
▽ More
Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2nd position in Code-Mixed Data set (Macro F1: 0.7107), 2nd position in Hindi two-class classification(Macro F1: 0.7797), 4th in English four-class category (Macro F1: 0.8006) and 12th in English two-class category (Macro F1: 0.6447).
△ Less
Submitted 27 November, 2021;
originally announced November 2021.
-
You too Brutus! Trapping Hateful Users in Social Media: Challenges, Solutions & Insights
Authors:
Mithun Das,
Punyajoy Saha,
Ritam Dutt,
Pawan Goyal,
Animesh Mukherjee,
Binny Mathew
Abstract:
Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an ar…
▽ More
Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an array of models ranging from purely textual to graph based to finally semi-supervised techniques using Graph Neural Networks (GNN) that utilize both textual and graph-based features. We run exhaustive experiments on two datasets -- Gab, which is loosely moderated and Twitter, which is strictly moderated. Overall the AGNN model achieves 0.791 macro F1-score on the Gab dataset and 0.780 macro F1-score on the Twitter dataset using only 5% of the labeled instances, considerably outperforming all the other models including the fully supervised ones. We perform detailed error analysis on the best performing text and graph based models and observe that hateful users have unique network neighborhood signatures and the AGNN model benefits by paying attention to these signatures. This property, as we observe, also allows the model to generalize well across platforms in a zero-shot setting. Lastly, we utilize the best performing GNN model to analyze the evolution of hateful users and their targets over time in Gab.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection
Authors:
Debjoy Saha,
Naman Paharia,
Debajit Chakraborty,
Punyajoy Saha,
Animesh Mukherjee
Abstract:
Social media often acts as breeding grounds for different forms of offensive content. For low resource languages like Tamil, the situation is more complex due to the poor performance of multilingual or language-specific models and lack of proper benchmark datasets. Based on this shared task, Offensive Language Identification in Dravidian Languages at EACL 2021, we present an exhaustive exploration…
▽ More
Social media often acts as breeding grounds for different forms of offensive content. For low resource languages like Tamil, the situation is more complex due to the poor performance of multilingual or language-specific models and lack of proper benchmark datasets. Based on this shared task, Offensive Language Identification in Dravidian Languages at EACL 2021, we present an exhaustive exploration of different transformer models, We also provide a genetic algorithm technique for ensembling different models. Our ensembled models trained separately for each language secured the first position in Tamil, the second position in Kannada, and the first position in Malayalam sub-tasks. The models and codes are provided.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
"Short is the Road that Leads from Fear to Hate": Fear Speech in Indian WhatsApp Groups
Authors:
Punyajoy Saha,
Binny Mathew,
Kiran Garimella,
Animesh Mukherjee
Abstract:
WhatsApp is the most popular messaging app in the world. Due to its popularity, WhatsApp has become a powerful and cheap tool for political campaigning being widely used during the 2019 Indian general election, where it was used to connect to the voters on a large scale. Along with the campaigning, there have been reports that WhatsApp has also become a breeding ground for harmful speech against v…
▽ More
WhatsApp is the most popular messaging app in the world. Due to its popularity, WhatsApp has become a powerful and cheap tool for political campaigning being widely used during the 2019 Indian general election, where it was used to connect to the voters on a large scale. Along with the campaigning, there have been reports that WhatsApp has also become a breeding ground for harmful speech against various protected groups and religious minorities. Many such messages attempt to instil fear among the population about a specific (minority) community. According to research on inter-group conflict, such `fear speech' messages could have a lasting impact and might lead to real offline violence. In this paper, we perform the first large scale study on fear speech across thousands of public WhatsApp groups discussing politics in India. We curate a new dataset and try to characterize fear speech from this dataset. We observe that users writing fear speech messages use various events and symbols to create the illusion of fear among the reader about a target community. We build models to classify fear speech and observe that current state-of-the-art NLP models do not perform well at this task. Fear speech messages tend to spread faster and could potentially go undetected by classifiers built to detect traditional toxic speech due to their low toxic nature. Finally, using a novel methodology to target users with Facebook ads, we conduct a survey among the users of these WhatsApp groups to understand the types of users who consume and share fear speech. We believe that this work opens up new research questions that are very different from tackling hate speech which the research community has been traditionally involved in.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
"Facebook Promotes More Harassment": Social Media Ecosystem, Skill and Marginalized Hijra Identity in Bangladesh
Authors:
Fayika Farhat Nova,
Michael Ann Devito,
Pratyasha Saha,
Kazi Shohanur Rashid,
Shashwata Roy Turzo,
Sadia Afrin,
Shion Guha
Abstract:
Social interaction across multiple online platforms is a challenge for gender and sexual minorities (GSM) due to the stigmatization they face, which increases the complexity of their self-presentation decisions. These online interactions and identity disclosures can be more complicated for GSM in non-Western contexts due to consequentially different audiences and perceived affordances by the users…
▽ More
Social interaction across multiple online platforms is a challenge for gender and sexual minorities (GSM) due to the stigmatization they face, which increases the complexity of their self-presentation decisions. These online interactions and identity disclosures can be more complicated for GSM in non-Western contexts due to consequentially different audiences and perceived affordances by the users, and limited baseline understanding of the conflation of these two with local norms and the opportunities they practically represent. Using focus group discussions and semi-structured interviews, we engaged with 61 \textit{Hijra} individuals from Bangladesh, a severely stigmatized GSM from south Asia, to understand their overall online participation and disclosure behaviors through the lens of personal social media ecosystems. We find that along with platform audiences, affordances, and norms, participant skill/knowledge, and cultural influences also impact navigation through multiple platforms, resulting in differential benefits from privacy features. This impacts how Hijra perceive online spaces, and shape their self-presentation and disclosure behaviors over time.
Content Warning: This paper discusses graphic contents (e.g. rape and sexual harassment) related to Hijra.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer
Authors:
Pramit Saha,
Debasish Ray Mohapatra,
Sidney Fels
Abstract:
This work presents our advancements in controlling an articulatory speech synthesis engine, \textit{viz.}, Pink Trombone, with hand gestures. Our interface translates continuous finger movements and wrist flexion into continuous speech using vocal tract area-function based articulatory speech synthesis. We use Cyberglove II with 18 sensors to capture the kinematic information of the wrist and the…
▽ More
This work presents our advancements in controlling an articulatory speech synthesis engine, \textit{viz.}, Pink Trombone, with hand gestures. Our interface translates continuous finger movements and wrist flexion into continuous speech using vocal tract area-function based articulatory speech synthesis. We use Cyberglove II with 18 sensors to capture the kinematic information of the wrist and the individual fingers, in order to control a virtual tongue. The coordinates and the bending values of the sensors are then utilized to fit a spline tongue model that smoothens out the noisy values and outliers. Considering the upper palate as fixed and the spline model as the dynamically moving lower surface (tongue) of the vocal tract, we compute 1D area functional values that are fed to the Pink Trombone, generating continuous speech sounds. Therefore, by learning to manipulate one's wrist and fingers, one can learn to produce speech sounds just through one's hands, without the need for using the vocal tract.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Mining the online infosphere: A survey
Authors:
Sayantan Adak,
Souvic Chakraborty,
Paramtia Das,
Mithun Das,
Abhisek Dash,
Rima Hazra,
Binny Mathew,
Punyajoy Saha,
Soumya Sarkar,
Animesh Mukherjee
Abstract:
The evolution of AI-based system and applications had pervaded everyday life to make decisions that have momentous impact on individuals and society. With the staggering growth of online data, often termed as the Online Infosphere it has become paramount to monitor the infosphere to ensure social good as the AI-based decisions are severely dependent on it. The goal of this survey is to provide a c…
▽ More
The evolution of AI-based system and applications had pervaded everyday life to make decisions that have momentous impact on individuals and society. With the staggering growth of online data, often termed as the Online Infosphere it has become paramount to monitor the infosphere to ensure social good as the AI-based decisions are severely dependent on it. The goal of this survey is to provide a comprehensive review of some of the most important research areas related to infosphere, focusing on the technical challenges and potential solutions. The survey also outlines some of the important future directions. We begin by discussions focused on the collaborative systems that have emerged within the infosphere with a special thrust on Wikipedia. In the follow up we demonstrate how the infosphere has been instrumental in the growth of scientific citations and collaborations thus fueling interdisciplinary research. Finally, we illustrate the issues related to the governance of the infosphere such as the tackling of the (a) rising hateful and abusive behavior and (b) bias and discrimination in different online platforms and news reporting.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Authors:
Binny Mathew,
Punyajoy Saha,
Seid Muhie Yimam,
Chris Biemann,
Pawan Goyal,
Animesh Mukherjee
Abstract:
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three…
▽ More
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public at https://github.com/punyajoy/HateXplain
△ Less
Submitted 12 April, 2022; v1 submitted 18 December, 2020;
originally announced December 2020.
-
A Deep Learning Approach for Predicting Spatiotemporal Dynamics From Sparsely Observed Data
Authors:
Priyabrata Saha,
Saibal Mukhopadhyay
Abstract:
In this paper, we consider the problem of learning prediction models for spatiotemporal physical processes driven by unknown partial differential equations (PDEs). We propose a deep learning framework that learns the underlying dynamics and predicts its evolution using sparsely distributed data sites. Deep learning has shown promising results in modeling physical dynamics in recent years. However,…
▽ More
In this paper, we consider the problem of learning prediction models for spatiotemporal physical processes driven by unknown partial differential equations (PDEs). We propose a deep learning framework that learns the underlying dynamics and predicts its evolution using sparsely distributed data sites. Deep learning has shown promising results in modeling physical dynamics in recent years. However, most of the existing deep learning methods for modeling physical dynamics either focus on solving known PDEs or require data in a dense grid when the governing PDEs are unknown. In contrast, our method focuses on learning prediction models for unknown PDE-driven dynamics only from sparsely observed data. The proposed method is spatial dimension-independent and geometrically flexible. We demonstrate our method in the forecasting task for the two-dimensional wave equation and the Burgers-Fisher equation in multiple geometries with different boundary conditions, and the ten-dimensional heat equation.
△ Less
Submitted 1 May, 2021; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Neural Identification for Control
Authors:
Priyabrata Saha,
Magnus Egerstedt,
Saibal Mukhopadhyay
Abstract:
We present a new method for learning control law that stabilizes an unknown nonlinear dynamical system at an equilibrium point. We formulate a system identification task in a self-supervised learning setting that jointly learns a controller and corresponding stable closed-loop dynamics hypothesis. The input-output behavior of the unknown dynamical system under random control inputs is used as the…
▽ More
We present a new method for learning control law that stabilizes an unknown nonlinear dynamical system at an equilibrium point. We formulate a system identification task in a self-supervised learning setting that jointly learns a controller and corresponding stable closed-loop dynamics hypothesis. The input-output behavior of the unknown dynamical system under random control inputs is used as the supervising signal to train the neural network-based system model and the controller. The proposed method relies on the Lyapunov stability theory to generate a stable closed-loop dynamics hypothesis and corresponding control law. We demonstrate our method on various nonlinear control problems such as n-link pendulum balancing and trajectory tracking, pendulum on cart balancing, and wheeled vehicle path following.
△ Less
Submitted 15 March, 2022; v1 submitted 24 September, 2020;
originally announced September 2020.
-
Combating Misinformation in Bangladesh: Roles and Responsibilities as Perceived by Journalists, Fact-checkers, and Users
Authors:
Md Mahfuzul Haque,
Mohammad Yousuf,
Ahmed Shatil Alam,
Pratyasha Saha,
Syed Ishtiaque Ahmed,
Naeemul Hassan
Abstract:
There has been a growing interest within CSCW community in understanding the characteristics of misinformation propagated through computational media, and the devising techniques to address the associated challenges. However, most work in this area has been concentrated on the cases in the western world leaving a major portion of this problem unaddressed that is situated in the Global South. This…
▽ More
There has been a growing interest within CSCW community in understanding the characteristics of misinformation propagated through computational media, and the devising techniques to address the associated challenges. However, most work in this area has been concentrated on the cases in the western world leaving a major portion of this problem unaddressed that is situated in the Global South. This paper aims to broaden the scope of this discourse by focusing on this problem in the context of Bangladesh, a country in the Global South. The spread of misinformation on Facebook in Bangladesh, a country with a population over 163 million, has resulted in chaos, hate attacks, and killings. By interviewing journalists, fact-checkers, in addition to surveying the general public, we analyzed the current state of verifying misinformation in Bangladesh. Our findings show that most people in the `news audience' want the news media to verify the authenticity of online information that they see online. However, the newspaper journalists say that fact-checking online information is not a part of their job, and it is also beyond their capacity given the amount of information being published online everyday. We further find that the voluntary fact-checkers in Bangladesh are not equipped with sufficient infrastructural support to fill in this gap. We show how our findings are connected to some of the core concerns of CSCW community around social media, collaboration, infrastructural politics, and information inequality. From our analysis, we also suggest several pathways to increase the impact of fact-checking efforts through collaboration, technology design, and infrastructure development.
△ Less
Submitted 27 August, 2020; v1 submitted 24 July, 2020;
originally announced July 2020.
-
Ultra2Speech -- A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images
Authors:
Pramit Saha,
Yadong Liu,
Bryan Gick,
Sidney Fels
Abstract:
Thousands of individuals need surgical removal of their larynx due to critical diseases every year and therefore, require an alternative form of communication to articulate speech sounds after the loss of their voice box. This work addresses the articulatory-to-acoustic mapping problem based on ultrasound (US) tongue images for the development of a silent-speech interface (SSI) that can provide th…
▽ More
Thousands of individuals need surgical removal of their larynx due to critical diseases every year and therefore, require an alternative form of communication to articulate speech sounds after the loss of their voice box. This work addresses the articulatory-to-acoustic mapping problem based on ultrasound (US) tongue images for the development of a silent-speech interface (SSI) that can provide them with an assistance in their daily interactions. Our approach targets automatically extracting tongue movement information by selecting an optimal feature set from US images and mapping these features to the acoustic space. We use a novel deep learning architecture to map US tongue images from the US probe placed beneath a subject's chin to formants that we call, Ultrasound2Formant (U2F) Net. It uses hybrid spatio-temporal 3D convolutions followed by feature shuffling, for the estimation and tracking of vowel formants from US images. The formant values are then utilized to synthesize continuous time-varying vowel trajectories, via Klatt Synthesizer. Our best model achieves R-squared (R^2) measure of 99.96% for the regression task. Our network lays the foundation for an SSI as it successfully tracks the tongue contour automatically as an internal representation without any explicit annotation.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Machine learning dynamics of phase separation in correlated electron magnets
Authors:
Puhan Zhang,
Preetha Saha,
Gia-Wei Chern
Abstract:
We demonstrate machine-learning enabled large-scale dynamical simulations of electronic phase separation in double-exchange system. This model, also known as the ferromagnetic Kondo lattice model, is believed to be relevant for the colossal magnetoresistance phenomenon. Real-space simulations of such inhomogeneous states with exchange forces computed from the electron Hamiltonian can be prohibitiv…
▽ More
We demonstrate machine-learning enabled large-scale dynamical simulations of electronic phase separation in double-exchange system. This model, also known as the ferromagnetic Kondo lattice model, is believed to be relevant for the colossal magnetoresistance phenomenon. Real-space simulations of such inhomogeneous states with exchange forces computed from the electron Hamiltonian can be prohibitively expensive for large systems. Here we show that linear-scaling exchange field computation can be achieved using neural networks trained by datasets from exact calculation on small lattices. Our Landau-Lifshitz dynamics simulations based on machine-learning potentials nicely reproduce not only the nonequilibrium relaxation process, but also correlation functions that agree quantitatively with exact simulations. Our work paves the way for large-scale dynamical simulations of correlated electron systems using machine-learning models.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.