-
On Achievable Rates for the Shotgun Sequencing Channel with Erasures
Authors:
Hrishi Narayanan,
Prasad Krishnan,
Nita Parekh
Abstract:
In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless…
▽ More
In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless substrings of the original sequence. Modern shotgun sequencers however also output quality scores for each base read, indicating the confidence in its identification. Bases with low quality scores can be considered to be erased. Motivated by this, we consider the \textit{shotgun sequencing channel with erasures}, where each symbol in any read can be independently erased with some probability $δ$. We identify achievable rates for this channel, using a random code construction and a decoder that uses typicality-like arguments to merge the reads.
△ Less
Submitted 12 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Using Audio Data to Facilitate Depression Risk Assessment in Primary Health Care
Authors:
Adam Valen Levinson,
Abhay Goyal,
Roger Ho Chun Man,
Roy Ka-Wei Lee,
Koustuv Saha,
Nimay Parekh,
Frederick L. Altice,
Lam Yin Cheung,
Munmun De Choudhury,
Navin Kumar
Abstract:
Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve ove…
▽ More
Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve overall diagnosis and treatment outcomes. Telehealth consultations often have video issues, such as poor connectivity or dropped calls. Audio-only telehealth is often more practical for lower-income patients who may lack stable internet connections. Thus, our study focused on using audio data to predict depression risk. The objectives were to: 1) Collect audio data from 24 people (12 with depression and 12 without mental health or major health condition diagnoses); 2) Build a machine learning model to predict depression risk. TPOT, an autoML tool, was used to select the best machine learning algorithm, which was the K-nearest neighbors classifier. The selected model had high performance in classifying depression risk (Precision: 0.98, Recall: 0.93, F1-Score: 0.96). These findings may lead to a range of tools to help screen for and treat depression. By developing tools to detect depression risk, patients can be routed to AI-driven chatbots for initial screenings. Partnerships with a range of stakeholders are crucial to implementing these solutions. Moreover, ethical considerations, especially around data privacy and potential biases in AI models, need to be at the forefront of any AI-driven intervention in mental health care.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
ChatGPT and Bard Responses to Polarizing Questions
Authors:
Abhay Goyal,
Muhammad Siddique,
Nimay Parekh,
Zach Schwitzky,
Clara Broekaert,
Connor Michelotti,
Allie Wong,
Lam Yin Cheung,
Robin O Hanlon,
Lam Yin Cheung,
Munmun De Choudhury,
Roy Ka-Wei Lee,
Navin Kumar
Abstract:
Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometime…
▽ More
Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometimes produce text that is convincing, but often incorrect, known as hallucinations. As such, their use can distort scientific facts and spread misinformation. To counter polarizing responses on these tools, it is critical to provide an overview of such responses so stakeholders can determine which topics tend to produce more contentious responses -- key to developing targeted regulatory policy and interventions. In addition, there currently exists no annotated dataset of ChatGPT and Bard responses around possibly polarizing topics, central to the above aims. We address the indicated issues through the following contribution: Focusing on highly polarizing topics in the US, we created and described a dataset of ChatGPT and Bard responses. Broadly, our results indicated a left-leaning bias for both ChatGPT and Bard, with Bard more likely to provide responses around polarizing topics. Bard seemed to have fewer guardrails around controversial topics, and appeared more willing to provide comprehensive, and somewhat human-like responses. Bard may thus be more likely abused by malicious actors. Stakeholders may utilize our findings to mitigate misinformative and/or polarizing responses from LLMs
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
How is Fatherhood Framed Online in Singapore?
Authors:
Tran Hien Van,
Abhay Goyal,
Muhammad Siddique,
Lam Yin Cheung,
Nimay Parekh,
Jonathan Y Huang,
Keri McCrickerd,
Edson C Tandoc Jr.,
Gerard Chung,
Navin Kumar
Abstract:
The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705…
▽ More
The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
Predicting Opioid Use Outcomes in Minoritized Communities
Authors:
Abhay Goyal,
Nimay Parekh,
Lam Yin Cheung,
Koustuv Saha,
Frederick L Altice,
Robin O'hanlon,
Roger Ho Chun Man,
Christian Poellabauer,
Honoria Guarino,
Pedro Mateu Gelabert,
Navin Kumar
Abstract:
Machine learning algorithms can sometimes exacerbate health disparities based on ethnicity, gender, and other factors. There has been limited work at exploring potential biases within algorithms deployed on a small scale, and/or within minoritized communities. Understanding the nature of potential biases may improve the prediction of various health outcomes. As a case study, we used data from a sa…
▽ More
Machine learning algorithms can sometimes exacerbate health disparities based on ethnicity, gender, and other factors. There has been limited work at exploring potential biases within algorithms deployed on a small scale, and/or within minoritized communities. Understanding the nature of potential biases may improve the prediction of various health outcomes. As a case study, we used data from a sample of 539 young adults from minoritized communities who engaged in nonmedical use of prescription opioids and/or heroin. We addressed the indicated issues through the following contributions: 1) Using machine learning techniques, we predicted a range of opioid use outcomes for participants in our dataset; 2) We assessed if algorithms trained only on a majority sub-sample (e.g., Non-Hispanic/Latino, male), could accurately predict opioid use outcomes for a minoritized sub-sample (e.g., Latino, female). Results indicated that models trained on a random sample of our data could predict a range of opioid use outcomes with high precision. However, we noted a decrease in precision when we trained our models on data from a majority sub-sample, and tested these models on a minoritized sub-sample. We posit that a range of cultural factors and systemic forms of discrimination are not captured by data from majority sub-samples. Broadly, for predictions to be valid, models should be trained on data that includes adequate representation of the groups of people about whom predictions will be made. Stakeholders may utilize our findings to mitigate biases in models for predicting opioid use outcomes within minoritized communities.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Evaluating Generalizability of Deep Learning Models Using Indian-COVID-19 CT Dataset
Authors:
Suba S,
Nita Parekh,
Ramesh Loganathan,
Vikram Pudi,
Chinnababu Sunkavalli
Abstract:
Computer tomography (CT) have been routinely used for the diagnosis of lung diseases and recently, during the pandemic, for detecting the infectivity and severity of COVID-19 disease. One of the major concerns in using ma-chine learning (ML) approaches for automatic processing of CT scan images in clinical setting is that these methods are trained on limited and biased sub-sets of publicly availab…
▽ More
Computer tomography (CT) have been routinely used for the diagnosis of lung diseases and recently, during the pandemic, for detecting the infectivity and severity of COVID-19 disease. One of the major concerns in using ma-chine learning (ML) approaches for automatic processing of CT scan images in clinical setting is that these methods are trained on limited and biased sub-sets of publicly available COVID-19 data. This has raised concerns regarding the generalizability of these models on external datasets, not seen by the model during training. To address some of these issues, in this work CT scan images from confirmed COVID-19 data obtained from one of the largest public repositories, COVIDx CT 2A were used for training and internal vali-dation of machine learning models. For the external validation we generated Indian-COVID-19 CT dataset, an open-source repository containing 3D CT volumes and 12096 chest CT images from 288 COVID-19 patients from In-dia. Comparative performance evaluation of four state-of-the-art machine learning models, viz., a lightweight convolutional neural network (CNN), and three other CNN based deep learning (DL) models such as VGG-16, ResNet-50 and Inception-v3 in classifying CT images into three classes, viz., normal, non-covid pneumonia, and COVID-19 is carried out on these two datasets. Our analysis showed that the performance of all the models is comparable on the hold-out COVIDx CT 2A test set with 90% - 99% accuracies (96% for CNN), while on the external Indian-COVID-19 CT dataset a drop in the performance is observed for all the models (8% - 19%). The traditional ma-chine learning model, CNN performed the best on the external dataset (accu-racy 88%) in comparison to the deep learning models, indicating that a light-weight CNN is better generalizable on unseen data. The data and code are made available at https://github.com/aleesuss/c19.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Explainable and Lightweight Model for COVID-19 Detection Using Chest Radiology Images
Authors:
Suba S,
Nita Parekh
Abstract:
Deep learning (DL) analysis of Chest X-ray (CXR) and Computed tomography (CT) images has garnered a lot of attention in recent times due to the COVID-19 pandemic. Convolutional Neural Networks (CNNs) are well suited for the image analysis tasks when trained on humongous amounts of data. Applications developed for medical image analysis require high sensitivity and precision compared to any other f…
▽ More
Deep learning (DL) analysis of Chest X-ray (CXR) and Computed tomography (CT) images has garnered a lot of attention in recent times due to the COVID-19 pandemic. Convolutional Neural Networks (CNNs) are well suited for the image analysis tasks when trained on humongous amounts of data. Applications developed for medical image analysis require high sensitivity and precision compared to any other fields. Most of the tools proposed for detection of COVID-19 claims to have high sensitivity and recalls but have failed to generalize and perform when tested on unseen datasets. This encouraged us to develop a CNN model, analyze and understand the performance of it by visualizing the predictions of the model using class activation maps generated using (Gradient-weighted Class Activation Mapping) Grad-CAM technique. This study provides a detailed discussion of the success and failure of the proposed model at an image level. Performance of the model is compared with state-of-the-art DL models and shown to be comparable. The data and code used are available at https://github.com/aleesuss/c19.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
A Federated Approach to Predicting Emojis in Hindi Tweets
Authors:
Deep Gandhi,
Jash Mehta,
Nirali Parekh,
Karan Waghela,
Lynette D'Mello,
Zeerak Talat
Abstract:
The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around t…
▽ More
The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches. However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage. In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people's data. We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Synchronization and Control of Spatiotemporal Chaos using Time-Series Data from Local Regions
Authors:
Nita Parekh,
V. Ravi Kumar,
B. D. Kulkarni
Abstract:
In this paper we show that the analysis of the dynamics in localized regions, i.e., sub-systems can be used to characterize the chaotic dynamics and the synchronization ability of the spatiotemporal systems. Using noisy scalar time-series data for driving along with simultaneous self-adaptation of the control parameter representative control goals like suppressing spatiotemporal chaos and synchr…
▽ More
In this paper we show that the analysis of the dynamics in localized regions, i.e., sub-systems can be used to characterize the chaotic dynamics and the synchronization ability of the spatiotemporal systems. Using noisy scalar time-series data for driving along with simultaneous self-adaptation of the control parameter representative control goals like suppressing spatiotemporal chaos and synchronization of spatiotemporally chaotic dynamics have been discussed.
△ Less
Submitted 31 October, 1997;
originally announced November 1997.
-
Analysis and Characterization of Complex Spatio-temporal Patterns in Nonlinear Reaction-Diffusion Systems
Authors:
Nita Parekh,
V. Ravi Kumar,
B. D. Kulkarni
Abstract:
Two important classes of spatio-temporal patterns, namely, spatio-temporal chaos and self-replicating patterns, for a representative three variable autocatalytic reaction mechanism coupled with diffusion has been studied. The characterization of these patterns has been carried out in terms of Lyapunov exponents and dimension density. The results show a linear scaling as a function of sub-system…
▽ More
Two important classes of spatio-temporal patterns, namely, spatio-temporal chaos and self-replicating patterns, for a representative three variable autocatalytic reaction mechanism coupled with diffusion has been studied. The characterization of these patterns has been carried out in terms of Lyapunov exponents and dimension density. The results show a linear scaling as a function of sub-system size for the Lyapunov dimension and entropy while the Lyapunov dimension density was found to rapidly saturate. The possibility of synchronizing the spatio-temporal dynamics by analyzing the conditional Lyapunov exponents of sub-systems was also observed.
△ Less
Submitted 17 June, 1996;
originally announced June 1996.