Tree-Based Learning on Amperometric Time Series Data Demonstrates High Accuracy for Classification
Authors:
Jeyashree Krishnan,
Zeyu Lian,
Pieter E. Oomen,
Xiulan He,
Soodabeh Majdi,
Andreas Schuppert,
Andrew Ewing
Abstract:
Elucidating exocytosis processes provide insights into cellular neurotransmission mechanisms, and may have potential in neurodegenerative diseases research. Amperometry is an established electrochemical method for the detection of neurotransmitters released from and stored inside cells. An important aspect of the amperometry method is the sub-millisecond temporal resolution of the current recordin…
▽ More
Elucidating exocytosis processes provide insights into cellular neurotransmission mechanisms, and may have potential in neurodegenerative diseases research. Amperometry is an established electrochemical method for the detection of neurotransmitters released from and stored inside cells. An important aspect of the amperometry method is the sub-millisecond temporal resolution of the current recordings which leads to several hundreds of gigabytes of high-quality data. In this study, we present a universal method for the classification with respect to diverse amperometric datasets using data-driven approaches in computational science. We demonstrate a very high prediction accuracy (greater than or equal to 95%). This includes an end-to-end systematic machine learning workflow for amperometric time series datasets consisting of pre-processing; feature extraction; model identification; training and testing; followed by feature importance evaluation - all implemented. We tested the method on heterogeneous amperometric time series datasets generated using different experimental approaches, chemical stimulations, electrode types, and varying recording times. We identified a certain overarching set of common features across these datasets which enables accurate predictions. Further, we showed that information relevant for the classification of amperometric traces are neither in the spiky segments alone, nor can it be retrieved from just the temporal structure of spikes. In fact, the transients between spikes and the trace baselines carry essential information for a successful classification, thereby strongly demonstrating that an effective feature representation of amperometric time series requires the full time series. To our knowledge, this is one of the first studies that propose a scheme for machine learning, and in particular, supervised learning on full amperometry time series data.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
Risks of Using Non-verified Open Data: A case study on using Machine Learning techniques for predicting Pregnancy Outcomes in India
Authors:
Anusua Trivedi,
Sumit Mukherjee,
Edmund Tse,
Anne Ewing,
Juan Lavista Ferres
Abstract:
Artificial intelligence (AI) has evolved considerably in the last few years. While applications of AI is now becoming more common in fields like retail and marketing, application of AI in solving problems related to developing countries is still an emerging topic. Specially, AI applications in resource-poor settings remains relatively nascent. There is a huge scope of AI being used in such setting…
▽ More
Artificial intelligence (AI) has evolved considerably in the last few years. While applications of AI is now becoming more common in fields like retail and marketing, application of AI in solving problems related to developing countries is still an emerging topic. Specially, AI applications in resource-poor settings remains relatively nascent. There is a huge scope of AI being used in such settings. For example, researchers have started exploring AI applications to reduce poverty and deliver a broad range of critical public services. However, despite many promising use cases, there are many dataset related challenges that one has to overcome in such projects. These challenges often take the form of missing data, incorrectly collected data and improperly labeled variables, among other factors. As a result, we can often end up using data that is not representative of the problem we are trying to solve. In this case study, we explore the challenges of using such an open dataset from India, to predict an important health outcome. We highlight how the use of AI without proper understanding of reporting metrics can lead to erroneous conclusions.
△ Less
Submitted 21 October, 2019; v1 submitted 4 October, 2019;
originally announced October 2019.