Using a decision tree to predict the number of COVID cases: a tutorial for beginners

bioRxiv [Preprint]. 2024 May 9:2023.12.19.572463. doi: 10.1101/2023.12.19.572463.

Abstract

This manuscript describes the development of a module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox . The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on machine learning and decision tree concepts in an interactive format that uses appropriate cloud resources for data access and analyses. Machine learning (ML) is an important tool in biomedical research and can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic ML was used for predictions at the patient and community levels. Given its ubiquity, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for everyone to learn about machine learning. The learning module we present here is based on a small COVID dataset, videos, annotated code and the use of Google Colab or the Google Cloud Platform (GCP). The benefit of these platforms is that students do not have to set up a programming environment on their computer which saves time and is also an important democratization factor. The module focuses on learning the basics of decision trees by applying them to COVID data. It introduces basic terminology used in supervised machine learning and its relevance to research. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.

Publication types

  • Preprint