-
Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050
Authors:
R. M. McGranaghan,
B. Thompson,
E. Camporeale,
J. Bortnik,
M. Bobra,
G. Lapenta,
S. Wing,
B. Poduval,
S. Lotz,
S. Murray,
M. Kirk,
T. Y. Chen,
H. M. Bain,
P. Riley,
B. Tremblay,
M. Cheung,
V. Delouille
Abstract:
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires…
▽ More
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission
Authors:
Carl Shneider,
Andong Hu,
Ajay K. Tiwari,
Monica G. Bobra,
Karl Battams,
Jannis Teunissen,
Enrico Camporeale
Abstract:
We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of…
▽ More
We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Machine Learning in Heliophysics and Space Weather Forecasting: A White Paper of Findings and Recommendations
Authors:
Gelu Nita,
Manolis Georgoulis,
Irina Kitiashvili,
Viacheslav Sadykov,
Enrico Camporeale,
Alexander Kosovichev,
Haimin Wang,
Vincent Oria,
Jason Wang,
Rafal Angryk,
Berkay Aydin,
Azim Ahmadzadeh,
Xiaoli Bai,
Timothy Bastian,
Soukaina Filali Boubrahimi,
Bin Chen,
Alisdair Davey,
Sheldon Fereira,
Gregory Fleishman,
Dale Gary,
Andrew Gerrard,
Gregory Hellbourg,
Katherine Herbert,
Jack Ireland,
Egor Illarionov
, et al. (16 additional authors not shown)
Abstract:
The authors of this white paper met on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, NJ, for a 2-day workshop that brought together a group of heliophysicists, data providers, expert modelers, and computer/data scientists. Their objective was to discuss critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, model…
▽ More
The authors of this white paper met on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, NJ, for a 2-day workshop that brought together a group of heliophysicists, data providers, expert modelers, and computer/data scientists. Their objective was to discuss critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, modeling and forecasting in Heliophysics, and to shape a strategy for further developments in the field. The workshop combined a set of plenary sessions featuring invited introductory talks interleaved with a set of open discussion sessions. The outcome of the discussion is encapsulated in this white paper that also features a top-level list of recommendations agreed by participants.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Estimation of Accurate and Calibrated Uncertainties in Deterministic models
Authors:
Enrico Camporeale,
Algo Carè
Abstract:
In this paper we focus on the problem of assigning uncertainties to single-point predictions generated by a deterministic model that outputs a continuous variable. This problem applies to any state-of-the-art physics or engineering models that have a computational cost that does not readily allow to run ensembles and to estimate the uncertainty associated to single-point predictions. Essentially,…
▽ More
In this paper we focus on the problem of assigning uncertainties to single-point predictions generated by a deterministic model that outputs a continuous variable. This problem applies to any state-of-the-art physics or engineering models that have a computational cost that does not readily allow to run ensembles and to estimate the uncertainty associated to single-point predictions. Essentially, we devise a method to easily transform a deterministic prediction into a probabilistic one. We show that for doing so, one has to compromise between the accuracy and the reliability (calibration) of such a probabilistic model. Hence, we introduce a cost function that encodes their trade-off. We use the Continuous Rank Probability Score to measure accuracy and we derive an analytic formula for the reliability, in the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The new Accuracy-Reliability cost function is then used to estimate the input-dependent variance, given a black-box mean function, by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should not only be accurate, but also reliable (i.e. statistical consistent with observations). Conversely, early works based on the minimization of classical cost functions, such as the negative log probability density, cannot simultaneously enforce both accuracy and reliability. We show several examples both with synthetic data, where the underlying hidden noise can accurately be recovered, and with large real-world datasets.
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
Accuracy-Reliability Cost Function for Empirical Variance Estimation
Authors:
Enrico Camporeale
Abstract:
In this paper we focus on the problem of assigning uncertainties to single-point predictions. We introduce a cost function that encodes the trade-off between accuracy and reliability in probabilistic forecast. We derive analytic formula for the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The Accuracy-Reliability cost function can be used to empiri…
▽ More
In this paper we focus on the problem of assigning uncertainties to single-point predictions. We introduce a cost function that encodes the trade-off between accuracy and reliability in probabilistic forecast. We derive analytic formula for the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The Accuracy-Reliability cost function can be used to empirically estimate the variance in heteroskedastic regression problems (input dependent noise), by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should be both accurate and reliable (i.e. statistical consistent with observations). We show several examples with synthetic data, where the underlying hidden noise function can be accurately recovered, both in one and multi-dimensional problems. The practical implementation of the method has been done using a Neural Network and, in the one-dimensional case, with a simple polynomial fit.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.