Search | arXiv e-print repository

arXiv:2212.13325 [pdf]

Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

Authors: R. M. McGranaghan, B. Thompson, E. Camporeale, J. Bortnik, M. Bobra, G. Lapenta, S. Wing, B. Poduval, S. Lotz, S. Murray, M. Kirk, T. Y. Chen, H. M. Bain, P. Riley, B. Tremblay, M. Cheung, V. Delouille

Abstract: Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires… ▽ More Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 4 pages; Heliophysics 2050 White Paper

arXiv:2108.06394 [pdf, other]

A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission

Authors: Carl Shneider, Andong Hu, Ajay K. Tiwari, Monica G. Bobra, Karl Battams, Jannis Teunissen, Enrico Camporeale

Abstract: We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of… ▽ More We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: under review

arXiv:2006.12224 [pdf, other]

Machine Learning in Heliophysics and Space Weather Forecasting: A White Paper of Findings and Recommendations

Authors: Gelu Nita, Manolis Georgoulis, Irina Kitiashvili, Viacheslav Sadykov, Enrico Camporeale, Alexander Kosovichev, Haimin Wang, Vincent Oria, Jason Wang, Rafal Angryk, Berkay Aydin, Azim Ahmadzadeh, Xiaoli Bai, Timothy Bastian, Soukaina Filali Boubrahimi, Bin Chen, Alisdair Davey, Sheldon Fereira, Gregory Fleishman, Dale Gary, Andrew Gerrard, Gregory Hellbourg, Katherine Herbert, Jack Ireland, Egor Illarionov , et al. (16 additional authors not shown)

Abstract: The authors of this white paper met on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, NJ, for a 2-day workshop that brought together a group of heliophysicists, data providers, expert modelers, and computer/data scientists. Their objective was to discuss critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, model… ▽ More The authors of this white paper met on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, NJ, for a 2-day workshop that brought together a group of heliophysicists, data providers, expert modelers, and computer/data scientists. Their objective was to discuss critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, modeling and forecasting in Heliophysics, and to shape a strategy for further developments in the field. The workshop combined a set of plenary sessions featuring invited introductory talks interleaved with a set of open discussion sessions. The outcome of the discussion is encapsulated in this white paper that also features a top-level list of recommendations agreed by participants. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: Workshop Report

arXiv:2003.05103 [pdf, other]

Estimation of Accurate and Calibrated Uncertainties in Deterministic models

Authors: Enrico Camporeale, Algo Carè

Abstract: In this paper we focus on the problem of assigning uncertainties to single-point predictions generated by a deterministic model that outputs a continuous variable. This problem applies to any state-of-the-art physics or engineering models that have a computational cost that does not readily allow to run ensembles and to estimate the uncertainty associated to single-point predictions. Essentially,… ▽ More In this paper we focus on the problem of assigning uncertainties to single-point predictions generated by a deterministic model that outputs a continuous variable. This problem applies to any state-of-the-art physics or engineering models that have a computational cost that does not readily allow to run ensembles and to estimate the uncertainty associated to single-point predictions. Essentially, we devise a method to easily transform a deterministic prediction into a probabilistic one. We show that for doing so, one has to compromise between the accuracy and the reliability (calibration) of such a probabilistic model. Hence, we introduce a cost function that encodes their trade-off. We use the Continuous Rank Probability Score to measure accuracy and we derive an analytic formula for the reliability, in the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The new Accuracy-Reliability cost function is then used to estimate the input-dependent variance, given a black-box mean function, by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should not only be accurate, but also reliable (i.e. statistical consistent with observations). Conversely, early works based on the minimization of classical cost functions, such as the negative log probability density, cannot simultaneously enforce both accuracy and reliability. We show several examples both with synthetic data, where the underlying hidden noise can accurately be recovered, and with large real-world datasets. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1803.04475

arXiv:1803.04475 [pdf, ps, other]

Accuracy-Reliability Cost Function for Empirical Variance Estimation

Authors: Enrico Camporeale

Abstract: In this paper we focus on the problem of assigning uncertainties to single-point predictions. We introduce a cost function that encodes the trade-off between accuracy and reliability in probabilistic forecast. We derive analytic formula for the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The Accuracy-Reliability cost function can be used to empiri… ▽ More In this paper we focus on the problem of assigning uncertainties to single-point predictions. We introduce a cost function that encodes the trade-off between accuracy and reliability in probabilistic forecast. We derive analytic formula for the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The Accuracy-Reliability cost function can be used to empirically estimate the variance in heteroskedastic regression problems (input dependent noise), by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should be both accurate and reliable (i.e. statistical consistent with observations). We show several examples with synthetic data, where the underlying hidden noise function can be accurately recovered, both in one and multi-dimensional problems. The practical implementation of the method has been done using a Neural Network and, in the one-dimensional case, with a simple polynomial fit. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: under review for ICML 2018

Showing 1–5 of 5 results for author: Camporeale, E