-
Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks
Authors:
Wenqian Chen,
Amanda A. Howard,
Panos Stinis
Abstract:
Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence…
▽ More
Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence speed of residuals at different training points, where the slowest convergence speed dominates the overall solution convergence. Based on these observations, we propose a point-wise adaptive weighting method that balances the residual decay rate across different training points. The performance of our proposed adaptive weighting method is compared with current state-of-the-art adaptive weighting methods on benchmark problems for both physics-informed neural networks and physics-informed deep operator networks. Through extensive numerical results we demonstrate that our proposed approach of balanced residual decay rates offers several advantages, including bounded weights, high prediction accuracy, fast convergence speed, low training uncertainty, low computational cost and ease of hyperparameter tuning.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems
Authors:
Amanda A. Howard,
Bruno Jacob,
Sarah H. Murphy,
Alexander Heinlein,
Panos Stinis
Abstract:
Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to…
▽ More
Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problems
Authors:
Alexander Heinlein,
Amanda A. Howard,
Damien Beecroft,
Panos Stinis
Abstract:
Multiscale problems are challenging for neural network-based discretizations of differential equations, such as physics-informed neural networks (PINNs). This can be (partly) attributed to the so-called spectral bias of neural networks. To improve the performance of PINNs for time-dependent problems, a combination of multifidelity stacking PINNs and domain decomposition-based finite basis PINNs is…
▽ More
Multiscale problems are challenging for neural network-based discretizations of differential equations, such as physics-informed neural networks (PINNs). This can be (partly) attributed to the so-called spectral bias of neural networks. To improve the performance of PINNs for time-dependent problems, a combination of multifidelity stacking PINNs and domain decomposition-based finite basis PINNs is employed. In particular, to learn the high-fidelity part of the multifidelity model, a domain decomposition in time is employed. The performance is investigated for a pendulum and a two-frequency problem as well as the Allen-Cahn equation. It can be observed that the domain decomposition approach clearly improves the PINN and stacking PINN approaches. Finally, it is demonstrated that the FBPINN approach can be extended to multifidelity physics-informed deep operator networks.
△ Less
Submitted 6 June, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Stacked networks improve physics-informed training: applications to neural networks and deep operator networks
Authors:
Amanda A Howard,
Sarah H Murphy,
Shady E Ahmed,
Panos Stinis
Abstract:
Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build…
▽ More
Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.
△ Less
Submitted 20 November, 2023; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Efficient kernel surrogates for neural network-based regression
Authors:
Saad Qadeer,
Andrew Engel,
Amanda Howard,
Adam Tsou,
Max Vargas,
Panos Stinis,
Tony Chiang
Abstract:
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialize…
▽ More
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and logistic regression classification, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription. We also show how our approach can be used to improve physics-informed operator network training for regression tasks as well as convolutional neural network training for vision classification tasks.
△ Less
Submitted 24 January, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Exploring Learned Representations of Neural Networks with Principal Component Analysis
Authors:
Amit Harlev,
Andrew Engel,
Panos Stinis,
Tony Chiang
Abstract:
Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We sh…
▽ More
Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Physics-informed machine learning of the correlation functions in bulk fluids
Authors:
Wenqian Chen,
Peiyuan Gao,
Panos Stinis
Abstract:
The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and…
▽ More
The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and high efficiency in solving the forward and inverse OZ problems of various bulk fluids. The results highlight the significant potential of physics-informed machine learning for applications in thermodynamic state theory.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Physics-informed machine learning of redox flow battery based on a two-dimensional unit cell model
Authors:
Wenqian Chen,
Yucheng Fu,
Panos Stinis
Abstract:
In this paper, we present a physics-informed neural network (PINN) approach for predicting the performance of an all-vanadium redox flow battery, with its physics constraints enforced by a two-dimensional (2D) mathematical model. The 2D model, which includes 6 governing equations and 24 boundary conditions, provides a detailed representation of the electrochemical reactions, mass transport and hyd…
▽ More
In this paper, we present a physics-informed neural network (PINN) approach for predicting the performance of an all-vanadium redox flow battery, with its physics constraints enforced by a two-dimensional (2D) mathematical model. The 2D model, which includes 6 governing equations and 24 boundary conditions, provides a detailed representation of the electrochemical reactions, mass transport and hydrodynamics occurring inside the redox flow battery. To solve the 2D model with the PINN approach, a composite neural network is employed to approximate species concentration and potentials; the input and output are normalized according to prior knowledge of the battery system; the governing equations and boundary conditions are first scaled to an order of magnitude around 1, and then further balanced with a self-weighting method. Our numerical results show that the PINN is able to predict cell voltage correctly, but the prediction of potentials shows a constant-like shift. To fix the shift, the PINN is enhanced by further constrains derived from the current collector boundary. Finally, we show that the enhanced PINN can be even further improved if a small number of labeled data is available.
△ Less
Submitted 7 September, 2023; v1 submitted 31 May, 2023;
originally announced June 2023.
-
A multifidelity approach to continual learning for physical systems
Authors:
Amanda Howard,
Yucheng Fu,
Panos Stinis
Abstract:
We introduce a novel continual learning method based on multifidelity deep neural networks. This method learns the correlation between the output of previously trained models and the desired output of the model on the current training dataset, limiting catastrophic forgetting. On its own the multifidelity continual learning method shows robust results that limit forgetting across several datasets.…
▽ More
We introduce a novel continual learning method based on multifidelity deep neural networks. This method learns the correlation between the output of previously trained models and the desired output of the model on the current training dataset, limiting catastrophic forgetting. On its own the multifidelity continual learning method shows robust results that limit forgetting across several datasets. Additionally, we show that the multifidelity method can be combined with existing continual learning methods, including replay and memory aware synapses, to further limit catastrophic forgetting. The proposed continual learning method is especially suited for physical problems where the data satisfy the same physical laws on each domain, or for physics-informed neural networks, because in these cases we expect there to be a strong correlation between the output of the previous model and the model on the current training domain.
△ Less
Submitted 9 February, 2024; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Feature-adjacent multi-fidelity physics-informed machine learning for partial differential equations
Authors:
Wenqian Chen,
Panos Stinis
Abstract:
Physics-informed neural networks have emerged as an alternative method for solving partial differential equations. However, for complex problems, the training of such networks can still require high-fidelity data which can be expensive to generate. To reduce or even eliminate the dependency on high-fidelity data, we propose a novel multi-fidelity architecture which is based on a feature space shar…
▽ More
Physics-informed neural networks have emerged as an alternative method for solving partial differential equations. However, for complex problems, the training of such networks can still require high-fidelity data which can be expensive to generate. To reduce or even eliminate the dependency on high-fidelity data, we propose a novel multi-fidelity architecture which is based on a feature space shared by the low- and high-fidelity solutions. In the feature space, the projections of the low-fidelity and high-fidelity solutions are adjacent by constraining their relative distance. The feature space is represented with an encoder and its mapping to the original solution space is effected through a decoder. The proposed multi-fidelity approach is validated on forward and inverse problems for steady and unsteady problems described by partial differential equations.
△ Less
Submitted 27 March, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
A Multifidelity deep operator network approach to closure for multiscale systems
Authors:
Shady E. Ahmed,
Panos Stinis
Abstract:
Projection-based reduced order models (PROMs) have shown promise in representing the behavior of multiscale systems using a small set of generalized (or latent) variables. Despite their success, PROMs can be susceptible to inaccuracies, even instabilities, due to the improper accounting of the interaction between the resolved and unresolved scales of the multiscale system (known as the closure pro…
▽ More
Projection-based reduced order models (PROMs) have shown promise in representing the behavior of multiscale systems using a small set of generalized (or latent) variables. Despite their success, PROMs can be susceptible to inaccuracies, even instabilities, due to the improper accounting of the interaction between the resolved and unresolved scales of the multiscale system (known as the closure problem). In the current work, we interpret closure as a multifidelity problem and use a multifidelity deep operator network (DeepONet) framework to address it. In addition, to enhance the stability and accuracy of the multifidelity-based closure, we employ the recently developed "in-the-loop" training approach from the literature on coupling physics and machine learning models. The resulting approach is tested on shock advection for the one-dimensional viscous Burgers equation and vortex merging using the two-dimensional Navier-Stokes equations. The numerical experiments show significant improvement of the predictive ability of the closure-corrected PROM over the un-corrected one both in the interpolative and the extrapolative regimes.
△ Less
Submitted 1 June, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
ViTO: Vision Transformer-Operator
Authors:
Oded Ovadia,
Adar Kahana,
Panos Stinis,
Eli Turkel,
George Em Karniadakis
Abstract:
We combine vision transformers with operator learning to solve diverse inverse problems described by partial differential equations (PDEs). Our approach, named ViTO, combines a U-Net based architecture with a vision transformer. We apply ViTO to solve inverse PDE problems of increasing complexity, namely for the wave equation, the Navier-Stokes equations and the Darcy equation. We focus on the mor…
▽ More
We combine vision transformers with operator learning to solve diverse inverse problems described by partial differential equations (PDEs). Our approach, named ViTO, combines a U-Net based architecture with a vision transformer. We apply ViTO to solve inverse PDE problems of increasing complexity, namely for the wave equation, the Navier-Stokes equations and the Darcy equation. We focus on the more challenging case of super-resolution, where the input dataset for the inverse problem is at a significantly coarser resolution than the output. The results we obtain are comparable or exceed the leading operator network benchmarks in terms of accuracy. Furthermore, ViTO`s architecture has a small number of trainable parameters (less than 10% of the leading competitor), resulting in a performance speed-up of over 5x when averaged over the various test cases.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
SDYN-GANs: Adversarial Learning Methods for Multistep Generative Models for General Order Stochastic Dynamics
Authors:
Panos Stinis,
Constantinos Daskalakis,
Paul J. Atzberger
Abstract:
We introduce adversarial learning methods for data-driven generative modeling of the dynamics of $n^{th}$-order stochastic systems. Our approach builds on Generative Adversarial Networks (GANs) with generative model classes based on stable $m$-step stochastic numerical integrators. We introduce different formulations and training methods for learning models of stochastic dynamics based on observat…
▽ More
We introduce adversarial learning methods for data-driven generative modeling of the dynamics of $n^{th}$-order stochastic systems. Our approach builds on Generative Adversarial Networks (GANs) with generative model classes based on stable $m$-step stochastic numerical integrators. We introduce different formulations and training methods for learning models of stochastic dynamics based on observation of trajectory samples. We develop approaches using discriminators based on Maximum Mean Discrepancy (MMD), training protocols using conditional and marginal distributions, and methods for learning dynamic responses over different time-scales. We show how our approaches can be used for modeling physical systems to learn force-laws, damping coefficients, and noise-related parameters. The adversarial learning approaches provide methods for obtaining stable generative models for dynamic tasks including long-time prediction and developing simulations for stochastic systems.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
A Hybrid Deep Neural Operator/Finite Element Method for Ice-Sheet Modeling
Authors:
QiZhi He,
Mauro Perego,
Amanda A. Howard,
George Em Karniadakis,
Panos Stinis
Abstract:
One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to a…
▽ More
One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to approximate existing ice sheet computational models at a fraction of their cost. Our approach consists of replacing the finite element model for the momentum equations for the ice velocity, the most expensive part of an ice sheet model, with a Deep Operator Network, while retaining a classic finite element discretization for the evolution of the ice thickness. We show that the resulting hybrid model is very accurate and it is an order of magnitude faster than the traditional finite element model. Further, a distinctive feature of the proposed model compared to other neural network approaches, is that it can handle high-dimensional parameter spaces (parameter fields) such as the basal friction at the bed of the glacier, and can therefore be used for generating samples for uncertainty quantification. We study the impact of hyper-parameters, number of unknowns and correlation length of the parameter distribution on the training and accuracy of the Deep Operator Network on a synthetic ice sheet model. We then target the evolution of the Humboldt glacier in Greenland and show that our hybrid model can provide accurate statistics of the glacier mass loss and can be effectively used to accelerate the quantification of uncertainty.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
SMS: Spiking Marching Scheme for Efficient Long Time Integration of Differential Equations
Authors:
Qian Zhang,
Adar Kahana,
George Em Karniadakis,
Panos Stinis
Abstract:
We propose a Spiking Neural Network (SNN)-based explicit numerical scheme for long time integration of time-dependent Ordinary and Partial Differential Equations (ODEs, PDEs). The core element of the method is a SNN, trained to use spike-encoded information about the solution at previous timesteps to predict spike-encoded information at the next timestep. After the network has been trained, it ope…
▽ More
We propose a Spiking Neural Network (SNN)-based explicit numerical scheme for long time integration of time-dependent Ordinary and Partial Differential Equations (ODEs, PDEs). The core element of the method is a SNN, trained to use spike-encoded information about the solution at previous timesteps to predict spike-encoded information at the next timestep. After the network has been trained, it operates as an explicit numerical scheme that can be used to compute the solution at future timesteps, given a spike-encoded initial condition. A decoder is used to transform the evolved spiking-encoded solution back to function values. We present results from numerical experiments of using the proposed method for ODEs and PDEs of varying complexity.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Multifidelity Deep Operator Networks For Data-Driven and Physics-Informed Problems
Authors:
Amanda A. Howard,
Mauro Perego,
George E. Karniadakis,
Panos Stinis
Abstract:
Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fid…
▽ More
Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity to accurately learn complex operators when sufficient high-fidelity data is not available. Additionally, we demonstrate that the presence of low-fidelity data can improve the predictions of physics-informed learning with DeepONets. We demonstrate the new multi-fidelity training in diverse examples, including modeling of the ice-sheet dynamics of the Humboldt glacier, Greenland, using two different fidelity models and also using the same physical model at two different resolutions.
△ Less
Submitted 21 November, 2023; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Enhanced physics-constrained deep neural networks for modeling vanadium redox flow battery
Authors:
QiZhi He,
Yucheng Fu,
Panos Stinis,
Alexandre Tartakovsky
Abstract:
Numerical modeling and simulation have become indispensable tools for advancing a comprehensive understanding of the underlying mechanisms and cost-effective process optimization and control of flow batteries. In this study, we propose an enhanced version of the physics-constrained deep neural network (PCDNN) approach [1] to provide high-accuracy voltage predictions in the vanadium redox flow batt…
▽ More
Numerical modeling and simulation have become indispensable tools for advancing a comprehensive understanding of the underlying mechanisms and cost-effective process optimization and control of flow batteries. In this study, we propose an enhanced version of the physics-constrained deep neural network (PCDNN) approach [1] to provide high-accuracy voltage predictions in the vanadium redox flow batteries (VRFBs). The purpose of the PCDNN approach is to enforce the physics-based zero-dimensional (0D) VRFB model in a neural network to assure model generalization for various battery operation conditions. Limited by the simplifications of the 0D model, the PCDNN cannot capture sharp voltage changes in the extreme SOC regions. To improve the accuracy of voltage prediction at extreme ranges, we introduce a second (enhanced) DNN to mitigate the prediction errors carried from the 0D model itself and call the resulting approach enhanced PCDNN (ePCDNN). By comparing the model prediction with experimental data, we demonstrate that the ePCDNN approach can accurately capture the voltage response throughout the charge--discharge cycle, including the tail region of the voltage discharge curve. Compared to the standard PCDNN, the prediction accuracy of the ePCDNN is significantly improved. The loss function for training the ePCDNN is designed to be flexible by adjusting the weights of the physics-constrained DNN and the enhanced DNN. This allows the ePCDNN framework to be transferable to battery systems with variable physical model fidelity.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Machine-learning custom-made basis functions for partial differential equations
Authors:
Brek Meuris,
Saad Qadeer,
Panos Stinis
Abstract:
Spectral methods are an important part of scientific computing's arsenal for solving partial differential equations (PDEs). However, their applicability and effectiveness depend crucially on the choice of basis functions used to expand the solution of a PDE. The last decade has seen the emergence of deep learning as a strong contender in providing efficient representations of complex functions. In…
▽ More
Spectral methods are an important part of scientific computing's arsenal for solving partial differential equations (PDEs). However, their applicability and effectiveness depend crucially on the choice of basis functions used to expand the solution of a PDE. The last decade has seen the emergence of deep learning as a strong contender in providing efficient representations of complex functions. In the current work, we present an approach for combining deep neural networks with spectral methods to solve PDEs. In particular, we use a deep learning technique known as the Deep Operator Network (DeepONet), to identify candidate functions on which to expand the solution of PDEs. We have devised an approach which uses the candidate functions provided by the DeepONet as a starting point to construct a set of functions which have the following properties: i) they constitute a basis, 2) they are orthonormal, and 3) they are hierarchical i.e., akin to Fourier series or orthogonal polynomials. We have exploited the favorable properties of our custom-made basis functions to both study their approximation capability and use them to expand the solution of linear and nonlinear time-dependent PDEs.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Structure-preserving Sparse Identification of Nonlinear Dynamics for Data-driven Modeling
Authors:
Kookjin Lee,
Nathaniel Trask,
Panos Stinis
Abstract:
Discovery of dynamical systems from data forms the foundation for data-driven modeling and recently, structure-preserving geometric perspectives have been shown to provide improved forecasting, stability, and physical realizability guarantees. We present here a unification of the Sparse Identification of Nonlinear Dynamics (SINDy) formalism with neural ordinary differential equations. The resultin…
▽ More
Discovery of dynamical systems from data forms the foundation for data-driven modeling and recently, structure-preserving geometric perspectives have been shown to provide improved forecasting, stability, and physical realizability guarantees. We present here a unification of the Sparse Identification of Nonlinear Dynamics (SINDy) formalism with neural ordinary differential equations. The resulting framework allows learning of both "black-box" dynamics and learning of structure preserving bracket formalisms for both reversible and irreversible dynamics. We present a suite of benchmarks demonstrating effectiveness and structure preservation, including for chaotic systems.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Machine learning structure preserving brackets for forecasting irreversible processes
Authors:
Kookjin Lee,
Nathaniel A. Trask,
Panos Stinis
Abstract:
Forecasting of time-series data requires imposition of inductive biases to obtain predictive extrapolation, and recent works have imposed Hamiltonian/Lagrangian form to preserve structure for systems with reversible dynamics. In this work we present a novel parameterization of dissipative brackets from metriplectic dynamical systems appropriate for learning irreversible dynamics with unknown a pri…
▽ More
Forecasting of time-series data requires imposition of inductive biases to obtain predictive extrapolation, and recent works have imposed Hamiltonian/Lagrangian form to preserve structure for systems with reversible dynamics. In this work we present a novel parameterization of dissipative brackets from metriplectic dynamical systems appropriate for learning irreversible dynamics with unknown a priori model form. The process learns generalized Casimirs for energy and entropy guaranteed to be conserved and nondecreasing, respectively. Furthermore, for the case of added thermal noise, we guarantee exact preservation of a fluctuation-dissipation theorem, ensuring thermodynamic consistency. We provide benchmarks for dissipative systems demonstrating learned dynamics are more robust and generalize better than either "black-box" or penalty-based approaches.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Physics-constrained deep neural network method for estimating parameters in a redox flow battery
Authors:
QiZhi He,
Panos Stinis,
Alexandre Tartakovsky
Abstract:
In this paper, we present a physics-constrained deep neural network (PCDNN) method for parameter estimation in the zero-dimensional (0D) model of the vanadium redox flow battery (VRFB). In this approach, we use deep neural networks (DNNs) to approximate the model parameters as functions of the operating conditions. This method allows the integration of the VRFB computational models as the physical…
▽ More
In this paper, we present a physics-constrained deep neural network (PCDNN) method for parameter estimation in the zero-dimensional (0D) model of the vanadium redox flow battery (VRFB). In this approach, we use deep neural networks (DNNs) to approximate the model parameters as functions of the operating conditions. This method allows the integration of the VRFB computational models as the physical constraints in the parameter learning process, leading to enhanced accuracy of parameter estimation and cell voltage prediction. Using an experimental dataset, we demonstrate that the PCDNN method can estimate model parameters for a range of operating conditions and improve the 0D model prediction of voltage compared to the 0D model prediction with constant operation-condition-independent parameters estimated with traditional inverse methods. We also demonstrate that the PCDNN approach has an improved generalization ability for estimating parameter values for operating conditions not used in the DNN training.
△ Less
Submitted 4 March, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning
Authors:
Panos Stinis
Abstract:
We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of the dynamical system. We present a collection of results on how to enforce constraints coming from the dynamical system in order to accelerate the training of deep neural networks to represent the flow map of the system as well as increase their predictive ability. In particular, we p…
▽ More
We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of the dynamical system. We present a collection of results on how to enforce constraints coming from the dynamical system in order to accelerate the training of deep neural networks to represent the flow map of the system as well as increase their predictive ability. In particular, we provide ways to enforce constraints during training for all three major modes of learning, namely supervised, unsupervised and reinforcement learning. In general, the dynamic constraints need to include terms which are analogous to memory terms in model reduction formalisms. Such memory terms act as a restoring force which corrects the errors committed by the learned flow map during prediction.
For supervised learning, the constraints are added to the objective function. For the case of unsupervised learning, in particular generative adversarial networks, the constraints are introduced by augmenting the input of the discriminator. Finally, for the case of reinforcement learning and in particular actor-critic methods, the constraints are added to the reward function. In addition, for the reinforcement learning case, we present a novel approach based on homotopy of the action-value function in order to stabilize and accelerate training. We use numerical results for the Lorenz system to illustrate the various constructions.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
A comparative study of physics-informed neural network models for learning unknown dynamics and constitutive relations
Authors:
Ramakrishna Tipireddy,
Paris Perdikaris,
Panos Stinis,
Alexandre Tartakovsky
Abstract:
We investigate the use of discrete and continuous versions of physics-informed neural network methods for learning unknown dynamics or constitutive relations of a dynamical system. For the case of unknown dynamics, we represent all the dynamics with a deep neural network (DNN). When the dynamics of the system are known up to the specification of constitutive relations (that can depend on the state…
▽ More
We investigate the use of discrete and continuous versions of physics-informed neural network methods for learning unknown dynamics or constitutive relations of a dynamical system. For the case of unknown dynamics, we represent all the dynamics with a deep neural network (DNN). When the dynamics of the system are known up to the specification of constitutive relations (that can depend on the state of the system), we represent these constitutive relations with a DNN. The discrete versions combine classical multistep discretization methods for dynamical systems with neural network based machine learning methods. On the other hand, the continuous versions utilize deep neural networks to minimize the residual function for the continuous governing equations. We use the case of a fedbatch bioreactor system to study the effectiveness of these approaches and discuss conditions for their applicability. Our results indicate that the accuracy of the trained neural network models is much higher for the cases where we only have to learn a constitutive relation instead of the whole dynamics. This finding corroborates the well-known fact from scientific computing that building as much structural information is available into an algorithm can enhance its efficiency and/or accuracy.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Doing the impossible: Why neural networks can be trained at all
Authors:
Nathan O. Hodas,
Panos Stinis
Abstract:
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be l…
▽ More
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights.
△ Less
Submitted 27 May, 2018; v1 submitted 13 May, 2018;
originally announced May 2018.
-
Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks
Authors:
Panos Stinis,
Tobias Hagge,
Alexandre M. Tartakovsky,
Enoch Yeung
Abstract:
We suggest ways to enforce given constraints in the output of a Generative Adversarial Network (GAN) generator both for interpolation and extrapolation (prediction). For the case of dynamical systems, given a time series, we wish to train GAN generators that can be used to predict trajectories starting from a given initial condition. In this setting, the constraints can be in algebraic and/or diff…
▽ More
We suggest ways to enforce given constraints in the output of a Generative Adversarial Network (GAN) generator both for interpolation and extrapolation (prediction). For the case of dynamical systems, given a time series, we wish to train GAN generators that can be used to predict trajectories starting from a given initial condition. In this setting, the constraints can be in algebraic and/or differential form. Even though we are predominantly interested in the case of extrapolation, we will see that the tasks of interpolation and extrapolation are related. However, they need to be treated differently.
For the case of interpolation, the incorporation of constraints is built into the training of the GAN. The incorporation of the constraints respects the primary game-theoretic setup of a GAN so it can be combined with existing algorithms. However, it can exacerbate the problem of instability during training that is well-known for GANs. We suggest adding small noise to the constraints as a simple remedy that has performed well in our numerical experiments.
The case of extrapolation (prediction) is more involved. During training, the GAN generator learns to interpolate a noisy version of the data and we enforce the constraints. This approach has connections with model reduction that we can utilize to improve the efficiency and accuracy of the training. Depending on the form of the constraints, we may enforce them also during prediction through a projection step. We provide examples of linear and nonlinear systems of differential equations to illustrate the various constructions.
△ Less
Submitted 19 June, 2019; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Solving differential equations with unknown constitutive relations as recurrent neural networks
Authors:
Tobias Hagge,
Panos Stinis,
Enoch Yeung,
Alexandre M. Tartakovsky
Abstract:
We solve a system of ordinary differential equations with an unknown functional form of a sink (reaction rate) term. We assume that the measurements (time series) of state variables are partially available, and we use recurrent neural network to "learn" the reaction rate from this data. This is achieved by including a discretized ordinary differential equations as part of a recurrent neural networ…
▽ More
We solve a system of ordinary differential equations with an unknown functional form of a sink (reaction rate) term. We assume that the measurements (time series) of state variables are partially available, and we use recurrent neural network to "learn" the reaction rate from this data. This is achieved by including a discretized ordinary differential equations as part of a recurrent neural network training problem. We extend TensorFlow's recurrent neural network architecture to create a simple but scalable and effective solver for the unknown functions, and apply it to a fedbatch bioreactor simulation problem. Use of techniques from recent deep learning literature enables training of functions with behavior manifesting over thousands of time steps. Our networks are structurally similar to recurrent neural networks, but differences in design and function require modifications to the conventional wisdom about training such networks.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.