-
On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed
Authors:
Hamid A. Pahlavan,
Pedram Hassanzadeh,
M. Joan Alexander
Abstract:
Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate phy…
▽ More
Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially $\textit{non-local}$ dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the concept of receptive field (RF) can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles ($\mathbf{R^2 \approx 0.99}$) cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. We examine three broad classes of architectures, namely convolutional NNs, Fourier neural operators, and fully-connected NNs; the latter two have inherently large RFs. We also demonstrate that learning non-local dynamics is crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the necessity of integrating ML theory with physics to design and analyze data-driven algorithms for weather and climate modeling.
△ Less
Submitted 15 July, 2024; v1 submitted 6 July, 2024;
originally announced July 2024.
-
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Authors:
JoonHo Lee,
Jae Oh Woo,
Juree Seok,
Parisa Hassanzadeh,
Wooseok Jang,
JuYoun Son,
Sima Didari,
Baruch Gutow,
Heng Hao,
Hankyu Moon,
Wenjun Hu,
Yeong-Dae Kwon,
Taehee Lee,
Seungjai Min
Abstract:
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t…
▽ More
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.
△ Less
Submitted 19 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Extreme Event Prediction with Multi-agent Reinforcement Learning-based Parametrization of Atmospheric and Oceanic Turbulence
Authors:
Rambod Mojgani,
Daniel Waelchli,
Yifei Guan,
Petros Koumoutsakos,
Pedram Hassanzadeh
Abstract:
Global climate models (GCMs) are the main tools for understanding and predicting climate change. However, due to limited numerical resolutions, these models suffer from major structural uncertainties; e.g., they cannot resolve critical processes such as small-scale eddies in atmospheric and oceanic turbulence. Thus, such small-scale processes have to be represented as a function of the resolved sc…
▽ More
Global climate models (GCMs) are the main tools for understanding and predicting climate change. However, due to limited numerical resolutions, these models suffer from major structural uncertainties; e.g., they cannot resolve critical processes such as small-scale eddies in atmospheric and oceanic turbulence. Thus, such small-scale processes have to be represented as a function of the resolved scales via closures (parametrization). The accuracy of these closures is particularly important for capturing climate extremes. Traditionally, such closures are based on heuristics and simplifying assumptions about the unresolved physics. Recently, supervised-learned closures, trained offline on high-fidelity data, have been shown to outperform the classical physics-based closures. However, this approach requires a significant amount of high-fidelity training data and can also lead to instabilities. Reinforcement learning is emerging as a potent alternative for developing such closures as it requires only low-order statistics and leads to stable closures. In Scientific Multi-Agent Reinforcement Learning (SMARL) computational elements serve a dual role of discretization points and learning agents. We leverage SMARL and fundamentals of turbulence physics to learn closures for prototypes of atmospheric and oceanic turbulence. The policy is trained using only the enstrophy spectrum, which is nearly invariant and can be estimated from a few high-fidelity samples (these few samples are far from enough for supervised/offline learning). We show that these closures lead to stable low-resolution simulations that, at a fraction of the cost, can reproduce the high-fidelity simulations' statistics, including the tails of the probability density functions. The results demonstrate the high potential of SMARL for closure modeling for GCMs, especially in the regime of scarce data and indirect observations.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Learning Payment-Free Resource Allocation Mechanisms
Authors:
Sihan Zeng,
Sujay Bhatt,
Eleonora Kreacic,
Parisa Hassanzadeh,
Alec Koppel,
Sumitra Ganesh
Abstract:
We consider the design of mechanisms that allocate limited resources among self-interested agents using neural networks. Unlike the recent works that leverage machine learning for revenue maximization in auctions, we consider welfare maximization as the key objective in the payment-free setting. Without payment exchange, it is unclear how we can align agents' incentives to achieve the desired obje…
▽ More
We consider the design of mechanisms that allocate limited resources among self-interested agents using neural networks. Unlike the recent works that leverage machine learning for revenue maximization in auctions, we consider welfare maximization as the key objective in the payment-free setting. Without payment exchange, it is unclear how we can align agents' incentives to achieve the desired objectives of truthfulness and social welfare simultaneously, without resorting to approximations. Our work makes novel contributions by designing an approximate mechanism that desirably trade-off social welfare with truthfulness. Specifically, (i) we contribute a new end-to-end neural network architecture, ExS-Net, that accommodates the idea of "money-burning" for mechanism design without payments; (ii)~we provide a generalization bound that guarantees the mechanism performance when trained under finite samples; and (iii) we provide an experimental demonstration of the merits of the proposed mechanism.
△ Less
Submitted 14 August, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Learning Closed-form Equations for Subgrid-scale Closures from High-fidelity Data: Promises and Challenges
Authors:
Karan Jakhar,
Yifei Guan,
Rambod Mojgani,
Ashesh Chattopadhyay,
Pedram Hassanzadeh
Abstract:
There is growing interest in discovering interpretable, closed-form equations for subgrid-scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation-discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh-Bénard convection (RBC). Across common filters (e.g., Gaussian,…
▽ More
There is growing interest in discovering interpretable, closed-form equations for subgrid-scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation-discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh-Bénard convection (RBC). Across common filters (e.g., Gaussian, box), we robustly discover closures of the same form for momentum and heat fluxes. These closures depend on nonlinear combinations of gradients of filtered variables, with constants that are independent of the fluid/flow properties and only depend on filter type/size. We show that these closures are the nonlinear gradient model (NGM), which is derivable analytically using Taylor-series. Indeed, we suggest that with common (physics-free) equation-discovery algorithms, for many common systems/physics, discovered closures are consistent with the leading term of the Taylor-series (except when cutoff filters are used). Like previous studies, we find that large-eddy simulations with NGM closures are unstable, despite significant similarities between the true and NGM-predicted fluxes (correlations $> 0.95$). We identify two shortcomings as reasons for these instabilities: in 2D, NGM produces zero kinetic energy transfer between resolved and subgrid scales, lacking both diffusion and backscattering. In RBC, potential energy backscattering is poorly predicted. Moreover, we show that SGS fluxes diagnosed from data, presumed the ''truth'' for discovery, depend on filtering procedures and are not unique. Accordingly, to learn accurate, stable closures in future work, we propose several ideas around using physics-informed libraries, loss functions, and metrics. These findings are relevant to closure modeling of any multi-scale system.
△ Less
Submitted 7 July, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Long-term instabilities of deep learning-based digital twins of the climate system: The cause and a solution
Authors:
Ashesh Chattopadhyay,
Pedram Hassanzadeh
Abstract:
Long-term stability is a critical property for deep learning-based data-driven digital twins of the Earth system. Such data-driven digital twins enable sub-seasonal and seasonal predictions of extreme environmental events, probabilistic forecasts, that require a large number of ensemble members, and computationally tractable high-resolution Earth system models where expensive components of the mod…
▽ More
Long-term stability is a critical property for deep learning-based data-driven digital twins of the Earth system. Such data-driven digital twins enable sub-seasonal and seasonal predictions of extreme environmental events, probabilistic forecasts, that require a large number of ensemble members, and computationally tractable high-resolution Earth system models where expensive components of the models can be replaced with cheaper data-driven surrogates. Owing to computational cost, physics-based digital twins, though long-term stable, are intractable for real-time decision-making. Data-driven digital twins offer a cheaper alternative to them and can provide real-time predictions. However, such digital twins can only provide short-term forecasts accurately since they become unstable when time-integrated beyond 20 days. Currently, the cause of the instabilities is unknown, and the methods that are used to improve their stability horizons are ad-hoc and lack rigorous theory. In this paper, we reveal that the universal causal mechanism for these instabilities in any turbulent flow is due to \textit{spectral bias} wherein, \textit{any} deep learning architecture is biased to learn only the large-scale dynamics and ignores the small scales completely. We further elucidate how turbulence physics and the absence of convergence in deep learning-based time-integrators amplify this bias leading to unstable error propagation. Finally, using the quasigeostrophic flow and ECMWF Reanalysis data as test cases, we bridge the gap between deep learning theory and fundamental numerical analysis to propose one mitigative solution to such instabilities. We develop long-term stable data-driven digital twins for the climate system and demonstrate accurate short-term forecasts, and hundreds of years of long-term stable time-integration with accurate mean and variability.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Sequential Fair Resource Allocation under a Markov Decision Process Framework
Authors:
Parisa Hassanzadeh,
Eleonora Kreacic,
Sihan Zeng,
Yuchen Xiao,
Sumitra Ganesh
Abstract:
We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the…
▽ More
We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the problem as a discrete time Markov decision process (MDP). We propose a new algorithm, SAFFE, that makes fair allocations with respect to the entire demands revealed over the horizon by accounting for expected future demands at each arrival time. The algorithm introduces regularization which enables the prioritization of current revealed demands over future potential demands depending on the uncertainty in agents' future demands. Using the MDP formulation, we show that SAFFE optimizes allocations based on an upper bound on the Nash Social Welfare fairness objective, and we bound its gap to optimality with the use of concentration bounds on total future demands. Using synthetic and real data, we compare the performance of SAFFE against existing approaches and a reinforcement learning policy trained on the MDP. We show that SAFFE leads to more fair and efficient allocations and achieves close-to-optimal performance in settings with dense arrivals.
△ Less
Submitted 16 June, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems
Authors:
Yanchao Sun,
Ruijie Zheng,
Parisa Hassanzadeh,
Yongyuan Liang,
Soheil Feizi,
Sumitra Ganesh,
Furong Huang
Abstract:
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are…
▽ More
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $C<\frac{N-1}{2}$ agents to a victim agent. For this strong threat model, we propose a certifiable defense by constructing a message-ensemble policy that aggregates multiple randomly ablated message sets. Theoretical analysis shows that this message-ensemble policy can utilize benign communication while being certifiably robust to adversarial communication, regardless of the attacking algorithm. Experiments in multiple environments verify that our defense significantly improves the robustness of trained policies against various types of attacks.
△ Less
Submitted 2 July, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Deep learning-enhanced ensemble-based data assimilation for high-dimensional nonlinear dynamical systems
Authors:
Ashesh Chattopadhyay,
Ebrahim Nabizadeh,
Eviatar Bach,
Pedram Hassanzadeh
Abstract:
Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. Howev…
▽ More
Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. However, EnKF requires evolving large ensembles of forecasts using the dynamical model of the system. This often becomes computationally intractable, especially when the number of states of the system is very large, e.g., for weather prediction. With small ensembles, the estimated background error covariance matrix in the EnKF algorithm suffers from sampling error, leading to an erroneous estimate of the analysis state (initial condition for the next forecast cycle). In this work, we propose hybrid ensemble Kalman filter (H-EnKF), which is applied to a two-layer quasi-geostrophic flow system as a test case. This framework utilizes a pre-trained deep learning-based data-driven surrogate that inexpensively generates and evolves a large data-driven ensemble of the states of the system to accurately compute the background error covariance matrix with less sampling error. The H-EnKF framework estimates a better initial condition without the need for any ad-hoc localization strategies. H-EnKF can be extended to any ensemble-based DA algorithm, e.g., particle filters, which are currently difficult to use for high dimensional systems.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Explaining the physics of transfer learning a data-driven subgrid-scale closure to a different turbulent flow
Authors:
Adam Subel,
Yifei Guan,
Ashesh Chattopadhyay,
Pedram Hassanzadeh
Abstract:
Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs), such as weather/climate prediction and turbulence modeling. TL enables out-of-distribution generalization (e.g., extrapolation in parameters) and effective blending of disparate training sets (e.g., simulations and observations). In TL, selected layers of a NN, already trained for a base system,…
▽ More
Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs), such as weather/climate prediction and turbulence modeling. TL enables out-of-distribution generalization (e.g., extrapolation in parameters) and effective blending of disparate training sets (e.g., simulations and observations). In TL, selected layers of a NN, already trained for a base system, are re-trained using a small dataset from a target system. For effective TL, we need to know 1) what are the best layers to re-train? and 2) what physics are learned during TL? Here, we present novel analyses and a new framework to address (1)-(2) for a broad range of multi-scale, nonlinear systems. Our approach combines spectral analyses of the systems' data with spectral analyses of convolutional NN's activations and kernels, explaining the inner-workings of TL in terms of the system's nonlinear physics. Using subgrid-scale modeling of several setups of 2D turbulence as test cases, we show that the learned kernels are combinations of low-, band-, and high-pass filters, and that TL learns new filters whose nature is consistent with the spectral differences of base and target systems. We also find the shallowest layers are the best to re-train in these cases, which is against the common wisdom guiding TL in machine learning literature. Our framework identifies the best layer(s) to re-train beforehand, based on physics and NN theory. Together, these analyses explain the physics learned in TL and provide a framework to guide TL for wide-ranging applications in science and engineering, such as climate change modeling.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Generative Models with Information-Theoretic Protection Against Membership Inference Attacks
Authors:
Parisa Hassanzadeh,
Robert E. Tillman
Abstract:
Deep generative models, such as Generative Adversarial Networks (GANs), synthesize diverse high-fidelity data samples by estimating the underlying distribution of high dimensional data. Despite their success, GANs may disclose private information from the data they are trained on, making them susceptible to adversarial attacks such as membership inference attacks, in which an adversary aims to det…
▽ More
Deep generative models, such as Generative Adversarial Networks (GANs), synthesize diverse high-fidelity data samples by estimating the underlying distribution of high dimensional data. Despite their success, GANs may disclose private information from the data they are trained on, making them susceptible to adversarial attacks such as membership inference attacks, in which an adversary aims to determine if a record was part of the training set. We propose an information theoretically motivated regularization term that prevents the generative model from overfitting to training data and encourages generalizability. We show that this penalty minimizes the JensenShannon divergence between components of the generator trained on data with different membership, and that it can be implemented at low cost using an additional classifier. Our experiments on image datasets demonstrate that with the proposed regularization, which comes at only a small added computational cost, GANs are able to preserve privacy and generate high-quality samples that achieve better downstream classification performance compared to non-private and differentially private generative models.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Long-term stability and generalization of observationally-constrained stochastic data-driven models for geophysical turbulence
Authors:
Ashesh Chattopadhyay,
Jaideep Pathak,
Ebrahim Nabizadeh,
Wahid Bhimji,
Pedram Hassanzadeh
Abstract:
Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lo…
▽ More
Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lot of training data which may not be available from reanalysis (observational data) products. Moreover, an accurate, noise-free, initial condition to start forecasting with a data-driven weather model is not available in realistic scenarios. Finally, deterministic data-driven forecasting models suffer from issues with long-term stability and unphysical climate drift, which makes these data-driven models unsuitable for computing climate statistics. Given these challenges, previous studies have tried to pre-train deep learning-based weather forecasting models on a large amount of imperfect long-term climate model simulations and then re-train them on available observational data. In this paper, we propose a convolutional variational autoencoder-based stochastic data-driven model that is pre-trained on an imperfect climate model simulation from a 2-layer quasi-geostrophic flow and re-trained, using transfer learning, on a small number of noisy observations from a perfect simulation. This re-trained model then performs stochastic forecasting with a noisy initial condition sampled from the perfect simulation. We show that our ensemble-based stochastic data-driven model outperforms a baseline deterministic encoder-decoder-based convolutional model in terms of short-term skills while remaining stable for long-term climate simulations yielding accurate climatology.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks
Authors:
Rambod Mojgani,
Maciej Balajewicz,
Pedram Hassanzadeh
Abstract:
Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) pe…
▽ More
Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction
Authors:
Marc Rigter,
Danial Dervovic,
Parisa Hassanzadeh,
Jason Long,
Parisa Zehtabi,
Daniele Magazzeni
Abstract:
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective i…
▽ More
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We formulate the problem as a discrete time Markov Decision Process (MDP) with a hybrid state space. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly. Moreover, we prove that as the time step is reduced, the discrete time solution approaches the optimal solution to the original continuous time problem. To improve the scalability of our approach to a greater number of task classes, we present an approximation based on state abstraction. We validate our approach on synthetic data, as well as a real financial fraud data set, which is the motivating application for this work.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
Authors:
Jaideep Pathak,
Shashank Subramanian,
Peter Harrington,
Sanjeev Raja,
Ashesh Chattopadhyay,
Morteza Mardani,
Thorsten Kurth,
David Hall,
Zongyi Li,
Kamyar Azizzadenesheli,
Pedram Hassanzadeh,
Karthik Kashinath,
Animashree Anandkumar
Abstract:
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning win…
▽ More
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Tradeoffs in Streaming Binary Classification under Limited Inspection Resources
Authors:
Parisa Hassanzadeh,
Danial Dervovic,
Samuel Assefa,
Prashant Reddy,
Manuela Veloso
Abstract:
Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imba…
▽ More
Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected. We model the event arrivals as a non-homogeneous Poisson process, and compare various suspicious event selection methods including those based on static and adaptive thresholds. For each method, we analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity as a function of the data class imbalance and the classifier confidence score densities. We implement the selection methods on a real public fraud detection dataset and compare the empirical results with analytical bounds. Finally, we investigate how class imbalance and the choice of classifier impact the tradeoff.
△ Less
Submitted 29 October, 2021; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Non-Parametric Stochastic Sequential Assignment With Random Arrival Times
Authors:
Danial Dervovic,
Parisa Hassanzadeh,
Samuel Assefa,
Prashant Reddy
Abstract:
We consider a problem wherein jobs arrive at random times and assume random values. Upon each job arrival, the decision-maker must decide immediately whether or not to accept the job and gain the value on offer as a reward, with the constraint that they may only accept at most $n$ jobs over some reference time period. The decision-maker only has access to $M$ independent realisations of the job ar…
▽ More
We consider a problem wherein jobs arrive at random times and assume random values. Upon each job arrival, the decision-maker must decide immediately whether or not to accept the job and gain the value on offer as a reward, with the constraint that they may only accept at most $n$ jobs over some reference time period. The decision-maker only has access to $M$ independent realisations of the job arrival process. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. Moreover, we prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large. We demonstrate the effectiveness of the algorithm empirically on synthetic data and on public fraud-detection datasets, from where the motivation for this work is derived.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Multi-point Coordination in Massive MIMO Systems with Sectorized Antennas
Authors:
Shahram Shahsavari,
Mehrdad Nosrati,
Parisa Hassanzadeh,
Alexei Ashikhmin,
Thomas L. Marzetta,
Elza Erkip
Abstract:
Non-cooperative cellular massive MIMO, combined with power control, is known to lead to significant improvements in per-user throughput compared with conventional LTE technology. In this paper, we investigate further refinements to massive MIMO, first, in the form of three-fold sectorization, and second, coordinated multi-point operation (with and without sectorization), in which the three base st…
▽ More
Non-cooperative cellular massive MIMO, combined with power control, is known to lead to significant improvements in per-user throughput compared with conventional LTE technology. In this paper, we investigate further refinements to massive MIMO, first, in the form of three-fold sectorization, and second, coordinated multi-point operation (with and without sectorization), in which the three base stations cooperate in the joint service of their users. For these scenarios, we analyze the downlink performance for both maximum-ratio and zero-forcing precoding and derive closed-form lower-bound expressions on the achievable rate of the users. These expressions are then used to formulate power optimization problems with two throughput fairness criteria: i) network-wide max-min fairness, and ii) per-cell max-min fairness. Furthermore, we provide centralized and decentralized power control strategies to optimize the transmit powers in the network. We demonstrate that employing sectorized antenna elements mitigates the detrimental effects of pilot contamination by rejecting a portion of interfering pilots in the spatial domain during channel estimation phase. Simulation results with practical sectorized antennas reveal that sectorization and multi-point coordination combined with sectorization lead to more than 1.7x and 2.6x improvements in the 95%-likely per-user throughput, respectively.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving deep spatial transformers
Authors:
Ashesh Chattopadhyay,
Mustafa Mustafa,
Pedram Hassanzadeh,
Eviatar Bach,
Karthik Kashinath
Abstract:
There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent sp…
▽ More
There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent space of the U-NETs to preserve a property called equivariance, which is related to correctly capturing rotations and scalings of features in spatio-temporal data, 2) a data-assimilation (DA) algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, which combines forecasts from DDWP models with different time steps through DA, improving the accuracy of forecasts at short intervals. To show the benefit/feasibility of each component, we use geopotential height at 500~hPa (Z500) from ERA5 reanalysis and examine the short-term forecast accuracy of specific setups of the DDWP framework. Results show that the equivariance-preserving networks (U-STNs) clearly outperform the U-NETs, for example improving the forecast skill by $45\%$. Using a sigma-point ensemble Kalman (SPEnKF) algorithm for DA and U-STN as the forward model, we show that stable, accurate DA cycles are achieved even with high observation noise. The DDWP+DA framework substantially benefits from large ($O(1000)$) ensembles that are inexpensively generated with the data-driven forward model in each DA cycle. The multi-time-step DDWP+DA framework also shows promises, e.g., it reduces the average error by factors of 2-3.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Analog forecasting of extreme-causing weather patterns using deep learning
Authors:
Ashesh Chattopadhyay,
Ebrahim Nabizadeh,
Pedram Hassanzadeh
Abstract:
Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labelin…
▽ More
Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labeling strategy. CapsNets are trained on mid-tropospheric large-scale circulation patterns (Z500) labeled $0-4$ depending on the existence and geographical region of surface temperature extremes over North America several days ahead. The trained networks predict the occurrence/region of cold or heat waves, only using Z500, with accuracies (recalls) of $69\%-45\%$ $(77\%-48\%)$ or $62\%-41\%$ $(73\%-47\%)$ $1-5$ days ahead. CapsNets outperform simpler techniques such as convolutional neural networks and logistic regression. Using both temperature and Z500, accuracies (recalls) with CapsNets increase to $\sim 80\%$ $(88\%)$, showing the promises of multi-modal data-driven frameworks for accurate/fast extreme weather predictions, which can augment NWP efforts in providing early warnings.
△ Less
Submitted 12 January, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Centralized Caching and Delivery of Correlated Contents over Gaussian Broadcast Channels
Authors:
Qianqian Yang,
Parisa Hassanzadeh,
Deniz Gündüz,
Elza Erkip
Abstract:
Content delivery in a multi-user cache-aided broadcast network is studied, where a server holding a database of correlated contents communicates with the users over a Gaussian broadcast channel (BC). The minimum transmission power required to satisfy all possible demand combinations is studied, when the users are equipped with caches of equal size. Assuming uncoded cache placement, a lower bound o…
▽ More
Content delivery in a multi-user cache-aided broadcast network is studied, where a server holding a database of correlated contents communicates with the users over a Gaussian broadcast channel (BC). The minimum transmission power required to satisfy all possible demand combinations is studied, when the users are equipped with caches of equal size. Assuming uncoded cache placement, a lower bound on the required transmit power as a function of the cache capacity is derived. An achievable centralized caching scheme is proposed, which not only utilizes the user's local caches, but also exploits the correlation among the contents in the database. The performance of the scheme, which provides an upper bound on the required transmit power for a given cache capacity, is characterized. Our results indicate that exploiting the correlations among the contents in a cache-aided Gaussain BC can provide significant energy savings.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM
Authors:
Ashesh Chattopadhyay,
Pedram Hassanzadeh,
Devika Subramanian
Abstract:
In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM…
▽ More
In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed.
△ Less
Submitted 5 December, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Rate-Distortion-Memory Trade-offs in Heterogeneous Caching Networks
Authors:
Parisa Hassanzadeh,
Antonia M. Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
Caching at the wireless edge can be used to keep up with the increasing demand for high-definition wireless video streaming. By prefetching popular content into memory at wireless access points or end-user devices, requests can be served locally, relieving strain on expensive backhaul. In addition, using network coding allows the simultaneous serving of distinct cache misses via common coded multi…
▽ More
Caching at the wireless edge can be used to keep up with the increasing demand for high-definition wireless video streaming. By prefetching popular content into memory at wireless access points or end-user devices, requests can be served locally, relieving strain on expensive backhaul. In addition, using network coding allows the simultaneous serving of distinct cache misses via common coded multicast transmissions, resulting in significantly larger load reductions compared to those achieved with traditional delivery schemes. Most prior works simply treat video content as fixed-size files that users would like to fully download. This work is motivated by the fact that video can be coded in a scalable fashion and that the decoded video quality depends on the number of layers a user receives in sequence. Using a Gaussian source model, caching and coded delivery methods are designed to minimize the squared error distortion at end-user devices in a rate-limited caching network. The framework is very general and accounts for heterogeneous cache sizes, video popularities and user-file play-back qualities. As part of the solution, a new decentralized scheme for lossy cache-aided delivery subject to preset user distortion targets is proposed, which further generalizes prior literature to a setting with file heterogeneity.
△ Less
Submitted 1 December, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
A test case for application of convolutional neural networks to spatio-temporal climate data: Re-identifying clustered weather patterns
Authors:
Ashesh Chattopadhyay,
Pedram Hassanzadeh,
Saba Pasha
Abstract:
Convolutional neural networks (CNNs) can potentially provide powerful tools for classifying and identifying patterns in climate and environmental data. However, because of the inherent complexities of such data, which are often spatio-temporal, chaotic, and non-stationary, the CNN algorithms must be designed/evaluated for each specific dataset and application. Yet to start, CNN, a supervised techn…
▽ More
Convolutional neural networks (CNNs) can potentially provide powerful tools for classifying and identifying patterns in climate and environmental data. However, because of the inherent complexities of such data, which are often spatio-temporal, chaotic, and non-stationary, the CNN algorithms must be designed/evaluated for each specific dataset and application. Yet to start, CNN, a supervised technique, requires a large labeled dataset. Labeling demands (human) expert time, which combined with the limited number of relevant examples in this area, can discourage using CNNs for new problems. To address these challenges, here we (1) Propose an effective auto-labeling strategy based on using an unsupervised clustering algorithm and evaluating the performance of CNNs in re-identifying these clusters; (2) Use this approach to label thousands of daily large-scale weather patterns over North America in the outputs of a fully-coupled climate model and show the capabilities of CNNs in re-identifying the 4 clustered regimes. The deep CNN trained with $1000$ samples or more per cluster has an accuracy of $90\%$ or better. Accuracy scales monotonically but nonlinearly with the size of the training set, e.g. reaching $94\%$ with $3000$ training samples per cluster. Effects of architecture and hyperparameters on the performance of CNNs are examined and discussed.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
Rate-Memory Trade-Off for Caching and Delivery of Correlated Sources
Authors:
Parisa Hassanzadeh,
Antonia M. Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
This paper studies the fundamental limits of content delivery in a cache-aided broadcast network for correlated content generated by a discrete memoryless source with arbitrary joint distribution. Each receiver is equipped with a cache of equal capacity, and the requested files are delivered over a shared error-free broadcast link. A class of achievable correlation-aware schemes based on a two-ste…
▽ More
This paper studies the fundamental limits of content delivery in a cache-aided broadcast network for correlated content generated by a discrete memoryless source with arbitrary joint distribution. Each receiver is equipped with a cache of equal capacity, and the requested files are delivered over a shared error-free broadcast link. A class of achievable correlation-aware schemes based on a two-step source coding approach is proposed. Library files are first compressed, and then cached and delivered using a combination of correlation-unaware multiple-request cache-aided coded multicast schemes. The first step uses Gray-Wyner source coding to represent the library via private descriptions and descriptions that are common to more than one file. The second step then becomes a multiple-request caching problem, where the demand structure is dictated by the configuration of the compressed library, and it is interesting in its own right. The performance of the proposed two-step scheme is evaluated by comparing its achievable rate with a lower bound on the optimal peak and average rate-memory tradeoffs in a two-file multiple-receiver network, and in a three-file two-receiver network. Specifically, in a network with two files and two receivers, the achievable rate matches the lower bound for a significant memory regime and it is within half of the conditional entropy of files for all other memory values. In the three-file two-receiver network, the two-step strategy achieves the lower bound for large cache capacities, and it is within half of the joint entropy of two of the sources conditioned on the third one for all other cache sizes.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
On Coding for Cache-Aided Delivery of Dynamic Correlated Content
Authors:
Parisa Hassanzadeh,
Antonia M. Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple unicast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size. However, the increasingly dynamic, unpredictable, and personalized nature of the content that users consume challenges the efficiency of existing caching-based s…
▽ More
Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple unicast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size. However, the increasingly dynamic, unpredictable, and personalized nature of the content that users consume challenges the efficiency of existing caching-based solutions in which only exact content reuse is explored. This paper generalizes the cache-aided coded multicast problem to specifically account for the correlation among content files, such as, for example, the one between updated versions of dynamic data. It is shown that (i) caching content pieces based on their correlation with the rest of the library, and (ii) jointly compressing requested files using cached information as references during delivery, can provide load reductions that go beyond those achieved with existing schemes. This is accomplished via the design of a class of correlation-aware achievable schemes, shown to significantly outperform state-of-the-art correlation-unaware solutions. Our results show that as we move towards real-time and/or personalized media dominated services, where exact cache hits are almost non-existent but updates can exhibit high levels of correlation, network cached information can still be useful as references for network compression.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Centralized Caching and Delivery of Correlated Contents over a Gaussian Broadcast Channel
Authors:
Qianqian Yang,
Parisa Hassanzadeh,
Deniz Gündüz,
Elza Erkip
Abstract:
Content delivery in a multi-user cache-aided broadcast network is studied, where a server holding a database of correlated contents communicates with the users over a Gaussian broadcast channel (BC). The minimum transmission power required to satisfy all possible demand combinations is studied, when the users are equipped with caches of equal size. A lower bound on the required transmit power is d…
▽ More
Content delivery in a multi-user cache-aided broadcast network is studied, where a server holding a database of correlated contents communicates with the users over a Gaussian broadcast channel (BC). The minimum transmission power required to satisfy all possible demand combinations is studied, when the users are equipped with caches of equal size. A lower bound on the required transmit power is derived, assuming uncoded cache placement, as a function of the cache capacity. A centralized joint cache and channel coding scheme is proposed, which not only utilizes the user's local caches, but also exploits the correlation among the contents in the database. This scheme provides an upper bound on the minimum required transmit power for a given cache capacity. Our results indicate that exploiting the correlations among the contents in a cache-aided Gaussian BC can provide significant energy savings.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Broadcast Caching Networks with Two Receivers and Multiple Correlated Sources
Authors:
Parisa Hassanzadeh,
Antonia M. Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
The correlation among the content distributed across a cache-aided broadcast network can be exploited to reduce the delivery load on the shared wireless link. This paper considers a two-user three-file network with correlated content, and studies its fundamental limits for the worst-case demand. A class of achievable schemes based on a two-step source coding approach is proposed. Library files are…
▽ More
The correlation among the content distributed across a cache-aided broadcast network can be exploited to reduce the delivery load on the shared wireless link. This paper considers a two-user three-file network with correlated content, and studies its fundamental limits for the worst-case demand. A class of achievable schemes based on a two-step source coding approach is proposed. Library files are first compressed using Gray-Wyner source coding, and then cached and delivered using a combination of correlation-unaware cache-aided coded multicast schemes. The second step is interesting in its own right and considers a multiple-request caching problem, whose solution requires coding in the placement phase. A lower bound on the optimal peak rate-memory trade-off is derived, which is used to evaluate the performance of the proposed scheme. It is shown that for symmetric sources the two-step strategy achieves the lower bound for large cache capacities, and it is within half of the joint entropy of two of the sources conditioned on the third source for all other cache sizes.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Sectoring in Multi-cell Massive MIMO Systems
Authors:
Shahram Shahsavari,
Parisa Hassanzadeh,
Alexei Ashikhmin,
Elza Erkip
Abstract:
In this paper, the downlink of a typical massive MIMO system is studied when each base station is composed of three antenna arrays with directional antenna elements serving 120 degrees of the two-dimensional space. A lower bound for the achievable rate is provided. Furthermore, a power optimization problem is formulated and as a result, centralized and decentralized power allocation schemes are pr…
▽ More
In this paper, the downlink of a typical massive MIMO system is studied when each base station is composed of three antenna arrays with directional antenna elements serving 120 degrees of the two-dimensional space. A lower bound for the achievable rate is provided. Furthermore, a power optimization problem is formulated and as a result, centralized and decentralized power allocation schemes are proposed. The simulation results reveal that using directional antennas at base stations along with sectoring can lead to a notable increase in the achievable rates by increasing the received signal power and decreasing 'pilot contamination' interference in multicell massive MIMO systems. Moreover, it is shown that using optimized power allocation can increase 0.95-likely rate in the system significantly.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Rate-Memory Trade-off for the Two-User Broadcast Caching Network with Correlated Sources
Authors:
Parisa Hassanzadeh,
Antonia Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
This paper studies the fundamental limits of caching in a network with two receivers and two files generated by a two-component discrete memoryless source with arbitrary joint distribution. Each receiver is equipped with a cache of equal capacity, and the requested files are delivered over a shared error-free broadcast link. First, a lower bound on the optimal peak rate-memory trade-off is provide…
▽ More
This paper studies the fundamental limits of caching in a network with two receivers and two files generated by a two-component discrete memoryless source with arbitrary joint distribution. Each receiver is equipped with a cache of equal capacity, and the requested files are delivered over a shared error-free broadcast link. First, a lower bound on the optimal peak rate-memory trade-off is provided. Then, in order to leverage the correlation among the library files to alleviate the load over the shared link, a two-step correlation-aware cache-aided coded multicast (CACM) scheme is proposed. The first step uses Gray-Wyner source coding to represent the library via one common and two private descriptions, such that a second correlation-unaware multiple-request CACM step can exploit the additional coded multicast opportunities that arise. It is shown that the rate achieved by the proposed two-step scheme matches the lower bound for a significant memory regime and it is within half of the conditional entropy for all other memory values.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Correlation-Aware Distributed Caching and Coded Delivery
Authors:
Parisa Hassanzadeh,
Antonia Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple groupcast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size. However, the increasingly unpredictable and personalized nature of the content that users consume challenges the efficiency of existing caching-based solutions…
▽ More
Cache-aided coded multicast leverages side information at wireless edge caches to efficiently serve multiple groupcast demands via common multicast transmissions, leading to load reductions that are proportional to the aggregate cache size. However, the increasingly unpredictable and personalized nature of the content that users consume challenges the efficiency of existing caching-based solutions in which only exact content reuse is explored. This paper generalizes the cache-aided coded multicast problem to a source compression with distributed side information problem that specifically accounts for the correlation among the content files. It is shown how joint file compression during the caching and delivery phases can provide load reductions that go beyond those achieved with existing schemes. This is accomplished through a lower bound on the fundamental rate-memory trade-off as well as a correlation-aware achievable scheme, shown to significantly outperform state-of-the-art correlation-unaware solutions, while approaching the limiting rate-memory trade-off.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Cache-Aided Coded Multicast for Correlated Sources
Authors:
Parisa Hassanzadeh,
Antonia Tulino,
Jaime Llorca,
Elza Erkip
Abstract:
The combination of edge caching and coded multicasting is a promising approach to improve the efficiency of content delivery over cache-aided networks. The global caching gain resulting from content overlap distributed across the network in current solutions is limited due to the increasingly personalized nature of the content consumed by users. In this paper, the cache-aided coded multicast probl…
▽ More
The combination of edge caching and coded multicasting is a promising approach to improve the efficiency of content delivery over cache-aided networks. The global caching gain resulting from content overlap distributed across the network in current solutions is limited due to the increasingly personalized nature of the content consumed by users. In this paper, the cache-aided coded multicast problem is generalized to account for the correlation among the network content by formulating a source compression problem with distributed side information. A correlation-aware achievable scheme is proposed and an upper bound on its performance is derived. It is shown that considerable load reductions can be achieved, compared to state of the art correlation-unaware schemes, when caching and delivery phases specifically account for the correlation among the content files.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Distortion-Memory Tradeoffs in Cache-Aided Wireless Video Delivery
Authors:
P. Hassanzadeh,
E. Erkip,
J. Llorca,
A. Tulino
Abstract:
Mobile network operators are considering caching as one of the strategies to keep up with the increasing demand for high-definition wireless video streaming. By prefetching popular content into memory at wireless access points or end user devices, requests can be served locally, relieving strain on expensive backhaul. In addition, using network coding allows the simultaneous serving of distinct ca…
▽ More
Mobile network operators are considering caching as one of the strategies to keep up with the increasing demand for high-definition wireless video streaming. By prefetching popular content into memory at wireless access points or end user devices, requests can be served locally, relieving strain on expensive backhaul. In addition, using network coding allows the simultaneous serving of distinct cache misses via common coded multicast transmissions, resulting in significantly larger load reductions compared to those achieved with conventional delivery schemes. However, prior work does not exploit the properties of video and simply treats content as fixed-size files that users would like to fully download. Our work is motivated by the fact that video can be coded in a scalable fashion and that the decoded video quality depends on the number of layers a user is able to receive. Using a Gaussian source model, caching and coded delivery methods are designed to minimize the squared error distortion at end user devices. Our work is general enough to consider heterogeneous cache sizes and video popularity distributions.
△ Less
Submitted 12 November, 2015;
originally announced November 2015.