-
Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects
Authors:
Natalí S. M. de Santi,
Francisco Villaescusa-Navarro,
L. Raul Abramo,
Helen Shao,
Lucia A. Perez,
Tiago Castro,
Yueying Ni,
Christopher C. Lovell,
Elena Hernandez-Martinez,
Federico Marinacci,
David N. Spergel,
Klaus Dolag,
Lars Hernquist,
Mark Vogelsberger
Abstract:
It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $Ω_{\rm m}$ from catalogs that only contain the positions and radial velocit…
▽ More
It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $Ω_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.
△ Less
Submitted 9 May, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Robust Field-level Likelihood-free Inference with Galaxies
Authors:
Natalí S. M. de Santi,
Helen Shao,
Francisco Villaescusa-Navarro,
L. Raul Abramo,
Romain Teyssier,
Pablo Villanueva-Domingo,
Yueying Ni,
Daniel Anglés-Alcázar,
Shy Genel,
Elena Hernandez-Martinez,
Ulrich P. Steinwandel,
Christopher C. Lovell,
Klaus Dolag,
Tiago Castro,
Mark Vogelsberger
Abstract:
We train graph neural networks to perform field-level likelihood-free inference using galaxy catalogs from state-of-the-art hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain $3$D positions and radial velocities of $\sim 1, 000$ galaxies in tiny…
▽ More
We train graph neural networks to perform field-level likelihood-free inference using galaxy catalogs from state-of-the-art hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain $3$D positions and radial velocities of $\sim 1, 000$ galaxies in tiny $(25~h^{-1}{\rm Mpc})^3$ volumes our models can infer the value of $Ω_{\rm m}$ with approximately $12$ % precision. More importantly, by testing the models on galaxy catalogs from thousands of hydrodynamic simulations, each having a different efficiency of supernova and AGN feedback, run with five different codes and subgrid models - IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE -, we find that our models are robust to changes in astrophysics, subgrid physics, and subhalo/galaxy finder. Furthermore, we test our models on $1,024$ simulations that cover a vast region in parameter space - variations in $5$ cosmological and $23$ astrophysical parameters - finding that the model extrapolates really well. Our results indicate that the key to building a robust model is the use of both galaxy positions and velocities, suggesting that the network have likely learned an underlying physical relation that does not depend on galaxy formation and is valid on scales larger than $\sim10~h^{-1}{\rm kpc}$.
△ Less
Submitted 18 July, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
The CAMELS project: public data release
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Anglés-Alcázar,
Lucia A. Perez,
Pablo Villanueva-Domingo,
Digvijay Wadekar,
Helen Shao,
Faizan G. Mohammad,
Sultan Hassan,
Emily Moser,
Erwin T. Lau,
Luis Fernando Machado Poletti Valle,
Andrina Nicola,
Leander Thiele,
Yongseok Jo,
Oliver H. E. Philcox,
Benjamin D. Oppenheimer,
Megan Tillman,
ChangHoon Hahn,
Neerav Kaushal,
Alice Pisani,
Matthew Gebhardt,
Ana Maria Delgado,
Joyce Caliendo,
Christina Kreisch
, et al. (22 additional authors not shown)
Abstract:
The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present…
▽ More
The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogues, power spectra, bispectra, Lyman-$α$ spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over one thousand catalogues that contain billions of galaxies from CAMELS-SAM: a large collection of N-body simulations that have been combined with the Santa Cruz Semi-Analytic Model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics. We provide further technical details on how to access, download, read, and process the data at \url{https://camels.readthedocs.io}.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Weighing the Milky Way and Andromeda with Artificial Intelligence
Authors:
Pablo Villanueva-Domingo,
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Anglés-Alcázar,
Lars Hernquist,
Federico Marinacci,
David N. Spergel,
Mark Vogelsberger,
Desika Narayanan
Abstract:
We present new constraints on the masses of the halos hosting the Milky Way and Andromeda galaxies derived using graph neural networks. Our models, trained on thousands of state-of-the-art hydrodynamic simulations of the CAMELS project, only make use of the positions, velocities and stellar masses of the galaxies belonging to the halos, and are able to perform likelihood-free inference on halo mas…
▽ More
We present new constraints on the masses of the halos hosting the Milky Way and Andromeda galaxies derived using graph neural networks. Our models, trained on thousands of state-of-the-art hydrodynamic simulations of the CAMELS project, only make use of the positions, velocities and stellar masses of the galaxies belonging to the halos, and are able to perform likelihood-free inference on halo masses while accounting for both cosmological and astrophysical uncertainties. Our constraints are in agreement with estimates from other traditional methods.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Inferring halo masses with Graph Neural Networks
Authors:
Pablo Villanueva-Domingo,
Francisco Villaescusa-Navarro,
Daniel Anglés-Alcázar,
Shy Genel,
Federico Marinacci,
David N. Spergel,
Lars Hernquist,
Mark Vogelsberger,
Romeel Dave,
Desika Narayanan
Abstract:
Understanding the halo-galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase-space, we use Graph Neural Ne…
▽ More
Understanding the halo-galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase-space, we use Graph Neural Networks (GNNs), that are designed to work with irregular and sparse data. We train our models on galaxies from more than 2,000 state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. Our model, that accounts for cosmological and astrophysical uncertainties, is able to constrain the masses of the halos with a $\sim$0.2 dex accuracy. Furthermore, a GNN trained on a suite of simulations is able to preserve part of its accuracy when tested on simulations run with a different code that utilizes a distinct subgrid physics model, showing the robustness of our method. The PyTorch Geometric implementation of the GNN is publicly available on Github at https://github.com/PabloVD/HaloGraphNet
△ Less
Submitted 8 February, 2023; v1 submitted 16 November, 2021;
originally announced November 2021.
-
The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Angles-Alcazar,
Leander Thiele,
Romeel Dave,
Desika Narayanan,
Andrina Nicola,
Yin Li,
Pablo Villanueva-Domingo,
Benjamin Wandelt,
David N. Spergel,
Rachel S. Somerville,
Jose Manuel Zorrilla Matilla,
Faizan G. Mohammad,
Sultan Hassan,
Helen Shao,
Digvijay Wadekar,
Michael Eickenberg,
Kaze W. K. Wong,
Gabriella Contardo,
Yongseok Jo,
Emily Moser,
Erwin T. Lau,
Luis Fernando Machado Poletti Valle,
Lucia A. Perez
, et al. (3 additional authors not shown)
Abstract:
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light year…
▽ More
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Robust marginalization of baryonic effects for cosmological inference at the field level
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Angles-Alcazar,
David N. Spergel,
Yin Li,
Benjamin Wandelt,
Leander Thiele,
Andrina Nicola,
Jose Manuel Zorrilla Matilla,
Helen Shao,
Sultan Hassan,
Desika Narayanan,
Romeel Dave,
Mark Vogelsberger
Abstract:
We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalizat…
▽ More
We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalization over baryonic physics at the field level: the model can infer the value of $Ω_{\rm m} (\pm 4\%)$ and $σ_8 (\pm 2.5\%)$ from simulations completely different to the ones used to train it.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Multifield Cosmology with Artificial Intelligence
Authors:
Francisco Villaescusa-Navarro,
Daniel Anglés-Alcázar,
Shy Genel,
David N. Spergel,
Yin Li,
Benjamin Wandelt,
Andrina Nicola,
Leander Thiele,
Sultan Hassan,
Jose Manuel Zorrilla Matilla,
Desika Narayanan,
Romeel Dave,
Mark Vogelsberger
Abstract:
Astrophysical processes such as feedback from supernovae and active galactic nuclei modify the properties and spatial distribution of dark matter, gas, and galaxies in a poorly understood way. This uncertainty is one of the main theoretical obstacles to extract information from cosmological surveys. We use 2,000 state-of-the-art hydrodynamic simulations from the CAMELS project spanning a wide vari…
▽ More
Astrophysical processes such as feedback from supernovae and active galactic nuclei modify the properties and spatial distribution of dark matter, gas, and galaxies in a poorly understood way. This uncertainty is one of the main theoretical obstacles to extract information from cosmological surveys. We use 2,000 state-of-the-art hydrodynamic simulations from the CAMELS project spanning a wide variety of cosmological and astrophysical models and generate hundreds of thousands of 2-dimensional maps for 13 different fields: from dark matter to gas and stellar properties. We use these maps to train convolutional neural networks to extract the maximum amount of cosmological information while marginalizing over astrophysical effects at the field level. Although our maps only cover a small area of $(25~h^{-1}{\rm Mpc})^2$, and the different fields are contaminated by astrophysical effects in very different ways, our networks can infer the values of $Ω_{\rm m}$ and $σ_8$ with a few percent level precision for most of the fields. We find that the marginalization performed by the network retains a wealth of cosmological information compared to a model trained on maps from gravity-only N-body simulations that are not contaminated by astrophysical effects. Finally, we train our networks on multifields -- 2D maps that contain several fields as different colors or channels -- and find that not only they can infer the value of all parameters with higher accuracy than networks trained on individual fields, but they can constrain the value of $Ω_{\rm m}$ with higher accuracy than the maps from the N-body simulations.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.