Improved Tomographic Binning of 3 × 2 pt Lens Samples: Neural Network Classifiers and Optimal Bin Assignments

Irene Moskowitz; Eric Gawiser; Abby Bault; Adam Broussard; Jeffrey A. Newman; Joe Zuntz; The LSST Dark Energy Science Collaboration

doi:10.3847/1538-4357/accc88

1. Introduction

The ΛCDM paradigm has been highly successful in explaining the evolution and accelerated expansion of the universe, with approximately 30% of the energy density of the universe present in visible and dark matter and the remaining 70% in dark energy (Planck Collaboration et al. 2020). The nature of this dark energy is not currently well understood, but the simplest model is a cosmological constant, Λ, in which the energy density of the dark energy does not change as the universe expands. A proposed alternative, the wCDM model, allows the energy density to evolve with time. The nature of the dark energy has been investigated by Stage III dark energy experiments, including the Dark Energy Survey (DES; Abbott et al. 2018), the Hyper Suprime-Cam Subaru Strategic Program (HSC; Aihara et al. 2017), the Kilo-Degree Survey (KiDS; Heymans et al. 2021), and the Extended Baryonic Oscillation Spectroscopic Survey (eBOSS; Dawson et al. 2016). Stage IV surveys have begun (Moustakas 2020), with the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST; Ivezic et al. 2019), Euclid (Laureijs et al. 2011), and Roman (Akeson et al. 2019) scheduled to come online in the next few years.

Two powerful cosmological probes are weak lensing (see, e.g., Heymans et al. 2012; Hildebrandt et al. 2016; Hamana et al. 2020; Asgari et al. 2021; Secco et al. 2022) and large-scale structure (see, for example, Press & Schechter 1974; Davis et al. 1985; Cooray & Sheth 2002; Zehavi et al. 2005; Desjacques et al. 2018; Rodríguez-Monroy et al. 2022). Two-point autocorrelation functions are computed between the shears of background (source) galaxies (shear–shear correlations), the locations of foreground (lens) galaxies (galaxy clustering), and the cross-correlation between the shears of the source galaxies and the locations of the lens galaxies (galaxy–galaxy lensing). This combination of three types of two-point correlation functions is called the 3 × 2 point (3 × 2 pt) method and has been used previously by DES (Abbott et al. 2018) and KiDS (Heymans et al. 2021).

If very accurate distances (i.e., spectroscopic redshifts) could be obtained for all the galaxies in a survey, the two-point correlation functions could be computed in 3D. However, many Stage III and IV surveys, including LSST, are large imaging surveys, and it is not possible to obtain spectroscopic redshifts for most of the galaxies that will be observed. Data for the 3 × 2 pt method are therefore analyzed tomographically, with galaxies sorted into bins of estimated redshift. After sorting, the exact redshift of each galaxy is no longer important, as angular two-point correlations are computed within and between each bin and only the redshift distribution of each bin is needed to interpret the results.

One way to sort galaxies into redshift bins is to estimate their photometric redshifts (photo-z's), which are less precise than spectroscopic redshifts but can be obtained for a much larger sample of galaxies without prohibitive observing time requirements. There are two main categories of photo-z estimation codes: template fitting and machine learning. Template-fitting codes use either observed or theoretical galaxy template spectra and apply filter transmission curves to predict the photometry for a galaxy at a given redshift. The observed photometry is then matched to this library to find the best-fit combination of template and redshift (e.g., Bolzonella et al. 2000; Ilbert et al. 2006; Brammer et al. 2008). In contrast, machine-learning codes use a variety of techniques, including self-organized maps (e.g., Masters et al. 2015; Wright et al. 2020), neural networks (e.g., Collister & Lahav 2004; Schuldt et al. 2021), and random forests (e.g., Carrasco Kind & Brunner 2013), to associate input colors and magnitudes with a redshift estimate. A method for improving photo-z estimates using machine-learning post-processing in the form of a neural network classifier (NNC) was recently introduced by Broussard & Gawiser (2021).

Methods for optimizing the tomographic binning for the source sample in LSST have been addressed in Zuntz et al. (2021). In this paper, we present a method for optimizing the tomographic binning for the lens sample. Where the methods for optimizing the source sample in Zuntz et al. (2021) are restricted to riz photometry owing to the use of the metacalibration (Sheldon & Huff 2017; Sheldon et al. 2020) technique for computing shears, the lens sample is under no such restriction, and we are free to use all the photometric information available for optimization. We make use of data simulated to resemble expected LSST observations, and we use a realistically nonrepresentative training sample of galaxies with spectroscopic redshifts. Next, we investigate extensions of three different tomographic binning methods to identify the optimal combination. We then use two variations of the NNC developed by Broussard & Gawiser (2021) to select galaxies with the most accurate photo-z estimates and calculate the resulting dark energy figure of merit (FOM) of the galaxy clustering signal.

In Section 2, we present the simulated galaxy catalogs used in this work, which have been split into training, validation, and application samples. In Section 3, we describe how the nonrepresentative training sample is built and introduce the photo-z estimation pipeline, along with two different ways to train the NNC used for selecting the final application sample; the different choices of binning; and how the FOM of each binning choice is evaluated. Section 4 presents the results, and we conclude in Section 5.

2. Simulated Data

We make use of two simulated galaxy catalogs, CosmoDC2 (Korytov et al. 2019) and Buzzard (DeRose et al. 2019).

2.1. CosmoDC2

The CosmoDC2 simulation covers 440 deg² on the sky and is complete to a magnitude r = 28 and contains galaxies up to a redshift of 3. The CosmoDC2 catalog was created using dark matter particles from the Outer Rim simulations (Heitmann et al. 2019). Galaxies with a limited set of specified properties were then assigned to dark matter halos using the UniverseMachine simulations (Behroozi et al. 2019) and the GalSampler technique (Hearin et al. 2020). These objects were matched to the outputs of the Galacticus model (Benson 2012), which generated complete galaxy properties.

2.2. Buzzard

The Buzzard catalog was developed to simulate DES Year 1 data and contains galaxies up to a redshift of 2.3 over an 1120 deg² area and complete to r ∼ 26. The Buzzard catalog is based on dark matter simulations from L-GADGET2 (Springel 2005). Galaxies were added to dark matter halos based on abundance matching using AddGals (Wechsler et al. 2022). Galaxies are drawn from a distribution of luminosities and overdensities and matched to halos of the same overdensity. Spectral energy distributions (SEDs) are assigned to match the SED−luminosity−density relationship measured from Sloan Digital Sky Survey (SDSS) data.

2.3. The DESC Tomographic Challenge

The LSST Dark Energy Science Collaboration (DESC) Tomographic Challenge (Zuntz et al. 2021) was designed as a first step toward optimizing the tomographic binning for the source sample of galaxies in a 3 × 2 analysis. Participants in the challenge were invited to use any method they liked for determining the binning but were limited to using (g)riz photometry. This choice was made to reflect the fact that the DESC plans to compute shears using the metacalibration technique (Sheldon & Huff 2017; Sheldon et al. 2020), which requires selections to be performed only in bands in which the point-spread function (PSF) is well measured.

For use in the Tomographic Challenge, the CosmoDC2 and Buzzard catalogs had noise added to simulate real LSST Year 1 observations using the DESC TXPipe ⁶ framework and following the methodology of Ivezic et al. (2019). The catalogs were further cut based on signal-to-noise ratio (S/N) and galaxy size to simulate the selection criteria used for real lensing catalogs. Galaxies in the full simulated catalogs were kept for the final sample of galaxies if the combined riz S/N satisfied S/N > 10 and the size satisfied T/T_psf > 0.5. Here T measures the squared, deconvolved radius of the galaxy and is the trace I_xx + I_yy of the moments matrix, and T_psf is derived from the assumed FWHM of the Rubin PSF, T_fwhm = 0 farcs 75. After these cuts, the CosmoDC sample contained 36 million objects and the Buzzard sample contained 20 million objects. From these samples, 25% were randomly assigned to the training sample, 50% to the validation sample, and the remaining 25% to the application sample, making the training sample fully representative. For this work, we have made use of the Tomographic Challenge training samples, which we have split into training and validation samples, and the Tomographic Challenge validation samples as our application sample.

3. Methods

The general process for optimizing the tomographic binning method consists of six steps:

1.
To simulate realistic training conditions, use Hyper Suprime-Cam (HSC) photometric and spectroscopic galaxies to create a realistically nonrepresentative training sample.
2.
Use the training sample to train a method for photometric redshift estimation. In the work, we use Trees for photo-z (TPZ; Carrasco Kind & Brunner 2013). This training is then applied to estimate photometric redshifts for the application sample.
3.
Select bin edges based on photo-z estimates for the sample using extensions on three simple approaches.
4.
Train an NNC to obtain a confidence in the accuracy of the photo-z estimates from TPZ.
5.
Sort galaxies into the chosen bins both with and without a confidence cut based on the NNC.
6.
Compute the FOM of the resulting bins for comparison with methods produced as part of the DESC Tomographic Challenge (Zuntz et al. 2021).

The following subsections describe this process in detail.

3.1. Nonrepresentative Training Sample

While the Tomographic Challenge methods were built and run using a training sample that was fully representative of the application sample, this will not be the case with LSST data. Previous attempts to account for the effects of a nonrepresentative training sample can be found in Beck et al. (2017) and Stylianou et al. (2022).

To divide the CosmoDC2 and Buzzard data sets into training and application samples that are nonrepresentative, we make use of the second data release (HSC DR2) of the Hyper Suprime-Cam Subaru Strategic Program (HSC SSP; Aihara et al. 2018). From the HSC Wide (300 deg² at i < 26.2; Aihara et al. 2019) catalog, we selected objects with i ≤ 25 and either g ≤ 27.5, r ≤ 27.7, or z ≤26.2, which mimics early LSST observations, as noted in Broussard & Gawiser (2021). After this photometric cut, there are 3,469,800 remaining objects; a color–magnitude diagram of a randomly selected subset of these objects is shown in the middle panel of Figure 1 of Broussard & Gawiser (2021). This sample is named the HSC phot sample.

**Figure 1.** Partitioning the original CosmoDC2 sample into nonrepresentative training and application samples. The left panel shows the original color–magnitude distribution of the catalog, while the middle and right panels show the selected training and application samples, respectively. The color bar shows the relative density per pixel within each panel on an arbitrary scale.
Download figure:
Standard image High-resolution image

In addition, some galaxies in the HSC Wide field have been matched to spectroscopic catalogs from zCOSMOS (Lilly et al. 2009), WiggleZ DR1 (Drinkwater et al. 2010), DEEP3 (Cooper et al. 2011), PRIMUS DR1 (Coil et al. 2011; Cool et al. 2013), DEEP2 DR4 (Newman et al. 2013), UDSz (Bradshaw et al. 2013; McLure et al. 2013), VVDS (Le Fèvre et al. 2013), 3D-HST (Skelton et al. 2014; Momcheva et al. 2016), VIPERS PDR1 (Garilli et al. 2014), FMOS-COSMOS (Silverman et al. 2015; Kashino et al. 2019), GAMA DR2 (Liske et al. 2015), SDSS DR12 (Alam et al. 2015), and the SDSS IV QSO catalog (Pâris et al. 2018). There are 307,193 objects meeting the same photometric requirements as above that also have matched spectroscopic redshifts. A color–magnitude diagram of this sample, named HSC spec, is shown in Broussard & Gawiser (2021) in the left panel of their Figure 1.

To build the nonrepresentative training sample, we divide the space of i-band magnitude and (g-z) color into 100 × 100 bins. In each bin, we compute the ratio between HSC spec objects with spectroscopic redshifts and HSC phot objects. Even at fixed brightness and color, galaxy spectral features can vary significantly, and spectroscopic confirmation rates are often significantly lower for higher-redshift galaxies in a given color–magnitude bin. To model this, we find the 99th percentile in spectroscopic redshift within each bin and assign that as ${z}_{\max }$ for that color–magnitude bin.

Objects in the CosmoDC2 and Buzzard catalogs are then sorted into the same bins in color–magnitude space. In each bin, objects with photometric redshifts greater than ${z}_{\max }$ are automatically assigned to the application sample. The remaining objects are then randomly assigned to the training or application sample based on the ratio of HSC spec to HSC phot objects. If there were no HSC spec objects in a given color–magnitude bin, all of the objects were assigned to the application sample. This method of producing a nonrepresentative training sample has been incorporated into the DESC RAIL⁷ framework as the GridSelection degrader. Inevitably, this is still an approximation of true nonrepresentativeness, which can get worse than what is presented here. As an example, a spectroscopic survey might use B-band selection, which is not directly replicable with the ugrizy photometry available for LSST. The nonrepresentative sample presented here may still prove to be somewhat optimistic, unless efforts are made to obtain spectroscopic samples with a greater coverage of color–magnitude–redshift space.

Figure 1 shows the partitioning of the CosmoDC2 sample into training and application samples. Critically, the training sample is brighter (the training sample has median i-band magnitude of 21.1) than the application sample (median i-band magnitude of 23.9), as is typical for spectroscopic follow-up of deep photometric surveys. The mean color of the training sample is also redder, as is expected. The median (g − z) color of the training sample is 2.20, while the median color of the application sample is 1.55. The training sample shown in Figure 1 is used for training TPZ and the NNCs and represents a spectroscopic sample, while the application sample is used for estimating photo-z's, NNC confidences, and the binning results and represents LSST data.

A further illustration of the nonrepresentative nature of the training sample we have built can be seen in Figure 2, which shows the normalized true redshift distribution of the selected application sample compared to that of the selected training sample. The redshift distribution of the training sample falls off above a redshift of 1.0 with a tail to z = 1.5, while the application sample falls off more gradually and extends out to a redshift of 3. In the application sample 34% of the objects are at z > 1.0, while in the training sample only 0.085% of objects are at z > 1.0. Additionally, the peak of the redshift distribution of the training sample occurs at lower redshifts than for the application sample. Note that the heights of the peaks in Figure 2 are not directly comparable; the training sample is much smaller overall.

3.1.1. Training Sample Size

We also use the HSC DR2 to construct a realistically sized training sample. There are 3,469,800 objects in the HSC DR2 catalog that meet the photometric cuts described in Section 3.1, as well as 307,193 objects with spectroscopic redshifts. This translates to a training sample that is ∼8% of the application sample. After splitting the Tomographic Challenge data sets into our nonrepresentative training and application samples, we select approximately 3.4M for the application sample and a training sample that's ∼7.6% the size of the application sample, a smaller fraction than the original Tomographic Challenge sample. This overall training sample is split into equally sized training and validation samples for TPZ. The TPZ validation sample is then further split into 2/3 to be used for the NNC training sample and 1/3 used for the NNC validation sample.

3.2. Photometric Redshifts from TPZ

TPZ is a machine-learning method of photo-z estimation that uses random forests, and we used the regression mode for this work. We used 100 trees for the random forest, with a minimum leaf size of 30. TPZ also allows us to choose m_*, the number of features that are randomly selected for splitting a given tree node; we provide TPZ with 11 features for training and choose m_* = 6.

We provide the following 11 features as inputs to TPZ for training: the apparent magnitudes in the ugrizy filters and the colors (u–g), (g–r), (r–i), (i–z), and (z–y). Galaxies with a nondetection in a given band are assigned a magnitude of 30 in that band. We additionally tried adding the band triples, defined as the difference between neighboring colors in Broussard & Gawiser (2021). In analogy to the manner in which colors act as a derivative of the magnitudes, band triples act as a second derivative. From the set of features we provide, the available band triples are [(u–g) – (g–r)], [(g–r) – (r–i)], [(r–i) – (i–z)], and [(i–z) – (z–y)]. We find here that including the triples in the training does improve the photo-z estimation by a marginal amount, but the NNCs do just as well without them, and it is too computationally expensive to be worth including them.

TPZ provides three outputs for each galaxy in the application sample: z_phot, the mean of the probability distribution function (pdf); σ_TPZ, the associated Gaussian uncertainty in the z_phot estimate; and z_conf, the integrated probability between ${z}_{\mathrm{phot}}\pm {\sigma }_{\mathrm{TPZ}}\left(1+{z}_{\mathrm{phot}}\right)$ .

3.3. Figure-of-merit Calculation

Tomographic Challenge methods were evaluated using two metrics: the S/N of the angular power spectra derived from the n_i(z) distribution of each bin i, and the FOM. The FOM comes in two flavors: the S₈ − Ω_m FOM, and the w₀ − w_a FOM, which constrains the dark energy equation-of-state parameters (Albrecht et al. 2006). We choose to focus on the w₀ − w_a FOM (hereafter simply FOM) as our metric for optimizing the binning choices since this is specific to the DESC goal of measuring the dark energy equation-of-state parameters. The binning optimization procedure we outline in the rest of the paper can be used for other science goals, but a different metric may be preferable in those cases.

Each metric has contributions from galaxy clustering and weak-lensing components, which are combined to form the 3 × 2pt FOM. Since assumptions made about galaxy bias in the Tomographic Challenge lead to the total 3 × 2pt FOM being dominated by the clustering signal, we continue to use the 3 × 2pt FOM despite focusing on the choice of lens sample. In doing so, we have assumed a source sample that is equivalent to the lens sample, even though the source sample cannot be selected using all of the bands in this work. This makes the exact FOM values reported here somewhat optimistic, but it is the relative changes in the FOM between binning choices that are most important. In order to directly compare the tomographic binning method described here to the results of the Tomographic Challenge, we use the same method for calculating the FOM.

The FOM is calculated using the inverse area of a Fisher matrix ellipse representing the area of the posterior pdf of the two parameters we want to measure. The FOM is then related to the constraining power of the chosen binning scheme: a higher FOM corresponds to a smaller area, meaning tighter constraints on w₀ and w_a. The Fisher matrix is given by

$\begin{eqnarray}&&F={\left(\displaystyle \frac{\partial \mu }{\partial \theta }\right)}^{T}{C}^{-1}\left(\displaystyle \frac{\partial \mu }{\partial \theta }\right),\end{eqnarray} \tag{ 1 }$

where μ is the vector of theoretical predictions for the C_ℓ spectra for each pair of tomographic bins, θ are the list of cosmological parameters included in the theory, and C is a Gaussian estimate of the covariance between them following Takada & Jain (2004). To compute the C_ℓ, we use the true redshift distribution, n_i(z), which is the distribution of the actual redshifts of the galaxies in each bin, rather than the distribution of photometric redshifts. The C_ℓ are calculated for 100 values of ℓ between 100 and 2000. More details can be found in the Appendix of Zuntz et al. (2021).

The FOM is then calculated as

$\begin{eqnarray}&&\mathrm{FOM}=\frac{1}{2\pi \sqrt{\det \left({\left[{F}^{-1}\right]}_{{p}_{1},{p}_{2}}\right)}},\end{eqnarray} \tag{ 2 }$

where p₁ = w₀ and p₂ = w_a extracts the 2 × 2 submatrix of F⁻¹ corresponding to the dark energy equation-of-state parameters. The list of parameters used for calculating the Fisher matrix is the native combination in the DESC core cosmology library (Chisari et al. 2019): Ω_c, Ω_b, H₀, σ₈, n_s, w₀, and w_a. The Tomographic Challenge FOM calculation does not include any galaxy bias values as nuisance parameters, which leads to the overall 3 × 2 pt FOM being dominated by the galaxy clustering signal.

Our analysis required some changes to the FOM calculation from the Tomographic Challenge. The Challenge used a hard-coded density of galaxies on the sky, which we updated to reflect the actual galaxy density of the CosmoDC2 and Buzzard simulations. The CosmoDC2 simulation covers 440 deg² on the sky, and the final CosmoDC2 sample contained 36M objects, so this density was used as the starting density. Although we did not use the entire Challenge catalog, we used the appropriate sky density for the sample to obtain comparable FOM values. Instead of hard coding a sky density, this ensures that sample cuts using the NNCs will become susceptible to shot noise in bins with few galaxies, thereby reducing the FOM as appropriate. We also updated the hard-coded fraction of the sky covered by the survey. We implemented two options: scaling the survey area to match the fraction of the catalog we used to maintain a comparable survey density, or assuming the sky density for a given simulation but scaled to the LSST year 10 survey area. The rest of this paper reports results for the assumed year 10 area.

3.4. Selecting Bin Edges

Once photo-z estimates are made, galaxies can be binned. Based on the optimization of galaxy clustering S/N explored in Broussard & Gawiser (2021), we utilize 12 tomographic bins covering 0 < z < 3. We start with three possible binning options: bin edges equally spaced in redshift (equal Δz binning), bin edges equally spaced in comoving distance (equal Δχ binning), and bins with equal numbers of galaxies in each bin (equal number binning). Examples of the three binning methods, along with the computed FOM values, are shown for the CosmoDC2 sample in Figure 3. Note that for binning choices related to distance, like equal Δz and equal Δχ, this results in a couple of sparsely populated bins at the high-redshift tail of the sample. In these cases, there are ∼10 effective bins up to z ∼ 1.2 that contain most of the information. This is not the case for equal number binning.

Figure 3 illustrates that the equal Δχ binning case produces the highest FOM value, while equal number binning performs the worst out of our three base binning choices. Equal Δχ binning has not been explored much in the literature, but the Tomographic Challenge method that achieved the highest FOM also utilized equal Δχ bins (Zuntz et al. 2021). Although they did not include equal Δχ bins in their analysis, Pocino et al. (2021) also found that equally spaced redshift bins performed better than equal number bins for galaxy clustering and galaxy–galaxy lensing when using 13 bins, and the two performed about evenly with 10 bins, which is consistent with our findings. This is in contrast with Sipp et al. (2021), which found that equal number binning outperformed equal Δz binning for a Euclid-like survey, although we find agreement with these results when the S/N is used as the evaluation metric. Taylor et al. (2018) and Kitching et al. (2019) found that bins equally spaced in redshift performed better than bins with equal numbers of galaxies for cosmic shear analysis, although neither were for the same number of bins used here. Taylor et al. (2018) concluded that while having a large number of bins equally spaced in redshift provides the best constraints on cosmological parameters, bins with equal numbers of galaxies were better for N_bin < 20. Meanwhile, Kitching et al. (2019) only tested on three, four, and five tomographic bins.

In addition to the three common binning choices above, there are many more ways that the bin edges could be chosen, so we introduce a new parameter, ${ \mathcal M }$ , that allows us to investigate an extended family of possible bin definitions. We define ${ \mathcal M }$ as

$\begin{eqnarray}&&{ \mathcal M }={\int }_{0}^{{z}_{\max }}{\left(\displaystyle \frac{{dN}}{{dz}}\right)}^{\alpha }{\left(\displaystyle \frac{d\chi }{{dz}}\right)}^{\beta }{dz},\end{eqnarray} \tag{ 3 }$

where α and β will be chosen to maximize the final S/N of the resulting bins. Once ${ \mathcal M }$ is calculated for a given α and β, ${ \mathcal M }$ is divided into equal intervals based on the number of bins desired, in this case 12, and then those values of ${ \mathcal M }$ are converted back into redshift values for the bin edges by interpolation.

There are three sets of α and β values that can recover the previous binning choices. When α = β = 0, ${ \mathcal M }={z}_{\max }$ , and we recover equal Δz binning. When α = 0 and β = 1, ${ \mathcal M }={\chi }_{\max }$ , and we recover equal Δχ binning. When α = 1 and β = 0, ${ \mathcal M }=N$ , the total number of galaxies in the sample, and we recover equal number binning. Other values for α and β represent variations on these approaches.

3.5. The "Outlier" Neural Network Classifier

Since LSST is expected to observe enough galaxies that it will not be shot noise limited, it should be possible to improve the FOM of the bins shown in Figure 3, or determined by Equation (3), by removing galaxies with poor photo-z estimates. After training and applying TPZ, we implement the NNC as described in Broussard & Gawiser (2021), which we refer to here as the "Outlier" NNC.

The NNC takes as its inputs the three outputs from TPZ, z_phot, σ_TPZ, and z_conf, along with the size of the galaxies, the i-band magnitude, and the same colors that were used for training TPZ. The NNC consists of four fully connected hidden layers, with [100, 200, 100, 50] neurons using a Scaled Exponential Linear Unit (SELU) activation function. The output neuron produces a value between 0 and 1, indicating redshift accuracy confidence, by using a sigmoid function with a binary cross-entropy loss function. An NNC confidence output close to 1 indicates high confidence in the accuracy of the photo-z estimate, while a confidence output near 0 indicates a probable outlier. The output NNC confidence values are then used to make a sample cut.

To define what counts as an accurate photo-z estimate when training the NNC, we must choose a maximum acceptable value for the "accuracy parameter" $\left|{\rm{\Delta }}z\right|/\left(1+z\right)$ . Broussard & Gawiser (2021) tested accuracy parameters between 0.04 and 0.15. They found that while the value of the accuracy parameter has an impact on the fit, if the goal is to cut a certain fraction of the sample, e.g., the worst 10%, the value of the accuracy parameter does not change which galaxies are in that worst 10%, only what the threshold confidence value is. In other words, we are free to choose the accuracy parameter to obtain reasonable-looking confidence values, since the accuracy parameter changes the confidence but not the ordering. We choose to use $\left|{\rm{\Delta }}z\right|/\left(1+z\right)\lt 0.07$ .

Figure 4 shows the results of implementing a confidence cut based on the NNC confidence levels for the CosmoDC2 sample. The confidence cut of 0.782 was chosen to retain 75% of the original sample, as this will be shown later to produce the highest FOM with this NNC. Since TPZ was trained on a nonrepresentative training sample with a maximum redshift lower than the maximum redshift of the application sample, the high-redshift portion of the application consists entirely of outlier photo-z estimates. The NNC successfully excludes a large portion of these major photo-z outliers, reducing the outlier fractions (defined as Δz/(1 + z) > 0.15) by about a third from f_out = 0.30 for the full sample to f_out = 0.20 for the NNC-selected sample. The Outlier NNC selects galaxies close to the one-to-one line representing accurate photo-z estimates when trained on a representative sample.

We can also see in Figure 4, and later in Figure 5, that TPZ produces concentrations near z_phot ∼ 0.1 and z_phot ∼ 0.5. We would expect bands arising from degeneracies in galaxy colors to form diagonal bands, not horizontal ones, so this is likely not an effect of degeneracies in galaxy SEDs at different redshifts. It appears that TPZ has found some preferred redshifts. Investigation of the cause and mitigation of this effect is left for future work.

3.6. The "Misclassification" NNC

One of the benefits of tomographic binning is that once a galaxy has been sorted into a bin, its individual redshift value is no longer important; angular correlation functions are computed for the population of galaxies within the bin. In this sense, it does not matter whether the photo-z estimate is highly accurate as long as it is good enough for most galaxies to be sorted into the correct tomographic bin. Following this logic, we implement a new method for training a "Misclassification" NNC. Instead of training it to assign low-confidence values to galaxies likely to be outliers, we trained this NNC to assign low-confidence values to galaxies likely to be sorted into the incorrect bin.

The Misclassification NNC has a more complicated training routine than the outlier NNC. With the Outlier NNC, we are free to choose the bin edges after applying the NNC and making the confidence cuts. However, since the definition of the correct bin will depend on the choice of bin edges, we must now choose the bin edges before training the NNC. Once photo-z estimates are determined for the application sample, we use the entire sample to define the bin edges, using equal Δz, equal Δχ, equal number binning, or the optimized binning determined by Equation (3). Then, objects in the NNC training sample are sorted into those bins by their estimated photo-z, and we determine whether the galaxy was sorted into the correct bin by comparing to the bin its true redshift would have placed it in. Galaxies in the correct bin are assigned an accuracy value of 1, while galaxies placed in any incorrect bin are assigned an accuracy value of 0.

We also train a separate Misclassification NNC for each bin, which is then applied only to objects sorted into that bin. As an example, a "bin 3 NNC" is trained on objects in the training sample that have been sorted into bin 3 and is applied only to objects in the application sample that are sorted into bin 3. Instead of identifying objects across the whole sample that are likely to be misclassified, this version of the Misclassification NNC identifies galaxies that are likely to be incorrectly sorted into each bin. This is an attempt to make it easier for the NNC to learn the photometric features associated with each bin, since the photometries associated with being sorted into the wrong bin are not necessarily consistent over all bins. This version of the Misclassification NNC performs better than the version trained on the entire sample and is the version we use going forward.

Figure 5 shows the sample selection using the Misclassification NNC. The NNC was trained on 12 equal Δχ bins, since this was the best performing of the base binning choices. The confidence cut was selected to retain 75% of the original sample, as in Figure 4. The sample selected by the Misclassification NNC looks quite different from the sample selected by the Outlier NNC. Boxy horizontal features are caused by the NNC only caring if the photo-z places the galaxy into the correct bin and correspond to the bin edges. In particular, in the equal Δχ binning case, the bins get progressively wider as the redshift increases; the highest redshift bin is quite wide, since there are not as many galaxies at those redshifts; and a modestly larger fraction of the outliers in the upper right of Figure 5 are kept in the sample compared to the Outlier NNC selection; this is consistent with the Misclassification NNC only caring about objects that shuffle between bins, as the highest redshift bin is broad.

4. Results

We first determine the optimal choice of bin edges and the optimal fraction of the sample to retain with the NNCs. After the sample selections are made with the NNCs, the remaining galaxies are sorted into the optimized bin edges, which are determined using the redshift distribution for the full sample. We compare the optimized binning choice to the previous best binning choice, equal number binning. Results in this section are shown for the CosmoDC2 simulation, and we leave the results for Buzzard to Appendix A, except where we discuss differences between the two simulations.

4.1. Optimal Binning Approaches

We calculate the FOM resulting from the bins produced by various combinations of α and β in Equation (3). In order to do these calculations, we must make a choice for the redshift distribution, n(z) (equivalent to $\tfrac{{dN}}{{dz}}$ in Equation (3)). We have done these calculations for three versions of n(z): the photo-z n(z), made up of the point estimates of the photo-z for each galaxy, a version of the photo-z n(z) that has been smoothed with a Gaussian filter, or the true n(z). In all three cases, we find that slightly different combinations of α and β produce the optimal binning choice, but they all find that equal number binning is not the optimal choice. We show results for the true n(z) going forward.

Figure 6 shows the FOM values for a grid of different values of α and β for CosmoDC2. The maximum FOM of 124 is achieved at α = 0.25 and β = 2.0. We take this to be our optimized binning choice. The bin edges resulting from this choice of α and β are listed in Table 1. Buzzard reaches the maximum FOM at α and β corresponding to equal Δχ binning; see Appendix A.

Table 1. Summary of the Bin Choices and Sample Selections

Binning Type	Bin Edges (photo-z)	Sample Selection	Retained Fraction	Contamination Fraction	FOM
	[0, 0.0758, 0.1548, 0.2372, 0.3234,	Full	1	0.5370	123
Equal Δχ	0.4142, 0.5102, 0.6120, 0.7206,	Outlier NNC	0.75	0.4374	126
	0.8365, 0.9614, 1.096, 1.242]	Misclassification NNC	0.45	0.1710	133

	[0, 0.0853, 0.1568, 0.2275, 0.3015,	Full	1	0.5310	124
Optimized	0.3792, 0.4634, 0.5550, 0.6557,	Outlier NNC	0.75	0.4261	128
	0.7662, 0.8897, 1.040, 1.242]	Misclassification NNC	0.45	0.1748	140

Representative sample	⋯	⋯	⋯	⋯	186
Optimized

Download table as: ASCII Typeset image

4.2. Figure of Merit versus Retained Fraction

Since LSST will not be shot noise limited, removing some of the galaxies with poor photo-z estimates should improve the FOM. However, if too much of the sample is removed, we will reach the limit where shot noise increases sufficiently to drop the FOM below what it would be if we kept the entire sample.

We test the range over which the NNC sample cuts will improve the FOM and find the optimal fraction of the sample to retain (the retained fraction). Figure 7 compares the performance of the FOM as a function of the retained fraction for each type of NNC and binning choice for CosmoDC2. The results for Buzzard are shown in Appendix A. It can be seen that the Outlier NNC performs better at slightly higher retained fractions, while the Misclassification NNC obtains a higher FOM at lower retained fractions. Neither NNC improves the FOM for the Buzzard sample, but there are a range of retained fractions for which the Misclassification NNC does not hurt the FOM either. See Appendix A for further discussion. Figure 8 shows the sample selection by the Misclassification NNC with the optimized bin edges and retained fraction. The retained fractions achieving the highest FOM in each combination of bin edges and NNC sample selection are listed for CosmoDC2 in Table 1.

**Figure 7.** The FOM as a function of the amount of the sample retained after cuts for CosmoDC2. Solid lines represent the FOM for the optimized binning choice with α = 0.25 and β = 2.0. Dashed lines represent equal Δχ binning. Solid and dashed horizontal lines are the FOM for the full sample for the optimized bins and equal Δχ bins, respectively.
Download figure:
Standard image High-resolution image

**Figure 8.** The sample selected by the Misclassification NNC trained on the optimized bin choice, defined by α = 0.25 and β = 2.0. Here we have retained 45% of the original sample, which achieves the highest FOM for the Misclassification NNC.
Download figure:
Standard image High-resolution image

Both CosmoDC2 and Buzzard obtain a higher FOM with the Misclassification NNC than with the Outlier NNC, suggesting that the Misclassification NNC is preferred. However, as stated in Appendix A, the highest FOM obtained with the Misclassification NNC is the same as the full sample FOM for Buzzard. Interestingly, although there is not a large improvement in the FOM when the full CosmoDC2 sample of galaxies is sorted into equal Δχ bins versus the optimized bins, the Misclassification NNC greatly increases this improvement at the optimal retained fraction.

4.3. Final Binning Results

The Outlier NNC–selected sample shown in Figure 4, which retains 75% of the original sample, is sorted into equal Δχ bins, which is the best performing of the original three binning choices. The same Outlier NNC–selected sample is sorted into the optimized binning choice defined by α = 0.25 and β = 2.0. The Misclassification NNC–selected samples for equal Δχ binning (shown in Figure 5 with a retained fraction of 75%) and for the optimized binning choice, shown in Figure 8, each with a retained fraction of 45%, are sorted into equal Δχ bins and the optimized bins, respectively.

The top row of Figure 9 shows the binning results for equal Δχ binning, while the bottom row shows the results for the optimized binning choice. In the left column, we have binned the full sample, while the middle and right columns show the Outlier NNC and Misclassification NNC selections. Switching from equal Δχ binning to the optimized binning increases the FOM by ∼1%. The sample selection with the Outlier NNC improves the FOM by a further 3.3%, while the Misclassification NNC improves the FOM by 13.2% over using the full sample. The NNC sample selection process boosts the FOMs in the equal Δχ binning case by a similar amount: 3.0% for the Outlier NNC, and 8.5% for the Misclassification NNC. The overall improvement between the Misclassification NNC–selected sample sorted into the optimized bins and the full sample sorted into the equal Δχ bins is 14.2%. The Misclassification NNC slightly outperforms the Outlier NNC, and in Figure 9 there is visibly less overlap between the bins when the Misclassification NNC is used, particularly when compared to the full sample bins. A summary of the different binning methods and sample selections, along with the achieved FOMs, is included in Table 1.

To compare how well the NNC sample selection can recover the ideal case of a representative training sample, we also test the binning method when TPZ and the NNCs have been trained with a fully representative training sample. When the galaxies are sorted into the optimized bin edges, we achieve an FOM of 186, an improvement of ∼50% over the nonrepresentative training sample. When the Misclassification NNC is applied in the nonrepresentative case, it is able to recover ∼25% of this difference. This indicates that the Misclassification NNC will be useful in attempting to re-create the ideal scenario of a fully representative spectroscopic training sample for LSST.

Given that the Tomographic Challenge already proposed methods for optimizing the tomographic binning strategy for LSST, albeit for the source sample of galaxies instead of the lens sample of galaxies as we have done here, we also compare how the Tomographic Challenge methods compare to the method described in this work. These results are described in Appendix B.

4.3.1. Difference between Binning Choices

The bin edges for both the optimized bins and equal Δχ bins are listed in Table 1. In the optimized case, the highest redshift bin is ∼40% wider than in the equal Δχ case, while most of the rest of the bins are slightly narrower than in the equal Δχ case. This could be a result of the highest redshift bin covering a range that is not well represented in our training sample, so the photo-z estimates are poorer. We gain constraining power by containing a larger fraction of those galaxies with poor photo-z's in one bin, allowing us to have narrower bins in regions with better photo-z's.

As can be seen in Figures 4, 5, and 8, galaxies at high true redshifts are assigned primarily lower photo-z values. In particular, there are bands at z ∼ 0.1 and z ∼ 0.5 where many higher-redshift galaxies are assigned. The lower band corresponds to bins 0 and 1, while the z ∼ 0.5 band falls around bin 5. In Figure 10, we show a closer look at this subset of the bins to illustrate some differences between the binning choices and sample selections.

In the top rows of the upper left and right quadrants of Figure 10, the two lowest redshift bins have significant numbers of contaminating galaxies, where the true redshift places them well outside the bounds of the bin. In the middle and bottom rows of these quadrants, the Outlier and Misclassification NNC are able to remove many of these contaminating galaxies, with the Outlier NNC working better in the lowest redshift bin 0, and the Misclassification NNC doing a better job in bin 1. Additionally, the Misclassification NNC does a slightly better job when applied to the optimized bin 1 than the equal Δχ bin 1.

In the lower left quadrant of Figure 10, we show bin 5, which contains many of the galaxies assigned to the z ∼ 0.5 band. Similar to bin 1, the Misclassification NNC is better able to clean up the contaminating galaxies than the Outlier NNC, although in this case the Misclassification NNC performs about equally for both choices of bin edges.

In general, the Misclassification NNC removes contaminating galaxies better than the Outlier NNC. The two exceptions are the lowest and highest redshift bins, shown in the upper left and lower right quadrants of Figure 10, respectively. In the case of the highest redshift bin, this is likely because not many galaxies are truly misclassified into the highest redshift bin. In general, photo-z estimates are lower than the true redshift, not higher (see, e.g., Figure 8), so there are not a significant number of galaxies misclassified into the highest redshift bin from below. On the other hand, galaxies are not being misclassified from above either, since galaxies assigned into bin 11 that have true redshifts larger than their photo-z would have been assigned to the highest redshift bin anyway. It is not as clear why the lowest redshift bin should also do better with the Outlier NNC as opposed to the Misclassification NNC.

In the remaining bins, the Misclassification NNC performs better at cleaning up contaminating galaxies than the Outlier NNC. In 4 of the 10 remaining bins, the Misclassification NNC cleans up more contaminating galaxies in the optimized binning choice than in the equal Δχ choice. In another 4 bins it cleans up contaminating galaxies equally well in both bin choices, while in the remaining 2 it cleans up contaminating galaxies better in the equal Δχ case. Given that the Misclassification NNC performs better on the optimized case in a larger percentage of bins, this explains why the optimized bins are able to achieve more constraining power than the equal Δχ bins.

The visual indication that the Misclassification NNC is better able to clean up the contaminating galaxies is quantified by the contamination fraction. Bin edges are defined in photo-z space; we define the contamination fraction as the fraction of galaxies with z_true outside the photo-z range of the bin. When the full sample of galaxies is sorted into equal Δχ bins, the contamination fraction is 0.54; over half the galaxies have z_true outside the bin they are sorted into. This is not surprising; the maximum photo-z estimate is z ∼ 1.2, while the true redshifts extend to z ∼ 3. With the Outlier NNC–selected sample, the contamination fraction is reduced to 0.44. It is further reduced to 0.17 with the Misclassification NNC selection. In contrast, the contamination fraction for the full sample sorted into the optimized bins is 0.53. This is reduced to 0.43 with the Outlier NNC and to 0.17 with the Misclassification NNC. In both choices of bin edges, the contamination fractions are much lower when the Misclassification NNC is used to select the sample than when the Outlier NNC or no NNC is used. This can be seen visually in the upper right and lower left quadrants of Figure 10.

4.3.2. Impact of NNCs on n(z)

The NNCs both produce changes in the overall n(z) distribution of the final selected sample compared to the full sample. Since galaxies with true redshifts above z ∼ 1.2 are not represented in the TPZ training sample, photometric redshifts for those galaxies are primarily outliers. These lead to the majority of galaxies with z_true ≳ 1.2 being removed when the Outlier NNC is used for sample selection. In the middle column of Figure 9, we can see that this produces a sharp cutoff in the n(z) distribution above z ∼ 1.2 in both the equal Δχ and optimized bins.

When the Misclassification NNC is used for sample selection, the n(z) distribution is less smooth over the bin edges, and there are notable decreases in the n(z) distribution right at the bin edges. This arises because galaxies with redshifts near the bin edges are more likely to be sorted into the wrong bin, even if the difference between their photo-z estimate and true redshift is small, meaning that the Misclassification NNC preferentially removes galaxies near the bin edges, which is not true for the Outlier NNC. In the rightmost column of Figure 9, it can be seen that the n(z) distribution falls off much more gradually above z ∼ 1.2 than when the Outlier NNC is used, but there are sharp dips around the bin edges.

This modification to the n(z) resulting from the NNCs could introduce biases in the estimated redshift distribution of the sample. Throughout this work, we have assumed that the true n(z) can be perfectly known via spectroscopic cross-correlation, but in practice this will be imperfect. The impact of the binning choices and sample selection with the NNCs on the estimated n(z), as well as any biases in cosmological parameters that may arise from systematic misestimation of that distribution, is the subject of future work.

4.3.3. Combining NNCs

As seen in Figure 10, the Outlier NNC performs better at removing contaminating galaxies in the highest and lowest redshift bins, while the Misclassification NNC is better at removing contaminating galaxies in the intermediate redshift bins. Given this, we test whether combining the two NNCs can improve the FOM even further. To do this, we sort the full sample of galaxies into the optimized bin edges and then apply the Outlier NNC to galaxies in the highest and lowest redshift bins, using the optimized retained fraction for the Outlier NNC. We apply the Misclassification NNC, with the appropriate retained fraction, to all other bins. This method achieves an FOM of 105, lower than the FOM achieved while using either NNC alone or using no NNC at all. Although there is room for attempting to optimize this combination further by adjusting the retained fraction within each bin, we leave this to future work.

5. Summary and Conclusions

We have explored two ways to optimize the tomographic binning of the lens sample of galaxies for 3 × 2 pt analysis: how to optimize the choice of bin edges, and how to optimize the choice of galaxies to sort into those bins. We used HSC DR2 to build a realistically nonrepresentative training sample of galaxies from the CosmoDC2 and Buzzard simulated catalogs. In particular, the training set (with spectroscopic redshifts) is brighter than the application set (with only photometric redshifts), which is expected for LSST data. We then used this sample of galaxies to train TPZ and obtain photometric redshift estimates for the application sample. Galaxies were initially binned by redshift into bins with edges equally spaced in either redshift or comoving distance, or with an equal number of objects in each bin. We introduced the new parameter ${ \mathcal M }$ to explore other possible binning choices and found that although equal Δχ binning performs the best out of the three initial binning choices, it is not the overall optimal choice of bin edges in all cases. For CosmoDC2, the optimized binning choice corresponds to α = 0.25 and β = 2.0, while Buzzard prefers α = 1.0 and β = 0, corresponding to equal Δχ binning. This difference in the optimized binning choice indicates that the optimized binning strategy depends on the underlying n(z) distribution of the galaxies. If this method was to be applied to real data, the optimized bin edges will have to be determined separately for a given galaxy survey, as the choice of α and β depends on the underlying n(z) distribution of the galaxy population observed in the survey. However, given that both CosmoDC2 and Buzzard prefer binning choices close to (or exactly) equal Δχ bins, this is a good starting choice.

After obtaining photo-z estimates, we trained the "Outlier" NNC introduced in Broussard & Gawiser (2021) to estimate the confidence that each photo-z estimate is an outlier, with an NNC confidence near 0 likely to be an outlier. Since galaxies are binned tomographically and the individual redshift estimates will no longer matter once the binning has been performed, we also trained the NNC in a new way: to estimate the confidence that each photo-z estimate reflects the galaxy being sorted into the correct bin. For this Misclassification NNC, a confidence value near 0 indicates a high probability that the galaxy has been misclassified and sorted into an incorrect redshift bin. We used the NNC confidence values to make sample cuts, selecting only galaxies with high confidence values in each case. The remaining galaxies were then sorted into the same bins defined before.

Figure 7 showed that the improvement to the FOM depends on the amount of the sample that is cut. The CosmoDC2 sample reaches the highest FOM when the Misclassification NNC is used to remove the worst 55% of the sample when galaxies are sorted into the optimized binning choice. In general, the Misclassification NNC performs better than the Outlier NNC when larger fractions of the sample are removed, while the Outlier NNC performs better when smaller fractions are removed. Overall, the Misclassification NNC produces the highest FOM in both CosmoDC2 and Buzzard.

One consequence of the NNC sample selection is the modification to the n(z) distribution. With the Outlier NNC, galaxies with true redshift greater than z ∼ 1.2 are mostly removed, leading to a sharp cutoff in the n(z) distribution at redshifts higher than this. When using the Misclassification NNC, galaxies with redshifts near the bin edges are more likely to be sorted into the incorrect bin, so the Misclassification NNC preferentially removes galaxies near the bin edges. This results in an n(z) distribution that is much less smooth than the n(z) for the full sample. Future work will investigate any bias in cosmological parameter estimation resulting from using samples selected with the NNCs, as well as any bias that may arise from the different binning choices.

The methodology developed here for optimizing lens galaxy samples is designed to be broadly applicable, offering independent steps of optimizing the bins and using an NNC to remove galaxies most likely to be photo-z outliers. The true optimized choices for a given survey will depend on the redshift distribution and photo-z probability distributions of its galaxies. While we have only utilized the TPZ code for photometric redshift estimation, Broussard & Gawiser (2021) found similar improvements from the application of NNCs for the template-based code BPZ. For both the CosmoDC2 and Buzzard examples, we found that among the three simplified binning choices, equal Δχ binning leads to higher FOMs, with the opportunity to further improve the FOM by shifting to the more flexible binning arrangement we introduced. The NNCs offer a significant further reduction in both the outlier and contamination fractions, and the Misclassification NNC can improve the FOM by over 13%. Further investigation of biases in the subsequent measurement of cosmological parameters will be needed to determine whether this reduction in outlier fraction yields benefits at that crucial stage of the analysis.

This paper has undergone internal review by the LSST Dark Energy Science Collaboration. The internal reviewers were Boris Leistedt, Jan Luca van den Busch, and Angus Wright. The authors would like to thank François Lanusse for his contribution to the metrics code.

I.M., E.G., and A.Br. acknowledge support for this research from the LSST Corporation via grant No. 2021–42. I.M., E.G. and A.Br. also acknowledge support from the U.S. Department of Energy, Office of Science, Office of High Energy Physics Cosmic Frontier Research program under award No. DE-SC0010008. J.N. acknowledges support from the U.S. Department of Energy, Office of Science, Office of High Energy Physics Cosmic Frontier Research program under award No. DE-SC0007914.

Author contributions are as follows. I.M. updated the NNC framework to include the misclassification training, performed the analysis, and wrote the majority of the paper. E.G. designed the project and supervised the analysis, wrote small sections, and provided feedback on the text. A.Ba wrote the FunBins tomographic binning method. A.Br. wrote the original Outlier NNC. J.N. suggested the color-based redshift cut used in creating the nonrepresentative training sample. J.Z. compiled mock catalogs and contributed to the metrics code.

The DESC acknowledges ongoing support from the Institut National de Physique Nucléaire et de Physique des Particules in France; the Science & Technology Facilities Council in the United Kingdom; and the Department of Energy, the National Science Foundation, and the LSST Corporation in the United States. DESC uses resources of the IN2P3 Computing Center (CC-IN2P3–Lyon/Villeurbanne—France) funded by the Centre National de la Recherche Scientifique; the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under contract No. DE-AC02-05CH11231; STFC DiRAC HPC Facilities, funded by UK BEIS National E-infrastructure capital grants; and the UK particle physics grid, supported by the GridPP Collaboration. This work was performed in part under DOE contract DE-AC02-76SF00515.

Appendix A: Buzzard Results

We use the same process to divide the Buzzard data set into nonrepresentative training and application samples. The resulting samples are shown in Figure 11. Similar to the partitioning of the CosmoDC2 sample, the resulting training sample has a median i-band magnitude of 21.3, brighter than the median i-band magnitude of the application sample at 24.3. The training sample is again redder, with a median (g−z) color of 2.00 compared to 1.51 for the application sample.

**Figure 11.** Training and application sample selection for Buzzard. The training sample is brighter and slightly redder than the application sample, as is expected for real LSST data.
Download figure:
Standard image High-resolution image

We determine the optimal values of α and β for the choice of bin edges using the true n(z), as we did for CosmoDC2. The results are shown in Figure 12. Similar to the CosmoDC2 sample, lower values of α and β are preferred. However, whereas CosmoDC2 preferred α and β close to, but not exactly, equal Δχ binning, Buzzard attains the maximum FOM at α = 0, β = 1.0, corresponding to equal Δχ binning. This difference is mostly likely due to the difference in n(z) distributions between the CosmoDC2 and Buzzard simulations. In particular, the Buzzard sample has a lower maximum redshift than CosmoDC2. The optimal values of α and β are not universal, but both simulations agree that equal Δχ binning is at least a close approximation of the optimal binning choice.

We also investigated the optimal retained fraction when using the Outlier and Misclassification NNCs with equal Δχ bins. Figure 13 shows the results. Unlike CosmoDC2, neither NNC improves the FOM for the Buzzard sample. Although the FOM drops for all retained fractions when the Outlier NNC is used, when the Misclassification NNC is applied, retained fractions above 0.75 still attain the same FOM as the full sample. Similarly to CosmoDC2, the Misclassification NNC achieves higher FOM than the Outlier NNC, so although the Misclassification NNC does not improve the FOM overall, it still performs better than the Outlier NNC. The difference in NNC behavior between CosmoDC2 and Buzzard further shows that the binning optimization process must be repeated for different surveys with different n(z) distributions.

Appendix B: Comparison to Tomographic Challenge Results

The Tomographic Challenge was conducted to determine a method of optimizing the source sample binning, as opposed to the lens sample binning we have optimized in this work. The winning method of the Tomographic Challenge was FunBins (Zuntz et al. 2021), which uses bin edges equally spaced in comoving distance χ. Galaxies are assigned to bins using a random forest classifier trained with some combination of colors and magnitudes, but they are not assigned specific photo-z values.

Although the Tomographic Challenge was conducted with a fully representative training set that was large compared to the application set, we can train FunBins using our nonrepresentative training set to compare the method outlined in this paper with the results of the Tomographic Challenge. We train FunBins using the same colors and magnitudes as were provided to TPZ for training. The results produced by FunBins using this training set for CosmoDC2 are shown in Figure 14. FunBins is able to achieve an FOM of ∼115 under these training conditions, equivalent to our equal Δz bins and a 6.5% decrease from our full sample sorted into equal Δχ bins. Compared to our NNC-selected samples, the FunBins FOM is a 14% decrease from the maximum FOM achieved with equal Δχ bins and an 18% decrease from the maximum FOM achieved with our optimized bins for CosmoDC2 (see Table 1). To achieve the same FOM as our optimized tomography and NNC selection method, a more conventional method, such as FunBins, would be required to observe an additional ∼18% of the total LSST Year 1 area.

**Figure 14.** Binning results when the FunBins Tomographic Challenge method is trained using the CosmoDC2 training set, colors, and magnitudes as described in this work. The FOM is lower than can be achieved with the methods described in this work (see Figure 9 and Table 1).
Download figure:
Standard image High-resolution image

Improved Tomographic Binning of 3 × 2 pt Lens Samples: Neural Network Classifiers and Optimal Bin Assignments

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction