\copyyear

2023 \startpage1 \authormarkTang et al. \titlemarkBounds for the average treatment effect \corresSatoshi Hattori, Department of Biomedical Statistics, Graduate School of Medicine, Osaka University, Osaka, Japan.

A simple sensitivity analysis method for unmeasured confounders via linear programming with estimating equation constraints

Chengyao Tang Yi Zhou Ao Huang Satoshi Hattori \orgdivDepartment of Biomedical Statistics, Graduate School of Medicine, \orgnameOsaka University, \orgaddress\stateOsaka, \countryJapan \orgdivBeijing International Center for Mathematical Research, \orgnamePeking University,\orgaddress\stateBeijing, \countryChina \orgdivDepartment of Medical Statistics, \orgnameUniversity Medical Center Göttingen, \orgaddress\stateGöttingen, \countryGermany \orgdivIntegrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives (OTRI), \orgnameOsaka University, \orgaddress\stateOsaka, \countryJapan [email protected]

Abstract

[Abstract]In estimating the average treatment effect in observational studies, the influence of confounders should be appropriately addressed. To this end, the propensity score is widely used. If the propensity scores are known for all the subjects, bias due to confounders can be adjusted by using the inverse probability weighting (IPW) by the propensity score. Since the propensity score is unknown in general, it is usually estimated by the parametric logistic regression model with unknown parameters estimated by solving the score equation under the strongly ignorable treatment assignment (SITA) assumption. Violation of the SITA assumption and/or misspecification of the propensity score model can cause serious bias in estimating the average treatment effect. To relax the SITA assumption, the IPW estimator based on the outcome-dependent propensity score has been successfully introduced. However, it still depends on the correctly specified parametric model and its identification. In this paper, we propose a simple sensitivity analysis method for unmeasured confounders. In the standard practice, the estimating equation is used to estimate the unknown parameters in the parametric propensity score model. Our idea is to make inference on the average causal effect by removing restrictive parametric model assumptions while still utilizing the estimating equation. Using estimating equations as constraints, which the true propensity scores asymptotically satisfy, we construct the worst-case bounds for the average treatment effect with linear programming. Different from the existing sensitivity analysis methods, we construct the worst-case bounds with minimal assumptions. We illustrate our proposal by simulation studies and a real-world example.

keywords:

Unmeasured confounders, sensitivity analysis, average treatment effect, linear programming

^†^†articletype: Article Type

1 Introduction

In observational studies, it is always crucial to adjust influence of confounders in estimating the average treatment effect (ATE). If all the confounders are observed and satisfy the strongly ignorable treatment assignment (SITA) assumption,^{1, 2} one can adjust the effects of confounders by using the propensity score. With the propensity score, inverse probability weighting (IPW)^{3, 4} is a popular approach. The IPW method constructs weights on the observations of each subject, and then the ATE can be identified by comparing the weighted outcomes of two groups.⁵ In practice, the propensity score is unknown. Then, the estimation of the propensity score usually relies on a parametric model such as the logistic regression under the SITA assumption. In most observational studies, it is untestable and implausible that there are no unmeasured confounders, and then the SITA assumption may fail to hold. Using the outcome-dependent propensity score is an option to make inference without the SITA assumption.^{6, 7} By incorporating the outcome variable in the model of the propensity score, we can make inference on the ATE without the SITA assumption. In general, the outcome-dependent propensity score is estimated by a parametric logistic regression model with the observed confounders and the outcome as explanatory variables. Thus, model misspecification is still of concern in the estimation of the outcome-dependent propensity score. Moreover, it has an unidentifiability issue;^{8, 9} that is, the estimating equation cannot determine the unknown parameters uniquely in the outcome-dependent propensity score. Then, the outcome-dependent propensity score cannot solve the issue of unmeasured confounders completely.

Sensitivity analysis is a useful tool to assess the potential impact of unmeasured confounders, and many sensitivity analysis methods have been developed. With the substantially increasing applications of the propensity score methods in the analysis of observational studies, there is a growing interest in employing sensitivity analysis methods in real-data analyses. Typical sensitivity analysis approaches involve formulating additional assumptions with regards to the relationships among unmeasured confounders, treatment assignments, and outcomes. These assumptions often take the form of plausible values for parameters that cannot be directly estimated from the observed data and must be set by analysts. Rosenbaum and Rubin ¹⁰ and Lin et al. ¹¹ modeled the mechanism of confounding with both the measured and unmeasured confounders and then estimated the treatment effect parameter of interest. Alternatively, Cornfield et al. ¹² and Ding and Vanderweele ¹³ developed methods to construct the bounds for the treatment effects to quantify the magnitude of the unmeasured confounders. These bounds were designed to elucidate the extent to which unmeasured confounders could influence observed causal estimates. Particularly, when the sensitivity parameters were expressed as risk ratios, the E-value ¹⁴ was introduced and has become a pivotal quantity in the realm of causal inference in observational studies. While the E-value can provide a bound without any model specification, the estimand is restrictive and the bound is likely to be wide, which can lead to inefficiency in sensitivity analysis.

For the sensitivity analysis approaches based on the IPW method to estimate the ATE, Li et al. ¹⁵ modeled the mean between-group differences of potential outcomes to correct bias in the presence of unmeasured confounders. Shen et al.¹⁶ proposed an IPW-based sensitivity analysis method by using two parameters, the variance of the multiplicative errors in the estimated propensity score and its correlations with the potential outcomes, to quantify the bias due to unmeasured confounders. Lu and Ding ¹⁷ extended the method of Li et al.¹⁵ into a more flexible sensitivity analysis framework, which can handle the IPW, outcome regression, and doubly robust estimators. In addition, Zhao et al. ¹⁸ constructed bounds for the ATE based on the IPW estimators by incorporating a marginal sensitivity model.¹⁹ Dorn and Guo ²⁰ further refined this method and gave sharper bounds. These sensitivity analysis methods can address the impacts of violation of the SITA assumption by quantifying potential biases; however, they rely on untestable parametric assumptions on the departure from the SITA assumption, and it is practically difficult to set a relevant magnitude of the departure.

In this paper, a simple sensitivity analysis framework for unmeasured confounders is proposed. In the standard process of the confounder adjustment with the outcome-dependent propensity score, a parametric model for the propensity score is assumed, and an estimating equation is introduced to estimate its unknown parameters. Instead of determining a unique model for the outcome-dependent propensity score, we construct bounds for the ATE by considering possible propensity scores. We realize it by removing the parametric model for the propensity score, but still relying on the estimating equation. We introduce an optimization problem constrained by the estimating equation, which the true propensity score asymptotically satisfies. The worst-case bounds for the ATE can be obtained by solving a linear programming problem. Different from the existing sensitivity analysis methods, the proposed worst-case bounds do not rely on strong assumptions. By increasing the dimension of the estimating equations involving many covariates, one can make the bounds further narrow. Compared with existing sensitivity analysis methods, the proposed method offers the following advantages. First, the proposed method can provide worst-case bounds with minimal assumptions. Second, since the proposed method is free from the estimated propensity score under the SITA assumption, its misspecification does not matter. Finally, our method exhibits computational efficiency as the optimization problem can be solved by linear programming.

The rest of this paper is organized as follows. In Section 2, we introduce the basic notations and the standard methods with the parametric propensity score. In Section 3, some existing sensitivity analysis methods for the IPW estimator are reviewed. In Section 4, the proposed method for sensitivity analysis is introduced. We investigate the performance of the proposed method on simulated datasets in Section 5, and illustrate it on a real-world example in Section 6. In Section 7, we provide a concluding discussion to summarize the main findings and contributions of this paper.

2 Estimation with the parametric propensity score

2.1 Notations and the standard propensity score analysis

In this paper, we consider to estimate the ATE for the overall mean over the population in an observational study with two treatment groups. Let $Z$ be the treatment assignment: $Z=1$ if the subject is in the treated (exposed) group and $Z=0$ if in the control group. Let $X$ be a vector of baseline covariates and $Y$ be the observed outcome. We follow Rubin’s causal model framework.²¹ Let $Y^{(1)}$ and $Y^{(0)}$ be the potential outcomes if the subjects were assigned to the treated group ( $Z=1$ ) and the control group ( $Z=0$ ), respectively. Suppose the observational study enrolls $n$ subjects, and the observed data $(Y_{i},Z_{i},X_{i})$ for subject $i$ ( $i=1,2,\dots,n$ ) available, which are independent and identically distributed copies of $(Y,Z,X)$ . Denote $\mu_{1}=E[Y^{(1)}]$ and $\mu_{0}=E[Y^{(0)}]$ . The ATE, which is of our primary interest to estimate, is defined by

\psi=\mu_{1}-\mu_{0}=E[Y^{(1)}]-E[Y^{(0)}].

In observational studies, owing to the absence of randomization, the potential influence of confounders should be carefully handled in estimating the ATE. The propensity score is widely used to adjust the bias due to confounding. The propensity score is defined by $e(X_{i})=p(Z_{i}=1\mid X_{i})$ . Various methods, such as stratification, matching, and IPW,^{3, 4, 22} can be employed to adjust for confounding with the propensity score. The standard propensity analysis is conducted under the following assumptions: {assumption} Consistency: $Y_{i}=Z_{i}Y_{i}^{(1)}+(1-Z_{i})Y_{i}^{(0)}$ . {assumption} Positivity: there exists a small positive parameter $\delta$ such that $0<\delta\leq e(X_{i})\leq 1-\delta$ , for all subjects $i$ . {assumption} SITA: $(Y^{(1)}_{i},Y^{(0)}_{i})\perp\!\!\!\perp Z_{i}\mid X_{i}$ . Assumption 3 implies that the bias due to confounding can be adjusted by using $X$ in principle. The SITA assumption is corresponding to the Missing At Random (MAR) in the missing data analysis context. In this paper, we handle situations in which the SITA is violated, which is corresponding to the concept of the Missing Not At Random (MNAR) in the missing data problem. We use the terminologies SITA and MAR exchangeably. In practice, the propensity score is unknown, and then some parametric models such as the logistic regression model is usually assumed. Let $\mbox{logit}(e(X_{i};\theta,\alpha))=\theta+\alpha^{\top}X_{i}$ . The unknown parameters are usually estimated by solving the following score equation:

\sum_{i=1}^{n}\left(\begin{array}[]{cc}1\\ X_{i}\end{array}\right)\left(Z_{i}-\frac{exp(\theta+\alpha^{\top}X_{i})}{1+exp% (\theta+\alpha^{\top}X_{i})}\right)=0.

(1)

Let the solution to the score equation for $(\theta,\alpha)$ be denoted by $(\hat{\theta},\hat{\alpha})$ , and $\hat{e}(X_{i})=e(X_{i};\hat{\theta},\hat{\alpha})$ . Then, we can determine the unique set of propensity scores for all subjects. In this paper, the propensity score estimated under the SITA assumption is called the MAR-based propensity score to avoid confusion; another type of propensity score is introduced in a later section, which is called the outcome-dependent propensity score. The IPW estimator for $\mu_{1}$ is defined by

\hat{\mu}_{1}=\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}.

(2)

Similarly, we can estimate $\mu_{0}$ with

\hat{\mu}_{0}=\frac{1}{n}\sum_{i=1}^{n}\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i})},

(3)

and then the ATE $\psi$ is estimated with

\hat{\psi}=\hat{\mu}_{1}-\hat{\mu}_{0}.

The aforementioned IPW estimator has an unstabilized form, which may suffer from extremely large weights when some propensity scores are very close to one or zero, and then can cause instability in the estimation. The stabilized IPW (SIPW) estimator introduces a stabilization term to the weights, which helps mitigate the impact of extreme weights. Specifically, the SIPW estimator for $\mu_{1}$ is defined by

\hat{\mu}_{1,SIPW}=\left(\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}}{\hat{e}(X_{i})}% \right)^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}.

(4)

Similarly, we can estimate $\mu_{0}$ with

\hat{\mu}_{0,SIPW}=\left(\frac{1}{n}\sum_{i=1}^{n}\frac{1-Z_{i}}{1-\hat{e}(X_{% i})}\right)^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i}% )},

(5)

and then the ATE $\psi$ is estimated with

\hat{\psi}_{SIPW}=\hat{\mu}_{1,SIPW}-\hat{\mu}_{0,SIPW}.

In this paper, we focus on the SIPW estimator.

If the SITA assumption holds and the model of the propensity score is correctly specified, the ATE is consistently estimated. However, the SITA assumption does not hold in the presence of unmeasured confounders.

2.2 Estimation with the outcome-dependent propensity score

In this section, suppose that the SITA assumption does not necessarily hold in the presence of unmeasured confounder $U$ . The estimation of the ATE using the method in Section 2.1 is no longer valid. To address the issue of unmeasured confounders, the outcome-dependent propensity score approach^{23, 24, 25} has been successfully introduced. We define the outcome-dependent propensity scores by $o^{1}(X_{i},Y_{i}^{(1)})=P(Z_{i}=1\mid X_{i},Y_{i}^{(1)})$ and $o^{0}(X_{i},Y_{i}^{(0)})=P(Z_{i}=1\mid X_{i},Y_{i}^{(0)})$ for subjects in the treated and control group, respectively.

One may consider the logistic regression models for $o^{1}(X_{i},Y_{i}^{(1)})$ and $o^{0}(X_{i},Y_{i}^{(0)})$ . Let us consider the models $\mbox{logit}{(o^{1}(X_{i},Y_{i}^{(1)};\theta^{1},\alpha^{1},\beta^{1}))}=% \theta^{1}+\alpha^{1\top}X_{i}+\beta^{1}Y_{i}^{(1)}$ and $\mbox{logit}{(o^{0}(X_{i},Y_{i}^{(0)};\theta^{0},\alpha^{0},\beta^{0}))}=% \theta^{0}+\alpha^{0\top}X_{i}+\beta^{0}Y_{i}^{(0)}$ . The score equation (1) does not work for estimation of the unknown parameters in these models, since $Y_{i}^{(z)}$ is observed only for subjects with $Z_{i}=z$ . The unknown parameters in the model of $o^{1}(X_{i},Y_{i}^{(1)};\theta^{1},\alpha^{1},\beta^{1})$ can be estimated by solving the following estimating equation:

\sum_{i=1}^{n}g(X_{i})\left(1-\frac{Z_{i}}{o^{1}(X_{i},Y_{i};\theta^{1},\alpha% ^{1},\beta^{1})}\right)=0,

(6)

where $g(X)$ is a vector of the same dimensions as $(\theta^{1},\alpha^{1},\beta^{1})$ and the solution to the estimating equation (6) is denoted by $(\hat{\theta}^{1},\hat{\alpha}^{1},\hat{\beta}^{1})$ . Similarly, the unknown parameters in the model of $o^{0}(X_{i},Y_{i}^{(0)};\theta^{0},\alpha^{0},\beta^{0})$ can be estimated by solving the following estimating equation:

\sum_{i=1}^{n}g(X_{i})\left(1-\frac{1-Z_{i}}{1-o^{0}(X_{i},Y_{i};\theta^{0},% \alpha^{0},\beta^{0})}\right)=0.

(7)

The dimension of $g(X)$ should be equal to that of $(\theta^{0},\alpha^{0},\beta^{0})$ to obtain a solution. The solution to the estimating equation (7) is denoted by $(\hat{\theta}^{0},\hat{\alpha}^{0},\hat{\beta}^{0})$ . Denote $\hat{o}^{1}(X_{i},Y^{(1)}_{i})=o^{1}(X_{i},Y_{i};\hat{\theta}^{1},\hat{\alpha}% ^{1},\hat{\beta}^{1})$ and $\hat{o}^{0}(X_{i},Y^{(0)}_{i})=o^{0}(X_{i},Y_{i};\hat{\theta}^{0},\hat{\alpha}% ^{0},\hat{\beta}^{0})$ , respectively. We can then estimate $\mu_{1}$ under the MNAR with

\hat{\mu}^{MNAR}_{1,SIPW}=\left(\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}}{\hat{o}^% {1}(X_{i},Y^{(1)}_{i})}\right)^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{% \hat{o}^{1}(X_{i},Y^{(1)}_{i})}.

Similarly, we can estimate $\mu_{0}$ under the MNAR with

\hat{\mu}^{MNAR}_{0,SIPW}=\left(\frac{1}{n}\sum_{i=1}^{n}\frac{1-Z_{i}}{1-\hat% {o}^{0}(X_{i},Y^{(0)}_{i})}\right)^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{(1-Z_{i}% )Y_{i}}{1-\hat{o}(X_{i},Y^{(0)}_{i})},

and then the ATE is estimated with

\hat{\psi}^{MNAR}_{SIPW}=\hat{\mu}^{MNAR}_{1,SIPW}-\hat{\mu}^{MNAR}_{0,SIPW}.

The SIPW estimator with the outcome-dependent propensity score can consistently estimate the ATE without the SITA assumption as long as the parametric models for the outcome-dependent propensity score are correctly specified. However, estimations with (6) and (7) often encounter an unidentifiability issue, wherein the model coefficients obtained through solving the estimating equations may not be uniquely determined. Miao et al.⁹ pointed out that even if the model for the propensity score has a known parametric form, the model is not identifiable without specifying a parametric outcome distribution. A unique solution to the estimating equations is only achieved when both the outcome model and the propensity score model are appropriately specified. Specifically, without additional restrictions or assumptions, sorely solving the estimating equations (6) is not sufficient to determine the coefficients $(\hat{\theta}^{1},\hat{\alpha}^{1},\hat{\beta}^{1})$ uniquely. Therefore, the outcome-dependent propensity score cannot solve the issue of the unmeasured confounder completely.

3 Existing sensitivity analysis methods

In this section, we will briefly review some existing sensitivity analysis methods for the IPW estimator.

3.1 Modeling the mean difference of the potential outcomes

Along with the lines of the work by Robins et al.,^{26, 27} Brumback et al. ²⁸ proposed to quantify the impact of the unmeasured confounders by modeling the mean between-group difference of the potential outcomes, conditional on all observed covariates. The sensitivity function is defined by $c(z,X)=E[Y^{(z)}\mid Z=1,X]-E[Y^{(z)}\mid Z=0,X]$ . If the SITA assumption holds, $c(z,X)$ equals zero. Thus, the sensitivity function can describe the magnitude of the departure from SITA assumption or the impact of the unmeasured confounders. Once we specify the sensitivity function $c(z,X)$ , one can predict the mean function of the counterfactual variables conditional on $X$ and then estimate the ATE without the SITA assumption. Li et al. ¹⁵ criticized a technical difficulty in defining the sensitivity function when covariates $X$ contain multiple dimensions. Of note, in practical sensitivity analysis, if $X$ is multi-dimensional, not only the functional form but also the specific coefficients for each covariate are required to be specified. Such specifications were criticized to be unlikely to accurately reflect the relationship between the departure from SITA assumption and the potential outcomes. Li et al. ¹⁵ proposed a refinement by defining the sensitivity function as a function of the MAR-based propensity score: $c(z,e(X))=E[Y^{(z)}\mid Z=1,e(X)]-E[Y^{(z)}\mid Z=0,e(X)]$ . The MAR-based propensity score is a one-dimension summary of observed covariates, and this refinement made the specification of the sensitivity function much simpler.

However, in reality, even with the simplification by Li et al.¹⁵, it is not an easy task to define a plausible range of sensitivity functions. Furthermore, their method still relies on the estimation of the MAR-based propensity score. Misspecification of the parametric model for the MAR-based propensity score may result in difficulty in interpreting the results of the sensitivity analysis.

3.2 The marginal sensitivity model

Tan ¹⁹ proposed the marginal sensitivity model, which describes a relaxation of the SITA assumption. The model assumes a single sensitivity parameter, which permits the presence of the unmeasured confounders $U$ , but restricts the extent of selection bias that can be attributed to these confounders. One can specify a parameter $\lambda$ , and then the following inequality is supposed to hold:

1/\lambda\leq\frac{e(X_{i},U_{i})/(1-e(X_{i},U_{i}))}{\hat{e}(X_{i})/(1-\hat{e% }(X_{i}))}\leq\lambda,

where $e(X_{i},U_{i})$ refers to the true propensity score measuring all covariates, and $\hat{e}(X_{i})$ refers to the estimated MAR-based propensity score. The single parameter $\lambda$ , that is, the odds ratio (OR) between true propensity score and estimated propensity score, can control degree of unconfoundedness. When $\lambda=1$ , the inclusion of additional confounders has no effect on the treatment odds. This implies that the allocation of the treatment is not influenced by confounding factors. That is, the SITA assumption holds. Increasing $\lambda$ represents the allowance for stronger extent to which the SITA assumption is violated. Tan ¹⁹ proposed a sensitivity analysis method to assess how the estimates based on the nonparametric likelihood change under the violation of the SITA assumption.

By introducing the marginal sensitivity model, the sensitivity analysis for unmeasured confounders can be applied to the IPW estimator under the MNAR. If $U$ was observed, one can estimate $\mu_{1}$ with

\left(\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}}{e(X_{i},U_{i})}\right)^{-1}\frac{1% }{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{e(X_{i},U_{i})}.

(8)

In practice, ${U}$ is unobserved and $\hat{\mu}_{1}$ in equation (8) actually makes no sense. However, under the marginal sensitivity model, $\lambda$ can link the unobserved true propensity score and the estimated MAR-based propensity score, so that it is possible to evaluate bounds of (8) under some constraints. That is

\begin{split}\max\mbox{or}\min&\left(\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}}{e(X% _{i},U_{i})}\right)^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{e(X_{i},U_{% i})}\\ \mbox{subject to}\quad&1/\lambda^{1}\leq\frac{e(X_{i},U_{i})/(1-e(X_{i},U_{i})% )}{\hat{e}(X_{i})/(1-\hat{e}(X_{i}))}\leq\lambda^{1},\end{split}

(9)

where $\lambda^{1}$ is the pre-specified constant, which describes the upper and lower bounds of the discrepancy of the true propensity score from the estimated MAR-based propensity score for the estimation of $\mu_{1}$ . As long as the true propensity score for all the subjects satisfies the constraint, the true $\mu_{1}$ should be bounded by the minimum and maximum of (9) asymptotically. It is possible to have an interval for $\mu_{0}$ in a similar way. This method under the marginal sensitivity model was proposed firstly by Zhao.¹⁸ In this method, the sensitivity parameter $\lambda$ quantifies the extent to which the SITA assumption is violated. However, it still suffers from defining a plausible range for the sensitivity parameter and reliance on correct specification of the MAR-based propensity score model.

It was criticized that the interval obtained by (9) may not be tight and the interval was asymptotically conservative.²⁰ Dorn and Guo ²⁰ proposed the quantile balancing method, a refinement based on the marginal sensitivity model. Let $F(y\mid x,z)=P(Y\leq y\mid X=x,Z=z)$ and the quantile function is defined by $Q_{t}(x,z)=\mbox{inf}\{q:F(q\mid x,z)\geq t\}$ . For bounding $\mu_{1}$ , the quantile balancing method solves the following optimization problem:

$\displaystyle\max\mbox{or}\min$	$\displaystyle\left(\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}}{e(X_{i},U_{i})}\right% )^{-1}\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{e(X_{i},U_{i})}$	(10)
subject to	$\displaystyle\sum_{i=1}^{n}\left(\begin{array}[]{cc}1\\ \hat{Q}_{\tau}(X_{i},1)\end{array}\right)\left(\frac{Z_{i}}{e(X_{i},U_{i})}-% \frac{Z_{i}}{\hat{e}(X_{i})}\right)=0$	(13)
	$\displaystyle 1/\lambda^{1}\leq\frac{e(X_{i},U_{i})/(1-e(X_{i},U_{i}))}{\hat{e% }(X_{i})/(1-\hat{e}(X_{i}))}\leq\lambda^{1},$	(14)

where $\tau=\frac{\lambda^{1}}{1+\lambda^{1}}$ and $\hat{Q}_{\tau}(X_{i},1)$ is estimated with some quantile regression models.²⁰ Bounding $\mu_{0}$ and the ATE can be achieved in a similar way. The quantile balancing method refined Zhao’s sensitivity analysis method ¹⁸ by adding the quantile function to balance the treatment assignment $Z$ over the true propensity score at population level. This additional constraint based on the estimated quantile function ensured asymptotic optimality of the interval obtained by solving (10). Although it solves asymptotic conservativeness in Zhao’s method,¹⁸ it still suffers from the misspecification of the estimated MAR-based propensity score. Moreover, the quantile function also requires specifying some parametric models or machine learning-related methods.

4 The Proposed sensitivity analysis method

We begin with the bound for $\mu_{1}$ . Let $e^{1}(X_{i},U_{i})$ denote the true propensity score for subjects in the treated group. Let us consider to construct the upper bound of $\mu_{1}$ by solving the following optimization problem:

$\displaystyle\bar{\mu}_{1}^{+}=\quad\max\quad$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{e^{1}(X_{i},U_{i})}$	(15)
subject to	$\displaystyle\delta\leq e^{1}(X_{i},U_{i})\leq 1-\delta$	(16)
	$\displaystyle\sum_{i=1}^{n}g(X_{i})\left(1-\frac{Z_{i}}{e^{1}(X_{i},U_{i})}% \right)=0.$	(17)

In a similar way, to obtain the lower bound of $\mu_{1}$ , let us consider the following problem:

$\displaystyle\bar{\mu}_{1}^{-}=\quad\min\quad$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}Y_{i}}{e^{1}(X_{i},U_{i})}$	(18)
subject to	$\displaystyle\delta\leq e^{1}(X_{i},U_{i})\leq 1-\delta$	(19)
	$\displaystyle\sum_{i=1}^{n}g(X_{i})\left(1-\frac{Z_{i}}{e^{1}(X_{i},U_{i})}% \right)=0.$	(20)

The constraints (16) and (19) come from the positivity assumption (Assumption 2.1), which is a fundamental assumption in causal inference. We regard $\delta$ in (16) and (18) as a sensitivity parameter. The constraints (17) and (20) come from the estimating equation for the outcome-dependent propensity score (6). As mentioned, the estimating equation (6) cannot necessarily identify the true propensity score model uniquely from a parametric model. However, according to the law of large number, it holds that

\sum_{i=1}^{n}g(X_{i})\left(1-\frac{Z_{i}}{e^{1}(X_{i},U_{i})}\right)% \xrightarrow{p}E\left[g(X)\left(1-\frac{Z}{e^{1}(X,U)}\right)\right]=0.

(21)

Then, the true propensity scores should satisfy the constraints (17) and (20) asymptotically, and therefore, $\mu_{1}$ should be included in the interval $[\mu_{1}^{-},\mu_{1}^{+}]$ asymptotically. The bound for $\mu_{1}$ is the interval $[\bar{\mu}_{1}^{+},\bar{\mu}_{1}^{-}]$ , which contains the true $\mu_{1}$ .

Let us consider the inverse of the true propensity score, denoted by $w_{i}^{1}=(e^{1}(X_{i},U_{i}))^{-1}$ , as the decision variable. Then, optimization problems (15) and (18) become a linear programming problem with linear constraints:

$\displaystyle\quad\min\mbox{or}\max\quad$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}Z_{i}Y_{i}w_{i}^{1}$	(22)
subject to	$\displaystyle\frac{1}{1-\delta}\leq w_{i}^{1}\leq\frac{1}{\delta}$	(23)
	$\displaystyle\sum_{i=1}^{n}g(X_{i})(1-Z_{i}w_{i}^{1})=0.$	(24)

Compared to the quantile balancing method, which is nonlinear optimization and requires estimation of the quantile functions, our proposal can be solved time-efficiently with the interior-point method or the simplex algorithm for the linear programming and then tractable with standard software for mathematical programming.

The bound for $\mu_{0}$ can be constructed in a similar way as follows. Let $e^{0}(X_{i},U_{i})$ denote the true propensity score for subjects in the control group and similarly consider the weight $w_{i}^{0}=(1-e^{0}(X_{i},U_{i}))^{-1}$ as the decision variable. Then the interval $[\bar{\mu}_{0}^{-},\bar{\mu}_{0}^{+}]$ can be obtained by solving the following linear programming problem:

$\displaystyle\quad\min\mbox{or}\max\quad$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}(1-Z_{i})Y_{i}w_{i}^{0}$	(25)
subject to	$\displaystyle\frac{1}{1-\delta}\leq w_{i}^{0}\leq\frac{1}{\delta}$	(26)
	$\displaystyle\sum_{i=1}^{n}g(X_{i})(1-(1-Z_{i})w_{i}^{0})=0.$	(27)

We obtain bounds for $\psi$ by $[\bar{\mu}_{1}^{-}-\bar{\mu}_{0}^{+},\bar{\mu}_{1}^{+}-\bar{\mu}_{0}^{-}]$ .

Generally, in the estimation of propensity score, the dimension of $g(X_{i})$ should be equal to the number of unknown parameters in the parametric model for the propensity score. In the proposed sensitivity analysis method, one can impose more constraints by increasing the dimension of $g(X_{i})$ , thereby yielding a narrower bound obtained by the linear programming problems (22) and (25). $g(X_{i})$ can be any function of $X_{i}$ . Suppose that there are $K$ covariates: $X^{\top}=(X_{i,1},X_{i,2},\dots,X_{i,K})$ . Then, $g(X_{i})$ can be like

g(X_{i})=\left(\begin{array}[]{cccc}1\\ X_{i,1}\\ \vdots\\ X_{i,K}\end{array}\right)\quad\mbox{or}\quad g(X_{i})=\left(\begin{array}[]{% ccccccc}1\\ X_{i,1}\\ \vdots\\ X_{i,K}\\ X_{i,1}^{2}\\ \vdots\\ X_{i,K}^{2}\end{array}\right)\quad\mbox{or}\quad g(X)=\left(\begin{array}[]{% cccccccc}1\\ X_{i,1}\\ \vdots\\ X_{i,K}\\ X_{i,1}^{2}\\ \vdots\\ X_{i,K}^{2}\\ \vdots\end{array}\right).

(28)

As long as the resulting constraints give us feasible solutions to the optimization problems, the proposed method is expected to narrow the bound by simply increasing the dimension of $g(X_{i})$ , since greater flexibility on the choice of $g(X_{i})$ is allowed.

The IPW estimators (2) and (3) do not satisfy the population boundedness property: the IPW estimator can be beyond the range of the outcome.^{29, 30} On the other hand, the SIPW estimators (4) and (5) satisfy it. The objective functions (15) and (18) have the form of the IPW estimator. If we set the first element of $g(X_{i})$ to be 1, as seen in (28), the IPW estimator agrees with the SIPW estimator. Consequently, it is sufficient to consider using a more computationally tractable, unstabilized form as the objective function and suggested to consider 1 as the first element of $g(X_{i})$ . Dorn and Guo ²⁰ also considered the condition (21), but criticized that $g(X_{i})$ should involve infinitely many moment conditions. Coupled with the constraint (14), they showed that the infinitely many constraints can be replaced with a single constraint of the quantile balancing (13). This simplification with the quantile balancing is realized with the OR-based constraint (14). In practical sensitivity analysis of observational studies, the bounds for the ATE obtained by optimizing (15) and (18) are generally compared with a specific threshold, such as zero, to ensure the robustness of the results. Therefore, there is no need to introduce an infinite number of constraints, as it is sufficient to increase the dimension of $g(X_{i})$ to ensure the robustness of the causal inference in an observational study. In addition, the quantile function must be estimated and then be subject to assumptions in modeling and estimation, although Dorn and Guo ²⁰ tried to minimize the risk of misspecification by introducing flexible models. The authors provides several machine learning-related estimation methods, which might yield notably different bounds from each other in their simulation study.²⁰ One advantage of the proposed method is that it does not rely on messy estimation in the quantile regression. Simply by increasing dimension of $g(X_{i})$ , we can try to make the bound narrower. Furthermore, it is more crucial that our method can provide bounds without relying of the condition (14), $\lambda$ in which is hard to specify.

On the other hand, the bounds may be wide without (14) and may not give any meaningful information. If this is the case, the constraint (14) can be incorporated into the optimization problems (22) and (25) as follows:

$\displaystyle\quad\min\mbox{or}\max\quad$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}Z_{i}Y_{i}w_{i}^{1}$	(29)
subject to	$\displaystyle\frac{1}{1-\delta}\leq w_{i}^{1}\leq\frac{1}{\delta}$
	$\displaystyle\sum_{i=1}^{n}g(X_{i})(1-Z_{i}w_{i}^{1})=0$
	$\displaystyle\frac{1+(\lambda^{1}-1)\hat{e}(X_{i})}{\lambda^{1}\hat{e}(X_{i})}% \leq w_{i}^{1}\leq\frac{1+\hat{e}(X_{i})(1/\lambda^{1}-1)}{\hat{e}(X_{i})(1/% \lambda^{1})}.$	(30)

The additional constraint (30) is from the marginal sensitivity model (9), in which $\hat{e}(X_{i})$ refers to estimated MAR-based propensity score depending merely on measured covariates, and $\lambda^{1}$ refers to the OR between true propensity score and the estimated MAR-based propensity score. Note that, after introducing the third constraint (30), the optimization problem remains a linear programming problem. With $g(X_{i})$ fixed, the addition of the constraint (30) can further narrow the bound obtained by solving the linear programming (29). The bound for $\mu_{0}$ can be obtained in a similar way:

\begin{split}\quad\min\mbox{or}\max\quad&\frac{1}{n}\sum_{i=1}^{n}(1-Z_{i})Y_{% i}w_{i}^{0}\\ \mbox{subject to}\quad&\frac{1}{1-\delta}\leq w_{i}^{0}\leq\frac{1}{\delta}\\ &\sum_{i=1}^{n}g(X_{i})(1-(1-Z_{i})w_{i}^{0})=0\\ &\frac{\hat{e}(X_{i})}{\lambda^{0}(1-\hat{e}(X_{i}))}+1\leq w_{i}^{0}\leq\frac% {\lambda^{0}\hat{e}(X_{i})}{1-\hat{e}(X_{i})}+1.\end{split}

(31)

As done by Dorn and Guo,²⁰ estimation error for the bounds can be accounted by using the bootstrap confidence intervals of the lower and upper bounds. One may hope to make the bounds tighter by introducing $g(X_{i})$ of the higher dimension. A concern is putting more variables may lead unreliable bounds of less stability. The bootstrap samples would also be useful to evaluate how stable the resulting bounds are: if the number of the bootstrap samples of feasible solutions of the linear programming is small, the resulting bounds should be carefully interpreted.

5 Simulation Study

In this section, we investigate the performance of the proposed method and compare it with Dorn and Guo’s method²⁰ based on the marginal sensitivity model and quantile balancing over several simulated datasets. The simulation settings followed Morikawa and Kim’s framework,³¹ allowing us to evaluate the performance of the proposed method when encountering unidentifiability issues. In our simulation, we considered to generate five covariates $\bar{X}_{i}^{\top}=(X_{i,1},X_{i,2},X_{i,3},X_{i,4},X_{i,5})$ from the normal distribution with

\begin{split}X_{i,1}&\sim\mathcal{N}\left(0,1\right)\\ X_{i,k+1}\mid X_{i,k}=x_{i,k}&\sim\mathcal{N}\left(\frac{-x_{i,k}}{3},1\right)% ,\quad k=1,2,3,4.\end{split}

Here, $\{X_{i,1},X_{i,2},X_{i,3},X_{i,4}\}$ were regarded as measured covariates, while $X_{i,5}$ was regarded as an unmeasured confounder. By setting different $a_{1}^{[s]}$ in two scenarios $(s=1,2)$ , where $a_{1}^{[1]}=0.0775$ and $a_{1}^{[2]}=0.998$ , the outcome was generated as follows:

\mu^{(1)}(\bar{x}_{i})=a_{1}^{[s]}+0.4x_{i,1}+0.4x_{i,2}+0.6x_{i,1}x_{i,2}+0.5% x_{i,3}-0.7x_{i,4}+0.2x_{i,5},

\mu^{(0)}(\bar{x}_{i})=0.0654+0.2x_{i,1}+0.1x_{i,2}+1.2x_{i,1}x_{i,2}+0.2x_{i,% 3}-0.3x_{i,4}+0.6x_{i,5},

Y^{(1)}_{i}\mid(\bar{X}_{i}=\bar{x}_{i})\sim\mathcal{N}\left(\mu^{(1)}(\bar{x}% _{i}),\frac{1}{4}\right),

Y^{(0)}_{i}\mid(\bar{X}_{i}=\bar{x}_{i})\sim\mathcal{N}\left(\mu^{(0)}(\bar{x}% _{i}),\frac{1}{4}\right).

The treatment assignment $Z_{i}\in\{0,1\}$ was generated by the Bernoulli distribution with

P(Z_{i}=1\mid\bar{X}_{i}=\bar{x}_{i},Y^{(1)}_{i}=y^{(1)}_{i})=\frac{1}{1+\exp{% (-0.904+0.5x_{i,1}+0.5x_{i,2}+0.5x_{i,3}-0.2x_{i,4}-x_{i,5}+0.3y^{(1)}_{i})}}.

We simulated 1,000 observational studies with $n=1,000$ subjects for each scenario. For each simulated study, the bounds for the ATE were calculated by our proposed method, as well as the quantile balancing method.²⁰ In applying the quantile balancing method, the linear quantile regression on $\{X_{i,1},X_{i,2},X_{i,3},X_{i,4}\}$ was applied. For the constraint (14), we estimated the MAR-based propensity score $\hat{e}(X_{i})$ by the logistic regression model with $\{X_{i,1},X_{i,2},X_{i,3},X_{i,4}\}$ and applied 5-fold cross-fitting in the estimation of the quantile function. The application of the quantile balancing method was conducted by utilizing the R package provided by Dorn and Guo.²⁰ The proposed method with (22) and (25) did not involve any estimation of the MAR-based propensity score. The OPTMODEL Procedure of SAS (SAS Institute Inc, Cary, North Carolina) was used for solving the linear programming problems in the proposed method. We considered four settings of the specification of $g(X)$ :

D1

includes $1$ , and the linear and quadratic terms for all the observed covariates;
D2

includes D1 plus all two-variable interactions;
D3

includes D1 plus the cubic terms for all the observed covariates and all interactions;
D4

includes D3 plus the quartic and quintic terms for all the observed covariates, $X_{i,4}^{3}(X_{i,3}-X_{i,2})(X_{i,1}+3X_{i,2}/2)$ , and $X_{i,4}^{3}(X_{i,3}-X_{i,2})/(X_{i,1}-X_{i,3})(X_{i,1}+3X_{i,2}/2)$ .

In the proposed method, we considered to estimate the bounds for $\mu_{1}$ under the constraints (23) and (24) and for $\mu_{0}$ under the constraints (26) and (27). Thus, the validity of the proposed method would be dependent on the choice of $\delta$ . To discuss this point, we checked the distributions of the true propensity score in the simulated datasets. In the datasets under Scenario 1, with $\delta=0.1,0.01$ , and $0.001$ , 18.39%, 0.29% and 0.01% true propensity scores did not satisfy the conditions (23) and (26), respectively. The corresponding proportions for the datasets under Scenario 2 were 2.47%, 0.01% and 0.00%, respectively. Thus, with $\delta=0.1$ , the constraints seemed not to hold, whereas setting $\delta=0.01$ or less were relevant. Table 1 shows the average of the upper and the lower bounds and the coverage probability of the bounds for the ATE in the proposed method with several settings of $\delta$ and $g(X_{i})$ , where the coverage probability was defined as the proportion that the lower and the upper bounds covered the true ATE. The left panels of Figure 1 and Figure 2 show the boxplots of the lower and upper bounds for the ATE when $\delta$ was set to 0.01 in the two scenarios, respectively. We computed the averages of $Y^{(1)}$ and $Y^{(0)}$ over all simulated datasets and regarded their subtraction as the true ATE, which is depicted by a solid horizontal line in Figure 1 and Figure 2. In Scenario 1, as shown in Table 1 and the left panel of Figure 1, the proposed method demonstrated excellent performance in terms of the coverage probability when $\delta$ was set to 0.01 or smaller. For Scenario 2, as shown in Table 1 and the left panel of Figure 2, the coverage probability of the bounds was even excellent when $\delta$ was set to 0.1. In addition, the bounds (D2, D3, and D4) in Scenario 2 effectively excluded the null.

For the quantile balancing method, the right panels of Figure 1 and Figure 2 present the boxplots of the bound for the ATE based on different ORs in the two scenarios, respectively; the detailed estimates are summarized in Table 2. As the decrease of OR, the quantile balancing method gave a narrower bound with sacrificing the coverage probability. The proposed method provided feasible solutions for all the 1,000 simulated datasets with different settings of $\delta$ and $g(X_{i})$ , except for one setting (Scenario 2, D4, and $\delta=0.1$ ), in which the coverage probability was almost 1 among 952 simulated datasets with feasible solutions and the bounds for ATE were the narrowest. In this case, the proposed method gave the worst-case bounds with an average length of 1.09, which was less than the length of the bound obtained by assuming OR to be 2 in the quantile balancing method. In words, by increasing the dimension of the function alone, we can narrow down the length of the bound obtained by the proposed method to the level of quantile balancing method with OR specified as 2. The results comply with our expectations that (1) the worst-case bound obtained by our proposal can cover the true ATE without any additional assumptions; and (2) by increasing the dimension of $g(X_{i})$ , narrowing of our bound can be achieved.

6 Application

In this section, we apply the proposed method to a real-world data from TONE study.³² This study aimed to evaluate the effectiveness of a designated exercise program in preventing dementia among the elderly. In this study, scores in five cognitive domains (attention, memory, visuospatial function, language, and reasoning) were used to quantify the level of cognition. We considered to estimate the effectiveness of the exercise program on the attention domain, which was regarded as a continuous variable. The confounders included age, sex, education level ( $1/0$ : high/low), and attention scores at the baseline.

In the primary analysis, a total of 935 participants were included, in which 234 were in the exercise program group and 701 in the control group. We utilized the IPW estimator to adjust imbalances in covariates between the exercise program and control groups. In the primary analysis, we used the logistic regression to the estimate MAR-based propensity scores, and the unknown parameters were estimated by the maximum likelihood method. Significantly large between-group imbalances were observed in age, attention scores at baseline, and education level before weighting inversely by the estimated MAR-based propensity scores. The between-group imbalances were effectively eliminated after weighting, indicating that IPW significantly enhanced balance across the two groups. The IPW point estimate of the ATE was 4.09 with a 95% confidence interval of $[2.97,5.22]$ . Therefore, the result of primary analysis indicated a significantly positive effect of the exercise program on the improvement of the attention level. This finding was consistent with some previous randomized controlled trials.^{33, 34} However, a meta-analysis of observational studies³⁵ reported an insignificantly positive result, suggesting that the robustness of the positive effects needs further confirmation. Then, a sensitivity analysis was necessary. In the sensitivity analysis, we applied our proposed method with $\delta=0.01$ and four settings of $g(X_{i})$ :

D1

includes $1$ and the linear term of all the covariates;
D2

includes $1$ and the linear and quadratic terms of all the covariates;
D3

includes D2 plus the interaction between age and attention score at baseline;
D4

includes D2 plus all two-variable interactions.

For the quantile balancing method, the quantile function was estimated using linear quantile regression with 5-fold cross fitting, and the MAR-based propensity scores were estimated using the logistic regression. The result by the proposed method is given in Table 3. In addition to the plain bounds, we calculated the confidence intervals of the upper and lower bounds with 1,000 bootstrap samples. BootLower and BootUpper refer to the lower and upper bounds of 95% bootstrap confidence interval for the lower and upper bounds of ATE, respectively. Feasibility refers to the number of the resampled datasets, in which feasible solutions of the linear programming in the proposed method can be obtained. At first, we examined the worst-case bounds based on the proposed method (22) without the OR-based constraints (Table 3). The resulting bounds with the four settings of $g(X_{i})$ are presented in the row with $\lambda=/$ in Table 3. Even with $g(X_{i})$ of a higher dimension (D4), the lower bound was less than the null value 0, indicating that the proposed methods did not eliminate concerns on unmeasured confounders. Since the length of the bounds was wide and not necessarily well interpretable, the OR-based constrains were added, and the bounds were calculated with (29) and (31). The results with $\lambda=2,3$ , and $5$ are also shown in Table 3. Our proposal indicated that the worst-case bound could exclude the null if it was supposed that the OR between the true propensity score and estimated MAR-based propensity score was at most 2. For reference, we also applied the quantile balancing method. The corresponding bounds with the OR-based constrains based on the quantile balancing method (10) are presented in Table 4. The bounds were similar to ours and, of note, when subjected to the same OR-based constraint, our bound was tighter than that of the quantile balancing method. We noticed that the smaller OR and greater complexity of $g(X_{i})$ caused less feasible solutions of the linear programming in resampled datasets. Without the additional OR=based constraint, the results of bootstrap kept stable and feasible. However, when a small OR was assumed, occasions to have feasible solutions in resampled datasets drastically decreased. Thus, the estimating equation constraints are not necessarily compatible with the OR-based constraints, in particular when a small $\lambda$ is set.

By incorporating more covariates, we tried to make the bounds tighter. We further included the baseline scores of other four cognitive domains (memory, visuospatial function, language, and reasoning) as confounders in the sensitivity analysis. The results by the proposed method without the OR-based constraint (22) are shown in Table 5. The worst-case bounds for the ATE in Table 5 achieve great tightness. In some settings (D3 and D4), the worst-case bounds even excluded the null, thus indicating the robustness of the primary analysis, without any additional OR-based constraints. Nevertheless, we noticed small numbers of boostrap samples with feasible solutions. Thus, the successful exclusion of the null with D3 and D4 would be subject to instability, and it could not eliminate concerns against unmeasured confounders completely.

7 Discussion

Interest in drawing medical evidence from the real-world data has been rapidly growing, and the number of papers reporting results of real-world data analyses with the confounder adjustment has been substantially increasing. The propensity score analysis is now routinely applied in the analysis of observational studies. However, almost all the papers only reports the results of the propensity score matching and/or the IPW method by the propensity score and do not address the important issue of the unmeasured confounders. Since the issue of residual confounding is always left as a limitation in the analysis of observational studies, it is very important to develop sensitivity analysis methods which are easily applicable and rely on less assumptions. In this paper, we proposed a simple sensitivity analysis method based on the IPW method. To our best knowledge, all the existing sensitivity analysis methods for the IPW estimator rely on some untestable assumptions on the departure from the SITA assumption. Although they provide very useful tools to address potential impacts of unmeasured confounders, there are still concerns with potential violation of the assumption. Our method requires only minimal assumptions and can construct the bounds for the ATE, completely free from any quantification of the departure from the SITA assumption. Although it may not give sufficiently informative bounds of small width, showing the bounds based on minimal assumptions would be useful as a basis in addressing the potential impacts of the residual confounding. Our method can easily incorporate the OR-based constraints for the departure from the SITA assumption to give tighter bounds. The resulting method with the additional constraints corresponds to the quantile methods by Dorn and Gou.²⁰ Our method can be easily applied with the linear programming, avoiding the estimation of the quantile functions, and empirically, gave a bit tighter bounds in our simulation study. We propose a strategy that analysts begin with the bounds without the OR-based constraints of minimal assumptions and then incorporates additional OR-based constraints if needed. Comparing with the elegant theory of the quantile balancing method by Dorn and Guo,²⁰ our method is based on a very simple idea. We believe that our method is practical and our strategy would be useful in addressing the issue of residual confounding.

By incorporating more constraints with $g(X)$ of higher dimension, one may have tighter bounds with our proposed method. It motivates us to collect as many potential confounders as possible in conducting observational studies. On the other hand, we do not have any clear guidance on how to define $g(X)$ . Putting more constraints would be desirable to make the bounds tighter. However, feasibility depends on sample size, and we cannot implement the linear programming without any feasible solution which depends on the choice of $g(X)$ . Establishing a practical guidance would be an important future research topic.

Our idea is simple: we remove any parametric models for the propensity score, but still rely on the estimating equation for the propensity score. This simplicity would make us easily extend the idea to more complicated problems in causal inference and missing data analysis. This also warrants addressing in future research.

Acknowledgments

This research was partly supported by Grant-in-Aid for Challenging Exploratory Research (16K12403) and for Scientific Research(16H06299, 18H03208) from the Ministry of Education, Science, Sports and Technology of Japan.

Conflict of interest

The authors declare no conflict of interests.

References

1 Rubin DB. Bayesian inference for causal effects: The role of randomization. Ann Stat. 1978;6(1):34–58.
2 Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
3 Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Serv Outcomes Res Methodol. 2001;2:259–278.
4 Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661–3679.
5 Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23(19):2937–2960.
6 Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–625.
7 Petersen ML, Laan v. dMJ. Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology (Cambridge, Mass.). 2014;25(3):418.
8 Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc. 1999;94(448):1096–1120.
9 Miao W, Ding P, Geng Z. Identifiability of normal and normal mixture models with nonignorable missing data. J Am Stat Assoc. 2016;111(516):1673–1683.
10 Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Series B Stat Methodol. 1983;45(2):212–218.
11 Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54(3):948–963.
12 Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. J Natl Cancer Inst. 1959;22(1):173–203.
13 Ding P, VanderWeele TJ. Sensitivity analysis without assumptions. Epidemiology (Cambridge, Mass.). 2016;27(3):368–377.
14 VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167(4):268–274.
15 Li L, Shen C, Wu AC, Li X. Propensity score-based sensitivity analysis method for uncontrolled confounding. Am J Epidemiol. 2011;174(3):345–353.
16 Shen C, Li X, Li L, Were MC. Sensitivity analysis for causal inference using inverse probability weighting. Biom J. 2011;53(5):822–837.
17 Lu S, Ding P. Flexible sensitivity analysis for causal inference in observational studies subject to unmeasured confounding. arXiv. Preprint posted online Sun, 28 May 2023. doi: https://doi.org/10.48550/arXiv.2305.1764
18 Zhao Q, Small DS, Bhattacharya BB. Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. J R Stat Soc Series B Stat Methodol. 2019;81(4):735–761.
19 Tan Z. A distributional approach for causal inference using propensity scores. J Am Stat Assoc. 2006;101(476):1619–1637.
20 Dorn J, Guo K. Sharp Sensitivity Analysis for Inverse Propensity Weighting via Quantile Balancing. J Am Stat Assoc. Published online: 31 May 2022. doi: 10.1080/01621459.2022.2069572
21 Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701.
22 Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89(427):846–866.
23 Greenlees JS, Reece WS, Zieschang KD. Imputation of missing values when the probability of response depends on the variable being imputed. J Am Stat Assoc. 1982;77(378):251–261.
24 Andrea R, Scharfstein D, Su TL, Robins J. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57(1):103–113.
25 Verbeke G, Molenberghs G, Thijs H, Lesaffre E, Kenward MG. Sensitivity analysis for nonrandom dropout: a local influence approach. Biometrics. 2001;57(1):7–14.
26 Robins JM. Association, causation, and marginal structural models. Synthese. 1999;121(1/2):151–179.
27 Robins JM, Rotnitzky A, Scharfstein DO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran ME, Berry D. , eds. Statistical models in epidemiology, the environment, and clinical trials, New York, NY: Springer, 2000:1–94.
28 Brumback BA, Hernán MA, Haneuse SJ, Robins JM. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Stat Med. 2004;23(5):749–767.
29 Tan Z. Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika. 2010;97(3):661–682.
30 Robins J, Sued M, Lei-Gomez Q, Rotnitzky A. Comment: Performance of double-robust estimators when" inverse probability" weights are highly variable. Stat Sci. 2007;22(4):544–559.
31 Morikawa K, Kim JK. Semiparametric optimal estimation with nonignorable nonresponse data. Ann Stat. 2021;49(5):2991–3014.
32 Sasaki M, Kodama C, Hidaka S, et al. Prevalence of four subtypes of mild cognitive impairment and APOE in a Japanese community. Int J Geriatr Psychiatry. 2009;24(10):1119–1126.
33 Sanders L, Hortobágyi T, Karssemeijer E, Zee V. dE, Scherder E, Van Heuvelen M. Effects of low-and high-intensity physical exercise on physical and cognitive function in older persons with dementia: a randomized controlled trial. Alzheimers Res Ther. 2020;12(1):28.
34 Lamb SE, Sheehan B, Atherton N, et al. Dementia And Physical Activity (DAPA) trial of moderate to high intensity exercise training for people with dementia: randomised controlled trial. BMJ. 2018;361:k1675. doi: 10.1136/bmj.k1675
35 Demurtas J, Schoene D, Torbahn G, et al. Physical activity and exercise in mild cognitive impairment and dementia: an umbrella review of intervention and observational studies. J Am Med Dir Assoc. 2020;21(10):1415–1422.

Table 1: Summary of the proposed method with several settings of

\delta

and

g(X)

over 1,000 simulated datasets in two scenarios.

$g(X)$	$\delta$	Bound ${}^{*}$	Length ${}^{**}$	Coverage ${}^{***}$	Bound	Length	Coverage
		Scenario 1 (True ATE: 0.21)			Scenario 2 (True ATE: 1.13)
D1	0.1 ${}^{a}$	[-0.44,1.41]	1.85	1.00	[0.32,2.41]	2.09	1.00
	0.01 ${}^{b}$	[-1.36,2.12]	3.48	1.00	[-0.60,3.09]	3.68	1.00
	0.001 ${}^{c}$	[-1.43,2.18]	3.61	1.00	[-0.93,3.18]	4.11	1.00
D2	0.1	[0.10,1.08]	0.97	0.93	[0.90,2.07]	1.17	1.00
	0.01	[-0.38,1.56]	1.94	1.00	[0.32,2.53]	2.21	1.00
	0.001	[-0.41,1.59]	2.00	1.00	[0.05,2.57]	2.53	1.00
D3	0.1	[0.14,1.04]	0.91	0.69	[0.91,2.05]	1.14	1.00
	0.01	[-0.33,1.50]	1.83	1.00	[0.34,2.47]	2.13	1.00
	0.001	[-0.35,1.53]	1.88	1.00	[0.09,2.50]	2.41	1.00
D4	0.1	[0.21,0.97]	0.75	0.45	[0.93,2.02]	1.09	0.99
	0.01	[-0.19,1.37]	1.56	0.95	[0.36,2.43]	2.07	1.00
	0.001	[-0.21,1.38]	1.59	0.97	[0.13,2.45]	2.32	1.00

*

the averages of the lower and upper bounds;
**

the difference between the averages of the lower and upper bounds;
***

the proportion of inclusion of the true ATE between the lower and upper bounds;
a

18.39% and 2.47% of the true propensity scores did not satisfy the constraints (23) or (26), respectively, in scenarios 1 and 2;
b

0.29% and 0.01% of the true propensity scores did not satisfy the constraints (23) or (26), respectively, in scenarios 1 and 2;
c

0.01% and 0.00% of the true propensity scores did not satisfy the constraints (23) or (26), respectively, in scenarios 1 and 2.

Table 2: Summary of the quantile balancing method with several settings of OR over 1,000 simulated datasets in two scenarios.

	Scenario 1 (True ATE: 0.21)			Scenario 2 (True ATE: 1.13)
$\lambda$	Bound ${}^{*}$	Length ${}^{**}$	Coverage ${}^{***}$	Bound	Length	Coverage
1	[0.01,0.01]	/	/	[1.31,1.31]	/	/
1.2	[-0.13,0.14]	0.27	0.48	[1.16,1.47]	0.31	0.36
1.5	[-0.29,0.31]	0.61	0.89	[0.98,1.66]	0.68	0.96
2	[-0.51,0.54]	1.04	1.00	[0.74,1.90]	1.17	1.00
3	[-0.81,0.87]	1.68	1.00	[0.40,2.26]	1.85	1.00
5	[-1.22,1.35]	2.57	1.00	[-0.03,2.74]	2.77	1.00

*

the averages of the lower and upper bounds;
**

the difference between the averages of the lower and upper bounds;
***

the proportion of inclusion of the true ATE between the lower and upper bounds.

Table 3: Bounds of the ATE for the attention score in TONE study by the proposed method.

$g(X)$	$\lambda$	Lower bound	Upper bound	Length	BootLower ${}^{*}$	BootUpper ${}^{*}$	Feasibility ${}^{**}$
D1	/ ${}^{***}$	-10.57	18.30	28.87	-11.50	19.27	1000
	2	0.57	6.61	6.04	-0.36	7.59	987
	3	-1.18	8.59	9.77	-2.39	9.48	1000
	5	-3.36	10.77	14.14	-4.69	11.72	1000
D2	/	-9.81	17.75	27.56	-10.83	18.39	1000
	2	1.66	5.50	3.84	0.05	6.93	386
	3	-0.92	8.18	9.10	-1.82	8.92	892
	5	-3.19	10.38	13.57	-4.12	11.12	997
D3	/	-9.70	17.23	26.93	-10.46	17.49	1000
	2	/	/	/	0.42	6.73	66
	3	-0.40	7.45	7.85	-1.41	8.43	593
	5	-3.08	9.91	13.00	-3.67	10.59	957
D4	/	-8.74	16.54	25.28	-8.26	16.17	991
	2	/	/	/	0.85	6.39	55
	3	-0.25	7.16	7.41	-0.86	7.85	540
	5	-2.66	9.48	12.15	-2.92	9.93	920

*

the lower and upper bounds of 95% bootstrap confidence interval for the lower and upper bounds of ATE, respectively;
**

the number of the resampled datasets in the proposed method, in which feasible solution of the linear programming can be obtained;
***

the rows with $/$ in the $\lambda$ column show the results without OR-based constraint.

Table 4: Bounds of the ATE for the attention score in TONE study by the quantile balancing method.

$\lambda$	Lower bound	Upper bound	Length	BootLower ${}^{*}$	BootUpper ${}^{*}$
1	4.09	4.09	/	3.04	5.30
1.2	3.29	4.92	1.63	2.21	6.14
1.5	2.32	5.97	3.64	1.20	7.21
2	1.13	7.29	6.17	-0.01	8.59
3	-0.61	9.15	9.77	-1.85	10.58
5	-2.80	11.56	14.36	-4.25	13.27

*

the lower and upper bounds of 95% bootstrap confidence interval for the lower and upper bounds of ATE, respectively.

Table 5: Bounds of the ATE for the attention score in TONE study by the proposed method with

g(X)

of additional four domains.

$g(X)$	Lower bound	Upper bound	Length	BootLower ${}^{*}$	BootUpper ${}^{*}$	Feasibility ${}^{**}$
D1	-6.75	14.93	21.68	-8.84	15.84	1000
D2	-2.08	11.14	13.22	-4.29	12.44	657
D3	0.31	8.32	8.01	-3.14	11.33	323
D4	0.70	7.82	7.12	-2.45	10.51	220

*

the lower and upper bounds of 95% bootstrap confidence interval for the lower and upper bounds of ATE, respectively;
**

the number of the resampled datasets in the proposed method, in which feasible solution of the linear programming can be obtained.

Refer to caption — (a) The proposed method ( $\delta=0.01$ )