Non-parametric inference about mean functionals of non-ignorable non-response data without identifying the joint distribution

J R Stat Soc Series B Stat Methodol. 2023 May 8;85(3):913-935. doi: 10.1093/jrsssb/qkad047. eCollection 2023 Jul.

Abstract

We consider identification and inference about mean functionals of observed covariates and an outcome variable subject to non-ignorable missingness. By leveraging a shadow variable, we establish a necessary and sufficient condition for identification of the mean functional even if the full data distribution is not identified. We further characterize a necessary condition for n-estimability of the mean functional. This condition naturally strengthens the identifying condition, and it requires the existence of a function as a solution to a representer equation that connects the shadow variable to the mean functional. Solutions to the representer equation may not be unique, which presents substantial challenges for non-parametric estimation, and standard theories for non-parametric sieve estimators are not applicable here. We construct a consistent estimator of the solution set and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator of an appropriately chosen solution. The estimator is asymptotically normal, locally efficient and attains the semi-parametric efficiency bound under certain regularity conditions. We illustrate the proposed approach via simulations and a real data application on home pricing.

Keywords: identification; model-free estimation; non-ignorable missingness; shadow variable.