We study the regression fβ (Y|X,Z), where Y is the response, Z∈Rd is a vector of fully observed regressors and X is the regressor with incomplete observation. To handle missing data, maximum likelihood estimation via expectation-maximisation (EM) is the most efficient but is sensitive to the specification of the distribution of X. Under a missing at random assumption, we propose an EM-type estimation via a semiparametric pseudoscore. Like in EM, we derive the conditional expectation of the score function given Y and Z, or the mean score, over the incompletely observed units under a postulated distribution of X. Instead of directly using the 'mean score' in estimating equation, we use it as a working index to construct the semiparametric pseudoscore via nonparametric regression. Introduction of semiparametric pseudoscore into the EM framework reduces sensitivity to the specified distribution of X. It also avoids the curse of dimensionality when Z is multidimensional. The resulting regression estimator is more than doubly robust: it is consistent if either the pattern of missingness in X is correctly specified or the working index is appropriately, but not necessarily correctly, specified. It attains optimal efficiency when both conditions are satisfied. Numerical performance is explored by Monte Carlo simulations and a study on treating hepatitis C patients with HIV coinfection. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Keywords: curse of dimensionality; expectation-maximisation; missing at random; nonparametric regression; pseudoscore.
Published 2017. This article is a U.S. Government work and is in the public domain in the USA.