-
Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist
Authors:
Daniel J. Eck,
Charles J. Geyer
Abstract:
In a regular full exponential family, the maximum likelihood estimator (MLE) need not exist in the traditional sense. However, the MLE may exist in the completion of the exponential family. Existing algorithms for finding the MLE in the completion solve many linear programs; they are slow in small problems and too slow for large problems. We provide new, fast, and scalable methodology for finding…
▽ More
In a regular full exponential family, the maximum likelihood estimator (MLE) need not exist in the traditional sense. However, the MLE may exist in the completion of the exponential family. Existing algorithms for finding the MLE in the completion solve many linear programs; they are slow in small problems and too slow for large problems. We provide new, fast, and scalable methodology for finding the MLE in the completion of the exponential family. This methodology is based on conventional maximum likelihood computations which come close, in a sense, to finding the MLE in the completion of the exponential family. These conventional computations construct a likelihood maximizing sequence of canonical parameter values which goes uphill on the likelihood function until they meet a convergence criteria. Nonexistence of the MLE in this context results from a degeneracy of the canonical statistic of the exponential family, the canonical statistic is on the boundary of its support. There is a correspondance between this boundary and the null eigenvectors of the Fisher information matrix. Convergence of Fisher information along a likelihood maximizing sequence follows from cumulant generating function (CGF) convergence along a likelihood maximizing sequence, conditions for which are given. This allows for the construction of necessarily one-sided confidence intervals for mean value parameters when the MLE exists in the completion. We demonstrate our methodology on three examples in the main text and three additional examples in the Appendix. We show that when the MLE exists in the completion of the exponential family, our methodology provides statistical inference that is much faster than existing techniques.
△ Less
Submitted 25 November, 2020; v1 submitted 29 March, 2018;
originally announced March 2018.
-
Automatic Response Category Combination in Multinomial Logistic Regression
Authors:
Bradley S. Price,
Charles J. Geyer,
Adam J. Rothman
Abstract:
We propose a penalized likelihood method that simultaneously fits the multinomial logistic regression model and combines subsets of the response categories. The penalty is non differentiable when pairs of columns in the optimization variable are equal. This encourages pairwise equality of these columns in the estimator, which corresponds to response category combination. We use an alternating dire…
▽ More
We propose a penalized likelihood method that simultaneously fits the multinomial logistic regression model and combines subsets of the response categories. The penalty is non differentiable when pairs of columns in the optimization variable are equal. This encourages pairwise equality of these columns in the estimator, which corresponds to response category combination. We use an alternating direction method of multipliers algorithm to compute the estimator and we discuss the algorithm's convergence. Prediction and model selection are also addressed.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
Combining Envelope Methodology and Aster Models for Variance Reduction in Life History Analyses
Authors:
Daniel J. Eck,
Charles J. Geyer,
R. Dennis Cook
Abstract:
Precise estimation of expected Darwinian fitness, the expected lifetime number of offspring of organism, is a central component of life history analysis. The aster model serves as a defensible statistical model for distributions of Darwinian fitness. The aster model is equipped to incorporate the major life stages an organism travels through which separately may effect Darwinian fitness. Envelope…
▽ More
Precise estimation of expected Darwinian fitness, the expected lifetime number of offspring of organism, is a central component of life history analysis. The aster model serves as a defensible statistical model for distributions of Darwinian fitness. The aster model is equipped to incorporate the major life stages an organism travels through which separately may effect Darwinian fitness. Envelope methodology reduces asymptotic variability by establishing a link between unknown parameters of interest and the asymptotic covariance matrices of their estimators. It is known both theoretically and in applications that incorporation of envelope methodology reduces asymptotic variability. We develop an envelope framework, including a new envelope estimator, that is appropriate for aster analyses. The level of precision provided from our methods allows researchers to draw stronger conclusions about the driving forces of Darwinian fitness from their life history analyses than they could with the aster model alone. Our methods are illustrated on a simulated dataset and a life history analysis of \emph{Mimulus guttatus} flowers is provided. Useful variance reduction is obtained in both analyses.
△ Less
Submitted 27 February, 2018; v1 submitted 26 January, 2017;
originally announced January 2017.
-
Local adaptation and genetic effects on fitness: Calculations for exponential family models with random effects
Authors:
Charles J. Geyer,
Caroline E. Ridley,
Robert G. Latta,
Julie R. Etterson,
Ruth G. Shaw
Abstract:
Random effects are implemented for aster models using two approximations taken from Breslow and Clayton [J. Amer. Statist. Assoc. 88 (1993) 9-25]. Random effects are analytically integrated out of the Laplace approximation to the complete data log likelihood, giving a closed-form expression for an approximate missing data log likelihood. Third and higher derivatives of the complete data log likeli…
▽ More
Random effects are implemented for aster models using two approximations taken from Breslow and Clayton [J. Amer. Statist. Assoc. 88 (1993) 9-25]. Random effects are analytically integrated out of the Laplace approximation to the complete data log likelihood, giving a closed-form expression for an approximate missing data log likelihood. Third and higher derivatives of the complete data log likelihood with respect to the random effects are ignored, giving a closed-form expression for second derivatives of the approximate missing data log likelihood, hence approximate observed Fisher information. This method is applicable to any exponential family random effects model. It is implemented in the CRAN package aster (R Core Team [R: A Language and Environment for Statistical Computing (2012) R Foundation for Statistical Computing], Geyer [R package aster (2012) http://cran.r-project.org/package=aster]). Applications are analyses of local adaptation in the invasive California wild radish (Raphanus sativus) and the slender wild oat (Avena barbata) and of additive genetic variance for fitness in the partridge pea (Chamaecrista fasciculata).
△ Less
Submitted 29 November, 2013;
originally announced November 2013.
-
Ridge Fusion in Statistical Learning
Authors:
Bradley S. Price,
Charles J. Geyer,
Adam J. Rothman
Abstract:
We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote similarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning param…
▽ More
We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote similarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in quadratic discriminant analysis and semi-supervised model based clustering.
△ Less
Submitted 5 May, 2014; v1 submitted 14 October, 2013;
originally announced October 2013.
-
Variable transformation to obtain geometric ergodicity in the random-walk Metropolis algorithm
Authors:
Leif T. Johnson,
Charles J. Geyer
Abstract:
A random-walk Metropolis sampler is geometrically ergodic if its equilibrium density is super-exponentially light and satisfies a curvature condition [Stochastic Process. Appl. 85 (2000) 341-361]. Many applications, including Bayesian analysis with conjugate priors of logistic and Poisson regression and of log-linear models for categorical data result in posterior distributions that are not super-…
▽ More
A random-walk Metropolis sampler is geometrically ergodic if its equilibrium density is super-exponentially light and satisfies a curvature condition [Stochastic Process. Appl. 85 (2000) 341-361]. Many applications, including Bayesian analysis with conjugate priors of logistic and Poisson regression and of log-linear models for categorical data result in posterior distributions that are not super-exponentially light. We show how to apply the change-of-variable formula for diffeomorphisms to obtain new densities that do satisfy the conditions for geometric ergodicity. Sampling the new variable and mapping the results back to the old gives a geometrically ergodic sampler for the original variable. This method of obtaining geometric ergodicity has very wide applicability.
△ Less
Submitted 11 December, 2013; v1 submitted 27 February, 2013;
originally announced February 2013.
-
Asymptotics of Maximum Likelihood without the LLN or CLT or Sample Size Going to Infinity
Authors:
Charles J. Geyer
Abstract:
If the log likelihood is approximately quadratic with constant Hessian, then the maximum likelihood estimator (MLE) is approximately normally distributed. No other assumptions are required. We do not need independent and identically distributed data. We do not need the law of large numbers (LLN) or the central limit theorem (CLT). We do not need sample size going to infinity or anything going to i…
▽ More
If the log likelihood is approximately quadratic with constant Hessian, then the maximum likelihood estimator (MLE) is approximately normally distributed. No other assumptions are required. We do not need independent and identically distributed data. We do not need the law of large numbers (LLN) or the central limit theorem (CLT). We do not need sample size going to infinity or anything going to infinity. Presented here is a combination of Le Cam style theory involving local asymptotic normality (LAN) and local asymptotic mixed normality (LAMN) and Cramér style theory involving derivatives and Fisher information. The main tool is convergence in law of the log likelihood function and its derivatives considered as random elements of a Polish space of continuous functions with the metric of uniform convergence on compact sets. We obtain results for both one-step-Newton estimators and Newton-iterated-to-convergence estimators.
△ Less
Submitted 4 July, 2012; v1 submitted 20 June, 2012;
originally announced June 2012.
-
Likelihood Inference in Exponential Families and Directions of Recession
Authors:
Charles J. Geyer
Abstract:
When in a full exponential family the maximum likelihood estimate (MLE) does not exist, the MLE may exist in the Barndorff-Nielsen completion of the family. We propose a practical algorithm for finding the MLE in the completion based on repeated linear programming using the R contributed package rcdd and illustrate it with two generalized linear model examples. When the MLE for the null hypothes…
▽ More
When in a full exponential family the maximum likelihood estimate (MLE) does not exist, the MLE may exist in the Barndorff-Nielsen completion of the family. We propose a practical algorithm for finding the MLE in the completion based on repeated linear programming using the R contributed package rcdd and illustrate it with two generalized linear model examples. When the MLE for the null hypothesis lies in the completion, likelihood ratio tests of model comparison are almost unchanged from the usual case. Only the degrees of freedom need to be adjusted. When the MLE lies in the completion, confidence intervals are changed much more from the usual case. The MLE of the natural parameter can be thought of as having gone to infinity in a certain direction, which we call a generic direction of recession. We propose a new one-sided confidence interval which says how close to infinity the natural parameter may be. This maps to one-sided confidence intervals for mean values showing how close to the boundary of their support they may be.
△ Less
Submitted 5 January, 2009;
originally announced January 2009.
-
Monte Carlo likelihood inference for missing data models
Authors:
Yun Ju Sung,
Charles J. Geyer
Abstract:
We describe a Monte Carlo method to approximate the maximum likelihood estimate (MLE), when there are missing data and the observed data likelihood is not available in closed form. This method uses simulated missing data that are independent and identically distributed and independent of the observed data. Our Monte Carlo approximation to the MLE is a consistent and asymptotically normal estimat…
▽ More
We describe a Monte Carlo method to approximate the maximum likelihood estimate (MLE), when there are missing data and the observed data likelihood is not available in closed form. This method uses simulated missing data that are independent and identically distributed and independent of the observed data. Our Monte Carlo approximation to the MLE is a consistent and asymptotically normal estimate of the minimizer $θ^*$ of the Kullback--Leibler information, as both Monte Carlo and observed data sample sizes go to infinity simultaneously. Plug-in estimates of the asymptotic variance are provided for constructing confidence regions for $θ^*$. We give Logit--Normal generalized linear mixed model examples, calculated using an R package.
△ Less
Submitted 16 August, 2007;
originally announced August 2007.