Modeling Pregnancy Outcomes through Sequentially Nested Regression Models

J Am Stat Assoc. 2022;117(538):602-616. doi: 10.1080/01621459.2021.2006666. Epub 2022 Jan 5.

Abstract

The polycystic ovary syndrome (PCOS) is a most common cause of infertility among women of reproductive age. Unfortunately, the etiology of PCOS is poorly understood. Large scale clinical trials for Pregnancy in Polycystic Ovary Syndrome (PPCOS) were conducted to evaluate the effectiveness of treatments. Ovulation, pregnancy, and live birth are three sequentially nested binary outcomes, typically analyzed separately. However, the separate models may lose power in detecting the treatment effects and influential variables for live birth, due to decreased sample sizes and unbalanced event counts. It has been a long-held hypothesis among the clinicians that some of the important variables for early pregnancy outcomes may continue their influence on live birth. To consider this possibility, we develop an 0-norm based regularization method in favor of variables that have been identified from an earlier stage. Our approach explicitly bridges the connections across nested outcomes through computationally easy algorithms and enjoys theoretical guarantee of estimation and variable selection. By analyzing the PPCOS data, we successfully uncover the hidden influence of risk factors on live birth, which confirm clinical experience. Moreover, we provide novel infertility treatment recommendations (e.g., letrozole vs clomiphene citrate) for women with PCOS to improve their chances of live birth.

Keywords: Infertility study; Sequentially nested binary outcome; Variable selection; ℓ0 penalization.