-
Addressing Confounding and Continuous Exposure Measurement Error Using Corrected Score Functions
Authors:
Brian D. Richardson,
Bryan S. Blette,
Peter B. Gilbert,
Michael G. Hudgens
Abstract:
Confounding and exposure measurement error can introduce bias when drawing inference about the marginal effect of an exposure on an outcome of interest. While there are broad methodologies for addressing each source of bias individually, confounding and exposure measurement error frequently co-occur and there is a need for methods that address them simultaneously. In this paper, corrected score me…
▽ More
Confounding and exposure measurement error can introduce bias when drawing inference about the marginal effect of an exposure on an outcome of interest. While there are broad methodologies for addressing each source of bias individually, confounding and exposure measurement error frequently co-occur and there is a need for methods that address them simultaneously. In this paper, corrected score methods are derived under classical additive measurement error to draw inference about marginal exposure effects using only measured variables. Three estimators are proposed based on g-formula, inverse probability weighting, and doubly-robust estimation techniques. The estimators are shown to be consistent and asymptotically normal, and the doubly-robust estimator is shown to exhibit its namesake property. The methods, which are implemented in the R package mismex, perform well in finite samples under both confounding and measurement error as demonstrated by simulation studies. The proposed doubly-robust estimator is applied to study the effects of two biomarkers on HIV-1 infection using data from the HVTN 505 preventative vaccine trial.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Assessing COVID-19 Vaccine Effectiveness in Observational Studies via Nested Trial Emulation
Authors:
Justin B. DeMonte,
Bonnie E. Shook-Sa,
Michael G. Hudgens
Abstract:
Observational data are often used to estimate real-world effectiveness and durability of coronavirus disease 2019 (COVID-19) vaccines. A sequence of nested trials can be emulated to draw inference from such data while minimizing selection bias, immortal time bias, and confounding. Typically, when nested trial emulation (NTE) is employed, effect estimates are pooled across trials to increase statis…
▽ More
Observational data are often used to estimate real-world effectiveness and durability of coronavirus disease 2019 (COVID-19) vaccines. A sequence of nested trials can be emulated to draw inference from such data while minimizing selection bias, immortal time bias, and confounding. Typically, when nested trial emulation (NTE) is employed, effect estimates are pooled across trials to increase statistical efficiency. However, such pooled estimates may lack a clear interpretation when the treatment effect is heterogeneous across trials. In the context of COVID-19, vaccine effectiveness quite plausibly will vary over calendar time due to newly emerging variants of the virus. This manuscript considers a NTE inverse probability weighted estimator of vaccine effectiveness that may vary over calendar time, time since vaccination, or both. Statistical testing of the trial effect homogeneity assumption is considered. Simulation studies are presented examining the finite-sample performance of these methods under a variety of scenarios. The methods are used to estimate vaccine effectiveness against COVID-19 outcomes using observational data on over 120,000 residents of Abruzzo, Italy during 2021.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Finite sample performance of optimal treatment rule estimators with right-censored outcomes
Authors:
Michael Jetsupphasuk,
Michael G. Hudgens,
Jessie K. Edwards,
Stephen R. Cole
Abstract:
Patient care may be improved by recommending treatments based on patient characteristics when there is treatment effect heterogeneity. Recently, there has been a great deal of attention focused on the estimation of optimal treatment rules that maximize expected outcomes. However, there has been comparatively less attention given to settings where the outcome is right-censored, especially with rega…
▽ More
Patient care may be improved by recommending treatments based on patient characteristics when there is treatment effect heterogeneity. Recently, there has been a great deal of attention focused on the estimation of optimal treatment rules that maximize expected outcomes. However, there has been comparatively less attention given to settings where the outcome is right-censored, especially with regard to the practical use of estimators. In this study, simulations were undertaken to assess the finite-sample performance of estimators for optimal treatment rules and estimators for the expected outcome under treatment rules. The simulations were motivated by the common setting in biomedical and public health research where the data is observational, survival times may be right-censored, and there is interest in estimating baseline treatment decisions to maximize survival probability. A variety of outcome regression and direct search estimation methods were compared for optimal treatment rule estimation across a range of simulation scenarios. Methods that flexibly model the outcome performed comparatively well, including in settings where the treatment rule was non-linear. R code to reproduce this study's results are available on Github.
△ Less
Submitted 25 January, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Fusing Trial Data for Treatment Comparisons: Single versus Multi-Span Bridging
Authors:
Bonnie E. Shook-Sa,
Paul N. Zivich,
Samuel P. Rosin,
Jessie K. Edwards,
Adaora A. Adimora,
Michael G. Hudgens,
Stephen R. Cole
Abstract:
While randomized controlled trials (RCTs) are critical for establishing the efficacy of new therapies, there are limitations regarding what comparisons can be made directly from trial data. RCTs are limited to a small number of comparator arms and often compare a new therapeutic to a standard of care which has already proven efficacious. It is sometimes of interest to estimate the efficacy of the…
▽ More
While randomized controlled trials (RCTs) are critical for establishing the efficacy of new therapies, there are limitations regarding what comparisons can be made directly from trial data. RCTs are limited to a small number of comparator arms and often compare a new therapeutic to a standard of care which has already proven efficacious. It is sometimes of interest to estimate the efficacy of the new therapy relative to a treatment that was not evaluated in the same trial, such as a placebo or an alternative therapy that was evaluated in a different trial. Such multi-study comparisons are challenging because of potential differences between trial populations that can affect the outcome. In this paper, two bridging estimators are considered that allow for comparisons of treatments evaluated in different trials using data fusion methods to account for measured differences in trial populations. A "multi-span'' estimator leverages a shared arm between two trials, while a "single-span'' estimator does not require a shared arm. A diagnostic statistic that compares the outcome in the standardized shared arms is provided. The two estimators are compared in simulations, where both estimators demonstrate minimal empirical bias and nominal confidence interval coverage when the identification assumptions are met. The estimators are applied to data from the AIDS Clinical Trials Group 320 and 388 to compare the efficacy of two-drug versus four-drug antiretroviral therapy on CD4 cell counts among persons with advanced HIV. The single-span approach requires fewer identification assumptions and was more efficient in simulations and the application.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Quantifying the HIV reservoir with dilution assays and deep viral sequencing
Authors:
Sarah C. Lotspeich,
Brian D. Richardson,
Pedro L. Baldoni,
Kimberly P. Enders,
Michael G. Hudgens
Abstract:
People living with HIV on antiretroviral therapy often have undetectable virus levels by standard assays, but "latent" HIV still persists in viral reservoirs. Eliminating these reservoirs is the goal of HIV cure research. The quantitative viral outgrowth assay (QVOA) is commonly used to estimate the reservoir size, i.e., the infectious units per million (IUPM) of HIV-persistent resting CD4+ T cell…
▽ More
People living with HIV on antiretroviral therapy often have undetectable virus levels by standard assays, but "latent" HIV still persists in viral reservoirs. Eliminating these reservoirs is the goal of HIV cure research. The quantitative viral outgrowth assay (QVOA) is commonly used to estimate the reservoir size, i.e., the infectious units per million (IUPM) of HIV-persistent resting CD4+ T cells. A new variation of the QVOA, the Ultra Deep Sequencing Assay of the outgrowth virus (UDSA), was recently developed that further quantifies the number of viral lineages within a subset of infected wells. Performing the UDSA on a subset of wells provides additional information that can improve IUPM estimation. This paper considers statistical inference about the IUPM from combined dilution assay (QVOA) and deep viral sequencing (UDSA) data, even when some deep sequencing data are missing. Methods are proposed to accommodate assays with wells sequenced at multiple dilution levels and with imperfect sensitivity and specificity, and a novel bias-corrected estimator is included for small samples. The proposed methods are evaluated in a simulation study, applied to data from the University of North Carolina HIV Cure Center, and implemented in the open-source R package SLDeepAssay.
△ Less
Submitted 26 September, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Efficient Nonparametric Estimation of Stochastic Policy Effects with Clustered Interference
Authors:
Chanhwa Lee,
Donglin Zeng,
Michael G. Hudgens
Abstract:
Interference occurs when a unit's treatment (or exposure) affects another unit's outcome. In some settings, units may be grouped into clusters such that it is reasonable to assume that interference, if present, only occurs between individuals in the same cluster, i.e., there is clustered interference. Various causal estimands have been proposed to quantify treatment effects under clustered interfe…
▽ More
Interference occurs when a unit's treatment (or exposure) affects another unit's outcome. In some settings, units may be grouped into clusters such that it is reasonable to assume that interference, if present, only occurs between individuals in the same cluster, i.e., there is clustered interference. Various causal estimands have been proposed to quantify treatment effects under clustered interference from observational data, but these estimands either entail treatment policies lacking real-world relevance or are based on parametric propensity score models. Here, we propose new causal estimands based on modification of the propensity score distribution which may be more relevant in many contexts and are not based on parametric models. Nonparametric sample splitting estimators of the new estimands are constructed, which allow for flexible data-adaptive estimation of nuisance functions and are consistent, asymptotically normal, and efficient, converging at the usual parametric rate. Simulations show the finite sample performance of the proposed estimators. The proposed methods are applied to evaluate the effect of water, sanitation, and hygiene facilities on diarrhea among children in Senegal.
△ Less
Submitted 23 August, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Bridged treatment comparisons: an illustrative application in HIV treatment
Authors:
Paul N Zivich,
Stephen R Cole,
Jessie K Edwards,
Bonnie E Shook-Sa,
Alexander Breskin,
Michael G Hudgens
Abstract:
Comparisons of treatments, interventions, or exposures are of central interest in epidemiology, but direct comparisons are not always possible due to practical or ethical reasons. Here, we detail a fusion approach to compare treatments across studies. The motivating example entails comparing the risk of the composite outcome of death, AIDS, or greater than a 50% CD4 cell count decline in people wi…
▽ More
Comparisons of treatments, interventions, or exposures are of central interest in epidemiology, but direct comparisons are not always possible due to practical or ethical reasons. Here, we detail a fusion approach to compare treatments across studies. The motivating example entails comparing the risk of the composite outcome of death, AIDS, or greater than a 50% CD4 cell count decline in people with HIV when assigned triple versus mono antiretroviral therapy, using data from the AIDS Clinical Trial Group (ACTG) 175 (mono versus dual therapy) and ACTG 320 (dual versus triple therapy). We review a set of identification assumptions and estimate the risk difference using an inverse probability weighting estimator that leverages the shared trial arms (dual therapy). A fusion diagnostic based on comparing the shared arms is proposed that may indicate violation of the identification assumptions. Application of the data fusion estimator and diagnostic to the ACTG trials indicates triple therapy results in a reduction in risk compared to monotherapy in individuals with baseline CD4 counts between 50 and 300 cells/mm$^3$. Bridged treatment comparisons address questions that none of the constituent data sources could address alone, but valid fusion-based inference requires careful consideration of the underlying assumptions.
△ Less
Submitted 22 August, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Exposure Effects on Count Outcomes with Observational Data, with Application to Incarcerated Women
Authors:
Bonnie E. Shook-Sa,
Michael G. Hudgens,
Andrea K. Knittel,
Andrew Edmonds,
Catalina Ramirez,
Stephen R. Cole,
Mardge Cohen,
Adebola Adedimeji,
Tonya Taylor,
Katherine G. Michel,
Andrea Kovacs,
Jennifer Cohen,
Jessica Donohue,
Antonina Foster,
Margaret A. Fischl,
Dustin Long,
Adaora A. Adimora
Abstract:
Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outco…
▽ More
Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.
△ Less
Submitted 6 November, 2023; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Estimating SARS-CoV-2 Seroprevalence
Authors:
Samuel P. Rosin,
Bonnie E. Shook-Sa,
Stephen R. Cole,
Michael G. Hudgens
Abstract:
Governments and public health authorities use seroprevalence studies to guide responses to the COVID-19 pandemic. Seroprevalence surveys estimate the proportion of individuals who have detectable SARS-CoV-2 antibodies. However, serologic assays are prone to misclassification error, and non-probability sampling may induce selection bias. In this paper, nonparametric and parametric seroprevalence es…
▽ More
Governments and public health authorities use seroprevalence studies to guide responses to the COVID-19 pandemic. Seroprevalence surveys estimate the proportion of individuals who have detectable SARS-CoV-2 antibodies. However, serologic assays are prone to misclassification error, and non-probability sampling may induce selection bias. In this paper, nonparametric and parametric seroprevalence estimators are considered that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the estimators over a range of scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in New York City, Belgium, and North Carolina.
△ Less
Submitted 9 November, 2022; v1 submitted 4 November, 2021;
originally announced November 2021.
-
G-Formula for Observational Studies under Stratified Interference, with Application to Bed Net Use on Malaria
Authors:
Kayla W. Kilpatrick,
Chanhwa Lee,
Michael G. Hudgens
Abstract:
Assessing population-level effects of vaccines and other infectious disease prevention measures is important to the field of public health. In infectious disease studies, one person's treatment may affect another individual's outcome, i.e., there may be interference between units. For example, the use of bed nets to prevent malaria by one individual may have an indirect effect on other individuals…
▽ More
Assessing population-level effects of vaccines and other infectious disease prevention measures is important to the field of public health. In infectious disease studies, one person's treatment may affect another individual's outcome, i.e., there may be interference between units. For example, the use of bed nets to prevent malaria by one individual may have an indirect effect on other individuals living in close proximity. In some settings, individuals may form groups or clusters where interference only occurs within groups, i.e., there is partial interference. Inverse probability weighted estimators have previously been developed for observational studies with partial interference. Unfortunately, these estimators are not well suited for studies with large clusters. Therefore, in this paper, the parametric g-formula is extended to allow for partial interference. G-formula estimators are proposed for overall effects, effects when treated, and effects when untreated. The proposed estimators can accommodate large clusters and do not suffer from the g-null paradox that may occur in the absence of interference. The large sample properties of the proposed estimators are derived assuming no unmeasured confounders and that the partial interference takes a particular form (referred to as `weak stratified interference'). Simulation studies are presented demonstrating the finite-sample performance of the proposed estimators. The Demographic and Health Survey from the Democratic Republic of the Congo is then analyzed using the proposed g-formula estimators to assess the effects of bed net use on malaria.
△ Less
Submitted 25 March, 2024; v1 submitted 1 February, 2021;
originally announced February 2021.
-
On variance of the treatment effect in the treated using inverse probability weighting
Authors:
Sarah A. Reifeis,
Michael G. Hudgens
Abstract:
In the analysis of observational studies, inverse probability weighting (IPW) is commonly used to consistently estimate the average treatment effect (ATE) or the average treatment effect in the treated (ATT). The variance of the IPW ATE estimator is often estimated by assuming the weights are known and then using the so-called "robust" (Huber-White) sandwich estimator, which results in conservativ…
▽ More
In the analysis of observational studies, inverse probability weighting (IPW) is commonly used to consistently estimate the average treatment effect (ATE) or the average treatment effect in the treated (ATT). The variance of the IPW ATE estimator is often estimated by assuming the weights are known and then using the so-called "robust" (Huber-White) sandwich estimator, which results in conservative standard error (SE) estimation. Here it is shown that using such an approach when estimating the variance of the IPW ATT estimator does not necessarily result in conservative SE estimates. That is, assuming the weights are known, the robust sandwich estimator may be conservative or anti-conservative. Thus confidence intervals of the ATT using the robust SE estimate will not be valid in general. Instead, stacked estimating equations which account for the weight estimation can be used to compute a consistent, closed-form variance estimator for the IPW ATT estimator. The two variance estimators are compared via simulation studies and in a data analysis of the effect of smoking on gene expression.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
diproperm: An R Package for the DiProPerm Test
Authors:
Andrew G. Allmon,
J. S. Marron,
Michael G. Hudgens
Abstract:
High-dimensional low sample size (HDLSS) data sets emerge frequently in many biomedical applications. A common task for analyzing HDLSS data is to assign data to the correct class using a classifier. Classifiers which use two labels and a linear combination of features are known as binary linear classifiers. The direction-projection-permutation (DiProPerm) test was developed for testing the differ…
▽ More
High-dimensional low sample size (HDLSS) data sets emerge frequently in many biomedical applications. A common task for analyzing HDLSS data is to assign data to the correct class using a classifier. Classifiers which use two labels and a linear combination of features are known as binary linear classifiers. The direction-projection-permutation (DiProPerm) test was developed for testing the difference of two high-dimensional distributions induced by a binary linear classifier. This paper discusses the key components of the DiProPerm test, introduces the diproperm R package, and demonstrates the package on a real-world data set.
△ Less
Submitted 30 August, 2020;
originally announced September 2020.
-
Power and Sample Size for Marginal Structural Models
Authors:
Bonnie E. Shook-Sa,
Michael G. Hudgens
Abstract:
Marginal structural models fit via inverse probability of treatment weighting are commonly used to control for confounding when estimating causal effects from observational data. When planning a study that will be analyzed with marginal structural modeling, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance…
▽ More
Marginal structural models fit via inverse probability of treatment weighting are commonly used to control for confounding when estimating causal effects from observational data. When planning a study that will be analyzed with marginal structural modeling, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance of the estimated causal means. This paper considers the utility of the design effect to quantify the effect of weighting on the precision of causal estimates. The design effect is defined as the ratio of the variance of the causal mean estimator divided by the variance of a naive estimator if, counter to fact, no confounding had been present and weights were not needed. A simple, closed-form approximation of the design effect is derived that is outcome invariant and can be estimated during the study design phase. Once the design effect is approximated for each treatment group, sample size calculations are conducted as for a randomized trial, but with variances inflated by the design effects to account for weighting. Simulations demonstrate the accuracy of the design effect approximation, and practical considerations are discussed.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Balanced Policy Evaluation and Learning for Right Censored Data
Authors:
Owen E. Leete,
Nathan Kallus,
Michael G. Hudgens,
Sonia Napravnik,
Michael R. Kosorok
Abstract:
Individualized treatment rules can lead to better health outcomes when patients have heterogeneous responses to treatment. Very few individualized treatment rule estimation methods are compatible with a multi-treatment observational study with right censored survival outcomes. In this paper we extend policy evaluation methods to the right censored data setting. Existing approaches either make rest…
▽ More
Individualized treatment rules can lead to better health outcomes when patients have heterogeneous responses to treatment. Very few individualized treatment rule estimation methods are compatible with a multi-treatment observational study with right censored survival outcomes. In this paper we extend policy evaluation methods to the right censored data setting. Existing approaches either make restrictive assumptions about the structure of the data, or use inverse weighting methods that increase the variance of the estimator resulting in decreased performance. We propose a method which uses balanced policy evaluation combined with an imputation approach to remove right censoring. We show that the proposed imputation approach is compatible with a large number of existing survival models and can be used to extend any individualized treatment rule estimation method to the right censored data setting. We establish the rate at which the imputed values converge to the conditional expected survival times, as well as consistency guarantees and regret bounds for the combined balanced policy with imputation approach. In simulation studies, we demonstrate the improved performance of our approach compared to existing methods. We also apply our method to data from the University of North Carolina Center for AIDS Research HIV Clinical Cohort.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Estimands and Inference in Cluster-Randomized Vaccine Trials
Authors:
Kayla W. Kilpatrick,
Michael G. Hudgens,
M. Elizabeth Halloran
Abstract:
Cluster-randomized trials are often conducted to assess vaccine effects. Defining estimands of interest before conducting a trial is integral to the alignment between a study's objectives and the data to be collected and analyzed. This paper considers estimands and estimators for overall, indirect, and total vaccine effects in trials where clusters of individuals are randomized to vaccine or contr…
▽ More
Cluster-randomized trials are often conducted to assess vaccine effects. Defining estimands of interest before conducting a trial is integral to the alignment between a study's objectives and the data to be collected and analyzed. This paper considers estimands and estimators for overall, indirect, and total vaccine effects in trials where clusters of individuals are randomized to vaccine or control. The scenario is considered where individuals self-select whether to participate in the trial and the outcome of interest is measured on all individuals in each cluster. Unlike the overall, indirect, and total effects, the direct effect of vaccination is shown in general not to be estimable without further assumptions, such as no unmeasured confounding. An illustrative example motivated by a cluster-randomized typhoid vaccine trial is provided.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Inverse Probability Weighted Estimators of Vaccine Effects Accommodating Partial Interference and Censoring
Authors:
Sujatro Chakladar,
Michael G. Hudgens,
M. Elizabeth Halloran,
John D. Clemens,
Mohammad Ali,
Michael E. Emch
Abstract:
Estimating population-level effects of a vaccine is challenging because there may be interference, i.e., the outcome of one individual may depend on the vaccination status of another individual. Partial interference occurs when individuals can be partitioned into groups such that interference occurs only within groups. In the absence of interference, inverse probability weighted (IPW) estimators a…
▽ More
Estimating population-level effects of a vaccine is challenging because there may be interference, i.e., the outcome of one individual may depend on the vaccination status of another individual. Partial interference occurs when individuals can be partitioned into groups such that interference occurs only within groups. In the absence of interference, inverse probability weighted (IPW) estimators are commonly used to draw inference about causal effects of an exposure or treatment. Tchetgen Tchetgen and VanderWeele (2012) proposed a modified IPW estimator for causal effects in the presence of partial interference. Motivated by a cholera vaccine study in Bangladesh, this paper considers an extension of the Tchetgen Tchetgen and VanderWeele IPW estimator to the setting where the outcome is subject to right censoring using inverse probability of censoring weights (IPCW). Censoring weights are estimated using proportional hazards frailty models. The large sample properties of the IPCW estimators are derived, and simulation studies are presented demonstrating the estimators' performance in finite samples. The methods are then used to analyze data from the cholera vaccine study.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Exact Power of the Rank-Sum Test for a Continuous Variable
Authors:
Katie R. Mollan,
Ilana M. Trumble,
Sarah A. Reifeis,
Orlando Ferrer,
Camden P. Bay,
Pedro L. Baldoni,
Michael G. Hudgens
Abstract:
Accurate power calculations are essential in small studies containing expensive experimental units or high-stakes exposures. Herein, exact power of the Wilcoxon Mann-Whitney rank-sum test of a continuous variable is formulated using a Monte Carlo approach and defining P(X < Y) = p as a measure of effect size, where X and Y denote random observations from two distributions hypothesized to be equal…
▽ More
Accurate power calculations are essential in small studies containing expensive experimental units or high-stakes exposures. Herein, exact power of the Wilcoxon Mann-Whitney rank-sum test of a continuous variable is formulated using a Monte Carlo approach and defining P(X < Y) = p as a measure of effect size, where X and Y denote random observations from two distributions hypothesized to be equal under the null. Effect size p fosters productive communications because researchers understand p = 0.5 is analogous to a fair coin toss, and p near 0 or 1 represents a large effect. This approach is feasible even without background data. Simulations were conducted comparing the exact power approach to existing approaches by Rosner & Glynn (2009), Shieh et al. (2006), Noether (1987), and O'Brien-Castelloe (2006). Approximations by Noether and O'Brien-Castelloe are shown to be inaccurate for small sample sizes. The Rosner & Glynn and Shieh et al. approaches performed well in many small sample scenarios, though both are restricted to location-shift alternatives and neither approach is theoretically justified for small samples. The exact method is recommended and available in the R package wmwpow.
KEYWORDS: Mann-Whitney test, Monte Carlo simulation, non-parametric, power analysis, Wilcoxon rank-sum test
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
Post-randomization Biomarker Effect Modification in an HIV Vaccine Clinical Trial
Authors:
Peter B. Gilbert,
Bryan S. Blette,
Bryan E. Shepherd,
Michael G. Hudgens
Abstract:
While the HVTN 505 trial showed no overall efficacy of the tested vaccine to prevent HIV infection over placebo, previous studies, biological theories, and the finding that immune response markers strongly correlated with infection in vaccine recipients generated the hypothesis that a qualitative interaction occurred. This hypothesis can be assessed with statistical methods for studying treatment…
▽ More
While the HVTN 505 trial showed no overall efficacy of the tested vaccine to prevent HIV infection over placebo, previous studies, biological theories, and the finding that immune response markers strongly correlated with infection in vaccine recipients generated the hypothesis that a qualitative interaction occurred. This hypothesis can be assessed with statistical methods for studying treatment effect modification by an intermediate response variable (i.e., principal stratification effect modification (PSEM) methods). However, available PSEM methods make untestable structural risk assumptions, such that assumption-lean versions of PSEM methods are needed in order to surpass the high bar of evidence to demonstrate a qualitative interaction. Fortunately, the survivor average causal effect (SACE) literature is replete with assumption-lean methods that can be readily adapted to the PSEM application for the special case of a binary intermediate response variable. We map this adaptation, opening up a host of new PSEM methods for a binary intermediate variable measured via two-phase sampling, for a dichotomous or failure time final outcome and including or excluding the SACE monotonicity assumption. The new methods support that the vaccine partially protected vaccine recipients with a high polyfunctional CD8+ T cell response, an important new insight for the HIV vaccine field.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Doubly Robust Estimation in Observational Studies with Partial Interference
Authors:
Lan Liu,
Michael G. Hudgens,
Bradley Saul,
John D. Clemens,
Mohammad Ali,
Michael E. Emch
Abstract:
Interference occurs when the treatment (or exposure) of one individual affects the outcomes of others. In some settings it may be reasonable to assume individuals can be partitioned into clusters such that there is no interference between individuals in different clusters, i.e., there is partial interference. In observational studies with partial interference, inverse probability weighted (IPW) es…
▽ More
Interference occurs when the treatment (or exposure) of one individual affects the outcomes of others. In some settings it may be reasonable to assume individuals can be partitioned into clusters such that there is no interference between individuals in different clusters, i.e., there is partial interference. In observational studies with partial interference, inverse probability weighted (IPW) estimators have been proposed of different possible treatment effects. However, the validity of IPW estimators depends on the propensity score being known or correctly modeled. Alternatively, one can estimate the treatment effect using an outcome regression model. In this paper, we propose doubly robust (DR) estimators which utilize both models and are consistent and asymptotically normal if either model, but not necessarily both, is correctly specified. Empirical results are presented to demonstrate the DR property of the proposed estimators, as well as the efficiency gain of DR over IPW estimators when both models are correctly specified. The different estimators are illustrated using data from a study examining the effects of cholera vaccination in Bangladesh.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Randomization inference with general interference and censoring
Authors:
Wen Wei Loh,
Michael G. Hudgens,
John D. Clemens,
Mohammad Ali,
Michael E. Emch
Abstract:
Interference occurs between individuals when the treatment (or exposure) of one individual affects the outcome of another individual. Previous work on causal inference methods in the presence of interference has focused on the setting where a priori it is assumed there is 'partial interference,' in the sense that individuals can be partitioned into groups wherein there is no interference between i…
▽ More
Interference occurs between individuals when the treatment (or exposure) of one individual affects the outcome of another individual. Previous work on causal inference methods in the presence of interference has focused on the setting where a priori it is assumed there is 'partial interference,' in the sense that individuals can be partitioned into groups wherein there is no interference between individuals in different groups. Bowers, Fredrickson, and Panagopoulos (2012) and Bowers, Fredrickson, and Aronow (2016) consider randomization-based inferential methods that allow for more general interference structures in the context of randomized experiments. In this paper, extensions of Bowers et al. which allow for failure time outcomes subject to right censoring are proposed. Permitting right censored outcomes is challenging because standard randomization-based tests of the null hypothesis of no treatment effect assume that whether an individual is censored does not depend on treatment. The proposed extension of Bowers et al. to allow for censoring entails adapting the method of Wang, Lagakos, and Gray (2010) for two sample survival comparisons in the presence of unequal censoring. The methods are examined via simulation studies and utilized to assess the effects of cholera vaccination in an individually-randomized trial of 73,000 children and women in Matlab, Bangladesh.
△ Less
Submitted 19 July, 2019; v1 submitted 6 March, 2018;
originally announced March 2018.
-
Causal Inference from Observational Studies with Clustered Interference
Authors:
Brian G. Barkley,
Michael G. Hudgens,
John D. Clemens,
Mohammad Ali,
Michael E. Emch
Abstract:
Inferring causal effects from an observational study is challenging because participants are not randomized to treatment. Observational studies in infectious disease research present the additional challenge that one participant's treatment may affect another participant's outcome, i.e., there may be interference. In this paper recent approaches to defining causal effects in the presence of interf…
▽ More
Inferring causal effects from an observational study is challenging because participants are not randomized to treatment. Observational studies in infectious disease research present the additional challenge that one participant's treatment may affect another participant's outcome, i.e., there may be interference. In this paper recent approaches to defining causal effects in the presence of interference are considered, and new causal estimands designed specifically for use with observational studies are proposed. Previously defined estimands target counterfactual scenarios in which individuals independently select treatment with equal probability. However, in settings where there is interference between individuals within clusters, it may be unlikely that treatment selection is independent between individuals in the same cluster. The proposed causal estimands instead describe counterfactual scenarios in which the treatment selection correlation structure is the same as in the observed data distribution, allowing for within-cluster dependence in the individual treatment selections. These estimands may be more relevant for policy-makers or public health officials who desire to quantify the effect of increasing the proportion of treated individuals in a population. Inverse probability-weighted estimators for these estimands are proposed. The large-sample properties of the estimators are derived, and a simulation study demonstrating the finite-sample performance of the estimators is presented. The proposed methods are illustrated by analyzing data from a study of cholera vaccination in over 100,000 individuals in Bangladesh.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
The Calculus of M-estimation in R with geex
Authors:
Bradley C. Saul,
Michael G. Hudgens
Abstract:
M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user-specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The packag…
▽ More
M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user-specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The package also includes a framework for finite sample variance corrections and a website with an extensive collection of tutorials.
△ Less
Submitted 5 January, 2019; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Upstream Causes of Downstream Effects
Authors:
Bradley C. Saul,
Michael G. Hudgens,
Michael A. Mallin
Abstract:
The United States Environmental Protection Agency considers nutrient pollution in stream ecosystems one of the U.S. most pressing environmental challenges. But limited independent replicates, lack of experimental randomization, and space- and time-varying confounding handicap causal inference on effects of nutrient pollution. In this paper the causal g-methods developed by Robins and colleagues ar…
▽ More
The United States Environmental Protection Agency considers nutrient pollution in stream ecosystems one of the U.S. most pressing environmental challenges. But limited independent replicates, lack of experimental randomization, and space- and time-varying confounding handicap causal inference on effects of nutrient pollution. In this paper the causal g-methods developed by Robins and colleagues are extended to allow for exposures to vary in time and space in order to assess the effects of nutrient pollution on chlorophyll a, a proxy for algal production. Publicly available data from the North Carolina Cape Fear River and a simulation study are used to show how causal effects of upstream nutrient concentrations on downstream chlorophyll a levels may be estimated from typical water quality monitoring data. Estimates obtained from the parametric g-formula, a marginal structural model, and a structural nested model indicate that chlorophyll a concentrations at Lock and Dam 1 were influenced by nitrate concentrations measured 86 to 109 km upstream, an area where four major industrial and municipal point sources discharge wastewater.
△ Less
Submitted 18 March, 2018; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Nonparametric Bounds and Sensitivity Analysis of Treatment Effects
Authors:
Amy Richardson,
Michael G. Hudgens,
Peter B. Gilbert,
Jason P. Fine
Abstract:
This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the tr…
▽ More
This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the treatment effect is no longer identifiable without relying on untestable assumptions. Nonetheless, the observable data often do provide some information about the effect of treatment, that is, the parameter of interest is partially identifiable. Two approaches are often employed in this setting: (i) bounds are derived for the treatment effect under minimal assumptions, or (ii) additional untestable assumptions are invoked that render the treatment effect identifiable and then sensitivity analysis is conducted to assess how inference about the treatment effect changes as the untestable assumptions are varied. Approaches (i) and (ii) are considered in various settings, including assessing principal strata effects, direct and indirect effects and effects of time-varying exposures. Methods for drawing formal inference about partially identified parameters are also discussed.
△ Less
Submitted 5 March, 2015;
originally announced March 2015.
-
Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times
Authors:
Marloes H. Maathuis,
Michael G. Hudgens
Abstract:
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified "naive estimator" have been established under certain smoothness conditions. In this paper, we establish the…
▽ More
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified "naive estimator" have been established under certain smoothness conditions. In this paper, we establish the large-sample behavior of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two data sets regarding the cumulative incidence of (i) different types of menopause from a cross-sectional sample of women in the United States and (ii) subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand.
△ Less
Submitted 20 December, 2010; v1 submitted 26 September, 2009;
originally announced September 2009.