Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.
目的: 探讨不同缺失数据填补法对MSM人群HIV感染者(MSM感染者)病毒载量(VL)缺失数据的填补效果。 方法: 以2013年中国16个大城市MSM感染者VL抽样检测数据为基础,采用SPSS 17.0软件,模拟完整数据集和5种不同类型的缺失数据集,采用最大期望值法(EM)、回归法、均值填补法、删除法、马尔科夫链蒙特卡罗法(MCMC)对5种VL缺失数据填补处理,从数据分布、准确度、精确度3个方面比较填补效果。 结果: VL数据呈偏态非连续分布,难以进行有效正态分布转化;不同填补方法对完全随机缺失数据填补效果均较好;对于其他类型缺失数据,回归法、MCMC较好保留完整数据主要分布特征;EM、回归法、均值填补法、删除法普遍低估数据均值,MCMC多高估数据均值。 结论: MCMC可作为首选的VL数据对数转换后缺失数据填补方法。填补数据可作为调查人群VL均值水平估算的参考依据。.
Keywords: HIV; Markov Chain Monte Carlo; Missing data; Multiple imputation; Viral load.