论文标题
三相概括的耙子和多个插补估计器来解决容易出错的数据
Three-phase generalized raking and multiple imputation estimators to address error-prone data
论文作者
论文摘要
验证研究通常用于在具有错误的数据的设置中获取更可靠的信息。可以将受试者子样本的验证数据与所有受试者的易错数据一起使用,以提高估计。实际上,可能需要一轮数据验证,并且直接应用将验证数据组合到分析中的标准方法可能导致估计效率低下,因为中间验证步骤可获得的信息仅被部分考虑甚至完全忽略。在本文中,我们提出了两种新颖的插定和广义耙子估计器的新型扩展,这些扩展可以充分利用所有可用数据。我们通过模拟显示,将中间步骤中的信息结合起来可能会带来巨大的效率提高。这项工作是由82,957名患有艾滋病毒的妇女的避孕效率的研究激励和说明的,这些妇女最初是从电子病历中提取的,其中485555次对其图表进行了审查,随后的1203也进行了电话采访以验证关键研究变量。
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 82,957 women living with HIV whose data were originally extracted from electronic medical records, of whom 4855 had their charts reviewed, and a subsequent 1203 also had a telephone interview to validate key study variables.