论文标题
有针对性的人:对纵向电子健康记录的观察性因果推断的深度学习
Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records
论文作者
论文摘要
当随机临床试验(RCT)是不可行的或不可推断的,观察性因果推断对于医学决策很有用。但是,传统方法未能在实践中得出无关的因果结论。 “双重强大”的非参数工具的兴起,加上深度学习的增长,以捕获多模式数据的丰富表示形式,为开发和测试这种模型的综合电子健康记录(EHR)提供了独特的机会。在本文中,我们研究了RCT建立的无因果关系的因果建模:降压使用对事件癌风险的影响。我们为观察性研究开发了一个数据集和一个基于变压器的模型,该模型与双重强大的估计相结合,我们估计了平均风险比(RR)。我们将我们的模型与基准的统计和深度学习模型进行了比较,以在我们的数据集的半合成派生中具有各种混淆和混杂的强度,以进行因果推断。为了进一步测试我们方法的可靠性,我们在有限数据的情况下测试了模型。我们发现,与在实验中高维EHR的风险比估计的基准相比,我们的模型提供了更准确的RR估计值(地面真相的绝对误差)。最后,我们将模型应用于原始案例研究:抗高血压对癌症的影响,并证明我们的模型通常捕获了经过验证的无效关联。
Observational causal inference is useful for decision making in medicine when randomized clinical trials (RCT) are infeasible or non generalizable. However, traditional approaches fail to deliver unconfounded causal conclusions in practice. The rise of "doubly robust" non-parametric tools coupled with the growth of deep learning for capturing rich representations of multimodal data, offers a unique opportunity to develop and test such models for causal inference on comprehensive electronic health records (EHR). In this paper, we investigate causal modelling of an RCT-established null causal association: the effect of antihypertensive use on incident cancer risk. We develop a dataset for our observational study and a Transformer-based model, Targeted BEHRT coupled with doubly robust estimation, we estimate average risk ratio (RR). We compare our model to benchmark statistical and deep learning models for causal inference in multiple experiments on semi-synthetic derivations of our dataset with various types and intensities of confounding. In order to further test the reliability of our approach, we test our model on situations of limited data. We find that our model provides more accurate estimates of RR (least sum absolute error from ground truth) compared to benchmarks for risk ratio estimation on high-dimensional EHR across experiments. Finally, we apply our model to investigate the original case study: antihypertensives' effect on cancer and demonstrate that our model generally captures the validated null association.