对电子健康记录的深度稳定表示学习

论文标题

对电子健康记录的深度稳定表示学习

Deep Stable Representation Learning on Electronic Health Records

论文作者

Luo, Yingtao, Liu, Zhaocheng, Liu, Qiang

论文摘要

深度学习模型已经实现了患者电子健康记录（EHR）的有希望的疾病预测。但是，大多数在I.I.D.下开发的模型假设未能考虑不可知的分布变化，从而降低了深度学习模型到分布（OOD）数据的概括能力。在这种情况下，将利用可能在不同环境中发生变化的虚假统计相关性，这可能会导致深度学习模型的次优性能。训练分布中存在过程和诊断之间的不稳定相关性可能会导致历史EHR与未来诊断之间的虚假相关性。为了解决这个问题，我们建议使用一种称为因果医疗保健嵌入（CHE）的因果表示学习方法。 CHE旨在通过消除诊断和程序之间的依赖性来消除虚假的统计关系。我们介绍了希尔伯特 - 史密特独立标准（HSIC），以衡量嵌入式诊断和程序特征之间的独立性。基于因果观点分析，我们执行样本加权技术，以摆脱这种虚假的关系，以跨不同环境对EHR进行稳定学习。此外，我们提出的CHE方法可以用作灵活的插件模块，可以增强EHR上现有的深度学习模型。在两个公共数据集和五个最先进的基线上进行了广泛的实验表明，CHE可以通过很大的利润来提高深度学习模型对分布数据的预测准确性。此外，可解释性研究表明，CHE可以成功利用因果结构来反映历史记录对预测的更合理贡献。

Deep learning models have achieved promising disease prediction performance of the Electronic Health Records (EHR) of patients. However, most models developed under the I.I.D. hypothesis fail to consider the agnostic distribution shifts, diminishing the generalization ability of deep learning models to Out-Of-Distribution (OOD) data. In this setting, spurious statistical correlations that may change in different environments will be exploited, which can cause sub-optimal performances of deep learning models. The unstable correlation between procedures and diagnoses existed in the training distribution can cause spurious correlation between historical EHR and future diagnosis. To address this problem, we propose to use a causal representation learning method called Causal Healthcare Embedding (CHE). CHE aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures. We introduce the Hilbert-Schmidt Independence Criterion (HSIC) to measure the degree of independence between the embedded diagnosis and procedure features. Based on causal view analyses, we perform the sample weighting technique to get rid of such spurious relationship for the stable learning of EHR across different environments. Moreover, our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR. Extensive experiments on two public datasets and five state-of-the-art baselines unequivocally show that CHE can improve the prediction accuracy of deep learning models on out-of-distribution data by a large margin. In addition, the interpretability study shows that CHE could successfully leverage causal structures to reflect a more reasonable contribution of historical records for predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题