因果关系感知的反事实混杂调整，作为基于线性学习者的反疗法预测任务中线性残差的替代方案

论文标题

因果关系感知的反事实混杂调整，作为基于线性学习者的反疗法预测任务中线性残差的替代方案

Causality-aware counterfactual confounding adjustment as an alternative to linear residualization in anticausal prediction tasks based on linear learners

论文作者

Neto, Elias Chaibub

论文摘要

线性残差是混淆机器学习（ML）应用程序的常见做法。最近，已经提出了因果关系的预测建模，作为调整混杂因素的替代因果关系启发的方法。基本思想是模拟反事实数据，这些数据摆脱了观察到的混杂因素产生的虚假关联。在本文中，我们将线性残差方法与因果关系的混淆调整进行了比较，并表明因果关系感知的方法倾向于（渐近地）优于线性学习者中预测性能的残差调整。重要的是，即使真实模型不是线性的，我们的结果仍然存在。我们在回归和分类任务中说明了我们的结果，在该任务中，我们在合成数据实验中使用平方误差和分类精度比较了因果关系感知和残差方法，在该实验中，线性回归模型被错误地指定，以及线性模型正确指定时。此外，我们说明了因果关系感知方法如何比残差相对于混杂因素的联合分布和结果变量更稳定。

Linear residualization is a common practice for confounding adjustment in machine learning (ML) applications. Recently, causality-aware predictive modeling has been proposed as an alternative causality-inspired approach for adjusting for confounders. The basic idea is to simulate counterfactual data that is free from the spurious associations generated by the observed confounders. In this paper, we compare the linear residualization approach against the causality-aware confounding adjustment in anticausal prediction tasks, and show that the causality-aware approach tends to (asymptotically) outperform the residualization adjustment in terms of predictive performance in linear learners. Importantly, our results still holds even when the true model is not linear. We illustrate our results in both regression and classification tasks, where we compared the causality-aware and residualization approaches using mean squared errors and classification accuracy in synthetic data experiments where the linear regression model is mispecified, as well as, when the linear model is correctly specified. Furthermore, we illustrate how the causality-aware approach is more stable than residualization with respect to dataset shifts in the joint distribution of the confounders and outcome variables.

下载PDF全文

下载文献需遵守相关版权规定

论文标题