有条件平均治疗效应预测的良性过渡

论文标题

有条件平均治疗效应预测的良性过渡

Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression

论文作者

Kato, Masahiro, Imaizumi, Masaaki

论文摘要

我们研究了线性回归模型的条件平均治疗效果（CATE）（CATE）的良性过度拟合理论。随着因果推断的机器学习的发展，广泛的因果关系模型引起了人们的关注。一个问题是，人们怀疑大规模模型容易过度适应样品选择的观察，因此大型模型可能不适合因果预测。在这项研究中，为了解决可疑的，我们通过应用最新的良性过度拟合理论来研究因果推理方法过度参数化模型的有效性（Bartlett等，2020）。具体而言，我们考虑其根据分配规则进行切换的样本，并使用线性模型研究CATE的预测，其尺寸与无穷大。我们关注两种方法：T-Learner，该方法基于每个处理组分别构造的估计器之间的差异和逆概率权重（IPW） - 验证者，该估计性解决了另一个通过倾向分数近似的回归问题。在这两种方法中，估计量都由完美适合样品的插值器组成。结果，我们表明，除随机分配外，T-Learner无法达到一致性，而如果知道倾向得分，则IPW-Learner将风险收敛到零。这种差异源于T-Learner无法保留协方差的特征空间，这对于在过度参数化的环境中过度拟合是必不可少的。我们的结果为在过度参数化的设置中，特别是双重稳定的估计器中的因果推理方法的使用提供了新的见解。

We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE), with linear regression models. As the development of machine learning for causal inference, a wide range of large-scale models for causality are gaining attention. One problem is that suspicions have been raised that the large-scale models are prone to overfitting to observations with sample selection, hence the large models may not be suitable for causal prediction. In this study, to resolve the suspicious, we investigate on the validity of causal inference methods for overparameterized models, by applying the recent theory of benign overfitting (Bartlett et al., 2020). Specifically, we consider samples whose distribution switches depending on an assignment rule, and study the prediction of CATE with linear models whose dimension diverges to infinity. We focus on two methods: the T-learner, which based on a difference between separately constructed estimators with each treatment group, and the inverse probability weight (IPW)-learner, which solves another regression problem approximated by a propensity score. In both methods, the estimator consists of interpolators that fit the samples perfectly. As a result, we show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known. This difference stems from that the T-learner is unable to preserve eigenspaces of the covariances, which is necessary for benign overfitting in the overparameterized setting. Our result provides new insights into the usage of causal inference methods in the overparameterizated setting, in particular, doubly robust estimators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题