论文标题
用于估计因果参数的模型选择
Model selection for estimation of causal parameters
论文作者
论文摘要
选择和调谐机学习估计器的流行技术是交叉验证。交叉验证通常根据预测精度评估整体模型拟合。在因果推断中,估计量的最佳选择不仅取决于拟合的模型,还取决于统计学家愿意做出的假设。在这种情况下,无法通过检查整体模型拟合来评估不同(潜在有偏见的)估计量的性能。我们提出了一个模型选择程序,该程序估算了从其目标的有限维估计器的平方L2差。该过程依赖于了解目标参数的渐近公正的“基准估计器”。在规律性条件下,我们研究了与竞争程序相比,我们研究所提出的标准的偏差和差异,并获得了与甲骨文程序相比,有限样本对过量风险结合。最终的估计器是不连续的,并且没有高斯极限分布。因此,标准的渐近扩展不适用。我们得出渐近有效的置信区间,这些间隔考虑了模型选择步骤。在模拟数据集(包括实验数据,仪器变量设置和观察数据)上评估了估计方法和平均治疗效果的推断方法的性能。
A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal inference, the optimal choice of estimator depends not only on the fitted models but also on assumptions the statistician is willing to make. In this case, the performance of different (potentially biased) estimators cannot be evaluated by checking overall model fit. We propose a model selection procedure that estimates the squared l2-deviation of a finite-dimensional estimator from its target. The procedure relies on knowing an asymptotically unbiased "benchmark estimator" of the parameter of interest. Under regularity conditions, we investigate bias and variance of the proposed criterion compared to competing procedures and derive a finite-sample bound for the excess risk compared to an oracle procedure. The resulting estimator is discontinuous and does not have a Gaussian limit distribution. Thus, standard asymptotic expansions do not apply. We derive asymptotically valid confidence intervals that take into account the model selection step. The performance of the approach for estimation and inference for average treatment effects is evaluated on simulated data sets, including experimental data, instrumental variables settings, and observational data with selection on observables.