通过不变 - 特征子空间恢复可证明的域概括

论文标题

通过不变 - 特征子空间恢复可证明的域概括

Provable Domain Generalization via Invariant-Feature Subspace Recovery

论文作者

Wang, Haoxiang, Si, Haozhe, Li, Bo, Zhao, Han

论文摘要

域的概括要求在一组培训环境中训练的模型在看不见的测试环境中表现良好。最近，已经提出了一系列算法（例如不变风险最小化（IRM））进行域泛化。但是，Rosenfeld等。（2021）表明，在简单的线性数据模型中，即使忽略了非凸性问题，IRM及其扩展也无法推广到少于$ d_s+1 $ 1 $训练环境的看不见的环境，其中$ d_s $是损害 - 娱乐子空间的维度。在本文中，我们建议使用不变的-Feature子空间恢复（ISR）实现域的概括。我们的第一种算法ISR均值可以识别从类条件分布的一阶矩中不变特征跨越的子空间，并在Rosenfeld等人的数据模型下使用$ d_s+1 $训练环境实现可证明的域概括。（2021）。我们的第二种算法是ISR-COV，进一步将所需的培训环境数量减少到$ O（1）$，使用二阶矩信息。值得注意的是，与IRM不同，我们的算法绕过了非凸度问题并享受全球融合保证。从经验上讲，与IRM相比，我们的ISR可以在合成基准上获得卓越的性能。此外，在三个现实世界的图像和文本数据集上，我们表明这两个ISR可以用作简单但有效的后加工方法，以提高（预）训练有素的模型的最差案例准确性，以抵抗虚假的相关性和群体变化。

Domain generalization asks for models trained over a set of training environments to perform well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) has been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this paper, we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments under the data model of Rosenfeld et al. (2021). Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Empirically, our ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题