对于自我监督的学习，理性意味着概括，事实证明

论文标题

对于自我监督的学习，理性意味着概括，事实证明

For self-supervised learning, Rationality implies generalization, provably

论文作者

Bansal, Yamini, Kaplun, Gal, Barak, Boaz

论文摘要

我们证明了一个新的上限在分类器的概括差距上，该限制是通过首先使用自upervision来学习培训数据的表示$ r $而获得的，然后将简单（例如线性）分类器$ g $拟合到标签上。具体来说，我们表明（根据下面描述的假设），如果$ \ m athsf {c}（c}（g）\ ll n $，此类分类器的概括差异往往为零，其中$ \ Mathsf {c}（g）$是对简单的分类$ G $ g $ g $ n $ n $ sploot samples的适当定义的量度。我们强调，我们的界限独立于表示$ r $的复杂性。我们没有对表示学习任务进行任何结构或条件独立的假设，该假设可以使用后来用于分类的相同训练数据集。相反，我们假设训练程序满足某些自然的噪声刺激性（增加少量的标签噪声会导致性能较小）和合理性（获得错误的标签比根本没有获得标签的条件都不好），这些条件在许多标准建筑中广泛存在。我们表明，对于许多基于CIFAR-10和Imagenet的基于流行的表示学习的分类器，包括Simclr，Amdim和Moco在内的许多流行表示学习的分类器是无效的。

We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e.g., linear) classifier $g$ to the labels. Specifically, we show that (under the assumptions described below) the generalization gap of such classifiers tends to zero if $\mathsf{C}(g) \ll n$, where $\mathsf{C}(g)$ is an appropriately-defined measure of the simple classifier $g$'s complexity, and $n$ is the number of training samples. We stress that our bound is independent of the complexity of the representation $r$. We do not make any structural or conditional-independence assumptions on the representation-learning task, which can use the same training dataset that is later used for classification. Rather, we assume that the training procedure satisfies certain natural noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely hold across many standard architectures. We show that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and MoCo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题