美白和二阶优化既可以使数据集中的信息在培训期间无法使用，并且可以减少或防止概括

论文标题

美白和二阶优化既可以使数据集中的信息在培训期间无法使用，并且可以减少或防止概括

Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization

论文作者

Wadia, Neha S., Duckworth, Daniel, Schoenholz, Samuel S., Dyer, Ethan, Sohl-Dickstein, Jascha

论文摘要

机器学习是基于概括的概念：在足够大的训练集上达到低误差的模型也应在来自同一分布的新样本上表现良好。我们表明，数据美白和二阶优化都会损害或完全防止概括。通常，模型训练利用数据集的样本样本第二矩矩阵中包含的信息。对于一般模型，即具有完全连接的第一层的模型，我们证明该矩阵中包含的信息是唯一可用于概括的信息。使用白色数据或使用一定二阶优化方案训练的模型，对此信息的访问较少，从而降低或不存在概括能力。我们通过实验验证了几种架构的这些预测，并进一步证明，即使理论要求放宽了，也会继续损害概括。但是，我们还通过实验表明，正规的二阶优化可以提供一个实用的权衡，在这种方案中，培训加速但丢失了信息较少，在某些情况下，概括甚至可以改善。

Machine learning is predicated on the concept of generalization: a model achieving low error on a sufficiently large training set should also perform well on novel samples from the same distribution. We show that both data whitening and second order optimization can harm or entirely prevent generalization. In general, model training harnesses information contained in the sample-sample second moment matrix of a dataset. For a general class of models, namely models with a fully connected first layer, we prove that the information contained in this matrix is the only information which can be used to generalize. Models trained using whitened data, or with certain second order optimization schemes, have less access to this information, resulting in reduced or nonexistent generalization ability. We experimentally verify these predictions for several architectures, and further demonstrate that generalization continues to be harmed even when theoretical requirements are relaxed. However, we also show experimentally that regularized second order optimization can provide a practical tradeoff, where training is accelerated but less information is lost, and generalization can in some circumstances even improve.

下载PDF全文

下载文献需遵守相关版权规定

论文标题