论文标题
深层体重空间中的部分局部熵和各向异性
Partial local entropy and anisotropy in deep weight spaces
论文作者
论文摘要
我们通过将平滑正则化仅限于一部分重量来完善最近提供的局部熵损失功能。新的损失功能称为部分局部熵。它们可以适应重量空间各向异性,从而超过各向同性的对应物。我们通过对多层,完全连接和卷积神经网络执行的图像分类任务进行实验来支持理论分析。本研究表明,如何更好地利用深层景观的各向异性性质,并直接探测了随机梯度下降算法所遇到的最小值的形状。作为副产品,我们在晚期训练时间观察到一个渐近动力学状态,其中所有层的温度都服从了常见的冷却行为。
We refine a recently-proposed class of local entropic loss functions by restricting the smoothening regularization to only a subset of weights. The new loss functions are referred to as partial local entropies. They can adapt to the weight-space anisotropy, thus outperforming their isotropic counterparts. We support the theoretical analysis with experiments on image classification tasks performed with multi-layer, fully-connected and convolutional neural networks. The present study suggests how to better exploit the anisotropic nature of deep landscapes and provides direct probes of the shape of the minima encountered by stochastic gradient descent algorithms. As a by-product, we observe an asymptotic dynamical regime at late training times where the temperature of all the layers obeys a common cooling behavior.