论文标题
从嵌入欧几里得空间中的低维数据中学习的副作用
Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
论文作者
论文摘要
低维歧管假设认为,在许多应用中发现的数据(例如涉及自然图像的数据)(大约)位于嵌入高维欧几里得空间中的低维歧管上。在这种情况下,典型的神经网络定义了一个功能,该函数将嵌入空间中有限数量的向量作为输入。但是,通常需要考虑在训练分布以外的点上评估优化网络。本文考虑了培训数据以$ \ mathbb r^d $的线性子空间分配的情况。我们得出了关于由神经网络定义的学习函数变化的估计,沿横向子空间的方向。我们研究了数据歧管的编纂中与网络的深度和噪声相关的潜在正则化效应。由于存在噪声,我们还提出了训练中的其他副作用。
The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evaluating the optimized network at points outside the training distribution. This paper considers the case in which the training data is distributed in a linear subspace of $\mathbb R^d$. We derive estimates on the variation of the learning function, defined by a neural network, in the direction transversal to the subspace. We study the potential regularization effects associated with the network's depth and noise in the codimension of the data manifold. We also present additional side effects in training due to the presence of noise.