论文标题
神经网络近似
Neural Network Approximation
论文作者
论文摘要
神经网络(NNS)是构建学习算法的首选方法。它们的受欢迎程度源于他们在几个具有挑战性的学习问题上的经验成功。但是,大多数学者都同意,仍然缺乏令人信服的理论解释。 本文调查了NNS输出的已知近似属性,目的是发现数值分析中使用的更传统的近似方法中不存在的属性。从速率失真的角度与传统近似方法进行比较。数值近似分析的另一个主要组成部分是构造近似所需的计算时间,而这又与近似算法的稳定性密切相关。因此,使用NNS的数值近似值的稳定性是提出的很大一部分。 该调查在大多数情况下与使用流行的Relu激活函数有关的NNS。在这种情况下,NNS的输出是分段线性函数在$ f $的域的相当复杂的分区上,在凸多属的单元格中。当固定NN的体系结构并允许参数变化时,NN的输出功能集是参数化的非线性歧管。结果表明,该歧管具有某些空间填充属性,从而提高了近似能力(更好的速率失真),但以数值稳定性为代价。填充空间在尝试近似时找到最佳或良好的参数选择时,为数值方法带来了挑战。
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case, the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of $f$ into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parameterized nonlinear manifold. It is shown that this manifold has certain space filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates a challenge to the numerical method in finding best or good parameter choices when trying to approximate.