论文标题
张量程序III:神经矩阵法律
Tensor Programs III: Neural Matrix Laws
论文作者
论文摘要
在神经网络(NN)中, *权重矩阵 *线性将输入转换为 *预反应 *,然后将其非线性转换为 *激活 *。典型的NN交织了这种线性和非线性转换以表达复杂函数的众多。因此,(前)激活以复杂的方式取决于权重。我们表明,令人惊讶的是,在随机矩阵理论中,随着NN的宽度倾向于无穷大的(NN的宽度往往无穷大),随机初始化nn的(预 - )激活与权重。我们将其称为自由独立原理(FIP),它具有这些后果:1)严格证明在Pennington等人中,NN的渐近Jacobian奇异值分布的计算是合理的。 [36,37],对于训练超深NNS至关重要[48]。 2)它提供了用于计算神经网络神经切线核的梯度独立性假设的新理由。 FIP和这些结果适用于任何神经体系结构。我们通过证明在Yang [50,51]中介绍的任何张量程序的硕士定理来显示FIP,从而概括了这些作品中的主定理。作为对这个新的大师定理的热身演示,我们给出了半圆形和Marchenko-Pastur法律的新证明,该法律基准了我们反对这些基本数学结果的框架。
In a neural network (NN), *weight matrices* linearly transform inputs into *preactivations* that are then transformed nonlinearly into *activations*. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become *independent* from the weights as the NN's widths tend to infinity, in the sense of asymptotic freeness in random matrix theory. We call this the Free Independence Principle (FIP), which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of gradient independence assumption used for calculating the Neural Tangent Kernel of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results.