论文标题

部分可观测时空混沌系统的无模型预测

LU decomposition and Toeplitz decomposition of a neural network

论文作者

Liu, Yucong, Jiao, Simiao, Lim, Lek-Heng

论文摘要

众所周知,任何矩阵$ a $都有LU分解。鲜为人知的是,它具有“ toeplitz分解” $ a = t_1 t_1 t_2 \ cdots t_r $,其中$ t_i $是toeplitz矩阵。我们将证明任何连续函数$ f:\ mathbb {r}^n \ to \ mathbb {r}^m $具有神经网络对任意准确性的近似值,该神经网络采用表格$l_1σ_1σ_1σ_1σ_2u_2 l_2 l_2 l_2 l_2 f i_3 U_3 U_3 U_2 \ cdots l_r l_r proge Mat werive Mat wery wery wery wery wery在上层和上三角矩阵之间,对于某些偏置向量$ b_i $,$σ_i(x):=σ(x -b_i)$,并且可以选择激活$σ$本质上是任何均匀连续的非脉冲函数。相同的结果也与toeplitz矩阵(即$ f \ oft_1σ_1T_1 T_2σ_2\cdotsσ_{r-1} T_R $达到任意准确性,对于Hankel矩阵也是如此。我们的toeplitz结果的结果是卷积神经网络的固定宽度定理,到目前为止,该定理只有任意的宽度版本。由于我们的结果特别适用于$ f $是通用神经网络的情况,因此我们可以将它们视为神经网络的Lu和Toeplitz分解。结果的实际含义是,一个人可能会大大减少神经网络中的权重参数的数量,而不会牺牲其通用近似的力量。我们将在真实数据集上介绍几个实验,以表明将这种结构施加在重量矩阵上会急剧减少训练参数的数量,几乎对测试准确性没有明显的影响。

It is well-known that any matrix $A$ has an LU decomposition. Less well-known is the fact that it has a 'Toeplitz decomposition' $A = T_1 T_2 \cdots T_r$ where $T_i$'s are Toeplitz matrices. We will prove that any continuous function $f : \mathbb{R}^n \to \mathbb{R}^m$ has an approximation to arbitrary accuracy by a neural network that takes the form $L_1 σ_1 U_1 σ_2 L_2 σ_3 U_2 \cdots L_r σ_{2r-1} U_r$, i.e., where the weight matrices alternate between lower and upper triangular matrices, $σ_i(x) := σ(x - b_i)$ for some bias vector $b_i$, and the activation $σ$ may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., $f \approx T_1 σ_1 T_2 σ_2 \cdots σ_{r-1} T_r$ to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when $f$ is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices sharply reduces the number of training parameters with almost no noticeable effect on test accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源