论文标题
对稀疏神经网络的拓扑见解
Topological Insights into Sparse Neural Networks
论文作者
论文摘要
稀疏的神经网络是减少深层神经网络部署资源需求的有效方法。最近,出现了自适应稀疏连接性的概念,以通过优化训练期间的稀疏结构来允许训练稀疏的神经网络。但是,比较不同的稀疏拓扑结构并确定培训过程中稀疏拓扑的发展,尤其是在涉及稀疏结构优化的情况下,仍然是挑战性的开放问题。随着可能的拓扑比较的数量随着网络的大小呈指数增长,这种比较变得越来越复杂。在这项工作中,我们从图理论的角度介绍了一种方法来理解和比较稀疏神经网络拓扑。我们首先提出神经网络稀疏拓扑距离(NNSTD),以测量不同稀疏神经网络之间的距离。此外,我们证明了稀疏的神经网络可以在性能方面胜过过度参数的模型,即使没有任何进一步的结构优化。最后,我们还表明,自适应稀疏连接可以通过量化和比较其拓扑进化过程来揭示具有非常不同拓扑的稀疏子网络。后者的发现补充了彩票票证假设,表明有一种更有效,更强大的方法可以找到“获胜门票”。总而言之,我们的结果开始使对稀疏神经网络的理论理解有了更好的理解,并证明了使用图理论分析它们的实用性。
Sparse neural networks are effective approaches to reduce the resource requirements for the deployment of deep neural networks. Recently, the concept of adaptive sparse connectivity, has emerged to allow training sparse neural networks from scratch by optimizing the sparse structure during training. However, comparing different sparse topologies and determining how sparse topologies evolve during training, especially for the situation in which the sparse structure optimization is involved, remain as challenging open questions. This comparison becomes increasingly complex as the number of possible topological comparisons increases exponentially with the size of networks. In this work, we introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory. We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks. Further, we demonstrate that sparse neural networks can outperform over-parameterized models in terms of performance, even without any further structure optimization. To the end, we also show that adaptive sparse connectivity can always unveil a plenitude of sparse sub-networks with very different topologies which outperform the dense model, by quantifying and comparing their topological evolutionary processes. The latter findings complement the Lottery Ticket Hypothesis by showing that there is a much more efficient and robust way to find "winning tickets". Altogether, our results start enabling a better theoretical understanding of sparse neural networks, and demonstrate the utility of using graph theory to analyze them.