使用Taylor系列近似迈向可扩展双曲线神经网络

论文标题

使用Taylor系列近似迈向可扩展双曲线神经网络

Towards Scalable Hyperbolic Neural Networks using Taylor Series Approximations

论文作者

Choudhary, Nurendra, Reddy, Chandan K.

论文摘要

双曲线网络在涉及各个领域的层次数据集（例如计算机视觉，图形分析和自然语言处理）的几个领域中表现出对其欧几里得对应物的显着改善。但是，由于（i）对加速深度学习硬件的不可易度性，（ii）由于闭合双曲线空间的关闭而消失的梯度以及（iii）由于局部切线空间和完全双曲线空间之间的频繁映射而导致的信息损失，因此它们在实践中的采用仍然受到限制。为了解决这些问题，我们建议使用Taylor系列扩展对双曲线操作员进行近似，这使我们能够将计算昂贵的切线和余弦双曲线功能重新制定为更有效的多项式型号。这使我们能够保留保留双曲线空间的分层解剖结构的好处，同时保持了当前加速深度学习基础设施的可伸缩性。多项式配方还使我们能够利用欧几里得网络中的进步，例如梯度剪辑和relu激活，以避免消失的梯度并消除由于经常在切线空间和多重纤维空间之间切换而导致的错误。我们对图形分析和计算机视觉领域标准基准测试的经验评估表明，在记忆和时间复杂性方面，我们的多项式公式与欧几里得体系结构一样可扩展，同时提供的结果与双曲线模型一样有效。此外，由于我们解决了消失的梯度和信息丢失，我们的配方还显示出对基线的大幅改进。

Hyperbolic networks have shown prominent improvements over their Euclidean counterparts in several areas involving hierarchical datasets in various domains such as computer vision, graph analysis, and natural language processing. However, their adoption in practice remains restricted due to (i) non-scalability on accelerated deep learning hardware, (ii) vanishing gradients due to the closure of hyperbolic space, and (iii) information loss due to frequent mapping between local tangent space and fully hyperbolic space. To tackle these issues, we propose the approximation of hyperbolic operators using Taylor series expansions, which allows us to reformulate the computationally expensive tangent and cosine hyperbolic functions into their polynomial equivariants which are more efficient. This allows us to retain the benefits of preserving the hierarchical anatomy of the hyperbolic space, while maintaining the scalability over current accelerated deep learning infrastructure. The polynomial formulation also enables us to utilize the advancements in Euclidean networks such as gradient clipping and ReLU activation to avoid vanishing gradients and remove errors due to frequent switching between tangent space and hyperbolic space. Our empirical evaluation on standard benchmarks in the domain of graph analysis and computer vision shows that our polynomial formulation is as scalable as Euclidean architectures, both in terms of memory and time complexity, while providing results as effective as hyperbolic models. Moreover, our formulation also shows a considerable improvement over its baselines due to our solution to vanishing gradients and information loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题