Ollie：基于推导的张量程序优化器

论文标题

Ollie：基于推导的张量程序优化器

OLLIE: Derivation-based Tensor Program Optimizer

论文作者

Zheng, Liyan, Wang, Haojie, Zhai, Jidong, Hu, Muyan, Ma, Zixuan, Wang, Tuowei, Tang, Shizhi, Xie, Lei, Huang, Kezhao, Jia, Zhihao

论文摘要

由于它们在现实世界中的广泛采用，提高深神经网络（DNN）的运行时性能至关重要。现有的优化DNN张量代数表达的方法仅考虑由固定的预定义运算符表示的表达式，而一般表达式之间可能缺少可能的优化机会。我们提出了Ollie，这是第一个基于衍生的张量程序优化器。 Ollie通过利用一般张量代数表达式之间的转换来优化张量程序，从而实现了一个更大的表达搜索空间，其中包括由先前工作作为特殊情况支持的搜索空间。 Ollie使用基于混合衍生的优化器，该优化器有效地结合了探索性和指导性推导，以快速发现高度优化的表达式。对七个DNN的评估表明，Ollie可以在A100 GPU上胜过2.73 $ \ times $（平均为1.46 $ \ times $），分别在V100 GPU上分别在V100 GPU上胜过2.68 $ \ times $（1.51 $ \ times $）。

Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks. Existing approaches to optimizing the tensor algebra expression of a DNN only consider expressions representable by a fixed set of predefined operators, missing possible optimization opportunities between general expressions. We propose OLLIE, the first derivation-based tensor program optimizer. OLLIE optimizes tensor programs by leveraging transformations between general tensor algebra expressions, enabling a significantly larger expression search space that includes those supported by prior work as special cases. OLLIE uses a hybrid derivation-based optimizer that effectively combines explorative and guided derivations to quickly discover highly optimized expressions. Evaluation on seven DNNs shows that OLLIE can outperform existing optimizers by up to 2.73$\times$ (1.46$\times$ on average) on an A100 GPU and up to 2.68$\times$ (1.51$\times$) on a V100 GPU, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题