您仅线性化一次：切线转置向梯度

论文标题

您仅线性化一次：切线转置向梯度

You Only Linearize Once: Tangents Transpose to Gradients

论文作者

Radul, Alexey, Paszke, Adam, Frostig, Roy, Johnson, Matthew, Maclaurin, Dougal

论文摘要

自动分化（AD）通常被理解为一种不同算法的家族，植根于两个“模式” - 前向和反向 - 通常分别呈现（并实现）。只能有一个吗？跟随在JAX和DEX项目中开发的AD系统后，我们将反向模式AD的分解形式化为（i）向前模式AD，然后将（ii）解压缩线性和非线性部分，然后（III）线性部分的换位。为此，我们定义了一个（子结构）线性类型系统，该系统可以证明一类函数是（代数）线性的。我们的主要结果是前向模式AD会产生这种线性函数，并且我们可以解开和转置任何此类线性函数，保留成本，大小和线性性。组成这三个转换恢复了反向模式AD。这种分解还阐明了检查点，这是自然而然的选择，这是从解开“ let”表达式中的自由选择中出现的。作为推论，检查点技术适用于通用部分评估，而不仅仅是AD。我们希望我们的形式化将通过将差异化的关注点与获得效率的关注（即将衍生性计算与向后运行的行为分开），从而简化对自动分化的更深入了解，并可以简化实施。

Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part. To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping `let` expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD. We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).

下载PDF全文

下载文献需遵守相关版权规定

论文标题