论文标题
部分可观测时空混沌系统的无模型预测
On the detrimental effect of invariances in the likelihood for variational inference
论文作者
论文摘要
变异贝叶斯后推理通常需要简化近似值,例如平均场参数构型以确保障碍性。但是,先前的工作已将贝叶斯神经网络的变分平均近似值与小型数据集或大型模型大小相关联。在这项工作中,我们表明,过度参数模型的可能性函数的不变函数有助于这种现象,因为这些不变通过引入离散和/或连续模式来使后验的结构复杂化,而高斯均值均不能很好地近似。特别是,我们表明平均场近似在证据下限制的额外差距与专门建造的后部相比,考虑到已知的不变。重要的是,这种不变差距并不恒定。随着近似值恢复为先验,它消失了。我们首先在线性模型中首先考虑具有详细数据点的线性模型中的翻译不变。我们表明,虽然可以从平均场参数化构建真正的后验,但仅当目标函数考虑不变性差距时才能实现。然后,我们将线性模型的分析转移到神经网络。我们的分析为将来的工作提供了一个框架,以探索解决不变性问题的解决方案。
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.