论文标题
学习概率分布的数学框架
A Mathematical Framework for Learning Probability Distributions
论文作者
论文摘要
概率分布的建模,特别是生成性建模和密度估计,近年来由于其在图像和文本等复杂数据上的出色性能而成为非常流行的主题。然而,对其成功的理论理解仍然是不完整的。一个谜是记忆和概括之间的悖论:从理论上讲,该模型的训练与有限样本的经验分布完全相同,而在实践中,训练有素的模型可以生成新的样本或估计不见了样品的可能性。同样,分销学习模型的压倒性多样性也需要对此主题进行统一的观点。本文提供了一个数学框架,以便可以根据简单的原理得出所有著名的模型。为了证明其功效,我们对这些模型的近似错误,训练错误和概括错误的结果进行了调查,这些模型都可以根据此框架来建立。特别是,通过证明这些模型在训练过程中享有隐式正规化,从而解决了上述悖论,从而避免了早期阶段的概括误差避免了维度的诅咒。此外,我们为景观分析和模式崩溃现象提供了一些新的结果。
The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.