论文标题
统一的监督学习和VAE-基于正常化的神经网络模型的覆盖范围,系统学和拟合优度
Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions
论文作者
论文摘要
基于神经网络的天文粒子物理事件特性的预测越来越普遍。但是,在许多情况下,结果只是用作点预测。通常不计算统计不确定性,覆盖范围,系统不确定性或合适性措施。在这里,我们描述了培训和网络体系结构的某些选择,这些选择允许将所有这些属性纳入单个网络模型。我们表明,数据和标签的联合分布的KL-Divergence目标允许在随机变化推断的一个保护伞下统一监督的学习和变分自动编码器(VAE)。统一激发了扩展的监督学习计划,该方案允许计算神经网络模型的合适性P值。通过神经网络摊销的条件归一化流在这种结构中至关重要。我们讨论如何在没有数值集成的情况下计算覆盖率概率,以针对标准化流的特定“基订购”轮廓。此外,我们还展示了如何通过在训练过程中有效边缘化来包括系统的不确定性。拟议的扩展监督训练包括(1)覆盖范围计算,(2)系统学和(3)单个机器学习模型中的拟合优度度量。原则上,对所涉及的分布的形状没有任何约束,实际上,机械与复杂的多模式分布一起使用,这些分布在$ \ mathbb {r}^n \ times \ times \ mathbb {s}^m $之类的产品空间上定义。但是,当分布过于退化时,覆盖范围的计算需要在解释中的注意。我们看到在事件选择中利用此事实信息的巨大潜力或需要不确定性保证的快速天文警报。
Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathbb{S}^m$. The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.