论文标题
使用高斯过程回归的模型选择和信号提取
Model selection and signal extraction using Gaussian Process regression
论文作者
论文摘要
我们提出了一种用于提取弱信号的新型计算方法,其确切的位置和宽度可能是未知的,它是从具有任意功能形式的复杂背景分布中。我们专注于可以自然显示为BINNED Integer计数的数据集,从而证明了我们在大型强生对撞机的Atlas协作中在CERN Open数据集上的方法,其中包含Higgs Boson签名。我们的方法基于高斯流程(GP)回归 - 一种强大而灵活的机器学习技术,使我们能够在不明确指定其功能形式的情况下对背景进行建模,并以强大且可重复的方式分离背景和信号贡献。与功能拟合不同,随着更多数据可用,我们基于GP的方法不需要不断更新。我们讨论了如何选择GP内核类型,考虑了内核复杂性之间的权衡及其捕获背景分布功能的能力。我们表明,与专门针对数据集专门定制的多项式拟合相比,我们的GP框架可用于检测数据中的HIGGS玻色子共振。最后,我们使用马尔可夫链蒙特卡洛(MCMC)采样来确认提取的希格斯签名的统计意义。
We present a novel computational approach for extracting weak signals, whose exact location and width may be unknown, from complex background distributions with an arbitrary functional form. We focus on datasets that can be naturally presented as binned integer counts, demonstrating our approach on the CERN open dataset from the ATLAS collaboration at the Large Hadron Collider, which contains the Higgs boson signature. Our approach is based on Gaussian Process (GP) regression - a powerful and flexible machine learning technique that allowed us to model the background without specifying its functional form explicitly, and to separate the background and signal contributions in a robust and reproducible manner. Unlike functional fits, our GP-regression-based approach does not need to be constantly updated as more data becomes available. We discuss how to select the GP kernel type, considering trade-offs between kernel complexity and its ability to capture the features of the background distribution. We show that our GP framework can be used to detect the Higgs boson resonance in the data with more statistical significance than a polynomial fit specifically tailored to the dataset. Finally, we use Markov Chain Monte Carlo (MCMC) sampling to confirm the statistical significance of the extracted Higgs signature.