学习内核测试而无需数据拆分

论文标题

学习内核测试而无需数据拆分

Learning Kernel Tests Without Data Splitting

论文作者

Kübler, Jonas M., Jitkrittum, Wittawat, Schölkopf, Bernhard, Muandet, Krikamol

论文摘要

现代大规模内核测试，例如最大平均差异（MMD）和内核化的Stein差异（KSD），通过数据拆分在固定样本上优化了内核超参数，以获得最强大的测试统计数据。虽然数据拆分导致可处理的无效分布，但由于测试样本量较小而导致测试功率的降低。受选择性推理框架的启发，我们提出了一种方法，该方法可以在无数据分配的情况下学习超参数并在完整样本上进行测试。我们的方法可以在存在这种依赖的情况下正确校准测试，并以封闭形式产生测试阈值。在相同的显着性水平上，我们的方法的测试能力在经验上大于数据拆分方法，而不管其分裂比例如何。

Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题