超越最佳：估计无限武器匪徒中的分布功能

论文标题

超越最佳：估计无限武器匪徒中的分布功能

Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

论文作者

Wang, Yifei, Baharav, Tavor, Han, Yanjun, Jiao, Jiantao, Tse, David

论文摘要

在无限武器的强盗问题中，每个臂的平均奖励都是从未知分布中取样的，并且可以进一步采样每个手臂，以获取该臂平均奖励的嘈杂估计。先前的工作着重于确定最佳臂，即估计平均奖励分布的最大值。我们考虑了超出最大值的一般分配功能，并为离线和在线设置提出了统一的元算法，从而达到了最佳样本复杂性。我们表明，在线估计中，学习者可以依次选择采样新的或现有的ARM，在离线设置中没有任何优势来估计平均功能，但大大降低了其他功能的样品复杂性，例如中位数，最大和修剪平均值。匹配的下边界利用了几个不同的Wasserstein距离。对于中位数估计的特殊情况，我们确定了关于高斯卷积在噪声水平之间的不可区分性的奇怪阈值现象，这可能具有独立的兴趣。

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on identifying the best arm, i.e., estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings, achieving optimal sample complexities. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. The matching lower bounds utilize several different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题