自适应抽样进行发现

论文标题

自适应抽样进行发现

Adaptive Sampling for Discovery

论文作者

Xu, Ziping, Shim, Eunjae, Tewari, Ambuj, Zimmerman, Paul

论文摘要

在本文中，我们研究了一个顺序决策问题，称为发现的自适应抽样（ASD）。从大型未标记数据集开始，ASD的算法以最大化响应总和的目标自适应标记点。这个问题在现实世界的发现问题上有广泛的应用，例如在机器学习模型的帮助下发现药物。 ASD算法面临着众所周知的探索探索困境。该算法需要选择能够产生信息以改善模型估计的点，但还需要利用模型。我们严格提出问题，并提出了一般信息指导的采样（IDS）算法。我们为ID在线性，图形和低级别模型中的性能提供理论保证。在模拟实验和实际数据实验中显示了ID的好处，以发现化学反应条件。

In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题