论文标题

自适应抽样进行发现

Adaptive Sampling for Discovery

论文作者

Xu, Ziping, Shim, Eunjae, Tewari, Ambuj, Zimmerman, Paul

论文摘要

在本文中,我们研究了一个顺序决策问题,称为发现的自适应抽样(ASD)。从大型未标记数据集开始,ASD的算法以最大化响应总和的目标自适应标记点。 这个问题在现实世界的发现问题上有广泛的应用,例如在机器学习模型的帮助下发现药物。 ASD算法面临着众所周知的探索探索困境。该算法需要选择能够产生信息以改善模型估计的点,但还需要利用模型。我们严格提出问题,并提出了一般信息指导的采样(IDS)算法。我们为ID在线性,图形和低级别模型中的性能提供理论保证。在模拟实验和实际数据实验中显示了ID的好处,以发现化学反应条件。

In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源