奖励优化推荐系统的离线评估：模拟情况

论文标题

奖励优化推荐系统的离线评估：模拟情况

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

论文作者

Aouali, Imad, Benhalloum, Amine, Bompaire, Martin, Heymann, Benjamin, Jeunen, Olivier, Rohde, David, Sakhi, Otmane, Vasile, Flavian

论文摘要

在基于学术和行业的研究中，在线评估方法都被视为推荐系统等交互应用的黄金标准。自然，这样做的原因是，我们可以直接测量依赖干预措施的实用程序指标，这是向用户显示的建议。然而，由于多种原因，在线评估方法是昂贵的，并且对于可靠的离线评估程序仍然存在明确的需求。在行业中，离线指标通常被用作一线评估，以生成有前途的候选模型来在线评估。在学术工作中，对在线系统的访问有限，使离线指标是验证新方法的事实上的方法。存在两个类别的离线指标：基于代理的方法和反事实方法。一流的相关性与我们关心的在线指标的相关性很差，而后类只提供了在现实世界中无法实现的假设下提供的理论保证。在这里，我们表明基于模拟的比较为离线指标提供了前进的方向，并认为它们是可取的评估手段。

Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and the latter class only provides theoretical guarantees under assumptions that cannot be fulfilled in real-world environments. Here, we make the case that simulation-based comparisons provide ways forward beyond offline metrics, and argue that they are a preferable means of evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题