Kuairec：一个完全观察到的数据集和用于评估推荐系统的见解

论文标题

Kuairec：一个完全观察到的数据集和用于评估推荐系统的见解

KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems

论文作者

Gao, Chongming, Li, Shijun, Lei, Wenqiang, Chen, Jiawei, Li, Biao, Jiang, Peng, He, Xiangnan, Mao, Jiaxin, Chua, Tat-Seng

论文摘要

推荐系统的进度主要通过评估来阻碍，因为它需要人类与系统之间的实时互动，这太费力且昂贵。通常通过利用互动历史进行离线评估来解决此问题。但是，部分观察到了用户项目交互的现有数据集，因此尚不清楚丢失的交互如何以及在多大程度上影响评估。为了回答这个问题，我们从Kuaishou的在线环境中收集了一个完全观察到的数据集，那里几乎所有1,411个用户都暴露于所有3,327个项目中。据我们所知，这是第一个现实世界中完全观察到的数据，具有数百万个用户相互作用。通过这个独特的数据集，我们对两个因素（数据密度和暴露偏见）如何影响多轮对话建议的评估结果进行初步分析。我们的主要发现是，不同方法的性能排名随两个因素而变化，只有通过估计用户模拟缺失的交互作用，才能缓解这种效果。这证明了完全观察到的数据集的必要性。我们在https://kuairec.com上发布数据集和管道实现。

The progress of recommender systems is hampered mainly by evaluation as it requires real-time interactions between humans and systems, which is too laborious and expensive. This issue is usually approached by utilizing the interaction history to conduct offline evaluation. However, existing datasets of user-item interactions are partially observed, leaving it unclear how and to what extent the missing interactions will influence the evaluation. To answer this question, we collect a fully-observed dataset from Kuaishou's online environment, where almost all 1,411 users have been exposed to all 3,327 items. To the best of our knowledge, this is the first real-world fully-observed data with millions of user-item interactions. With this unique dataset, we conduct a preliminary analysis of how the two factors - data density and exposure bias - affect the evaluation results of multi-round conversational recommendation. Our main discoveries are that the performance ranking of different methods varies with the two factors, and this effect can only be alleviated in certain cases by estimating missing interactions for user simulation. This demonstrates the necessity of the fully-observed dataset. We release the dataset and the pipeline implementation for evaluation at https://kuairec.com

下载PDF全文

下载文献需遵守相关版权规定

论文标题