论文标题
网络搜索评估的相关性评估:我们应该随机评估还是优先考虑合并文档? (更正版本)
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)
论文作者
论文摘要
在用于构建Web搜索测试收集的深度$ k $的背景下,我们比较了订购相关性评估者的汇总文档的两种方法:NTCIR广泛使用的优先级策略(PRI),以及简单的随机化策略(RND)。为了解决有关PRI和RND的研究问题,我们已经构建并发布了www3e8数据集,其中包含八个独立相关性标签,用于32,375个主题文档对,即总共259,000个标签。八个相关标签中有四个是从PRI基池中获得的。另外四个是从RND基池中获得的。使用www3e8,我们根据评估协议,系统排名协议以及对不为池的新系统进行了比较PRI和RND。我们还利用了我们获得的评估者活动日志,以WWW3E8的副产品来比较评估效率的两种策略。
In the context of depth-$k$ pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: the prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 data set, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency.