论文标题
对排名偏好数据的两样本测试和建模假设的作用
Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions
论文作者
论文摘要
许多应用程序需要对排名偏好数据进行两样本测试。例如,在众包中,有一个长期存在的问题是,人们提供的成对比较数据是否分布类似于转换到综合子的评分。其他示例包括运动数据分析和同行分级。在本文中,我们设计了两次样本测试,以进行成对比较数据和排名数据。对于我们的两样本测试以进行成对比较数据,我们在正确区分两组样品的分布所需的样品复杂性上建立了上限。我们的测试本质上不需要对分布的假设。然后,我们证明了互补的下限,表明我们的结果很紧(从最小的意义上)到恒定因素。我们通过证明对一系列成对比较模型(WST,MST,SST,基于参数(例如BTL和Thurstone))的下限来研究建模假设的作用。我们还提供测试算法和相关的样品复杂性界限,以通过部分(或总)排名数据的两样本测试问题。FURTHERMORE,我们通过广泛的模拟以及两个由成对比较组成的现实模拟以及两个现实世界中的数据集进行了经验评估我们的结果。通过将我们的两样本测试应用于现实世界成对比较数据,我们得出结论,人们提供的评级和排名确实有所不同。另一方面,我们的测试认识到在两个赛季中欧洲足球队的相对表现没有显着差异。最后,我们将我们的两样本测试应用于现实世界中的部分和总排名数据集,并根据性别,年龄和居住区域在人口统计学划分之间找到统计学上的显着差异。
A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to ratings-converted-to-comparisons. Other examples include sports data analysis and peer grading. In this paper, we design two-sample tests for pairwise comparison data and ranking data. For our two-sample test for pairwise comparison data, we establish an upper bound on the sample complexity required to correctly distinguish between the distributions of the two sets of samples. Our test requires essentially no assumptions on the distributions. We then prove complementary lower bounds showing that our results are tight (in the minimax sense) up to constant factors. We investigate the role of modeling assumptions by proving lower bounds for a range of pairwise comparison models (WST, MST,SST, parameter-based such as BTL and Thurstone). We also provide testing algorithms and associated sample complexity bounds for the problem of two-sample testing with partial (or total) ranking data.Furthermore, we empirically evaluate our results via extensive simulations as well as two real-world datasets consisting of pairwise comparisons. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently. On the other hand, our test recognizes no significant difference in the relative performance of European football teams across two seasons. Finally, we apply our two-sample test on a real-world partial and total ranking dataset and find a statistically significant difference in Sushi preferences across demographic divisions based on gender, age and region of residence.