A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to ratings-converted-to-comparisons. Other examples include sports data analysis and peer grading. In this paper, we design two-sample tests for pairwise comparison data and ranking data. For our two-sample test for pairwise comparison data, we establish an upper bound on the sample complexity required to correctly distinguish between the distributions of the two sets of samples. Our test requires essentially no assumptions on the distributions. We then prove complementary lower bounds showing that our results are tight (in the minimax sense) up to constant factors. We investigate the role of modeling assumptions by proving lower bounds for a range of pairwise comparison models (WST, MST,SST, parameter-based such as BTL and Thurstone). We also provide testing algorithms and associated sample complexity bounds for the problem of two-sample testing with partial (or total) ranking data.Furthermore, we empirically evaluate our results via extensive simulations as well as two real-world datasets consisting of pairwise comparisons. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently. On the other hand, our test recognizes no significant difference in the relative performance of European football teams across two seasons. Finally, we apply our two-sample test on a real-world partial and total ranking dataset and find a statistically significant difference in Sushi preferences across demographic divisions based on gender, age and region of residence.
翻译:例如,在众包中,人们提供的对比比较数据是否分布得类似于评级、转换成比较的数据,这是一个长期存在的问题。其他例子包括体育数据分析和同级分级。在本文中,我们设计了双比比较数据和排序数据的双比抽样测试。在双比比较数据的两个抽样测试中,我们为正确区分两组样本的分布所需的抽样复杂性设定了一个上层界限。我们的测试基本上不需要对分布进行假设。然后,我们证明我们提供的对比数据是否分布得类似评级、转换为对等的数据。其他例子包括:体育数据分析和同级分级。我们设计了对双比的双比重测试。我们用双比值测试算法和相关的抽样复杂性来正确区分两组样本的分布。我们用部分(或全部)数据分级进行测试。此外,我们通过模拟数据测试结果,我们用两种不同等级的比值来评估世界的对比结果。我们用两种不同的比值,我们用两个不同的比值来进行真正的比值,通过模拟测试数据,我们用两种比值来评估我们的数据。