论文标题

独立性:快速排名测试

independence: Fast Rank Tests

论文作者

Even-Zohar, Chaim

论文摘要

1948年,Hoeffding设计了一个非参数测试,该测试基于N配对样品的排名(XI,YI),检测两个连续的随机变量X和Y之间的依赖性。此常用测试统计量的计算需要O(n log n)时间。 Hoeffding的测试与任何因概率密度F(x,y)的测试是一致的,但可以被其他具有连续边缘的双变量分布所欺骗。 Blum,Kiefer和Rosenblatt(1961),Yanagimoto(1970),Bergsma和Dassios(2010)考虑了该测试的变体。迄今为止最著名的计算这些更强独立性测试的算法需要二次时间。在这里,我们从作者和Leng(Soda'21)的最新论文中详细阐述了计算排名模式的新方法,从而提高了O(n log n)的运行时间。因此,在所有情况下,适用经典的Hoeffding独立测试,我们提供了新颖的竞争算法,以对所有替代方案进行一致的测试。我们的R软件包独立性提供了这些基于等级测试的高度优化实现。我们在大规模数据集上演示了其功能。

In 1948 Hoeffding devised a nonparametric test that detects dependence between two continuous random variables X and Y, based on the ranking of n paired samples (Xi,Yi). The computation of this commonly-used test statistic takes O(n log n) time. Hoeffding's test is consistent against any dependent probability density f(x,y), but can be fooled by other bivariate distributions with continuous margins. Variants of this test with full consistency have been considered by Blum, Kiefer, and Rosenblatt (1961), Yanagimoto (1970), Bergsma and Dassios (2010). The so far best known algorithms to compute these stronger independence tests have required quadratic time. Here we improve their run time to O(n log n), by elaborating on new methods for counting ranking patterns, from a recent paper by the author and Leng (SODA'21). Therefore, in all circumstances under which the classical Hoeffding independence test is applicable, we provide novel competitive algorithms for consistent testing against all alternatives. Our R package, independence, offers a highly optimized implementation of these rank-based tests. We demonstrate its capabilities on large-scale datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源