Two-Dimensional Kolmogorov-Smirnov Test

A collection on 2D KS test.


The algorithm is first developed in two papers:

You can find a good introduction and the C/C++/Fortran implementation in Section 14.7 (or 14.8 depending on version) of the book Numerical Recipes (Press, W.H. et al).

This Notebook provide a Python implementation for 2D K-S test with 2 samples. The .py file can be downloaded here. The code seems to be a translation of C code, the efficiency might be a problem if sample size is large.

Here's a post titled Beware the Kolmogorov-Smirnov test is also related to the subject, you may want to have a look.


https://asaip.psu.edu/Articles/beware-the-kolmogorov-smirnov-test
提醒说用KS检验要小心。

http://adsabs.harvard.edu/abs/1983MNRAS.202..615P
http://adsabs.harvard.edu/abs/1987MNRAS.225..155F
两篇2d ks 检验的文献。

http://www.subcortex.net/research/code/testing-for-differences-in-multidimensional-distributions
https://github.com/brian-lau/multdist
2-sided。提供了Matlab计算程序,实现了 Fasano & Franceschini的算法
项目现已移到
https://github.com/brian-lau/highdim
里面有很多高维统计的程序。可以看看其他比较分布的方法。

http://cs.marlboro.edu/courses/spring2014/jims_tutorials/ahernandez/Apr_25.attachments/
http://nbviewer.ipython.org/url/cs.marlboro.edu/courses/spring2014/jims_tutorials/ahernandez/Apr_25.attachments/scic_stat_tests.ipynb
http://nbviewer.ipython.org/url/cs.marlboro.edu/courses/spring2014/jims_tutorials/ahernandez/Apr_25.attachments/scic_dist_functions.ipynb
2-sided。提供了python算法,完全是翻译NR上的C算法,估计会很慢。还不如修改matlab版程序。
但它的算法用了第三版 NR 的算法,可能要先进一些。
还有很多NR上的其他函数实现,也可以参考。

2-sided 还是 1-sided 会是一个问题。
参考这里:http://www.ciphersbyritter.com/JAVASCRP/NORMCHIK.HTM#KolSmir

The "one-sided" statistics are:
   Dn+ = MAX( S(x[j]) - F(x[j]) )
       = MAX( ((j+1)/n) - F(x[j]) )

   Dn- = MAX( F(x[j]) - S(x[j]) )
       = MAX( F(x[j]) - (j/n) )
where "MAX" is computed over all j from 0 to n-1. Both of these statistics have the same distribution.
The "two-sided" K-S statistic is:
   Dn* = MAX( ABS( S(x[j]) - F(x[j]) ) )
       = MAX( Dn+, Dn- )

NR上的是2sided
probks_one(20,0.3) = 0.043134911214870418

spicy里的函数是1sided
smirnov(20, 0.3) = 0.021533604376327054

可见在尾巴处 2sided 是1sided的两倍。

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ks.test.html
列出了一些参考文献,讲各种sided怎么算。R的文档倒底要详细一些……

具体到spicy,2sided 用 scipy.stats.kstwobign 或 special.kolmogorov, 1sided 用 spicy.stats.ksone 或 special.smirnov。
在 scipy/special/cephes/kolmogorov.c 源码里也能看到说明。

另外一个教训,一定要看最新的文档!scipy更新得很快。

from scipy.stats import ks_2samp
from kstest2d import ks2d2s
x1=randn(200)+0.1*rand(200)
x2=randn(1000)
ks_2samp(x1,x2)
ks2d2s(x1, x1, x2, x2)

标签: algorithm

赞 (10)

添加新评论