quantify difference between discrete distributions

Hello,
I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test.
My first question is which of these is the correct method for comparing the distributions below?
The distributions are discrete distributions with 24 bins.
My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?
I appreciate any help and comments
Kind Regards

 Réponse acceptée

Thorsten
Thorsten le 14 Fév 2013

0 votes

Use the Two-sample Kolmogorov-Smirnov test from the Statistics Toolbox.

8 commentaires

José-Luis
José-Luis le 14 Fév 2013
Modifié(e) : Image Analyst le 14 Fév 2013
doc kstest2
A measure of how different they are will be the p-value. Note that you would need the sample data, not the histograms you mentioned here.
John
John le 14 Fév 2013
Modifié(e) : John le 14 Fév 2013
Thanks, I meant to say above that the raw data itself is discrete. "Binning" the data as I described is probably incorrect, the data itself is discrete. Is the Kolmogorov-Smirnov test still suitable?
Thanks
Yes.
John
John le 14 Fév 2013
Thank you
Hello Jose,
Sorry for bothering you again, but can I ask you one further question regarding kstest2.
I performed the test using the raw data behind the histograms above.
This is the output that I got
[h,p,k] = kstest2(a,b)
h =
1
p =
4.9903e-113
k =
0.2948
Because h=1 ,the test rejects the null hypothesis at the 5% significance level, so they're not from he same distribution.
I'm just wondering, how would you interpret p?
Thank you for you help
Kind Regards
The null hypothesis (the two sample come from the same distribution) could not be rejected if p >= k. In this case, p is much smaller than k, meaning that the null hypothesis can be reject. The difference between p and k is an indicator of how far off your results are. In any case, I would recommend reading any basic statistics book, or go to wikipedia for more details on p-values.
John
John le 16 Fév 2013
Modifié(e) : John le 16 Fév 2013
In the example in the documentation, doc kstest2, p is less than k and the hypothesis is not rejected. You say "could not be rejected if p >= k"
In your previous post you said "A measure of how different they are will be the p-value", but in my case P is very small??
José-Luis
José-Luis le 16 Fév 2013
Modifié(e) : José-Luis le 16 Fév 2013
The smaller p , the larger the difference between distributions. It's consistent with what I said before, methinks. It is a comparative measure, and you need to know what you are comparing against (in this case k). How to define k is a different story, for that you need to read the paper of those who came up with the statistic.
Also, you should be careful how you phrase the results from an hypothesis test. In this case, I should have said: "if _p>=k then the hypothesis that both samples come from the same distribution cannot be rejected".

Connectez-vous pour commenter.

Plus de réponses (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by