One-tailed test using the ranksum function

10 vues (au cours des 30 derniers jours)
Iasonas
Iasonas le 28 Nov 2017
In the ‘Increase in the Median’ example for the ranksum function (please see: https://uk.mathworks.com/help/stats/ranksum.html) it mentions that: “The weather data shows the daily high temperatures taken in the same month in two consecutive years. Perform a left-sided test to assess the increase in the median at the 1% significance level.” [p,h,stats] = ranksum(year1,year2,'alpha',0.01,'tail','left')”
The output of the function gives: h=0 i.e. the Null hypothesis cannot be rejected.
My question is the following: I have calculated that the Median value of temperatures for Year 1 [median(year1)] is: 60.5 and the Median value of temperatures for Year 2 [median(year2)] is: 62. Consequently, if in the previous function we exchange year2 with year1 i.e.: [p,h,stats] = ranksum(year2,year1,'alpha',0.01,'tail','left') then we should always get h=0, irrespective of the value of ‘alpha’, given that the Median value of temperatures for Year 1 is lower compared to the Median value of temperatures for Year 2. However, if we select any value of ‘alpha’ greater than 0.87 (e.g. 0.88) we get h=1 instead of h=0. I know that this value of ‘alpha’ is ridiculously high however, the value of h should be equal to 0 for any value of ‘alpha’ between 0 and 1 given that the Median value of temperatures for Year 1 is less than the Median value of temperatures for Year 2.
Is there an explanation for this strange result which occurs for values of alpha which are greater than 0.87?
Thank you very much,
Iasonas

Réponse acceptée

Jeff Miller
Jeff Miller le 28 Nov 2017
Yes, there is an explanation. The flaw is in your intuition (which I admit is very compelling) that "the value of h should be equal to 0 for any value of ‘alpha’ between 0 and 1 given that the Median value of temperatures for Year 1 is less than the Median value of temperatures for Year 2". Although this seems reasonable, it is not the way hypothesis testing works.
What you are seeing could easily arise with any one-tailed test, and it is much easier to see with a one-tailed Z test, so let's consider that. The null hypothesis (Ho) is that a score comes from a normal distribution with mean 0 and variance 1, and let's say we want to reject this Ho, thereby concluding that the mean is actually larger than 0, only if the observed score is "too big". It would be typical to set alpha=.05, which determines a z-score cutoff of 1.645, because only 5% of the standard normal distribution is above 1.645. So, if we observe an actual score greater than 1.645, we reject Ho and conclude that the mean is larger than 0.
Now suppose we choose a really large alpha instead---for convenience say alpha=.95. That determines a z-score cutoff of -1.645, and we are now in a position of concluding that the mean is greater than zero any time we observe a data value larger than that, which of course includes a lot of data values below the hypothesized mean of 0. This is analogous to the strange reversal you are seeing with the ranksum test.
Remember, one-tailed alpha is defined as the probability of rejecting Ho in a certain direction given that Ho is true. If you let alpha grow large, then there will be a lot of observations where you will reject Ho even though the evidence is quite consistent with it or even points to a reversal of it. That doesn't come up in practice, though, because people always choose alpha to be rather small.

Plus de réponses (2)

Kaushik Lakshminarasimhan
Kaushik Lakshminarasimhan le 28 Nov 2017
Your premise that "we should always get h=0, irrespective of the value of ‘alpha’" is not correct. Please see the definition of p-value and rethink your argument: https://en.wikipedia.org/wiki/P-value
p-value is the probability of finding data that are at least as extreme as what you have, under the assumption that your null hypothesis is true. Depending on your data (entire data, not just the median), if that probability happens to be smaller than alpha, then the null hypothesis is rejected. For your data, the probability was around 0.87, so alpha greater than that gives you h=1. There is no reason to suppose that the probability should be 1, simply because the one median is less than the other.

Iasonas
Iasonas le 2 Déc 2017
Thank you very much for your answers. They were really helpful.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by