Ticket #1593 (new defect)

Opened 3 months ago

Last modified 3 months ago

Mann-Whitney statistic returns incorrect results

Reported by: rgommers Owned by: somebody
Priority: normal Milestone: Unscheduled
Component: scipy.stats Version: 0.10.0
Keywords: mannwhitneyu Cc:

Description

This bug report and enhancement was sent as a PR by Sebastian Pölsterl. He also sent a fix which unfortunately was based on GPL'ed R code. Therefore the PR was rejected, the report and tests not derived from R code are given here.

The mannwhitneyu function did not always return the correct U, see the following example:

from scipy.stats import mannwhitneyu

x = [19.8958398126694,19.5452691647182,19.0577309166425,21.716543054589,20.3269502208702,20.0009273294025,19.3440043632957,20.4216806548105,19.0649894736528,18.7808043120398,19.3680942943298,19.4848044069953,20.7514611265663,19.0894948874598,19.4975522356628,18.9971170734274,20.3239606288208,20.6921298083835,19.0724259532507,18.9825187935021,19.5144462609601,19.8256857844223,20.5174677102032,21.1122407995892,17.9490854922535,18.2847521114727,20.1072217648826,18.6439891962179,20.4970638083542,19.5567594734914]
y = [19.2790668029091,16.993808441865,18.5416338448258,17.2634018833575,19.1577183624616,18.5119655377495,18.6068455037221,18.8358343362655,19.0366413269742,18.1135025515417,19.2201873866958,17.8344909022841,18.2894380745856,18.6661374133922,19.9688601693252,16.0672254617636,19.00596360572,19.201561539032,19.0487501090183,19.0847908674356]

u, p = mannwhitneyu(x, y)
print u, p

In the example above u is 102, but really should be 498.

Additionally, it would be useful to be able to specify alternative hypotheses (less/greater/two-sided), default now is less.

Tests at https://github.com/rgommers/scipy/tree/mannwhitneyu-tests

Change History

Changed 3 months ago by rgommers

The relevant PR is https://github.com/scipy/scipy/pull/144. Don't look at the code there if you intend to work on this!

Changed 3 months ago by warren.weckesser

The value 102 in the above example is correct. See the description of the calculation in the wikipedia article: http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Calculations In particular, "The smaller value of U1 and U2 is the one used when consulting significance tables." An elementary statistics text that I happen to have handy (Probability and Statistics for Engineers (7th ed), by Richard A. Johnson) says the same thing: "...the statistic U, which always equals the smaller of the two." mannwhitneyu() is returning the smaller value.

Because of the symmetry of the calculation, either U1 or U2 can be used to compute Z (since U1 + U2 = n1*n2).

Changed 3 months ago by josefpktd

Warren,

this sounds like the two-sided tests. Does it say anything about one-sided tests?

Note: See TracTickets for help on using tickets.