Albert Schweitzer said: “Example is not the main thing in influencing others. It is the only thing.”, so I start with it.

Generate two random samples from Normal distribution.

Test the hypothesis number one: , do not reject, then test the more general hypothesis number two: .

Logic states that since there is at least one value we do not reject for (in this case, 0), the more general statement would also not be rejected.

This is not always the case.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
PVgen = function(mu1,mu2){ n = 100 x1 = rnorm(n,mu1,1) x2 = rnorm(n,mu2,1) X = cbind(x1,x2) mx = apply(X,2,mean) ### LR test for NULL mu1 = mu2 = 0 lambda1 = exp((-n/2)*( t(mx)%*%mx )) Tstat1 = -2*log(lambda1) ### LR test for NULL mu1 = mu2 l = c(1,1) II = diag(2) lambda2 = exp((-n/2)*( t(mx)%*% (II - .5*(l%*%t(l))) %*% mx ) ) Tstat2 = -2*log(lambda2) LT1 = ifelse(Tstat1>qchisq(.5,2),T,F) ; LT2 = ifelse(Tstat2>qchisq(.5,1),T,F) pv1 <- 2*(1 - pchisq(Tstat1,2, lower.tail = LT1)) pv2 <- 2*(1 - pchisq(Tstat2,1, lower.tail = LT2)) return(cbind(pv1,pv2)) } s0 <- s1 <- s2 <- pvalues <- NULL for (i in 1:500){ pvalues <- rbind(pvalues, PVgen(.16,-.16) ) s1[i] <- sign(pvalues[i,1] - .05) s2[i] <- sign(pvalues[i,2] - .05) # which was rejected when the tests do not agree? s0[i] <- ifelse(s1[i] != s2[i],which.min(pvalues[i,]),0 ) } sum(s0==2) # when the second was rejected but the first did not.. Results: 70 |

Since the code involves simulations, results may differ between users, despite the repetition procedure applied.

So in about 14% = 70/500 cases, we can not reject hypothesis number one, but we reject the more general hypothesis number two. Note, *using the same data*.

What is the point to take from this?

Well, that there is no logical contradiction here, **we just need to be careful with the interpretation of p-values**:

“one agent that does not believe in a proposition does NOT imply the she believes in its negation” (Dubois and Prade, 2001).

The idea for the post came from reading the paper: A classical measure of evidence for general null hypotheses. The author suggest an alternative measure to p-value.

Related:

Dear Eran,

Just to let you know that the paper “A classical measure of evidence for general null hypotheses” that you cited above was accepted for publication in Fuzzy Sets and Systems: http://dx.doi.org/10.1016/j.fss.2013.03.007

Thank you for provinding the example and the respective R-code to reproduce it.

Best regards,

Alexandre Patriota

Well done!