Consider a random sample \(X_1, \dots, X_n\) from a multivariate normal distribution \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 1\\ 1 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}. \end{gather}\] Our aim is to explore the empirical type I error and the power of a test for a mean vector from a multivariate normal distribution.
We test a null hypothesis \[ H_0: \mu_1 + \mu_2 = 2 \] against an alternative hypothesis \[ H_1: \mu_1 + \mu_2 \neq 2 .\]
We consider two variations of the test: with known and unknown variance - covariance matrix.
Let us consider a significance level of 0.05 and take sample sizes \(n = 50,\,100,\,500,\,1000,\,10000.\) The estimated type I errors can be found in the following table. Each time, the test was done 500 times.
Sample size | 50.000 | 100.000 | 500.00 | 1000.000 | 10000.000 |
Type I error | 0.052 | 0.038 | 0.04 | 0.058 | 0.048 |
Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.
Sample size | 50.000 | 100.000 | 500.000 | 1000.00 | 10000.000 |
Type I error | 0.056 | 0.036 | 0.046 | 0.06 | 0.048 |
As we can see, in both situations for all considered sample sizes, the estimated type I error is very close to the confidence level 0.05.
As there is a big difference between the considered sample sizes, the following graph was plotted for the logarithm of the sample sizes.
Let us now assume, that the null hypothesis does not hold. Consider a random sample \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 0.9\\ 1 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}, \end{gather}\] hence the mean is different and the null hypothesis is not true.
Sample size | 50.00 | 100.000 | 500.000 | 1000.000 | 10000 |
Power | 0.08 | 0.098 | 0.358 | 0.606 | 1 |
Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.
Sample size | 50.000 | 100.000 | 500.000 | 1000.000 | 10000 |
Power | 0.076 | 0.114 | 0.348 | 0.596 | 1 |
As we can see, the power rises with sample size and a sample size of 10000 is in both cases equal to 1. We also see, that the power for the test with a known variance-covariance matrix appears to have in general bigger power.
We can also look at a situation, when the null hypothesis is violated “even more”.
Consider a random sample \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 1\\ 0.5 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}, \end{gather}\] hence the mean is different and the null hypothesis is not true.
Sample size | 50.000 | 100.00 | 500 | 1000 | 10000 |
Power | 0.736 | 0.94 | 1 | 1 | 1 |
Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.
Sample size | 50.000 | 100.000 | 500 | 1000 | 10000 |
Power | 0.718 | 0.948 | 1 | 1 | 1 |
As we can see, the power rises very quickly. The values are fairly similar for both tests.