Task5.knit

TASK 5

Consider a random sample \(X_1, \dots, X_n\) from a multivariate normal distribution \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 1\\ 1 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}. \end{gather}\] Our aim is to explore the empirical type I error and the power of a test for a mean vector from a multivariate normal distribution.

1. type I error

We test a null hypothesis \[ H_0: \mu_1 + \mu_2 = 2 \] against an alternative hypothesis \[ H_1: \mu_1 + \mu_2 \neq 2 .\]

We consider two variations of the test: with known and unknown variance - covariance matrix.

With known variance-covariance matrix

Let us consider a significance level of 0.05 and take sample sizes \(n = 50,\,100,\,500,\,1000,\,10000.\) The estimated type I errors can be found in the following table. Each time, the test was done 500 times.

Sample size	50.000	100.000	500.00	1000.000	10000.000
Type I error	0.052	0.038	0.04	0.058	0.048

With unknown variance-covariance matrix

Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.

Sample size	50.000	100.000	500.000	1000.00	10000.000
Type I error	0.056	0.036	0.046	0.06	0.048

As we can see, in both situations for all considered sample sizes, the estimated type I error is very close to the confidence level 0.05.

As there is a big difference between the considered sample sizes, the following graph was plotted for the logarithm of the sample sizes.

2. power

Let us now assume, that the null hypothesis does not hold. Consider a random sample \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 0.9\\ 1 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}, \end{gather}\] hence the mean is different and the null hypothesis is not true.

With known variance-covariance matrix

Sample size	50.00	100.000	500.000	1000.000	10000
Power	0.08	0.098	0.358	0.606	1

With unknown variance-covariance matrix

Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.

Sample size	50.000	100.000	500.000	1000.000	10000
Power	0.076	0.114	0.348	0.596	1

As we can see, the power rises with sample size and a sample size of 10000 is in both cases equal to 1. We also see, that the power for the test with a known variance-covariance matrix appears to have in general bigger power.

We can also look at a situation, when the null hypothesis is violated “even more”.

Consider a random sample \(\mathcal{N}_2 \left( \mu,\,\Sigma \right)\), where \[\begin{gather} \mu = \begin{pmatrix} 1\\ 0.5 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.5\\ -0.5 & 2 \end{pmatrix}, \end{gather}\] hence the mean is different and the null hypothesis is not true.

With known variance-covariance matrix

Sample size	50.000	100.00	500	1000	10000
Power	0.736	0.94	1	1	1

With unknown variance-covariance matrix

Let us consider the same situation as before, however now we do not know the variance-covariance matrix beforhand.

Sample size	50.000	100.000	500	1000	10000
Power	0.718	0.948	1	1	1

As we can see, the power rises very quickly. The values are fairly similar for both tests.