Multivariate Analyses

Consider a uniform (two dimensional) distribution of a random vector \((X,Y)^\top\) over the set \(M=\{(x,y)\in R^2; 0<x<y<1\}\). Find the explicit expression for the joint distribution – the density function \(f_{(X,Y)}(x,y)\) of the random vector \((X,Y)^\top\). Derive also the corresponding marginal densities \(f_X(x)\) and \(f_Y(y)\). Can you say something relevant about the dependence/independence of the random variables X and Y (elements of the random vector \((X,Y)^\top\))?

To find the explicit expression for the joint distribution – the density function \(f_{(X,Y)}(x,y)\) of the random vector \((X,Y)^\top\) we need to find constant \(c\) such that

\(\iint c\cdot \mathbb{I}_M dx dy=1\)

From this equation we get result \(c=2\). So the density function \(f_{(X,Y)}(x,y)=2 \cdot \mathbb{I}_M\). To derive the corresponding marginal densities f\(f_X(x)\) and \(f_Y(y)\) we need to integrate joint density function with respect to one of variables.

\(f_Y(y)=\int f_{(X,Y)}(x,y)dx=\int 2\cdot \mathbb{I}_M dx =\int_0^y 2\cdot \mathbb{I}_{y \in (0,1)} dx =2y\cdot \mathbb{I}_{y \in (0,1)}.\)

Similarly

\(f_X(x)=\int f_{(X,Y)}(x,y)dy=\int 2\cdot \mathbb{I}_M dy =\int_x^1 2\cdot \mathbb{I}_{x \in (0,1)} dy =2(1-x)\cdot \mathbb{I}_{x \in (0,1)}.\)

If the elements of the random vector \((X,Y)^\top\) were independent, it should hold \(f_{(X,Y)}(x,y)=f_X(x)\cdot f_Y(y)\) but in this case

\(f_{(X,Y)}(x,y)=2\cdot \mathbb{I}_M\neq 2y\cdot \mathbb{I}_{y \in (0,1)}\cdot 2(1-x)\cdot \mathbb{I}_{x \in (0,1)}=f_X(x)\cdot f_Y(y)\)

Because of that \(X\) and \(Y\) are not independent.

Use the expressions above to simulate a two dimensional random sample \((X_1,Y_1)^\top,…,(X_n,Y_n)^\top\) from the joint distribution given by the two-dimensional density function \(f_{(X,Y)}(x,y)\) (from the previous example). Use some proper graphical tools to verify that the simulated random sample indeed follows the given joint distribution. Calculate important sample characteristics (sample means and the sample variance-covariance matrix).

We will use the fact that:

\(f_{(X,Y)}(x,y)=f_{Y|X}(y|x)\cdot f_X(x)\)

For that we need to compute the conditional density, which is given by

\(f_{Y|X}(y|x)=\dfrac{f_{(X,Y)}(x,y)}{f_X(x)}=\dfrac{2\cdot \mathbb{I}_{M}}{2(1-x)\cdot \mathbb{I}_{x \in (0,1)}}=\dfrac{\mathbb{I}_{M}}{1-x}\)

At first we will generate a sample from uniform distribution on (0,1) and then use quantile function \(F_X^{-1}\). At first we compute the distribution function:

\(F_X=\int_{-\infty}^{x}2(1-u)\mathbb{I}_{u \in (0,1)}du= \begin{cases} 0 & \text{if } x < 0,\\ 2x-x^2 & \text{if } x \in (0,1),\\ 1 & \text{if } x > 1. \end{cases}\)

Corresponding quantile function is

\(F_X^{-1}(y)=1-\sqrt{1-y},\quad y \in (0,1).\)

Next we will generate another sample from uniform distribution on (0,1) and then use quantile function \(F_{Y|X}^{-1}\). We need to compute the distribution function:

\(F_{Y|X}=\int_{-\infty}^{y}\dfrac{\mathbb{I}_{u \in (x,1)}\mathbb{I}_{x \in (0,1)}}{1-x}du= \begin{cases} 0 & \text{if } y < x,\\ \dfrac{y-x}{1-x} & \text{if } y \in (x,1),\\ 1 & \text{if } y > 1. \end{cases}\)

Corresponding quantile function is

\(F_{Y|X}^{-1}(z)=(1-x)z+x,\quad z \in (0,1).\)

set.seed(98810)
n=1000
Z1=runif(n)
X=1-sqrt(1-Z1)
Z2=runif(n)
Y=(1-X)*Z2+X
plot(X,Y, pch=16, col="skyblue", main = "Generated points")

We have plotted scatterplot of 1000 generated points. We can see that they are in the expected triangle approximately uniformly distributed.

We will also plot 3D histograms for \(n=1000,10000,100000\) to verify whether generated distribution is uniform, we expect to see columns of the same height in the triangle and the columns of the half oh the height on the diagonal. We also plot the same results using just 2D models and color scale.

library(plot3D)
library(RColorBrewer)
options("scipen"=10)
for (a in c(1:3)){
n=100*10^a
Z1=runif(n)
X=1-sqrt(1-Z1)
Z2=runif(n)
Y=(1-X)*Z2+X
X_c <- cut(X, 20)
Y_c <- cut(Y, 20)
##  Calculate joint counts at cut levels:
z <- table(X_c, Y_c)

##  Plot as a 3D histogram:
hist3D(z=z, border="black", main=substitute(paste('n= ', round(n))), col = ramp.col (col = c("skyblue", "darkblue"), n = 100, alpha = 1))

##  Plot as a 2D heatmap:
image2D(z=z, border="black",main=substitute(paste('n= ', round(n))), col = ramp.col (col = c("skyblue", "darkblue"), n = 100, alpha = 1))
}

The plots look like what we expected. With higher number of n it is more uniform.

We need to calculate the sample means and the sample covariance matrix. From computation we get

\[ \begin{pmatrix} \overline{X_n} \\ \overline{Y_n} \end{pmatrix}= \begin{pmatrix} 0.33 \\ 0.67 \end{pmatrix} \]

\[ \overline{\Sigma}= \begin{pmatrix} 0.06 & 0.03 \\ 0.03 & 0.06 \end{pmatrix} \]

Multivariate Analyses

Veronika Roubínová

24.2.2022