Individual assignment II

Task 1

We consider a random vector \((X,Y)\) with a uniform disribution on the set \(M = \{(x,y) \in \mathbf{R}: 0 < x < y < 1 \}\).

First, let us compute the joint density \(f_{X,Y}(x,y) = c \mathbf{1}_M(x,y)\), it means, we have to find the constant \(c\): \[\begin{align*} 1 = \int\limits_{\mathbf{R}^2} c \mathbf{1}_M (x,y) ~ d(x,y) = \int\limits_0^1 \int\limits_0^y c ~ dx dy = \int\limits_0^1 cy ~ dy = c/2 \end{align*}\]

Thus, \(f_{X,Y}(x,y) = 2 \mathbf{1}_M (x,y)\).

Now we are supposed to derive the marginal densities \(f_X(x), ~ f_Y(y)\): \[\begin{align*} f_X(x) = \int\limits_{\mathbf{R}} f_{X,Y}(x,y) ~ dy = \int\limits_x^1 2 ~ dy = 2 - 2x, ~ x \in (0,1), \end{align*}\]

\[\begin{align*} f_Y(y) = \int\limits_{\mathbf{R}} f_{X,Y}(x,y) ~ dx = \int\limits_0^y 2 ~ dx = 2y, ~ y \in (0,1) \end{align*}\]

We know that \(X,Y\) are stochastically independent if and only if the joint density is a product of the marginals. But this clearly does not hold, so \(X,Y\) are not independent.

Task 2

Now our task is to simulate a random sample \((X_1, Y_1)^T, \ldots, (X_n, Y_n)^T\) from the joint distribution from the Task 1.

First, we simulate a random sample \((X_1, \ldots, X_n)^T\) from the distribution with density \(f_X(x)\). Since \(f_X(x) = 2 - 2x, ~ x \in (0,1)\), we obtain the distribution function \(F_X(x) = 2x - x^2, ~ x \in (0,1).\)

We know \(Z_i = F_X(X_i) \sim R(0,1)\), thus \(F_X^{-1}(Z_i) \sim F_X\). That is exactly how we get the random sample: first, we generate a random sample from the uniform distribution on (0,1) and then we transform it by the inverse of \(F_X\). The inverse is of a form \(F_X^{-1}(q) = 1 - \sqrt{1-q}.\)

Next, we will simulate a random sample \((Y_1, \ldots, Y_n)^T\) from the conditional distribution given by the conditional density \(f_{Y|X}(y|x)\) which we will compute right now: \[\begin{align*} \text{For} ~ x \in (0,1), ~ f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)} = \begin{cases} \frac{1}{1-x}, & 1 > y > x \\ 0, & y \leq x \end{cases} \end{align*}\]

Thus, for $ x (0,1), 1 > y > x,$ we get \[\begin{align*} F_{Y|X}(y|x) = \int\limits_{- \infty}^y f_{Y|X} (t|x) ~ dt = \int\limits_x^y \frac{1}{1-x} ~ dt = \frac{y-x}{1-x} \end{align*}\]

And the formula for the inverse looks as follows \[\begin{align*} F_{Y|X}^{-1}(q|x) = (1-x)q + x. \end{align*}\]

Here we can take a look at a visualisation of our random sample \((X_1, Y_1)^T, \ldots, (X_n, Y_n)^T:\)

n=1000
set.seed(1234)
Z1 = runif(n)
X = 1 - sqrt(1-Z1) ##náhodný výběr z rozdělení f_X

Z2 = runif(n)

Y = (1-X)*Z2 + X

plot(Y ~ X)

Indeed, all the points are concentrated in the upper triangle.

Now, if we would like to check whether the sample \(X_1, \ldots, X_n\) really comes from that distribution, we can compare its empirical distribution function with the theoretical or its histogram with the density:

xSeq <- seq(0,1, length = 1000)
par(mfrow = c(1,2))
plot(ecdf(X), main = "Empirical d.f. of X")
lines(2*xSeq - xSeq^2 ~ xSeq, col = "red", lty = 2, lwd = 2)
legend("bottomright", legend = c("Empirical d.f.", "Theoretical d.f."), col = c("black", "red"), lty = c(1,2), lwd = c(2,2))


hist(X, freq = F, col = "gray", breaks = ceiling(sqrt(n)))
lines(2 - 2*xSeq ~ xSeq, col = "red", lty = 2, lwd = 2)

The sample mean of X is 0.341, for Y it is 0.660.