I chose dataset “carc” from the package SMSdata. My main interest is whether we can distinguish American cars from European and Japanese cars just by looking at their coordinates in space that is generated by the first two principal components of the continuous observed variables.

The principal component analysis uses the eigenvalue decomposition of the sample variance-covariance matrix.

data("carc")
corc <- cor(subset(carc, select = -c(C, R77, R78)))
corrplot(corc, method="ellipse")


The eigenvalues of the variance-covariance matrix correspond to the “Proportion of variance” and the eigenvectors are called loadings in the PCA.

eigen(corc)$values
##  [1] 6.54413115 1.02605233 0.84204930 0.44121113 0.39999794 0.28222556
##  [7] 0.24984044 0.11256924 0.07983019 0.02209273


PC <- princomp(subset(carc, select = -c(C, R77, R78)), cor = TRUE)
summary(PC)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.5581499 1.0129424 0.91763244 0.66423725 0.63245390
## Proportion of Variance 0.6544131 0.1026052 0.08420493 0.04412111 0.03999979
## Cumulative Proportion  0.6544131 0.7570183 0.84122328 0.88534439 0.92534418
##                            Comp.6     Comp.7     Comp.8      Comp.9     Comp.10
## Standard deviation     0.53124906 0.49984041 0.33551340 0.282542362 0.148636249
## Proportion of Variance 0.02822256 0.02498404 0.01125692 0.007983019 0.002209273
## Cumulative Proportion  0.95356674 0.97855078 0.98980771 0.997790727 1.000000000


plot(PC, col = "lightblue")


Projections of the observed variables onto the space generated by the first two principal components – first two eigenvectors of the sample variance-covariance matrix.

ggbiplot(PC, elipse = T, circle = T, groups = carc$C)


We can see that if coordinate of a car with respect to the first principal component is positive, the car was most likely made in the United States of America. I don’t see any other clear distinction of contry of origin from the plot.

kable(round(PC$loadings[1:10,1:10], digits = 3), format = "pipe")
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
P 0.210 0.427 0.738 0.304 0.160 0.090 0.134 0.251 0.062 0.132
M -0.327 -0.166 0.014 0.299 -0.519 0.324 0.608 0.104 -0.137 -0.048
H 0.238 -0.655 0.005 0.472 0.324 -0.369 0.172 0.135 0.002 0.048
R 0.275 -0.304 0.454 -0.306 -0.632 -0.321 -0.156 -0.048 0.016 -0.057
Tr 0.308 -0.402 0.110 -0.035 0.089 0.802 -0.251 -0.062 0.098 -0.048
W 0.377 0.151 -0.056 -0.059 0.098 -0.018 0.246 -0.022 -0.264 -0.829
L 0.371 0.031 -0.092 -0.264 0.058 0.056 0.257 -0.060 -0.677 0.498
T 0.349 0.056 -0.282 -0.332 -0.051 0.017 0.374 0.469 0.553 0.128
D 0.354 0.187 -0.130 0.298 -0.156 -0.025 0.189 -0.740 0.326 0.137
G -0.307 -0.212 0.359 -0.477 0.391 -0.014 0.440 -0.360 0.164 -0.035