mwahw7.knit

I chose the “carc” data set from the package “SMSdata”.

library(SMSdata)
data("carc")
D <- subset(carc, select = -c(C, R77, R78))

Firsly I will look at how many factors should I use in the factor analysis.

library(nFactors)
ev <- eigen(cor(D)) 
ap <- parallel(subject=nrow(D),var=ncol(D), rep=100, cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

It seems like one is enough, which is weird but interesting so lets try it.

(fa1 <- factanal(D, factors = 1, rotation = "varimax", scores = "regression"))

## 
## Call:
## factanal(x = D, factors = 1, scores = "regression", rotation = "varimax")
## 
## Uniquenesses:
##     P     M     H     R    Tr     W     L     T     D     G 
## 0.719 0.319 0.738 0.635 0.514 0.014 0.074 0.179 0.186 0.418 
## 
## Loadings:
##    Factor1
## P   0.530 
## M  -0.825 
## H   0.512 
## R   0.604 
## Tr  0.697 
## W   0.993 
## L   0.962 
## T   0.906 
## D   0.902 
## G  -0.763 
## 
##                Factor1
## SS loadings      6.204
## Proportion Var   0.620
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 118.21 on 35 degrees of freedom.
## The p-value is 5.87e-11

The loadings from factor analysis seems similar to loadings from principal component analysis, but on bigger scale.

rbPal <- colorRampPalette(colors = c("red","blue"))
par(mar = c(3,3,3,1), mfrow = c(2,1))
barplot(abs(fa1$loadings[,1]), ylim = c(0,1), ylab = "Factor 1", cex.names = 0.8,
        names.arg = names(D), col = rbPal(20)[as.numeric(cut(fa1$loadings[,1],breaks = seq(-1,1,length = 20)))], main = "Factor")

PC <- princomp(subset(carc, select = -c(C, R77, R78)), cor = TRUE)
barplot(abs(PC$loadings[,1]), ylim = c(0,1), ylab = "Factor 1", cex.names = 0.8,
        names.arg = names(D), col = rbPal(20)[as.numeric(cut(PC$loadings[,1],breaks = seq(-1,1,length = 20)))], main = "PCA")

Lets plot the coordinates of observed data in in the space generated by our only factor.

par(mar = c(3,5,1,1), col = "black")
plot(fa1$scores, rep(0, nrow(D)), type = "n", yaxt = "n", 
     xlab = "", ylab = "")

points(fa1$loadings, rep(0.5, 10), pch = 16, col = rainbow(10))
text(fa1$loadings, rep(0.5, 10), labels = names(D), pos = c(1, 3, 3, 3, 1, 3, 1, 1, 3, 1), col = rainbow(10))

points(fa1$scores[carc$C == "US"], rep(0, length(fa1$scores[carc$C == "US"])), col = "darkorange", pch = 16)
points(fa1$scores[carc$C == "Europe"], rep(-0.25, length(fa1$scores[carc$C == "Europe"])), col = "lightblue", pch = 16)
points(fa1$scores[carc$C == "Japan"], rep(-0.5, length(fa1$scores[carc$C == "Japan"])), col = "lightgreen", pch = 16)

axis(side = 2, at = c(0.5, 0, -0.25, -0.5), labels = c("Loadings", "US", "EU", "JP"), las = 1)

abline(v = 0, lty = "dotted")

We can make same conclusion as in the previous task: If the coordinate of the observed car in the space generated by our only factor is positive, the car was most likely made in the United States of America.

library("ggbiplot")
ggbiplot(PC, elipse = T, circle = T, groups = carc$C)