I will be using codes from the excercise class (https://www2.karlin.mff.cuni.cz/~maciak/NMST539/Lab12_2022.html).
library(SMSdata)
data(carc)
Looking back at task 1, I will try to estimate random variable (country of origin of the car) based on (weight), (price), (headroom) and (rear seat clearance).
lda1 <- lda(C ~ W + P + H + R, data = carc)
pred <- predict(lda1)
ldahist(data = pred$x[,1], g=carc$C)
We can see some overlap between all categories.
partimat(C ~ W + P + H + R, data = carc, method = "lda")
Lets try merging Europe and Japan as we did in the first task.
carc <- transform(carc, C2 = Relevel(C, list("EJ" = c(2, 3),
"US" = 1)))
lda1 <- lda(C2 ~ W + P + H + R, data = carc)
pred <- predict(lda1)
ldahist(data = pred$x[,1], g=carc$C2)
partimat(C2 ~ W + P + H + R, data = carc, method = "lda")
There is some improvement. Question is, whether the cost of this improvement (not distinguishing between EU and JP) isn’t too high.
library("party")
plot(ctree(C ~ P + W,data=carc))
plot(ctree(C2 ~ P + W,data=carc))
Here I think it makes sanse not to distinguish between EU and JP.