I will be using codes from the excercise class (https://www2.karlin.mff.cuni.cz/~maciak/NMST539/Lab12_2022.html).

library(SMSdata)
data(carc)

Looking back at task 1, I will try to estimate random variable (country of origin of the car) based on (weight), (price), (headroom) and (rear seat clearance).

lda1 <- lda(C ~ W + P + H + R, data = carc)
pred <- predict(lda1)
ldahist(data = pred$x[,1], g=carc$C)

We can see some overlap between all categories.

partimat(C ~ W + P + H + R, data = carc, method = "lda")

Lets try merging Europe and Japan as we did in the first task.

carc <- transform(carc, C2 = Relevel(C, list("EJ" = c(2, 3),
                                             "US" = 1)))
lda1 <- lda(C2 ~ W + P + H + R, data = carc)
pred <- predict(lda1)
ldahist(data = pred$x[,1], g=carc$C2)

partimat(C2 ~ W + P + H + R, data = carc, method = "lda")

There is some improvement. Question is, whether the cost of this improvement (not distinguishing between EU and JP) isn’t too high.

library("party")
plot(ctree(C ~ P + W,data=carc))

plot(ctree(C2 ~ P + W,data=carc))

Here I think it makes sanse not to distinguish between EU and JP.