mwahw8.knit

I chose the “carc” data set from the package “SMSdata”.

library(SMSdata)
data("carc")
Data  <- subset(carc, select = -c(C, R77, R78))

Firstly I perform the K-means algorithm with \(K = 3\).

Kmeans <- kmeans(Data, centers = 3)
colVector <- as.numeric(Kmeans$cluster)

plot(carc$P ~ carc$W, bg = colVector, xlab = "Weight", ylab = "Price", pch = 21, col = "black")
points(Kmeans$centers[,"P"] ~ Kmeans$centers[,"W"], col = 1:3, pch = 8, cex = 2)
text((carc$P + 1) ~ carc$W,labels=rownames(carc), col=colVector, cex = 0.5)

Than the hierarchical agglomerative clustering algorithm. I will go with the euclidian distance for now since it is the most general distance. I belive way more suitable distances could be used for specific questions about the data.

D <- dist(Data)
HC1 <- hclust(D)

groups <- cutree(HC1, k=3)
cutHC1 = cutree(HC1, 3)

K means by clusters from hierarchical agglomerative clustering algorithm:

Data$col <- colVector
Data1 <- subset(Data, cutree(HC1, 3) == 1)
Data2 <- subset(Data, cutree(HC1, 3) == 2)
Data3 <- subset(Data, cutree(HC1, 3) == 3)

We can see that the results are somewhat similar with respect to shape of the groups, but different with respect to size of the groups.