Individual assignment IX

We will work with the dataset \(\mathsf{vocabulary}\). 64 observations and 5 variables are available. Let us first apply the K-means algorithm. We will consider only the first 30 observations for clarity. We splitted the observations into 6 clusters. From the plot, we can see by the way that Grade 8 increases with Grade 9. The clusters are chosen in the way that the variability in the cluster is small. We can read from the plot the means in each cluster as well.

For comparison, we can use the hierarchical aglomerative clustering algorithm: