Individual assignment VI

For the principal component analysis, I chose the dataset . We will investigate relationships between different kinds of crime committed in the 50 US states. We will try to compare numbers of crimes within American regions. First, let us look at the correlation structure of the data:

First of all, we can notice that numbers of all kinds of crime are positively (and in some cases strongly) correlated with the population, which is understandable. Within the crimes, increases the most with the population. Especially strong correlation can be seen between and , and or between and . This also makes sense since rape can be considered as an assault.

Now let us apply the PCA:

## Importance of components:
##                              Comp.1       Comp.2       Comp.3       Comp.4
## Standard deviation     5024.0406404 741.63107324 2.014138e+02 1.272909e+02
## Proportion of Variance    0.9762887   0.02127393 1.569099e-03 6.267104e-04
## Cumulative Proportion     0.9762887   0.99756261 9.991317e-01 9.997584e-01
##                              Comp.5       Comp.6       Comp.7       Comp.8
## Standard deviation     6.392995e+01 4.622322e+01 4.250024e+00 2.023751e+00
## Proportion of Variance 1.580814e-04 8.264039e-05 6.986420e-07 1.584112e-07
## Cumulative Proportion  9.999165e-01 9.999991e-01 9.999998e-01 1.000000e+00

We can see that the first component explains almost all variability, so we would choose the first one and maybe also the second one. We can visualise the variability of the components as well:

We can see that variances of other principal components are negligible comparing to the variance of the first principal component.

Finally, let us compare numbers of different crimes within the US regions. We will compare two triplets: together with and then and since it seems these two groups are given in different units.

From the first plot, when comparing within the 4 main regions (called according to their location), we can see the highest rate of murder in the South and the West. Similarly, the highest rate of rape is in the West and the South. In the direction of robbery, we cannot see any clear pattern, any “cloud” created by any of the regions, so this suggest no siginificant difference within the regions, which can be confirmed by a following boxplot:

Now regarding the second plot, when comparing within more specified regions, we observe the lowest rate of all 3 crimes in New England and WN Central. The highest rate of robbery can be found in EN Central and Mid Atlantic. The highest rate of murder is in ES and WS Central.

Finally, we can compare the second group:

The interpretation is similar is in the first case. For example, we observe a high number of larceny in the West, high number of burglary and autotheft in the Northeast. Generally, the lowest rate of the crime seems to be in the Midwest.