Eigen values: a short introduction



Eigen values for 2 x 2 matrices

There is one matrix algebra operation that is unique and plays a very important role in many of the multivariate methods. This operation involves the calculation of eigenvalues (also known as latent roots) and eigen vectors . Eigen analsysis is also known as 'Single Value Decomposition'. Eigen analysis is a technique that provides us with a summary of the data structure represented by a symmetrical matrix such as would be obtained from correlations, covariances or distances.

An understanding of the underlying principles is essential for many multivariate methods. The relevance of these values will be shown graphically. (See the book, Statistics and Data Analysis for Geologists by Davis for a more complete description of this and matrix methods in general). This demonstration is of an eigen alysis restricted to a simple 2 x 2 matrix. This restriction is imposed to facilitate the graphical description.

Consider the following matrix in which the rows are the coordinates of a pair of points in 2-D space.

4 8

8 4

Graphically these points would be positioned as shown opposite.

Using the 0,0 coordinate as it's centre it is possible to construct an ellipse, such that the two points fall on its perimeter.

A 2 x 2 matrix has two eigenvalues. In the above example they are 12 and -4, which happen to be the lengths of the major and minor axes of the ellipse that encloses the points. Note that we can only find eigen values for a square symmetric matrix (all correlation and covariance matrices are symmetrical, as are most distance matrices) and there will be as many eigen values as there are rows in the matrix. There is an eigenvector associated with each of the eigenvalues.

If you wish to draw the axes of the elipse you will need, in addition to their lengths, information about their orientation, i.e. we need to know their coordinates. The eigen vectors are the coordinates that define the orientation of the axes, whose lengths are given by the eigen values. However, the eigenvectors, which are centered at 0,0, do not have unique values, each has an infinite number of possible values. There are an infinite number of eigen vectors because any coordinate on an axis would be appropriate.

Although the justification will not be given here, it is possible to represent correlations as vectors. Again, for simplicity, we will restrict ourselves to two dimensions initially. In the following plots two variables are shown, that become increasingly correlated. The format of the correlation matrices is:

correlation of x with x correlation of x with y
correlation of y with x correlation of y with y

Note that a variable correlated with itself always has a correlation coefficient of 1.00, and that the correlation of x with y is the same as that for y with x. Hence the matrices are symmetrical. The rows in these matrices form the coordinates for two points. Also shown on the plots are the major and minor axes of the enclosing elipses. The lengths of these axes are the eigenvalues of the correlation matrices. Thus, for the first matrix they are 1 & 1, for the second 1.25 & 0.75, etc. Note that the eigen values sum to 2, which is of course the number of variables.

1.00 0.00
0.00 1.00
1.00 0.25
0.25 1.00
1.00 0.50
0.50 1.00
1.00 0.75
0.75 1.00
1.00 1.00
1.00 1.00

An obvious question but, what trends do you notice in the above plots?

As the variables become more correlated the major axis becomes longer whilst the minor axis becomes proportionately shorter. The limit is reached when the two variables are perfectly correlated. Under these conditions the major axis has a length of 2.0, whilst the minor axis has a length of 0.

The eigen vectors for the first axes share the same values. This is because they share the same orientation. The eigen vectors have the values of 0.707 (on the x axis) and 0.707 (on the y axis). Why these two values when any pair of coordinates, such as 0.5 & 0.5, would also be applicable? The answer is that the chosen coordinates share a special relationship such that the sum of their squared values equals 1, i.e.

0.7072 + 0.7072 = 0.4998 + 0.4998 = 1.00 (within the limits of the significant figures employed).

This is a commonly applied scaling for eigen vectors, it is certainly used in many multivariate statistical packages.

Note that the eigen vectors for the minor axis also share the same coordinates: 0.707 and -0.707. The equality of these eigen vectors is an artefact imposed by the two dimensional limit.

Extending beyond 2 dimensions

Similar relationships apply for any symmetrical matrix. For example, in a 3 x 3 matrix each point is now defined by a x,y & z value. An ellipsoid could be drawn around these data (think of the ellipsoid as a rugby football). There would now be 3 eigen values, and their associated eigenvectors, which correspond to the three axes of the ellipsoid.

Consider the following 3 by 3 correlation matrices. Only the lower triangles are shown.

            Eigenvalues   vector 1 vector 2 vector 3
1   1.0       1.0   0.000 0.000 1.000
    0.0 1.0     1.0   0.000 1.000 0.000
    0.0 0.0 1.0   1.0   1.000 0.000 0.000
                     
2   1.0       2.0   -0.577 0.085 -0.812
    0.5 1.0     0.5   -0.577 -0.746 0.332
    0.5 0.5 1.0   0.5   -0.577 0.660 0.480
                     
3   1.0       3.0   -0.577 0.000 0.000
    1.0 1.0     0.0   -0.577 0.000 0.000
    1.0 1.0 1.0   0.0   -0.577 0.000 0.000
                     
4   1.0       2.34   0.593 -0.525 0.611
    0.9 1.0     0.73   0.658 -0.121 -0.743
    0.3 0.6 1.0   0.04   0.464 0.842 0.273

Again, note how the correlation structure affects the eigen values. As the variables become more correlated so the length of the first eigen value increases. Note also that the sum of the squared eigen vectors equals 1.0, e.g -0.5772 + -0.5772 + -0.5772 = 1.00.

Try to guess the approximate sizes of the eigen values for the next three 4 by 4 correlation matrices. Recall that the sum of the eigen values will be 4.0.

1   1.00      
    0.00 1.00    
    0.00 0.00 1.00  
    0.00 0.00 0.00 1.00
           
2   1.00      
    0.90 1.00    
    0.00 0.00 1.00  
    0.00 0.00 0.90 1.00
           
3   1.00      
    0.90 1.00    
    0.20 0.30 1.00  
    0.15 0.10 0.80 1.00

Hope you haven't cheated!

The answers are:

  1. 4.00, 0.00, 0.00, 0.00
  2. 1.90, 1.90, 0.10, 0.10
  3. 2.23, 1.48, 0.23, 0.06

Why are the eigen values and vectors useful for multivariate analyses?

Imagine a set of data in 3-D space, i.e. where each point is defined by a x,y & z coordinate. Let us assume that these points are arranged in a cloud of points which resemble a rugby football.

An animated 3-D plot Matrix of equivalent 2 x 2 plots

As this cloud of points rotates you should notice that it is very flat in one plane, this should be reflected in a small eigen value. The correlation for these three variables is:

  x y z
x 1.00    
y 0.86 1.00  
z 0.02 0.06 1.00

giving 3 eigen values of 1.865, 1.003 and 0.132. Re-examine the animated plot and the 2-D matrix plots now that you know the eigenvalues of the correlation matrix. Can you understand how these values relate to this set of multivariate data?

Two more examples are presented below.

  x y z
x 1.00    
y 0.03 1.00  
z 0.02 0.04 1.00

Eigen values:

  1. 1.065
  2. 0.974
  3. 0.962
  x y z
x 1.00    
y 0.81 1.00  
z 0.72 0.91 1.00

Eigen values:

  1. 2.631
  2. 0.297
  3. 0.076

As we have seen these axes are defined by the eigenvalues and eigen vectors of a matrix derived from the original data. They give us information about the dimensionality of the data and how the variables are related to each other and the main axes through the 'data cloud'.

Another way of understanding these new axes is to consider that when an ellipsoid's axes are drawn they are, in effect, new variables derived from the existing ones. This is a different approach to understanding multivariate data that froms the basis of PCA. It is, however, important to remember that it is the eigen analysis methods that provide a method for defining these new variables.

A simple way of creating a new variable is to make it a sum, i.e. a linear combination, of existing variables. This is explored in the next section.