###### Introduction

Principal component analysis projects the data in the directions of maximum variance. However, in some cases, maximizing the variance of the data may not achieve highly discriminative features. Fisher’s Linear discriminant analysis (LDA) is also a projection-based method that seeks to find a vector that defines the projection direction of the data [1]. Unlike principal component analysis which maximizes the variance of the data [2], linear discriminant analysis uses the labels of the data and projects the data in the direction that maximizes the separation of the classes as shown in Figure 1.

How do we find the projection that maximizes the class separation of the data? LDA states that maximum separation is achieved when the distance between the two clusters is maximum (or far away from each other). We consider the classification of two classes first, then generalize to more than two classes.

###### Binary Class LDA

Let and denote the means of data after projection which belong to class RED and BLUE respectively.

Furthermore, let and denote the scatters (variances) of data after projection which belong to class RED and BLUE respectively.

To ensure maximum class separation, we want the means to be far away from each other and the projected data in the clusters to be close to each other. In other words, we want the mean difference to be large and the summation of the scatter to be small. Therefore, the objective is to maximize

The numerator can be expressed as

where is between-class scatter matrix.

The denominator can be expressed as

where is within-class scatter matrix.

Therefore, the objective function is

Taking the derivative and setting it equal to zero.

Given that is a scalar, the expression becomes

where is the eigenvector of and is the corresponding eigenvalue.

###### Multi-class LDA

When there are more than two classes, the between-class scatter matrix is defined as

where and is the global (overall) mean.

The within-class scatter matrix is defined as

where .

The number of components is defined as follows:

where is the number of classes.

Solving for Eigenvector and Eigenvalue

Compute and .

Compute .

where is an identity matrix.

Solve to obtain .

Substitute into to obtain .

Numerical Example

Consider the following dataset

Find by solving where is an identity matrix.

and

Substitute

Let , thus

###### References

[1] R. A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” *Annals of Eugenics*, vol. 7, no. 2, pp. 179–188, 1936, doi: 10.1111/j.1469-1809.1936.tb02137.x.

[2] A. M. Martinez and A. C. Kak, “PCA versus LDA,” *IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 23, no. 2, pp. 228–233, Feb. 2001, doi: 10.1109/34.908974.