# Pearson correlation coefficient

By definition, Pearson correlation coefficient is a measure of a linear correlation between two variables (between -1 and 1). It can be used when you are trying to find a smilarity between entities.

Let’s say that you have a list of movie ratings:

Now, the easiest way to find a difference between two ratings is the Euclidean distance.

*Formula for the Euclidean distance*

There are few problems with ED:

- it’s between 0 and ∞ where 0 means entities are the same (but this could be easily scaled to [0, 1])
- it doesn’t quantify how well two data objects fit a line
- difference between normalized and unnormalized data

On the other hand, Pearson correlation coefficient handles these issues pretty well. Let’s say that you have a user who rates almost all (good movies) movies with a 3 (and other movies with 1 and 2). You could easily say that his 3 is actually a 5. Pearson correlation coefficient does that normalization.

*Formula for the Pearson correlation coefficient*

Pearson correlation coefficient written in Ruby:

Let’s try it out: