The correlation coefficient or more precisely, the Pearson correlation coefficient, is a bivariate statistical descriptive statistics that examines the linear dependency between two vectors of observations. Given the range [1,-1] that the value of the Pearson correlation coefficient can have, it is very close to the value of trigonometric functions, especially the cosine function. Therefore, in this article from Arman Computer Magazine, we want to consider the correlation coefficient as well as the inner product of two vectors and arrive at the correlation coefficient calculation formula through the inner product.
Correlation Coefficient
Suppose we want to investigate the (linear) relationship between two variables or random vectors (a set of paired values). According to a formula developed by statisticians such as Galton and Pearson, we must first standardize the observations so that their mean is zero and their variance is 1. Then the product of the pairs of standardized values is equal to the correlation coefficient. The correlation coefficient is usually represented by the symbol R or r.
$$r = \frac{\sum (X_i-\bar{X})(Y_i-\bar{Y})}{\sigma_X\cdot\sigma_Y}$$
Assuming the mean is zero, we will have
$$r = \frac{\sum (X_i\cdot Y_i)}{\sqrt{\sum X_i^2\cdot\sum Y_i^2}}$$
For example, suppose we have a vector of observations as follows. There are six observations, which are represented by their values in pairs.
X: 1, 5, -5, -3, -1, 3
Y: 2, 5, -5, -2, -1, 1
Here we have chosen values that have a mean of zero to simplify the calculations. The correlation coefficient according to the above formula is r = 0.9567. We note that the standard deviation for the variable X is 3.41565 and for the variable Y is 3.16228.
Inner product and correlation coefficient
Consider the data from the previous example in Cartesian coordinates. Six points are plotted in two-dimensional
space, representing the observed values. The key point is that we consider the first coordinates of these points as one vector and the second coordinates as another vector, in order to measure the dependency between them
Note: The purpose of determining the correlation coefficient is to gauge the degree of relationship between two variables, allowing us to predict one variable based on the other. We aim to measure the intensity of the relationship between the first and second components of the points. Here, we define vector X as consisting of the first components of the points and vector Y as consisting of the second components. The inner product of these two vectors represents the cosine of the angle between them, which indicates their similarity and direction. The inner product formula for vectors X and Y is given by:
$$<X,Y> = X \cdot Y = \sum_i x_i\times y_i$$
In this context, instead of considering points in a two-dimensional space, we look at vectors formed by the values of the first and second components. We can use the Pearson correlation coefficient formula, which is derived from the inner product of two vectors whose values have a mean of zero. To achieve this, we subtract the mean from each of the values to obtain vectors with a mean of zero. The following is an example of calculations based on the inner product of vectors and the calculation of the correlation coefficient using Excel functions. You can download this file and modify the calculations as needed.
Here is the file link.