Covariance and Correlation
Covariance and Correlation are very helpful in understanding the relationship between two continuous variables. Covariance tells whether both variables vary in same direction (positive covariance) or in opposite direction (negative covariance). There is no significance of covariance numerical value only sign is useful. Whereas Correlation explains about the change in one variable leads how much proportion change in second variable. Correlation varies between -1 to +1. If correlation value is 0 then it means there is no Linear Relationship between variables however other functional relationship may exist.
Let’s understand these terms in detail:
In the study of covariance only sign matters. Positive value shows that both variables vary in same direction and negative value shows that they vary in opposite direction.
Covariance between two variables x and y can be calculated as following:
- x̄ is sample mean of x
- ȳ is sample mean of y
- x_i and y_i are the values of x and y for ith record in sample.
- n is the no of records in sample
Significance of the formula:
- Numerator: Quantity of variance in x multiplied by quantity of variance in y.
- Unit of covariance: Unit of x multiplied by unit of y
- Hence if we change the unit of variables, covariance will have new value however sign will remain same.
- Therefore numerical value of covariance does not have any significance however if it is positive then both variables vary in same direction else if it is negative then they vary in opposite direction.
As covariance only tells about the direction which is not enough to understand the relationship completely, we divide the covariance with standard deviation of x and y respectively and get correlation coefficient which varies between -1 to +1.
- -1 and +1 tells that both variables have perfect linear relationship.
- Negative means they are inversely proportional to each other with the factor of correlation coefficient value.
- Positive means they are directly proportional to each other mean vary in same direction with the factor of correlation coefficient value.
- if correlation coefficient is 0 then it means there is no linear relationship between variables however there could exist other functional relationship.
- if there is no relationship at all between two variables then correlation coefficient will certainly be 0 however if it is 0 then we can only say that there is no linear relationship but there could exist other functional relationship.
Correlation between x and y can be calculated as following:
- S_xy is the covariance between x and y.
- S_x and S_y are the standard deviation of x and y respectively.
- r_xy is correlation coefficient.
- Correlation coefficient is dimensionless quantity. Hence if we change the unit of x and y then also coefficient value will remain same.
Let’s understand what is the significance of correlation coefficient with the help of below graph:
If you are an aspiring data scientist or an experienced professional who is trying to make his career in Data Science, then you must visit E-network. Where we focus on high-quality interactive mock interview sessions and help you to QuickStart your Data Science and Machine Learning journey by Preparing a learning roadmap, providing study material, suggesting Best training institutes and provide practice problems with their solutions and many more…
Feel free to contact us for more details and discussions.