Data Analysis

What is the Coefficient of Determination | R Square

The coefficient of Determination is the direct indicator of how good our model is in terms of performance whether it is accuracy, Precision or Recall. In more technical terms we can define it as The Coefficient of Determination is the measure of the variance in response variable ‘y’ that can be predicted using predictor variable ‘x’. It is the most common way to measure the strength of the model.

Note: It is desired to have an understanding of what are Covariance and Correlation which helps in better understanding of the Coefficient of Determination as they are interrelated with each other.

 The coefficient of determination (R Square) for a linear regression model with one independent variable can be calculated as below:

R Square = { ( 1 / N ) * Σ [ (x_i – xbar) * (y_i – ybar) ] / (σx * σy ) }^2

  • where N is the number of observations used to fit the model
  • Σ is the summation symbol
  • x_i is the x value for observation i
  • xbar is the mean of x values
  • y_i is the y value for observation i
  • ybar is the mean of y value
  • σx is the standard deviation of x
  • And σy is the standard deviation of y.

So if you look carefully it is just the square of the Correlation coefficient.

There is one more way to calculate R Square value as follows:

R Square

However, we no need to remember the formula. Linear Regression model itself calculate everything for us and displays in the output summary. we just need to know what each term is telling about the model. So here goes a sample summary of Linear Regression Model output:

The value of the coefficient of Determination varies from 0 to 1. 0 means there is no linear relationship between predictor variable ‘x’ and response variable ‘y’ and 1 mean there is a perfect linear relationship between ‘x’ and ‘y’. However getting exactly 0 or 1 is nearly impossible in real data set. If we getting 0 or 1 then we need to re-check our code there is some error.

High value of R Square indicates model is able to predict response variable with less error.

Suppose you are working in a real-world problem and you are asked to predict whether a person will take a loan or not. So after working Hours and Months, you come up with a model. Now the question is how will you define the performance or strength of your model. Here comes the coefficient of determination. So f you have it with you and suppose its value is .85 then you can say your model is reliable and it is able to predict up to 85% of the variance in your response variable.

So that’s all about Coefficient of Determination. Please feel free to share your Ideas/thoughts in the comments section below.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s