What is the Significance of ROC AUC Curve?

ROC AUC curve helps you to determine the threshold of binary classification problems in machine learning. In Machine Learning classification problems are based on the probability value and its not always correct to have the threshold as 0.5. It depends on the type and domain of the problem. For example in a legal case you don’t want the false positive to be high or it should be at least as possible. so the threshold in this case would be very high. the term AUC that is Area under curve tells us the model goodness of fit. It is used to do the comparative analysis between different classifiers and identify which one is performing good.

Important points regarding ROC AUC Curve

  • It is useful in predicting the probability of a binary outcome.
  • The ROC curve is constructed by plotting the true positive rate (TPR) against the false positive rate (FPR).
  • It plots the false alarm rate versus the hit rate.
  • As summarized, ROC curve helps you in determining the right threshold value for your problem considering the variations of FPR and TPR values.
  • ROC curve is also used to compare the strength of different classifiers trained for binary classification problem.
  • Area under ROC curve is known as AUC. Normally greater the AUC value more good the model is.

The true positive rate is calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives.

True Positive Rate = True Positives / (True Positives + False Negatives)

The false positive rate is calculated as the number of false positives divided by the sum of the number of false positives and the number of true negatives.

False Positive Rate = False Positives / (False Positives + True Negatives)

ROC AUC Curve

The 45 degree diagonal line tells the random guess that is the 50-50 percent chances of both the classes.

As we go up the y axis true positive rate increases which is the desired outcome but simultaneously false positive rate also increases but the point is we need to check whether increment in true positive rate (y-axis ) is more than that in false positive rate (x-axis). So here is the catch, if increment in true positive rate is much much bigger than the increment in false positive rate then we just keep increasing the threshold until we observe the increment in false positive rate start increasing rapidly.

Also it depends on problem to problem. based on type of problem we fix the threshold.

In the above graph till the point 0.9 at y axis increment in true positive is much much bigger than the increment in false positive. Hence if we are dealing with a very sensitive problem like legal cases where we can not take risk of convicting an innocent defendant. So in this problem we can set the high threshold of 0.9. It means if there is a probability value greater than 0.9 then only model should declare a person guilty else not. (binary classification problem guilty or not guilty)

Point to remember: A classifier with high AUC can occasionally score worse in a specific region than another classifier with lower AUC but a rare situation.

In the above curve, Area under Curve is 87%. Which is known as the AUC score of the classifier.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.