AUC-ROC Curve: Visually Explained
In any machine learning problem, evaluating a model is as important as building one. Without evaluation, the whole process of formulating the business problem and creating a machine model for it goes in vain. In one of the previous articles, I have mentioned about the critical aspects of the Confusion Matrix. ROC (Receiver Operating Characteristic) curve is also one such tool to determine the suitability of a classification model. It is a curve that can help us understand how well we can distinguish between two similar responses (e.g., an email is spam or not). These responses vary based on the business problem we are trying to solve. Hence a proper understanding of what ROC Curve is can make our work more comfortable at times.
“ROC curve represents the extent of …”
Better models can accurately distinguish between the two responses, whereas a poor model will have difficulties in distinguishing between the two. We visually confirm this behavior by studying the Area Under the ROC curve (AUC-ROC). Let us dive into its explanation with a simple example case.
Probability Distribution of Responses:
Let’s assume we have a model that predicts whether an email is spam or not. Most of the machine learning models provide us propensity (probability) for each email to be spam. This probability distribution can be drawn like this.
- Red distribution represents all the emails which are not spam
- Green distribution represents all the emails which are spam
- Shaded region represents the extent of the model’s inability to distinguish between the classes
- Optimum probability cutoff is chosen to maximize the business objective (detecting spam emails in this case).
An optimum probability cutoff is used to segregate between the two responses (Spam or Not Spam) in this case. This decision-making process is represented in the below diagram for a cutoff of (0.5).
Theoretically, we can choose this probability cutoff anywhere between 0 and 1. For any such cutoff value, we get a different set of predicted responses. We can build a confusion matrix corresponding to each set of these predictions and hence evaluation metrics are bound to change in this case. This is presented in the below diagram for a cutoff of (0.5).
Note that the region of overlap corresponds only to wrong predictions (FP and FN).
Till now, we have understood the basic idea of the model predictions using a probability cutoff. Let us now understand how we use this to plot a ROC curve and how we can interpret that.
In the above section, we understood how the results and the confusion matrix could vary based on the threshold we wish to choose for our problem. To study this variation in these error metrics, we plot the ROC-curve. ROC Curve is the graph between True Positive Rate (TPR) and the False Positive Rate (FPR) for different cutoff values, where,
True Positive Rate = Sensitivity or Recall
False Positive Rate = 1- Specificity
As we decrease the probability cutoff, we get more positive responses thus, we increase the sensitivity (TPR) of the model. Meanwhile, this decreases the specificity and hence (1-specificity) or FPR increases. Similarly, as we increase the probability cutoff, we get more negative responses thus, we increase the specificity (FPR decreases) while decreasing sensitivity.
In short, when we increase FPR, TPR also increases and vice versa.
AUC stands for Area Under the Curve. In this case, it means the Area Under the ROC Curve (AUC-ROC). It is a metric that tells you how separable your positive and negative responses are from each other. This metric varies with the model we choose for the problem in hand.
In the above figure, note how the AUC depends upon only the model that we choose. It projects a model’s performance over a landscape of all possible thresholds.
Greater the area under this curve (AUC), greater the model’s ability to separate the responses (e.g., Spam and Not Spam). Below figures represents the nature of the curve for different AUC values. Notice how the curve gets flattered with decreasing AUC. In machine learning approaches, our aim revolves around increasing this AUC as much as possible. It is done by choosing a model that maximizes the AUC of this ROC curve.
The assimilation of probability distributions with differing cutoff can be summaries in the below animation.
This animation explains how the change in probability distributions obtained by different models can lead to a different overlapping region and the corresponding ROC curve.
I hope I have given you some basic understanding of what is the AUC-ROC curve. We also learned how we could use it to evaluate any model’s performance and compare it with other similar models.