Confusion Matrix in Machine Learning

Have you expected great results from your machine learning model, only to get poor accuracy? You’ve put in the effort, so what went wrong? How can you fix it? There are many ways to assess your classification model, but the confusion matrix is one of the most reliable option. It shows how well your model performed and where it made errors, helping you improve. Beginners often find the confusion matrix confusing, but it’s actually simple and powerful. This tutorial will explain what a confusion matrix in machine learning is and how it provides a complete view of your model’s performance.

Despite its name, you’ll see that a confusion matrix is straightforward and effective. Let’s explore the confusion matrix together!

Learning Objectives

Learning the ropes in the machine learning field? These courses will get you on your way:

Table of contents

What is a Confusion Matrix?

A confusion matrix is a performance evaluation tool in machine learning, representing the accuracy of a classification model. It displays the number of true positives, true negatives, false positives, and false negatives. This matrix aids in analyzing model performance, identifying mis-classifications, and improving predictive accuracy.

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the total number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

For a binary classification problem, we would have a 2 x 2 matrix, as shown below, with 4 values:

Confusion matrix

Let’s decipher the matrix:

But wait – what’s TP, FP, FN, and TN here? That’s the crucial part of a confusion matrix. Let’s understand each term below.

Important Terms in a Confusion Matrix

True Positive (TP)

True Negative (TN)

False Positive (FP) – Type I Error

False Negative (FN) – Type II Error

Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier (say logistic regression or decision tree) on it and get the below confusion matrix:

confusion matrix in machine learning

The different values of the Confusion matrix would be as follows:

This turned out to be a pretty decent classifier for our dataset, considering the relatively larger number of true positive and true negative values.

Remember the Type I and Type II errors. Interviewers love to ask the difference between these two! You can prepare for all this better from our Machine learning Course Online.

Why Do We Need a Confusion Matrix?

Before we answer this question, let’s think about a hypothetical classification problem.

Let’s say you want to predict how many people are infected with a contagious virus in times before they show the symptoms and isolate them from the healthy population (ringing any bells, yet?). The two values for our target variable would be Sick and Not Sick.

Now, you must be wondering why we need a confusion matrix when we have our all-weather friend – Accuracy. Well, let’s see where classification accuracy falters.

Our dataset is an example of an imbalanced dataset. There are 947 data points for the negative class and 3 data points for the positive class. This is how we’ll calculate the accuracy:

Equation_Accuracy

Let’s see how our model performed:

Model performed , confusion matrix

The total outcome values are:

TP = 30, TN = 930, FP = 30, FN = 10

So, the accuracy of our model turns out to be:

Model performed , confusion matrix

But it gives the wrong idea about the result. Think about it.

Our model is saying, “I can predict sick people 96% of the time”. However, it is doing the opposite. It predicts the people who will not get sick with 96% accuracy while the sick are spreading the virus!

Do you think this is a correct metric for our model, given the seriousness of the issue? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious virus? Or maybe, out of the correct predictions, how many are positive cases to check the reliability of our model?

This is where we come across the dual concept of Precision and Recall.

How to Calculate Confusion Matrix for a 2-class Classification Problem?

To calculate the confusion matrix for a 2-class classification problem, you will need to know the following:

Once you have these values, you can calculate the confusion matrix using the following table:

PredictedTRUEFALSE
PositiveTrue positives (TP)False positives (FP)
NegativeFalse negatives (FN)True negatives (TN)

Here is an example of how to calculate the confusion matrix for a 2-class classification problem:

# True positives (TP) TP = 100 # True negatives (TN) TN = 200 # False positives (FP) FP = 50 # False negatives (FN) FN = 150 # Confusion matrix confusion_matrix = [[TP, FP], [FN, TN]]

The confusion matrix can be used to calculate a variety of metrics, such as accuracy, precision, recall, and F1 score.

Precision vs. Recall

Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Here’s how to calculate Precision:

This would determine whether our model is reliable or not.

Recall tells us how many of the actual positive cases we were able to predict correctly with our model.

And here’s how we can calculate Recall:

Confusion Matrix Recall

Example Confusion matrix in machine learning

We can easily calculate Precision and Recall for our model by plugging in the values into the above questions:

Confusion_Matrix_Precision_Recall

50% percent of the correctly predicted cases turned out to be positive cases. Whereas 75% of the positives were successfully predicted by our model. Awesome!

Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.

Precision is important in music or video recommendation systems, e-commerce websites, etc. Wrong results could lead to customer churn and be harmful to the business.

Recall is a useful metric in cases where False Negative trumps False Positive.

Recall is important in medical cases where it doesn’t matter whether we raise a false alarm, but the actual positive cases should not go undetected!

In our example, when dealing with a contagious virus, the Confusion Matrix becomes crucial. Recall, assessing the ability to capture all actual positives, emerges as a better metric. We aim to avoid mistakenly releasing an infected person into the healthy population, potentially spreading the virus. This context highlights why accuracy proves inadequate as a metric for our model’s evaluation. The Confusion Matrix, particularly focusing on recall, provides a more insightful measure in such critical scenarios

But there will be cases where there is no clear distinction between whether Precision is more important or Recall. What should we do in those cases? We combine them!

What is F1-Score?

In practice, when we try to increase the precision of our model, the recall goes down, and vice-versa. The F1-score captures both the trends in a single value:

Confusion_Matrix_Precision_Recall

F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall.

But there is a catch here. The interpretability of the F1-score is poor. This means that we don’t know what our classifier is maximizing – precision or recall. So, we use it in combination with other evaluation metrics, giving us a complete picture of the result.

Confusion Matrix Using Scikit-learn in Python

You know the theory – now let’s put it into practice. Let’s code a confusion matrix with the Scikit-learn (sklearn) library in Python.

Python Code:

Sklearn has two great functions: confusion_matrix() and classification_report().

Mirco average is the precision/recall/f1-score calculated for all the classes.


Macro average is the average of precision/recall/f1-score.


Weighted average is just the weighted average of precision/recall/f1-score.

Confusion Matrix for Multi-Class Classification

How would a confusion matrix in machine learning work for a multi-class classification problem? Well, don’t scratch your head! We will have a look at that here.

Let’s draw a confusion matrix for a multiclass problem where we have to predict whether a person loves Facebook, Instagram, or Snapchat. The confusion matrix would be a 3 x 3 matrix like this:

Multiclass confusion matrix

The true positive, true negative, false positive, and false negative for each class would be calculated by adding the cell values as follows:

Multiclass confusion matrix result

That’s it! You are ready to decipher any N x N confusion matrix!

Conclusion

The Confusion matrix is not so confusing anymore, is it?

Hope this article gave you a solid base on how to interpret and use a confusion matrix for classification algorithms in machine learning. The matrix helps in understanding where the model has gone wrong and gives guidance to correct the path and it is a powerful and commonly used tool to evaluate the performance of a classification model in machine learning.

We will soon come out with an article on the AUC-ROC curve and continue our discussion there. Until next time, don’t lose hope in your classification model; you just might be using the wrong evaluation metric!

Key Takeaways

Frequently Asked Questions

Q1. Which confusion matrix is good?

A. A good confusion matrix is one that exhibits clear diagonal dominance, indicating that the majority of instances are correctly classified. Additionally, minimal off-diagonal values suggest that misclassifications are relatively rare. However, the interpretation of what constitutes a “good” confusion matrix may vary depending on the specific context and goals of the classification task.

Q2. What is the goal of confusion matrix?

A. The goal of a confusion matrix is to provide a clear summary of the performance of a classification model. It helps in understanding how well the model is classifying instances into different categories by comparing the predicted labels with the actual labels.

Q3. What is the F1 score in a confusion matrix?

A. The F1 score is a measure of a model’s accuracy that takes both precision and recall into account. It is the harmonic mean of precision and recall.

Q4. How do you draw a confusion matrix?

A. Drawing a confusion matrix involves creating a table with rows and columns representing the actual and predicted classes, respectively. The cells of the table contain the counts or percentages of instances that fall into each combination of actual and predicted classes. Typically, the actual classes are represented along the rows, while the predicted classes are represented along the columns.

Q5. How to use the Confusion Matrix in Machine Learning?

To use a confusion matrix in machine learning in 4 steps:

Train a machine learning model. This can be done using any machine learning algorithm, such as logistic regression, decision tree, or random forest.

Make predictions on a test dataset. This is a dataset of data that the model has not been trained on.

Construct a confusion matrix. This can be done using a Python library such as Scikit-learn.

Analyze the confusion matrix. Look at the diagonal elements of the matrix to see how many instances the model predicted correctly. Look at the off-diagonal elements of the matrix to see how many instances the model predicted incorrectly.