7 Matplotlib Tricks to Better Visualize Your Machine Learning Models


7 Matplotlib Tricks to Better Visualize Your Machine Learning Models

7 Matplotlib Tricks to Better Visualize Your Machine Learning Models
Image by Author | ChatGPT

Introduction

Visualizing model performance is an essential piece of the machine learning workflow puzzle. While many practitioners can create basic plots, elevating these from simple charts to insightful, elevated visualizations that can help easily tell the story of your machine leanring model’s interpretations and predictions is a skill that sets great professionals apart. The Matplotlib library, the foundational plotting tool in the scientific and computational Python ecosystem, is packed with features that can help you achieve this.

This tutorial provides 7 practical Matplotlib tricks that will help you better understand, evaluate, and present your machine learning models. We’ll move beyond the default settings to create visualizations that are not only aesthetically pleasing but also rich in information. These techniques are designed to integrate smoothly into your workflow with libraries like NumPy and Scikit-learn.

The assumption here is that you are already familiar with Matplotlib and its general usage, as we won’t be covering that here. Instead, we will focus on how to improve your skills with code for 7 specific machine learning task-related scenarios.

Since we will take the approach of treating each of our code solutions independently, get ready to see import matplotlib.pyplot as plt quite a bit today 🙂

1. Applying Professional Styles for Instant Polish

The default look of Matplotlib can sometimes feel a bit… dated. A simple yet effective trick for an elevated experience is to use Matplotlib’s built-in style sheets. With a single line of code, you can apply professional themes that mimic the aesthetics of popular tools like R’s ggplot or the Seaborn library. This instantly improves readability and visual appeal.

Let’s see the difference a style sheet can make. We’ll start with a basic scatter plot and then apply the 'seaborn-v0_8-whitegrid' style.

Here is the generated visualization:

Applying professional styles for instant polish

Applying professional styles for instant polish

As you can see, applying a style adds a grid, changes the font, and adjusts the overall color scheme, making the plot much easier to interpret.

2. Visualizing Classifier Decision Boundaries

Understanding how a classification model separates data is a must. A decision boundary plot shows the regions of the feature space that a model associates with each class. This visualization is an invaluable tool for diagnosing how a model generalizes and where it might be making errors.

We’ll train a Support Vector Machine (SVM) on the classic Iris dataset and plot its decision boundaries. To make it visible in 2D, we’ll only use two features. The trick is to create a mesh grid of points and have the model predict the class for each point, then use plt.contourf() to draw the colored regions.

And here are our classifier decision boundaries, visualized:

Visualizing classifier decision boundaries

Visualizing classifier decision boundaries

This plot shows how the SVM classifier divvies up the feature space, separating the three species of iris.

3. Plotting a Clear Receiver Operating Characteristic Curve

The Receiver Operating Characteristic (ROC) curve is a standard tool for evaluating binary classifiers. The ROC plots the true positive rate against the false positive rate at various threshold settings. The Area Under the Curve (AUC) provides a single number to summarize the model’s performance, as presented in the ROC plot. A good ROC plot should include the AUC score and a baseline for comparison.

Let’s use Scikit-learn to calculate the ROC curve points and AUC, then use Matplotlib to plot them cleanly. Adding a label with the AUC score makes the plot self-contained and easy to understand.

And here is the resulting robust ROC curve plot:

Plotting a clear receiver operating characteristic curve

Plotting a clear receiver operating characteristic curve

4. Building an Annotated Confusion Matrix Heatmap

A confusion matrix is a table summarizing the performance of a classification model. Raw numbers are useful here, but a heatmap visualization makes it much faster to spot patterns, such as which classes are commonly confused. Annotating the heatmap with the actual numbers provides both a quick visual summary and precise details.

We’ll use Matplotlib’s imshow() function to create the heatmap and then loop through the matrix to add text labels to each cell.

Here is the resulting easy-to-quickly-interpret confusion matrix:

Building an annotated confusion matrix heatmap

Building an annotated confusion matrix heatmap

5. Highlighting Feature Importance

For many models, especially tree-based ensembles like random forests or gradient boosting, we can extract a measure of how important each feature was in making predictions. Visualizing these scores helps in understanding the model’s behavior and guiding feature selection efforts. A horizontal bar chart is often the best choice for this task.

We’ll train a RandomForestClassifier, extract the feature importances, and display them in a sorted horizontal bar chart for easy comparison.

Let’s take a look at the feature importances plotted:

Highlighting feature importance

Highlighting feature importance

6. Plotting Diagnostic Learning Curves

Learning curves are a powerful tool for diagnosing whether a model is suffering from a bias problem (underfitting) or a variance problem (overfitting). They show the model’s performance on the training set and the validation set as a function of the number of training samples.

We’ll use Scikit-learn’s learning_curve utility to generate the scores and then plot them. A key trick here is to also plot the standard deviation of the scores to understand the stability of the model’s performance.

This is the resulting learning curve plot:

Plotting diagnostic learning curves

Plotting diagnostic learning curves

7. Creating a Gallery of Models with Subplots

There are times when you will want to compare the performance of several different models. Placing their visualizations side-by-side in a single figure makes this comparison direct and efficient. Matplotlib’s subplot functionality is perfect for creating this kind of “model gallery.”

We’ll create a grid of plots, with each subplot showing the decision boundary for a different classifier on the same dataset.

Here are the gallery of the various different classifier’s decision boundaries:

Creating a gallery of models with subplots

Creating a gallery of models with subplots

Wrapping Up

Mastering these 7 Matplotlib tricks will significantly enhance your ability to analyze, diagnose, and communicate the results of your machine learning models. Effective visualization is not only about creating pretty pictures; it’s about crafting and presenting a deeper intuition for how models work and conveying complex findings in a clear, impactful way. By moving beyond default plots and thoughtfully crafting your visualizations, you can accelerate your own understanding and more effectively share your insights with others.


Leave a Comment