
10 Essential Machine Learning Key Terms Explained
Image by Author | Ideogram
Introduction
Artificial intelligence (AI) is an umbrella computer science discipline focused on building software systems capable of mimicking human or animal intelligence capabilities to solve a task. Most AI systems nowadays are based on constructing models that learn by themselves from data to solve problems like making predictions, classifying images, or generating text, to name a few. These models that learn from data are part of what we call machine learning.
From a simple model that predicts the price of a house based on its size and location to advanced solutions that identify objects in video or generate human language responses to users, machine learning models are everywhere in our daily lives. Therefore, understanding a series of key terms surrounding it — some of which are often heard not only in tech discussions, but in industry and business talks as a whole — is key for comprehending and staying top of this massive subdomain of AI.
This article examines 10 essential machine learning terms and concepts that are key to understanding, whether you are an engineer, user, or consumer of machine learning systems.
1. Supervised Learning
Definition: Many machine learning models learn to make predictions by being exposed to labeled examples — that is, observations that have associated output values or labels. This is called supervised learning, and it entails tasks such as regression, classification, and time series forecasting. A key requirement for supervised learning is the availability of high-quality labeled data.
Why it’s key: Having a model learn to classify images of animals, people, etc., or accurately classify a financial transaction as legitimate or fraudulent, requires learning from a sufficient amount of representative, labeled examples of historical data with “known predictions”. By giving a supervised learning model a set of houses with diverse features alongside their associated prices, a model can learn the (sometimes complex) relationships between a house’s attributes and its price, hence becoming skilled at accurately estimating the unknown price of future house data.
2. Unsupervised Learning
Definition: Labeled data (i.e. data examples with known outputs) are not always available. Still, there is a family of machine learning models suitable for learning patterns and underlying structure from this unlabeled data. Unsupervised learning encompasses techniques to discover hidden relationships and groups of similar data, including tasks like clustering, anomaly detection, dimensionality reduction, and more.
Why it’s key: Since real-world data often lacks labels, unsupervised learning is sometimes as important as supervised learning — if not even more so. Techniques like clustering and anomaly detection can reveal structure, identify outliers, and generate insights about data like a body of customers, thereby turning unlabeled data into useful, labeled information.
3. Reinforcement Learning
Definition: Can you remember the last time you saw a small child learning to do something like using a toy by trial and error? This “trial and error” principle, or learning from experience, is the essence of reinforcement learning: a subarea of machine learning focused on training agents to make sequential decisions while pursuing a goal by maximizing (cumulative) rewards based on interactions with an environment.
Why it’s key: Supervised and unsupervised learning are limited in some specific application domains where reinforcement learning can shine — for instance, in robotics, game playing, recommender systems, and autonomous vehicles — where sequential decisions must be made over time predicated on feedback.
4. Overfitting and Underfitting
Definition: Two very common issues found when training a machine learning model — particularly supervised learning models like classifiers and regressors — are their inability to perform accurate predictions or generalize to new, unseen data. Underfitting occurs when a model is too simple to capture the underlying structure of the data. Meanwhile, at the opposite end of the spectrum, overfitting is a very common problem in trained machine learning models that “memorize” (or learn excessively from) the training data but fail to generalize to new inputs.
Why it’s key: Understanding overfitting and underfitting is essential to detect and deal with these common problems in machine learning systems. Becoming familiar with strategies like regularization, cross-validation (see below), and reducing model complexity is the first step to tackling these problems and building systems that perform well in real-world scenarios.
5. Bias-Variance Tradeoff
Definition: The bias-variance tradeoff is an important design decision to consider in building machine learning models. High bias happens when a model is too simplistic and fails to capture the true patterns in the data, whereas high variance occurs when a model is too sensitive to the training data and captures noise (irrelevant information for predictions or inference) as if it were important information. As one increases, the other tends to decrease; therefore, finding the right balance between bias and variance is crucial for achieving optimal model performance.
Why it’s key: Having a machine learning model with high bias or variance can have a negative impact in practice. For example, an e-mail classifier that is too biased might miss many spam messages, while one with high variance might mistakenly flag relevant messages as spam due to overfitting on training data.
6. Loss Function
Definition: The process of training a machine learning model on (typically labeled) data involves applying an optimization algorithm — see concept number 7 below — that iteratively learns from initially large prediction errors made by the model, gradually adjusting it so that its prediction errors become smaller and smaller. The loss function is the mathematical way to model these errors made during training, and it is given by the difference between the model’s predicted outputs and the actual target values in data examples with known outputs.
Why it’s key: The loss function is the compass that guides the optimization process, providing feedback on how well the model is performing. Hence, convergence towards an accurate model takes place thanks to minimizing this loss function.
7. Gradient Descent
Definition: If the loss function is the compass to guide the machine learning model toward a high-quality version of itself, the gradient descent algorithm is like the hiking boots that help the model (just like a hiker) navigate through the space of possible solutions (model versions) toward one that minimizes the loss function. The principle is similar to that of a hiker who is in the middle of a mountain slope and tries to find the lowest altitude point one step at a time: put it simply, it boils down to following the steepest downward direction, i.e. descending along the gradient.
Why it’s key: The gradient descent algorithm and its improved variants are the key elements to reducing a model’s loss function, thereby making the model effectively learn from data. Even the most sophisticated and state-of-the-art machine learning solutions today resort to this family of algorithms to optimize model performance.
8. Cross-Validation
Definition: Cross-validation is a well-established methodology to measure and validate not only a machine learning model’s performance while it is being trained, but also to estimate how well it will generalize to future data once deployed. It partitions the training set into multiple subsets or folds, using some for training and others for validation in a rotating and iterative manner. The process is repeated multiple times, and the results are averaged to obtain a robust estimate of model performance.
Why it’s key: Cross-validation is a more reliable approach than using a standalone validation split, because it reduces the risk of biased evaluation results and helps prevent overfitting the model to a single validation subset.
9. Feature Engineering
Definition: A high-quality machine learning model requires high-quality data to be properly trained. Preparing and preprocessing the data is an essential step in every machine learning project, and feature engineering, in which raw data is transformed into meaningful input features, becomes vital. Examples of feature engineering processes for machine learning modeling include encoding categorical variables numerically, scaling numerical values that may take disparate ranges, creating new attributes based on the interaction between existing ones, and extracting date or text features.
Why it’s key: Feature engineering sometimes helps discover useful features that can have a positive impact on the resulting machine learning model performance, namely by improving model accuracy, reducing training time, and fostering interpretability.
10. Model Evaluation Metrics
Definition: Whether you are building a machine learning model for classification, regression, clustering, or natural language processing tasks, you may want to evaluate how well your machine learning model performs at solving that task, particularly on a set of new or diverse data examples. Model evaluation metrics like accuracy, precision, recall, F1-score, or mean squared error are therefore essential to quantify and compare model performance.
Why it’s key: Without well-defined model evaluation metrics, it would be difficult to assess whether a model is genuinely effective, not to mention comparing different models or model versions, or making informed decisions about deploying a seemingly promising model in real-world applications.
Wrapping Up
This article described and underscored the significance of ten key concepts surrounding machine learning, the largest and most widely used of AI subdomains today. Being familiar with these concepts gives you a solid preparation to understand better machine learning techniques, models, and trends, both classic and newest solutions alike.