Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Why is Probability Important for Machine Learning?
Latest   Machine Learning

Why is Probability Important for Machine Learning?

Last Updated on July 25, 2023 by Editorial Team

Author(s): Mateo

Originally published on Towards AI.

Among many fields of mathematics, probability theory is often considered the bedrock of Machine Learning. But why not just rely on classical software engineering methods?

Image by Edge2Edge Media on Unsplash

Probability, Statistics, and Information Theory are topics you are guaranteed to encounter on your path to becoming a Machine Learning superstar. Nevertheless, if you come from a Computer Science or Software Engineering background, you might question the importance of these skills when it comes to applying your knowledge to solving actual problems. The reason might be that in software development, we typically work under the assumption that the world is deterministic and certain — we can safely assume the computer will do precisely what it is told, i.e., execute our code line-by-line flawlessly. Issues such as bit-flips and hardware failures are rare enough for us not have to account for them.
So why do we care that much about probability when it comes to computers learning on their own?

To answer this question and gain a better appreciation for the use of probability and information theory in Artificial Intelligence, Machine Learning, and Deep Learning, let’s look at a few examples of why working with uncertainty is necessary.

Why probability?

One explanation you might have come across is that in the real world, we need to make decisions even when the information is incomplete. While this is certainly the case, it’s likely not sufficient to convince you to dive into textbooks on probability straight away.

Machine Learning works in a completely different environment, where everything is inherently uncertain, stochastic, and just… messy. Machine Learning — and, more generally, Artificial Intelligence — works with models that are designed, trained, tuned, and evaluated with a probabilistic framework. Such frameworks for handling uncertainty tell us how the system should reason and provide us with tools for analyzing proposed systems’ behavior.
While there are cases where we could spend a lot of time coming up with a rigid, certain system, a quote from Ian Goodfellow’s Deep Learning book perfectly summarizes the justification for modeling under uncertainty:

“In many cases, it is more practical to use a simple but uncertain rule rather than a complex but certain one, even if the true rule is deterministic and our modeling system has the fidelity to accommodate a complex rule.”

Deep Learning by Goodfellow, Bengio, and Courville

The book also provides a good example — consider an AI system working with the rule “Most birds fly.” Such a rule is cheap to develop and is broadly applicable, unlike a more certain rule of the form, “Birds fly, except for very young birds that have not yet learnt to fly, sick or injured birds that have lost the ability to fly, flightless species of birds including the cassowary, ostrich, and kiwi…”, which is expensive to develop, maintain, and communicate, and, after all this effort, still brittle and prone to failure.
Now, let’s have a look at the other reasons why working with uncertainty is often inevitable.

The system’s information is incomplete

In many applications, we have to trade off the completeness of the environment’s information for the efficiency and effectiveness of the algorithm. For example, in autonomous driving or robotics, the agent may use an occupancy grid mapping to scan the environment to locate obstacles and discretize the space when predicting the future location of each object. The discretization immediately creates uncertainty about the precise position, as each object can occupy any part of the discrete cell.

The system is not fully observable

Even systems that are deterministic might appear stochastic, if the observability of variables within the system is incomplete. For example, many card games are deterministic in the outcome based on the player’s choices, yet from their view, the “system” isn’t fully observable; hence the result is uncertain.

The system is inherently stochastic

Many problems and scenarios in the real world are not deterministic by default and therefore need to be described in a probabilistic manner. Examples include Monte Carlo simulations, Poisson processes (such as radioactive decay), or the dynamics of subatomic particles in quantum mechanics. Therefore, there’s no other option than to represent these problem within a probabilistic framework.

So, do I have to learn probability?

Well, that’s a different question — and, more importantly, a question you need to answer yourself. While this article focuses on the undeniable importance of probability theory to the field, it shouldn’t discourage you from continuing with your current plan. It doesn’t mean you should stop whatever you are doing and not even think about touching PyTorch or TensorFlow until you obtain an advanced degree in probability and mathematical statistics. There are, in fact, several reasons why you should refrain from investing time into studying probability based on your current goals and needs.

First, it might simply be unnecessary.
While it’s always beneficial to appreciate the underlying, abstract theory, it shouldn’t discourage you from getting your hands dirty with solving real-world problems and diving into the aforementioned libraries. You can get really far with those without profound knowledge of probability theory.

Second, probability, statistics, and information theory are huge fields.
Just like with other math domains typically mentioned as prerequisites for Machine Learning, not all of the theory is directly relevant to Machine Learning, especially when it comes to applied scenarios. It’s likely worthwhile to first identify the topics relevant to your needs.

Third, it takes a lot of time to effort to master it.
This really depends on your current goal. For example, if you are a student and know you want to do Machine Learning in the future, I recommend truly learning the basics and underlying theory. On the other hand, if you are looking for business opportunities or want to switch careers, taking months to study these math prerequisites might be counterproductive and will delay you from achieving your goals.


Probability, statistics, and information theory are of great importance to Machine Learning, as it always deals with uncertain (and sometimes non-deterministic) quantities. This article described the specific reasons for focusing on probabilistic modeling in Machine Learning systems, compared to the more traditional software development process, where we generally don’t have to worry about uncertainty.

Nevertheless, you have to decide on your own whether diving into probability theory fits your current goals, as it takes a lot of time to master the various concepts. To help you with that, I will be publishing more articles on probability and statistics topics in the future, so don’t forget to follow if you are interested!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓