Why is Probability Important for Machine Learning?
Last Updated on July 25, 2023 by Editorial Team
Author(s): Mateo
Originally published on Towards AI.
Among many fields of mathematics, probability theory is often considered the bedrock of Machine Learning. But why not just rely on classical software engineering methods?
Probability, Statistics, and Information Theory are topics you are guaranteed to encounter on your path to becoming a Machine Learning superstar. Nevertheless, if you come from a Computer Science or Software Engineering background, you might question the importance of these skills when it comes to applying your knowledge to solving actual problems. The reason might be that in software development, we typically work under the assumption that the world is deterministic and certain β we can safely assume the computer will do precisely what it is told, i.e., execute our code line-by-line flawlessly. Issues such as bit-flips and hardware failures are rare enough for us not have to account for them.
So why do we care that much about probability when it comes to computers learning on their own?
To answer this question and gain a better appreciation for the use of probability and information theory in Artificial Intelligence, Machine Learning, and Deep Learning, letβs look at a few examples of why working with uncertainty is necessary.
Why probability?
One explanation you might have come across is that in the real world, we need to make decisions even when the information is incomplete. While this is certainly the case, itβs likely not sufficient to convince you to dive into textbooks on probability straight away.
Machine Learning works in a completely different environment, where everything is inherently uncertain, stochastic, and justβ¦ messy. Machine Learning β and, more generally, Artificial Intelligence β works with models that are designed, trained, tuned, and evaluated with a probabilistic framework. Such frameworks for handling uncertainty tell us how the system should reason and provide us with tools for analyzing proposed systemsβ behavior.
While there are cases where we could spend a lot of time coming up with a rigid, certain system, a quote from Ian Goodfellowβs Deep Learning book perfectly summarizes the justification for modeling under uncertainty:
βIn many cases, it is more practical to use a simple but uncertain rule rather than a complex but certain one, even if the true rule is deterministic and our modeling system has the fidelity to accommodate a complex rule.β
β Deep Learning by Goodfellow, Bengio, and Courville
The book also provides a good example β consider an AI system working with the rule βMost birds fly.β Such a rule is cheap to develop and is broadly applicable, unlike a more certain rule of the form, βBirds fly, except for very young birds that have not yet learnt to fly, sick or injured birds that have lost the ability to fly, flightless species of birds including the cassowary, ostrich, and kiwiβ¦β, which is expensive to develop, maintain, and communicate, and, after all this effort, still brittle and prone to failure.
Now, letβs have a look at the other reasons why working with uncertainty is often inevitable.
The systemβs information is incomplete
In many applications, we have to trade off the completeness of the environmentβs information for the efficiency and effectiveness of the algorithm. For example, in autonomous driving or robotics, the agent may use an occupancy grid mapping to scan the environment to locate obstacles and discretize the space when predicting the future location of each object. The discretization immediately creates uncertainty about the precise position, as each object can occupy any part of the discrete cell.
The system is not fully observable
Even systems that are deterministic might appear stochastic, if the observability of variables within the system is incomplete. For example, many card games are deterministic in the outcome based on the playerβs choices, yet from their view, the βsystemβ isnβt fully observable; hence the result is uncertain.
The system is inherently stochastic
Many problems and scenarios in the real world are not deterministic by default and therefore need to be described in a probabilistic manner. Examples include Monte Carlo simulations, Poisson processes (such as radioactive decay), or the dynamics of subatomic particles in quantum mechanics. Therefore, thereβs no other option than to represent these problem within a probabilistic framework.
So, do I have to learn probability?
Well, thatβs a different question β and, more importantly, a question you need to answer yourself. While this article focuses on the undeniable importance of probability theory to the field, it shouldnβt discourage you from continuing with your current plan. It doesnβt mean you should stop whatever you are doing and not even think about touching PyTorch or TensorFlow until you obtain an advanced degree in probability and mathematical statistics. There are, in fact, several reasons why you should refrain from investing time into studying probability based on your current goals and needs.
First, it might simply be unnecessary.
While itβs always beneficial to appreciate the underlying, abstract theory, it shouldnβt discourage you from getting your hands dirty with solving real-world problems and diving into the aforementioned libraries. You can get really far with those without profound knowledge of probability theory.
Second, probability, statistics, and information theory are huge fields.
Just like with other math domains typically mentioned as prerequisites for Machine Learning, not all of the theory is directly relevant to Machine Learning, especially when it comes to applied scenarios. Itβs likely worthwhile to first identify the topics relevant to your needs.
Third, it takes a lot of time to effort to master it.
This really depends on your current goal. For example, if you are a student and know you want to do Machine Learning in the future, I recommend truly learning the basics and underlying theory. On the other hand, if you are looking for business opportunities or want to switch careers, taking months to study these math prerequisites might be counterproductive and will delay you from achieving your goals.
Conclusion
Probability, statistics, and information theory are of great importance to Machine Learning, as it always deals with uncertain (and sometimes non-deterministic) quantities. This article described the specific reasons for focusing on probabilistic modeling in Machine Learning systems, compared to the more traditional software development process, where we generally donβt have to worry about uncertainty.
Nevertheless, you have to decide on your own whether diving into probability theory fits your current goals, as it takes a lot of time to master the various concepts. To help you with that, I will be publishing more articles on probability and statistics topics in the future, so donβt forget to follow if you are interested!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI