Understanding Why There Is No Such Thing as ‘Correct Probability’ in Data Science

Last Updated on January 25, 2024 by Editorial Team

Author(s): Peyman Kor

Originally published on Towards AI.

Probability Should be Conditioned on your Current State of Information

Intro

In the field of data science, individuals frequently use the term “Probability”. However, a fundamental concept needs to be emphasized:

There is no such thing as a Correct Probability in Data Science.

Let’s go a little deeper. We heavily rely on building machine learning models that give the probability of future events.

For example, what is the probability of the next person defaulting on a bank loan?, or what is the probability of a particular transaction being fraudulent?

Uncertainty Vs. Variability:

One common confusion I see is to recognize the difference between two terms, uncertainty and variability.

Confusion comes from the point both Uncertainty ad Variability can be expressed in terms of probability, but they are different concepts.

Variability

Variability is a state of things: It is quantified by the frequency of observed actual value.

For example, you can quantify the variability of the height of 100 students at the high school by plotting the probability mass function of the 100 data points.

Uncertainty

Uncertainty is a state of mind: It is quantified by the probability of a future event being true or not.

In the previous example, when we want to assign a probability to the height of the next person (101) who hasn’t been measured yet, here we are in the realm of uncertainty.

Uncertainty Comes from Person

In probability classes, you might heard the term “fair coin.” Well, the reality is that the factory producing the coin may not necessarily be concerned with ensuring the coin is perfectly fair.

It is our assumption, our belief, that the probability of coming head or tail is equal.

Probability is nothing more than our degree of belief, and it is much more useful to think about it as a measure that is conditioned on the current state of information we have.

Imagine two analysts sharing their weather predictions for tomorrow. One person suggests that there is a 70% of rain tomorrow, while another predicts a 50% chance.

Now, suppose tomorrow it rains. Who was correct? They both were. If it does not rain, they are still both equally correct.

There’s no such thing as an “actual probability “ of rain; each person presented a belief about the chance of rain, conditioned on the state of information they had.

Bayesian Example: What is the Probability of Rain Tomorrow?

Imagine that I want to forecast the probability of rain in my city tomorrow.

I can simply look at the previous year and see how many days of January were rainy. Say it was 20 days. Now, with this information, I can assign the probability of rain for tomorrow:

Now you just check the weather forecast, and it predicts heavy clouds and high humidity, which historically are associated with a 70% chance of rain.

Now, this is a new information. As we said, probability is just the state of information, and it changes when the information we have changes.

We can do simple Bayesian flipping to update our belief:

Now the probability of Rain (with weather forecast info) is around 0.82, which is different from what it was in the beginning when it was 2/3 (0.66).

The more information we receive, the probability we assign to the uncertain event changes, making probability a measure of the state of information.

Here is a simple Python code to reproduce the example:

prior_prob_rain = 20/30

print(f"Prior Probaility of Rain: {prior_prob_rain}")

prob_heavycloud_rain = 0.7
prob_heavycloud_norain = 0.3

# Calculate the total probability of heavy cloud
prob_heavycloud = prior_prob_rain * prob_heavycloud_rain + (1 - prior_prob_rain) * prob_heavycloud_norain

# Calculate the updated probability of rain
updated_prob_rain = prior_prob_rain * prob_heavycloud_rain / prob_heavycloud

# Print the results
#print(f"Total Probability of Heavy Cloud: {prob_heavycloud}")
print(f"Updated Probability of Rain: {updated_prob_rain}")

Main Message:

The main message I wish to convey is about assigning probabilities to uncertain events.

Two data scientists can, legitimately, assign different probabilities to an uncertain event if they have different information or process the same information differently.

Probability is subjective and personal. There is not “the” probability but rather “a probability”, and probability is conditioned on the current state of information we have.

References:

[1] Foundations of Decision Analysis, Ronald Howard (Author), Ali Abbas (Author)
[2] I appreciate the insightful dialog I had with Professor Reidar Brumer Bratvold and Professor Steve Begg. Their book on making good decisions will be a valuable resource to explore.

If you think this article helped you to learn more about this topic, please give it a U+1F44F and follow!

Resources:

Connect with MeU+007C Book a Free Call with Me

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Understanding Why There Is No Such Thing as ‘Correct Probability’ in Data Science

Author(s): Peyman Kor

Probability Should be Conditioned on your Current State of Information

Intro

Uncertainty Vs. Variability:

Variability

Uncertainty

Uncertainty Comes from Person

Bayesian Example: What is the Probability of Rain Tomorrow?

Main Message:

References:

Resources:

Connect with MeU+007C Book a Free Call with Me

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Understanding Why There Is No Such Thing as ‘Correct Probability’ in Data Science

Author(s): Peyman Kor

Probability Should be Conditioned on your Current State of Information

Intro

Uncertainty Vs. Variability:

Variability

Uncertainty

Uncertainty Comes from Person

Bayesian Example: What is the Probability of Rain Tomorrow?

Main Message:

References:

Resources:

Connect with MeU+007C Book a Free Call with Me

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement