Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


35 Words About Uncertainty, Every AI-Savvy Leader Must Know
Latest   Machine Learning

35 Words About Uncertainty, Every AI-Savvy Leader Must Know

Last Updated on July 24, 2023 by Editorial Team

Author(s): Yannique Hecht

Originally published on Towards AI.

Artificial Intelligence

Think you can explain these? Put your knowledge to the test!

[This is the 3rd part of a series. Make sure you read about Search and Knowledge before continuing. Future topics include Optimization, Machine Learning, Neural Networks, and Language.]

Uncertainty around artificial intelligence is twofold.

First, we still know little about how to apply AI practically. Which techniques are best suited for which problems? Which parts of the value chain benefit the most from AI? Which technical skills will be relevant in five years?

To get an initial idea about potential answers to these three questions, consider following the rabbit hole in this McKinsey resource.

Second, computers often have to deal with imperfect, incomplete, even uncertain information. This constraint requires AIs to ‘believe’ something only with a certain probability. That’s the type of uncertainty we are concerned with. To get you started, this article briefly defines the main concepts and terms.


uncertainty: a situation involving imperfect or unknown information

probability: a numerical description of how likely an event is going to happen or that a proposition is true

possible world: possible events given a situation, e.g., getting a ‘1’ when rolling a dice; notated with the letter:


set of all possible worlds: all possible worlds combined, which when added up equal one; e,g., getting a ‘1, 2, 3, 4, 5 or 6’ when rolling a dice; notated with the letter:


range of possibilities: ‘0’ means an event is certain not to happen, whereas ‘1’ means an event is absolutely certain to happen, notated as:

0 ≤ P(ω) ≤ 1

unconditional probability: the degree of belief in a proposition in the absence of any other evidence

conditional probability: the degree of belief in a proposition given some evidence that has already been revealed; the probability of ‘rain today’ given ‘rain yesterday’:

P(aU+007Cb) (probability of a given b), 
P(rain todayU+007Crain yesterday)
P(aU+007Cb) = [P(a ∧ b)] / P(b)
P(a ∧ b) = P(b) P(aU+007Cb)
P(a ∧ b) = P(a) P(bU+007Ca)

random variable: a variable in probability theory with a domain of possible values it can take on, for example:

{sun, cloud, rain, wind, snow}

probability distribution: a mathematical function that provides the probabilities of occurrence of different possible outcomes, for example:

P(Flight = on time) = 0.6 
P(Flight = delayed) = 0.3
P(Flight = cancelled) = 0.1
or:P(Flight) = ⟨0.6, 0.3, 0.1⟩

independence: the knowledge that one event occurs does not affect the probability of the other event

P(a ∧ b) = P(a)P(bU+007Ca) or
P(a ∧ b) = P(a)P(b)

Bayes' rule: (or Bayes’ theorem) of one probability theory’s most important rules, describing the probability of an event, based on prior knowledge of conditions that might be related:

P(bU+007Ca) = [P(b) P(aU+007Cb)] / P(a)

Thus, knowing…

P(cloudy morning U+007C rainy afternoon)

… we can calculate:

P(rainy afternoon U+007C cloudy morning)
P(rainU+007Cclouds) = [ P(cloudsU+007Crain)P(rain) ] / P(clouds)

joint probability: the likelihood that two events will happen at the same time

P(a,b) = P(a) * P(9)

probability rules: a number of algebraic manipulations useful to calculate different probabilities, including negation, inclusion-exclusion, marginalization, or conditioning

negation: a handy probability rule to figure out the probability of an event not happening, for example:

P(¬cloud) = 1 − P(cloud)

inclusion-exclusion: another probability rule, which excludes double-counts to calculate the probability of event a or b:

P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

marginalization: a very useful probability rule (much more details here by

Jonny Brooks-Bartlett)

P(a) = P(a, b) + P(a, ¬b)

conditioning: our final probability rule, implying that if we have two events (a and b), instead of having access to their joint probabilities, we have access to their conditional probabilities:

P(a) = P(aU+007Cb)P(b) + P(aU+007C¬b)P(¬b)

bayesian networks: a data structure that represents the dependencies among random variables

inference: the process of using data analysis to deduce properties of an underlying distribution of probability

query: variable for which to compute the distribution

evidence variable: observed variables for event e

hidden variable: non-evidence, non-query variable

inference by enumeration: a process for solving inference queries given a joint distribution and conditional probabilities

approximate inference: a systematic iterative method to estimate solutions, such as a Monte-Carlo simulation

sampling: a technique in which samples from a larger population are chosen using various probability methods

rejection sampling: (or acceptance-rejection method) a basic technique used to generate observations from a given distribution

likelihood weighting: a form of importance sampling where various variables are sampled in a predefined order and where evidence is used to update the weights

Markov assumption: the assumption that the current state depends on only a finite fixed number of previous states

Markov chain: a sequence of random variables where the distribution of each variable follows the Markov assumption

hidden Markov models: a Markov model for a system with hidden states that generate some observed event

sensor Markov assumption: the assumption that the evidence variable depends only the corresponding state

filtering: a practical application of probability information: given observations from start until now, calculate a distribution for the current state

prediction: a practical application of probability information: given observations from start until now, calculate a distribution for a future state

smoothing: a practical application of probability information: given observations from start until now, calculate a distribution for past state

most likely explanation: a practical application of probability information: given observations from start until now, calculate the most likely sequence of states

“One thing is certain on the path toward grasping and applying artificial intelligence: uncertainty.”

Now that you’re able to explain the most essential Uncertainty related terms, you’re hopefully more comfortable exploring these concepts further on your own.

This puts you on the third stage of your journey to becoming a fully-fledged AI-savvy leader. Explore similar AI-related topics, including Search, Knowledge, Optimization, Machine Learning, Neural Networks, and Language.

Like What You Read? Eager to Learn More?
Follow me on
Medium or LinkedIn.

About the author:
Yannique Hecht works in the fields of combining strategy, customer insights, data, and innovation. While his career has been in the aviation, travel, finance, and technology industry, he is passionate about management. Yannique specializes in developing strategies for commercializing AI & machine learning products.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓