Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Bernoulli Distribution — Probability Tutorial with Python
Editorial   Statistics   Tutorials

Bernoulli Distribution — Probability Tutorial with Python

Last Updated on October 21, 2021 by Editorial Team

Author(s): Pratik Shukla, Roberto Iriondo

Source: Unsplash

Probability, Statistics

Bernoulli distribution tutorial — diving into the discrete probability distribution of a random variable with examples in Python

In this series of tutorials, we will dive into probability distributions in detail. We will not just showcase formulas, but instead, we will see how each of the formulas derives from their basic definitions (as it is essential to understand the math behind the derivations), and we will showcase such by using some examples in Python.

This tutorial’s code is available on Github and its full implementation as well on Google Colab.

Table of Contents:

  1. What is a Random Variable?
  2. Discrete Random Variable.
  3. Continuous Random Variable.
  4. Probability Distributions.
  5. Bernoulli Distribution.
  6. Probability Mass Function (PMF).
  7. Mean of Bernoulli Distribution.
  8. The variance of a Bernoulli Distribution.
  9. Standard Deviation of Bernoulli Distribution.
  10. Mean Deviation of Bernoulli Distribution.
  11. Moment Generating Function for a Bernoulli Distribution.
  12. Cumulative Density Function (CDF) for a Bernoulli Distribution.
  13. Python Implementation.
  14. Summary of the Bernoulli Distribution.
  15. Resources.
  16. References.

📚 Check out our Moment Generating Function Tutorial with Python. 📚


Before diving deep into probability distributions, let’s first understand some basic terminology about a random variable.

Figure 1: Basic types of data | Bernoulli Distribution a Probability Tutorial with Python
Figure 1: Basic types of data

What is a Random Variable?

A variable is called a random variable if its value is unknown. In other words, a variable is a random variable if we cannot get the same variable using any kind of function.

A random variable is a variable whose possible values are numerical outcomes of a random phenomenon.

Properties of a random variable:

  1. We denote random variables with a capital letter.
  2. Random variables can be discrete or continuous.

Examples:

  1. Tossing a fair coin:
Figure 2: Randomly tossing a coin. | Bernoulli Distribution a Probability Tutorial with Python
Figure 2: Randomly tossing a coin.

In figure 1, we show that the outcome is not dependent on any other variables. So the output of tossing a coin will be random.

2. Rolling a fair die:

Figure 3: Rolling a die. | Bernoulli Distribution a Probability Tutorial with Python
Figure 3: Rolling a die.

In figure 2, we can notice that the output of a die cannot be predicted in advance, and it is not dependent on any other variables. So we can say that the output will be random.

Now let’s have a brief look at non-random variables.

Figure 4: Non-random variables. | Bernoulli Distribution a Probability Tutorial with Python
Figure 4: Non-random events.

In the example above, we can see that in example 1, we can quickly get the value of variable x by subtracting one from both sides. Therefore, the value of x is not random, but it is fixed. In the second example, we can see that the value of variable y is dependent on the value of variable x, where we can notice that the value of y changes according to the value of x. We can generate the same output variable y when we plugin the same value of x. So variable y is not random at all. In probability distributions, we will work with random variables.

Discrete Random Variable:

A random variable is called a discrete random variable if its values can be obtained by counting. Discrete variables can be counted a finite amount of time. The critical thing to note here is that discrete variables need not be an integer. We can have discrete random variables that are finite float values.

Examples:

  1. The number of students present on a school bus.
  2. The number of cookies on a plate.
  3. The number of heads while flipping a coin.
  4. The number of planets around a star.
  5. The net income of family members.

Continuous Random Variable:

A random variable is called a continuous random variable if its values can be obtained by measuring. We cannot count continuous variables in a finite amount of time. In other ways, we can say that it will take an infinite amount of time to count continuous variables.

Examples:

  1. The exact weight of a random animal in the universe.
  2. The exact height of a randomly selected student.
  3. The exact distance traveled in an hour.
  4. The exact amount of food eaten yesterday.
  5. The exact winning time of an athlete.

The vital thing to notice is that we are mentioning the word “Exact” here. It means that all the measurements we take are up to absolute precision.

Figure 5: Completion time for a race. | Bernoulli Distribution a Probability Tutorial with Python
Figure 5: Completion time for a race.

For example, if we measure the completion time of a race for an athlete, we can say that he completed the race in 9.5 seconds. To be more precise, we can say that he completed the race in 9.52 seconds. To be more precise, we can say that the athlete completed the race in 9.523 seconds. To add more precision to the time taken, we can also say that he completed the race in 9.5238 seconds. If we keep on doing this, we can take this thing to an infinite level of precision, and it will take us an infinite amount of time to measure it. That is why it is called a continuous variable.

Main Difference Between Discrete and Continuous Variable:

Example: What is your current age?

What do you think about this? Is it a continuous variable or discrete variable? Please take a moment to think about it.

The example is classified into the group of continuous variables. As discussed above, we can say the following about your age:

Figure 6: Current age with precisions. | Bernoulli Distribution a Probability Tutorial with Python
Figure 6: Current age with precisions.

Notice that we can continue writing age with more and more precision. Therefore, we can not count the exact age of a person in a finite amount of time. That is why it is a continuous variable.

On the other hand, if the question was, what is your current age in years?”. Then, in this case, the variable can be classified in the group of discrete variables. Since we already know that my age at this point is an “X amount of years.”

Next, let’s discuss Probability Distributions. Probability distributions are bases on data types, and they can be either Discrete or Continuous.

Probability Distribution:

A probability distribution is a mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.[1]

Figure 7: Types of probability distributions. | Bernoulli Distribution a Probability Tutorial with Python
Figure 7: Types of probability distributions.

Bernoulli Distribution:

Conditions for the Bernoulli Distribution

  1. There must be only one trial.
  2. There must be only two possible outcomes of the trial, one is called a success, and the other is called failure.
  3. P(Success) = p
  4. P(Failure) = 1 — p = q
  5. Conventionally, we assign the value of 1 to the event with probability p and a value of 0 to the event with probability 1 — p.
  6. Conventionally, we have p>1 — p. Another way we can say that we take the probability of success(1) as p and probability of failure(0) as 1 — p so that P(Success)>P(Failure).
  7. We must have the probability of one of the events (Success or Failure) or some past data that indicates experimental probability.

If our data satisfies the conditions above, then:

Figure 8: 

A discrete random variable X follows a Bernoulli distribution with the probability of success=p.
Visual representation of Bernoulli distribution:

Figure 9: A visual representation of the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 9: A visual representation of the Bernoulli distribution.

Examples:

Figure 10: Examples of the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 10: Examples of the Bernoulli distribution.

For instance:

There are only two candidates in an election: Patrick and Gary, and we can either vote for Patrick or Gary.

  • P(Success) = P(1) = Vote for Patrick = 0.7
  • P(Failure) = P(0) = Vote for Gary = 0.3

Here we have only one trial and only two possible outcomes. So we can say that the data follows a Bernoulli distribution. To visualize it:

Figure 11: Bernoulli distribution example graph. | Bernoulli Distribution a Probability Tutorial with Python
Figure 11: Bernoulli distribution example graph.

Probability Mass Function (PMF):

A probability mass function of a discrete random variable X assigns probabilities to each of the possible values of the random variable. By using PMF, we can get the probabilities of each random variable.

Let X be a discrete random variable with its possible values denoted by x1, x2, x3, …, xn. The probability mass function(PMF) must satisfy the following conditions:

Properties of PMF:

  1. The sum of all the probabilities in a given PMF must be 1.
Figure 12: Sum of probabilities in the PMF. | Bernoulli Distribution a Probability Tutorial with Python
Figure 12: Sum of probabilities in the PMF.

2. All the possible probability values must be greater than or equals to 0.

Figure 13: Probability of a random variable. | Bernoulli Distribution a Probability Tutorial with Python
Figure 13: Probability of a random variable.

Probability Mass Function (PMF) for Bernoulli Distribution:

Figure 14: Probability mass function (PMF). | Bernoulli Distribution a Probability Tutorial with Python
Figure 14: Probability mass function (PMF).

Let’s visualize the function:

Figure 15: Bernoulli distribution visualization. | Bernoulli Distribution a Probability Tutorial with Python
Figure 15: Bernoulli distribution visualization.

Mean for Bernoulli Distribution:

The mean of discrete random variable X is it is a weighted average. Its probability weights each value of random variable X. In the Bernoulli Distribution, the random variable X can take only two values: 0 and 1, and we can quickly get the weight by using the Probability Mass Function(PMF).

Mean: The mean of a probability distribution is the long-run arithmetic average value of a random variable having that distribution.

The expected value E[X] expresses the likelihood of the favored event.

Figure 16: Expected value E[X}. | Bernoulli Distribution a Probability Tutorial with Python
Figure 16: Expected value E[X}.

The expected value or the mean of Bernoulli Distribution is given by:

Figure 17: Mean of the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 17: Mean of the Bernoulli distribution.

Mean of Bernoulli Distribution:

Figure 18: Proof of mean for the Bernoulli Distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 18: Proof of mean for Bernoulli Distribution.

Variance for Bernoulli Distribution:

Variance(σ2) is the measure of how far each number from the set of random numbers is from the mean. The square root of the variance is called the standard deviation.

Based on its definition:

Figure 19: Variance formula. | Bernoulli Distribution a Probability Tutorial with Python
Figure 19: Variance formula.

The variance of a discrete probability distribution:

Figure 20; The variance for a discrete probability distribution.
Figure 20; The variance for a discrete probability distribution.

In our case, variable x can take only two values: 0 and 1.

Figure 21: The variance for the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 21: The variance for the Bernoulli distribution.

The variance of Bernoulli Distribution:

Figure 22: Proof of the variance of the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 22: Proof of the variance of the Bernoulli distribution.

There is a more popular form to find variance in statistics:

Figure 23: Popular form of the variance formula. | Bernoulli Distribution a Probability Tutorial with Python
Figure 23: Popular form of the variance formula.

Let’s see how this came into existence.

Basically, the variance is the expected value of the squared difference between each value and the mean of the distribution.

From the definition of variance, we can then:

Figure 24: Proof of the variance formula. | Bernoulli Distribution a Probability Tutorial with Python
Figure 24: Proof of the variance formula.

Finding the variance using this formula:

Figure 25: Proof of variance for the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 25: Proof of variance for the Bernoulli distribution.

In figure 25, we can see that the Bernoulli distribution variance is the same regardless of which formula we use.

Standard Deviation for Bernoulli Distribution:

A standard deviation is a number used to tell how measurements for a group are spread out from the average (mean or expected value).

A low standard deviation means that most of the numbers are close to the average, while a high standard deviation means that the numbers are more spread out.

Figure 26: Standard deviation for the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 26: Standard deviation for the Bernoulli distribution.

Mean Deviation for Bernoulli Distribution:

The mean deviation is the mean of the absolute deviations of a data set about the data’s mean.

Based on the definition:

Figure 27: Mean deviation formula. | Bernoulli Distribution a Probability Tutorial with Python
Figure 27: Mean deviation formula.

For Discrete probability Distribution:

Figure 28: Mean deviation for probability distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 28: Mean deviation for the probability distribution.

Finding the mean deviation for the Bernoulli distribution:

Figure 29: The mean deviation for the Bernoulli distribution.
Figure 29: The mean deviation for the Bernoulli distribution.

Moment Generating Function For Bernoulli Distribution:

Figure 30; Summary of the relationship between central and raw moments. | Bernoulli Distribution a Probability Tutorial with
Figure 30; Summary of the relationship between central and raw moments.

For the following derivations, we will use the formulas we derived in our previous tutorial. So we recommend you to check out our tutorial on Moment Generating Function.

Figure 31: Definition of a moment generating function. | Bernoulli Distribution a Probability Tutorial with Python
Figure 31: Definition of a moment generating function.

Moment Generating Function:

Figure 32: Moment generating function for a Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Pyth
Figure 32: Moment generating function for a Bernoulli distribution.

Finding Raw Moments:

1. First Moment:

a. First Raw Moment:

Figure 33: First raw moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 33: First raw moment.

2. Second Moment:

a. Second Raw Moment:

Figure 34: Second raw moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 34: Second raw moment.

b. Second Central Moment (Variance):

Figure 35: Second central moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 35: Second central moment.

3. Third Moment:

a. Third Raw Moment:

Figure 36: Third raw moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 36: Third raw moment.

b. Third Central Moment:

Figure 37: Third central moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 37: Third central moment.

c. Third Standardized Moment: (Skewness)

Figure 38: Third standardized moment (skewness). | Bernoulli Distribution a Probability Tutorial with Python
Figure 38: Third standardized moment (skewness).

4. Fourth Moment:

a. Fourth Raw Moment:

Figure 39; Fourth raw moment.
Figure 39; Fourth raw moment.

b. Fourth Centered Moment:

Figure 40: Fourth centered moment. | Bernoulli Distribution a Probability Tutorial with Python
Figure 40: Fourth centered moment.

c. Fourth Standardized Moment:( Kurtosis):

Figure 41: Fourth standardized moment (Kurtosis). | Bernoulli Distribution a Probability Tutorial with Python
Figure 41: Fourth standardized moment (Kurtosis).
Figure 42: Fourth standardized moment (excess kurtosis).
Figure 42: Fourth standardized moment (excess kurtosis).

Cumulative Distribution Function(CDF):

Figure 43: Cumulative density function definition. | Bernoulli Distribution a Probability Tutorial with Python
Figure 43: Cumulative density function definition.

Based on the Probability Mass Function (PMF), we can write the Cumulative Distribution Function (CDF) for the Bernoulli distribution as follows:

Figure 44: Cumulative density function for a Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Pyt
Figure 44: Cumulative density function for a Bernoulli distribution.

Next to the fun part, let’s move on to its implementation in Python.

Python Implementation:

  1. Import required libraries:
Figure 45: Importing the required libraries.
Figure 45: Importing the required libraries.

2. Find the moments:

Figure 46: Finding the moments for the Bernoulli distribution with a p-value of 0.7. | Bernoulli Distribution a Probability T
Figure 46: Finding the moments for the Bernoulli distribution with a p-value of 0.7.

3. Get the mean value:

Figure 47: Mean for p=0.7. | Bernoulli Distribution a Probability Tutorial with Python
Figure 47: Mean for p=0.7.

4. Get median value:

Figure 48: Median for p=0.7. | Bernoulli Distribution a Probability Tutorial with Python
Figure 48: Median for p=0.7.

5. Get variance value:

Figure 49: Variance for p=0.7.
Figure 49: Variance for p=0.7.

6. Get standard Deviation value:

Figure 50: Standard deviation for p=0.7.
Figure 50: Standard deviation for p=0.7.

7. Probability Mass Function (PMF):

Figure 51: Probability mass function for p=0.7. | Bernoulli Distribution a Probability Tutorial with Python
Figure 51: Probability mass function for p=0.7.

8. Plotting the PMF:

Figure 52: Scatter plot of the PMF for p=0.7.
Figure 52: Scatter plot of the PMF for p=0.7.

9. Cumulative Density Function (CDF):

Figure 53: Cumulative density function for p=0.7.
Figure 53: Cumulative density function for p=0.7.

10. Plot the CDF:

Figure 54: Scatter plot of the CDF for p=0.7.
Figure 54: Scatter plot of the CDF for p=0.7.

11. Plot the bar graph for PMF:

Figure 55: Plot the bar graph for the PMF. | Bernoulli Distribution a Probability Tutorial with Python
Figure 55: Bar graph for the PMF of p-value 0.7.

12. Plot the bar graph for CDF:

Figure 56: The bar graph of the CDF for p-value 0.7.
Figure 56: The bar graph of the CDF for p-value 0.7.

13. Output for different experiments:

Figure 57: Generating the output for different Bernoulli experiments.
Figure 57: Generating the output for different Bernoulli experiments.

Summary of the Bernoulli Distribution:

Figure 58: Summary of the Bernoulli distribution. | Bernoulli Distribution a Probability Tutorial with Python
Figure 58: Summary of the Bernoulli distribution.

That is it for the Bernoulli distribution tutorial. We hope you enjoyed reading it and learned something new. We will try to cover more probability distributions in-depth in the future. Any suggestions or feedback is crucial to continue to improve. Please let us know in the comments if you have any.


DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of Carnegie Mellon University, nor other companies (directly or indirectly) associated with the author(s). These writings do not intend to be final products, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.

Published via Towards AI

Resources:

Google colab implementation.

Github repository.

References:

[1] https://en.wikipedia.org/wiki/Probability_distribution

[2] https://www.statlect.com/probability-distributions/Bernoulli-distribution

[3] https://en.wikipedia.org/wiki/Variance

[4] https://en.wikipedia.org/wiki/Bernoulli_distribution

[5] https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.bernoulli.html

Feedback ↓