Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Inventory Optimization with Data Science: Hands-On Tutorial with Python
Data Science   Latest   Machine Learning

Inventory Optimization with Data Science: Hands-On Tutorial with Python

Last Updated on November 5, 2023 by Editorial Team

Author(s): Peyman Kor

Originally published on Towards AI.

Photo by Christin Hume on Unsplash


Inventory optimization is like solving a tricky puzzle. As a broad problem, it arises across many domains, it is all about figuring out how many products to order for your store.

Think about a bike shop owner who orders new bikes for sale. But there is a tricky situation here. If she orders too many bikes for the store, she will spend too much on maintaining and storing the bikes. On the other side, if she orders fewer bikes, she might not have enough bikes to meet customer demands, which is a loss of profit and (reputation).

What she needs is an optimal “strategy” that helps her make the best daily decision about how much to order, ensuring long-term profit for her inventory.

So, in the context of this problem, data scientists with programming, data, and modeling knowledge can play an important role in figuring out this best “strategy.” However, we need some foundation knowledge to reach that objective (to answer that question). We need to have a basic understanding about:

  • Markov Process,
  • Markov Rewards Process
  • and Markov Decision Processes.
  • Finally, we will combine these three concepts and connect them with
  • Dynamic Programming and
  • Reinforcement Learning

To arrive at the optimal ” strategy” I raised earlier. This blog aims to understand and model “Markov Processes” in Python, which will be a building block for the next steps.

Inventory Optimization: Do Out-the-Box ML Models Can Solve it?

Honestly, I wrote this blog “out of frustration” to see how Inventory optimization is modeled and solved on the online resources. In my Ph.D., I had to deal with inventory optimization (as my topic is on Sequential Decision Making). I did some research and went through papers and some books and was able to find the right modeling approach on how to “consistently” solve it.

The point is that “Inventory Optimization” type problems are dynamic problems, where given what state (inventory situation) you are in, you need to adapt to that situation and have an adaptive policy.

Inventory Optimization isn’t a static problem that can be addressed with a static analytical method, nor can it be solved using off-the-shelf machine learning/ deep learning models. It’s a dynamic process where its components must be understood and modeled, enabling you to dynamically adapt daily decisions.

Markov Process

If you are an analytics professional (whether a Data Scientist, Analyst, etc.), you will often be dealing with Processes that are time-indexed and follow an uncertain path. Think about a data scientist who works in an energy company. Her task involves keeping track of the uncertain path of commodity prices.

The commodity price (Oil, for instance) will follow a path at time steps t=0,1,2,⋯. This is an uncertain process where it was indexed by time. However, if we want to do an analysis, we need to internally represent the process.


Think of the State as a tool to internally represent the uncertain process. Let's go back to the example of oil prices. Let’s say today’s oil price is 100$. I can represent that information by saying S_0=100, then tomorrow the price will be different, can be represented as S_1, and this sequence continues S_0, S_1,S_2. The process is a sequence of random states St∈S, as at time t=0,1,2,3,.. — We can represent the process with:

Markov Property

Here, in this short blog, I want to talk about the Markov Process. We call the process (Markovian) if it has a Markov Property, meaning the transition of states has the following (property):

What does this equation mean in simpler terms?

We can think about a simple example of weather. Assume that a day can have three possible weather conditions, “Sunny”, “Rainy,” and “Snowy.” Then, if today is “Sunny”,

  • There is a 70% chance that tomorrow will be “Sunny” too,
  • 20% chance that the weather will turn to “Rainy” tomorrow, and
  • 10% chance tomorrow's weather will be “Snowy”.

In this weather example, we can see that the probability of tomorrrow’s weather condition, “Only” depends on “Today”, so the history of the weather condition is “irrelavant”. As long as weather today is sunny (St=”Sunny”), the probability of weather tomorrow to be P(S_{t+1}=”Sunny”)=0.7, P(S_{t+1}=“Rainy”)=0.2 and P(S_{t+1}=“Snowy”)=0.1.

Let’s work on a simple real-world example to understand the Markov Process better. Here, the example is about managing the inventory of a bike shop, and it will follow up with some hands-on Python coding.

A Simple Example of Bike Shop Inventory Management

Photo by Christin Hume on Unsplash

Back to the bike shop example: Imagine you own a bike shop with an inventory that can only hold a certain number of bicycles (limited capacity). For example, suppose your shop can hold a maximum of 5 bikes.

Every day, some customers come to buy bikes from your shop. Assume that today (Wednesday), you have three bikes in your shop. However, you also know that tomorrow (Thursday), there will be some demand for your store, meaning some of your bikes will be sold. You are not sure how many customers you will have tomorrow, meaning the exact demand for bikes tomorrow is uncertain. We need to Model this process more accurately, model how the state of inventory changes and how it evolves. ¹

Problem Description

In this blog, our main focus is on the Markov Process. This means that in this blog, we will have (assume) a fixed policy (in this case, the policy is how many bikes to order every day). To model the Markov Process of this example, first, we need to frame this problem (what is the state?). Secondly, we need to build the model of state transition.

The state (the internal representation of the process) can be described by two components:

  • α : is the number of bikes you already have in store
  • β : is the number of bikes that you ordered the previous day and will arrive tomorrow morning (these bikes are on a truck)

Sequence of 24-hr cycle

The sequence of events in a 24-hour cycle for this bike shop is as follows:

  1. At 6 PM: you observe the state S_t:(α,β)
  2. At 6 PM: you order the new bikes, equal to max(C−(α+β),0)
  3. At 6 Am: You receive bikes that you ordered 36hr ago
  4. At 8 AM: You open the store
  5. From 8 Am to 6 PM you experience demand i during the day (modelled using Poisson distribution) more below.
  6. At 6 PM you close the store

The diagram below visualizes this sequence:

The sequence of events in a 24-hr cycle for Bike shop store — Image Source: Author

Similar to the explanation above, the St=(αt+βt) is the internal representation of this stochastic process, and in the Markov Process, we want to model how this stochastic process evolves.

An Example demonstrating a 24-hour sequence of inventory.

Let me give an example. Let’s say David is the owner of a bike shop. On a Wednesday (6 PM), he has 2 bikes in his store and one bike he ordered on Monday 6 PM. His state would be:

So the state at the time Wednesday 8 PM is S=(α=2,β=1). Then, each day, there is a random (non-negative integer) demand for bikes, with demand modeled following Poisson distributions (with Poisson parameter λ∈R>). A demand of i bikes for each i=0,1,2⋯ occurs with probability:

We can visualize this distribution to see the probability of experiencing different demands, given we pick the λ=2

# import matplotlib and some desired styling
import matplotlib.pyplot as plt
%matplotlib inline"ggplot")
plt.rcParams["figure.figsize"] = [10, 6]

# need numpy to do some numeric calculation
import numpy as np

# poisson is used to find pdf of Poisson distribution
from scipy.stats import poisson
x_values = np.arange(0,10)

# pdf Poisson distri with lambda = 1
pdf_x_values = poisson.pmf(k=x_values,mu=2), pdf_x_values, edgecolor='black')
plt.xticks(np.arange(0, 10, step=1))
plt.xlabel("Costumer Demand")
plt.ylabel("Probability of Costumer Demand")
plt.title("Probability Distribution of Costumer Demand with $\lambda = 1$ ")
plt.savefig("fig/poisson_lambda_2.png", dpi=300)
Probability of experiencing different demands, Source: Author

Constructing the Probability Transition of the Markov Process:

Now that we have an initial understanding of what a state is, St=(α,β), the uncertainty that introduces the concept of a stochastic process is the customer demand (i). We can build upon how the probability transition evolves within this problem. I will do coding for that one, but first, let’s explain it in simpler terms.

The Probability transition of this problem has two cases, case 1 and 2.

  • Case 1)

If the demand i is smaller than the total inventory available on that day, initial inventory=α+β

Transition equation probability – Source: Author
  • Case 2)

If the demand i is more than the total inventory available on that day, initial inventory=α+β

Transition equation probability – Source: Author

Where F(α+β−1) is the CDF of Poisson distribution.

Having understood the background of this problem, we can now write some Python code to understand it better. The crucial aspect of this Python code is designing the data structure to store the transition of the Markov Process. To do that, I am designing the data structure as a Dictionary I call “MarkovProcessDict”. The keys of this dictionary correspond to the current state, and the values (represented in a dictionary) are the next states, along with the probabilities associated with each next state. This is a sample example of how the data structure MarkovProcessDict looks like:

from typing import Dict

MarkovProcessDict = {"Current State A":{"Next State 1, from A": "Probability of Next State 1, from A",
"Next State 2, from A": "Probability of Next State 2, from A"},

"Current State B":{"Next State 1, from B": "Probability of Next State 1, from B",
"Next State 2, from B": "Probability of Next State 2, from B" }}

Image Source: Author

Let’s unpack what data structure MarkovProcessDict means. For example, the initial state is “Current State A”, which can go to the two news states:

  • 1) “Next State 1, from A”, with probability “Probability of Next State 1, from A
  • 2) “Next State 2, from A”, with probability Probability of Next State 2, from A”.

Hands-On Coding

Let’s write code to build our MarkovProcessDict data structure, given two different cases of this process, explained earlier.

MarkovProcessDict: Dict[tuple, Dict[tuple, float]] = {}

user_capacity = 2
user_poisson_lambda = 2.0
# We are condiering all possible states
# That we can face in running this bike shop
for alpha in range(user_capacity+1):

for beta in range(user_capacity + 1 - alpha):

# This is St, the current state
state = (alpha, beta)

# This is initial inventory, total bike you have at 8AM
initial_inventory = alpha + beta

# The beta1 is the beta in next state, irrespctive of
#current state (as the decsion policy is constant)
beta1 = user_capacity - initial_inventory

# List of all possible demand you can get
for i in range(initial_inventory +1):

# if initial demand can meet the demand
if i <= (initial_inventory-1):

# probality of specifc demand can happen
transition_prob = poisson.pmf(i,user_poisson_lambda)

# If we already defined teh state in our data
# (MarkovProcessDict)
if state in MarkovProcessDict:

MarkovProcessDict[state][(initial_inventory - i, beta1)]= transition_prob


MarkovProcessDict[state] = {(initial_inventory - i, beta1):transition_prob }

# if initial demand can not meet the demand

# probability of not meeting the demands
transition_prob = 1- poisson.cdf(initial_inventory -1, user_poisson_lambda)

if state in MarkovProcessDict:

MarkovProcessDict[state][(0, beta1)]= transition_prob


MarkovProcessDict[state] = {(0, beta1 ):transition_prob }

In the above code, the for loop iterates over all possible combinations of states, and each state (St) moves to the next state (S_{t+1}) with a probability of “transition_prob”. We can print the dynamic of the system with the following code:

or (state, value) in MarkovProcessDict.items():

print("The Current state is: {}".format(state))

for (next_state, trans_prob) in value.items():
print("The Next State is {} with Probability of {:.2f}".format(next_state, trans_prob))
Printing Python code — Image Source: Author

Visualizing the Final Data Structure

One way to understand these dynamics is using the graphviz python package, where each node represents the Current State, and the edges show the probability of moving from that state to another.

# import the package
import graphviz

# define the initial structure of graphviz Diagraph , color in light blue and filled style
d = graphviz.Digraph(node_attr={'color': 'lightblue2', 'style': 'filled'},

d.attr(layout = "circo")

for s, v in MarkovProcessDict.items():

for s1, p in v.items():

#represent alpha and beta with s[0] and s[1]
d.edge("(\u03B1={}, \u03B2={})".format(s[0],s[1]),
# represent p as probability of moving from current state to new state
"(\u03B1={}, \u03B2={})".format(s1[0],s1[1]), label=str(round(p,2)), color="red")

Printing Python code — Image Source: Author

In the plot below, I visualized the graphics of this transition. For example, if we are state, St = (α=1,β=0)(the circle in the top left), there is 86% chance the next state will be S(t+1):(α=0,β=1) and 14% chance the next state will be S(t+1):(α=1,β=1).

Visualization of Markov Process and State Transitions — Image Source: Author

Final Notes

  • Inventory optimization is not a static optimization problem; rather, it is a sequential decision-making that needs an adaptive policy to make the best decision given the uncertainty at each time stage.
  • In this blog, I tried to build a mathematical model to keep track of the Inventory Process (using State and Markov Process) and visualize the process's dynamic with hands-on Python coding.
  • This blog builds a foundation on how to deal with inventory optimization problems, where in the next steps, we will work on Markov Reward and Markov Decision Process.

[1] You can read this example more in-depth in “Foundation of Reinforcement Learning with Application in Finance”. However, I have rewritten Python codes in this blog to make it easier to understand.

Thanks for reading so far!

I hope this article has provided an easy-to-understand tutorial on how to do inventory optimization with Python.

If you think this article helped you to learn more about inventory optimization and Markov Process, please give it a U+1F44F and follow!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓