Solving Facility Location
Last Updated on December 15, 2024 by Editorial Team
Author(s): Saif Ali Kheraj
Originally published on Towards AI.
We will start with one important use case to solve a facility location problem. Suppose we have a service-based company named MMM that sells different products to customers across various regions. Every year, MMM faces tough decisions about where to set up new offices and how many engineers to host in each office. The goal is to make these choices in a way that reduces service costs, office rent, and utilities expenses while still meeting all customer demands. By using data science and prescriptive analytics, MMM can carefully analyze costs, capacities, and customer locations to find the best facility setup that saves money and delivers services efficiently.
For example, when the CEO sees these costs, he may want to optimize them, reducing total expenses while still meeting all demand constraints and making other necessary decisions.
Introduction
Service Cost: This is the direct cost associated with providing after-sales services, such as labor, maintenance, and support services.
- New York has the highest service cost at $12,500,000. This is likely due to higher wages and operational costs in a metropolitan area where skilled labor demands a premium.
- Other cities, like Baltimore and Richmond, have significantly lower service costs because of lower wage scales and potentially less demand for specialized services.
Office Rent: This cost reflects the expense of leasing office space needed to support operations. New York again stands out with an office rent of $45,000,000, highlighting the high real estate costs in urban centers, particularly in prime locations. Boston and Washington D.C. also show considerable office rent, but significantly lower than New York. In contrast, cities like Virginia Beach and Raleigh have much lower rent costs, making them more affordable locations for businesses.
Utility Fee: Utility costs encompass expenses for electricity, water, heating, and other essential services required to maintain the facility.
- City Variations:
- The utility fees are generally lower compared to office rents, but still reflect the local cost of utilities. For example, New York has a utility fee of $10,000,000, while Virginia Beach has the lowest at $200,000, indicating that utility costs vary with local energy rates and usage levels.
As a manager or CEO, one must find ways to reduce these high costs. Maybe closing a certain location would save on office rent, but then the service costs might increase. Thus, thereβs a natural trade-off between these two factors.
- If we have fewer facilities then we will have higher service cost
- If we have more facilities, then we will have lower service cost.
Not only that, but there are many other factors involved. For example, if we close one facility, the customers using that facility need to be reassigned elsewhere, and engineers must also be reassigned. The situation also depends on an environment that keeps changing. For instance, in the future a customer might need fewer services, or overall demand could shift. When these changes occur, we simply update our input data to reflect the new conditions.
Mathematical Optimization Process
Main goal is to minimize the total operational cost of setting up and maintaining facilities to serve customers across various locations. This total cost includes:
- Office Rent Costs for facilities at different scale levels (small, medium, large).
- Traveling Costs for engineers to serve customers from facilities.
- Engineer Hosting Costs at each facility, representing the cost of maintaining engineers at these locations.
Decision Variables
Decision Variable depends on:
- Facility Presence: Whether to establish a facility in a given location.
- Facility Scale: The size and capacity of the facility if established.
- Engineer Allocation: How many engineers will be assigned to each facility.
- Customer Assignment: Which customers are served by which facilities.
- Facility Location and Scale Decision (
x_jk
): Whether to establish a facility at a particular locationj
and scale levelk
. If yes, this variable will be set to 1, otherwise 0.
2. Customer Assignment Decision (y_ij
): Whether a customer i
is assigned to be served by a facility at location j
. This binary variable is 1 if the customer is assigned to the facility, and 0 otherwise. We will use linking constraint to ensure a customer i can only be assigned to a facility location j if that facility has been established.
3. Engineer Allocation Decision (w_j
): The number of engineers allocated to each facility location j
. In the constraints part, we will add capacity constraint as well.
Code to define Decision Variables
import pulp
# Step 1: Initialize the LP problem
problem = pulp.LpProblem("Facility_Location_Problem", pulp.LpMinimize)
# Step 2: Define Decision Variables
# Binary decision variable x_jk: 1 if a facility j is built at scale k, 0 otherwise
x_jk = pulp.LpVariable.dicts("x_jk", ((loc, scale) for loc, scale in f_jk), cat='Binary')
# Binary decision variable y_ij: 1 if customer i is served by facility j, 0 otherwise
y_ij = pulp.LpVariable.dicts("y_ij", ((cust, loc) for cust in a_ij for loc in a_ij[cust]), cat='Binary')
# Engineer hosting variable w_j (number of engineers assigned to location j)
w_j = pulp.LpVariable.dicts("w_j", (loc for loc in c_j), lowBound=0, cat='Integer')
Before discussing about objective function, we need to discuss input parameters. These parameters provide the necessary data on which the optimization is based, such as costs, capacities, and demands.
Key Input Parameters
Facility-Scale Rent Cost (f_jk
):
- Annual rent cost for each facility location
j
at scale levelk
(e.g., small, medium, large). - This will represent the cost component in the objective function related to choosing a facility and its scale.
As shown in the diagram, we have various scale options for rent, each offering a different combination of rent.
Customer Service Demand (h_i
):
- The annual number of services required for each customer
i
. - This parameter determines the amount of service demand that needs to be met by the facilities.
Travel Cost (d_ij
β):
- The cost per service for an engineer to travel from facility location
j
to customeri
. - This parameter influences the travel cost component in the objective function, affecting which facilities serve which customers.
Engineer Hosting Cost ( cj
):
- The cost for hosting one engineer at each facility location
j
. - This component of the objective function will capture the expense of having engineers stationed at a particular facility.
- Because labor costs vary from one location to another, we need to factor this into our decisions as well.
Maximum Number of Engineers (m_jk
):
- The maximum capacity (number of engineers) that can be hosted at facility
j
depending on its scale levelk
. - This parameter ensures that facilities are not over-allocated beyond their capacity.
Service Capacity per Engineer (s
):
- The number of services that a single engineer can complete in a year.
- This value is crucial for understanding how many engineers are required to meet the customer demand allocated to each facility.
- Let us set s=200
Customer-Facility Reachability (a_ij
β):
- A binary parameter that indicates whether a facility location
j
can serve customeri
(1 if reachable, 0 otherwise). - This parameter adds practical constraints, restricting which facilities can realistically serve each customer.
This matrix provides:
- Regional clustering for flexibility in customer assignment.
- Coverage redundancy where multiple facilities can serve the same customer, allowing for reassignment if needed.
Code for loading input data
import pandas as pd
def clean_numeric(value):
if isinstance(value, str):
try:
return float(value.replace(',', ''))
except ValueError:
return value
return value
# Load CSV files into pandas dataframes
cost_to_host_engineer_location = pd.read_csv('cost_to_host_engineer_location.csv', skipinitialspace=True)
customer_location_reachability = pd.read_csv('customer_location_reachability.csv', skipinitialspace=True)
customer_service_required = pd.read_csv('customer_service_required.csv', skipinitialspace=True)
location_scale_rent = pd.read_csv('location_scale_rent.csv', skipinitialspace=True)
location_to_customer_cost = pd.read_csv('location_to_customer_cost.csv', skipinitialspace=True)
max_num_engineer_hosted = pd.read_csv('max_num_engineer_hosted.csv', skipinitialspace=True)
# Apply the cleaning function to numeric columns only
cost_to_host_engineer_location['Cost for Hosting One Engineer'] = cost_to_host_engineer_location['Cost for Hosting One Engineer'].apply(clean_numeric)
location_scale_rent['Rent'] = location_scale_rent['Rent'].apply(clean_numeric)
numeric_cols = location_to_customer_cost.columns[1:] # Skip the first column (Customer)
location_to_customer_cost[numeric_cols] = location_to_customer_cost[numeric_cols].applymap(clean_numeric)
# Constant: Number of services an engineer can complete per year
s = 200
# Display each dataframe to ensure they are loaded correctly
print("Cost to Host Engineer Location:\n", cost_to_host_engineer_location.head(), "\n")
print("Customer Location Reachability:\n", customer_location_reachability.head(), "\n")
print("Customer Service Required:\n", customer_service_required.head(), "\n")
print("Location Scale Rent:\n", location_scale_rent.head(), "\n")
print("Location to Customer Cost:\n", location_to_customer_cost.head(), "\n")
print("Max Number of Engineers Hosted:\n", max_num_engineer_hosted.head(), "\n")
# Convert each dataframe into a dictionary to use as parameters in the model
# 1. Cost to Host Engineer in each Location (c_j)
# c_j represents the annual cost for hosting one engineer at location j
c_j = dict(zip(cost_to_host_engineer_location['Location'], cost_to_host_engineer_location['Cost for Hosting One Engineer']))
# 2. Customer Location Reachability (a_ij)
# a_ij represents whether customer i can be served by facility j (1 if possible, 0 otherwise)
a_ij = customer_location_reachability.set_index('Customer').T.to_dict()
# 3. Customer Service Requirement (h_i)
# h_i represents the annual number of services required for customer i
h_i = dict(zip(customer_service_required['Customer'], customer_service_required['Annual # of Services Required']))
# 4. Location Scale Rent (f_jk)
# f_jk represents the annual office rent for facility j at scale k (small, medium, large)
f_jk = location_scale_rent.set_index(['Location', 'Scale'])['Rent'].to_dict()
# 5. Location to Customer Cost (d_ij)
# d_ij represents the cost per service for an engineer to travel between facility j and customer i
d_ij = location_to_customer_cost.set_index('Customer').T.to_dict()
# 6. Maximum Number of Engineers Hosted at a location (m_jk)
# m_jk represents the maximum number of engineers that can be hosted at location j with scale level k (small, medium, large)
m_jk = max_num_engineer_hosted.set_index('Location').T.to_dict()
print("Cost to Host Engineer Dictionary (c_j):\n", c_j, "\n")
print("Reachability Dictionary (a_ij):\n", a_ij, "\n")
print("Customer Service Requirement Dictionary (h_i):\n", h_i, "\n")
print("Location Scale Rent Dictionary (f_jk):\n", f_jk, "\n")
print("Location to Customer Cost Dictionary (d_ij):\n", d_ij, "\n")
print("Max Number of Engineers Hosted Dictionary (m_jk):\n", m_jk, "\n")
Summary of input Variables:
1. Facility Costs
- f_jkβ: Annual office rent for facility j at scale k (small, medium, large).
- c_jβ: Cost for hosting one engineer at facility j.
2. Customer Requirements
- h_i: Annual number of services required for customer i(this is demand).
- d_ijβ: Cost per service for an engineer to travel between facility j and customer i.
3. Facility Capacity
- m_jkβ: Maximum number of engineers that can be hosted at facility j with scale k (small, medium, large).
- s: Number of services an engineer can complete per year.
4. Reachability
- a_ij: Reachability indicator for whether customer i can be served by facility j (1 if reachable, 0 otherwise).
Objective Function
The objective is to minimize cost, which can be divided into three main components, each with its own trade-offs as we discussed:
- Office Rent
- Travelling Cost
- Utility Cost
The goal is to minimize the sum of these costs. The first part represents the total office rent cost across all facility locations. The second part represents the total traveling cost for engineers to reach customers. In this second part, we use the cost to serve a customer from location j (dijβ), the number of services required by that customer (hiβ), and yij (a binary variable indicating if facility location j is assigned to that customer). The third part represents the cost of hosting engineers, which depends on the cost per engineer at location j (cjβ) and the number of engineers assigned there (wj).
Objective function Code
# 1. Office rent cost: sum(f_jk * x_jk)
office_rent_cost = pulp.lpSum([x_jk[(loc, scale)] * f_jk[(loc, scale)] for loc, scale in f_jk])
# 2. Traveling cost: sum(h_i * d_ij * y_ij)
traveling_cost = pulp.lpSum([h_i[cust] * d_ij[cust][loc] * y_ij[(cust, loc)]
for cust in h_i for loc in d_ij[cust]])
# 3. Engineer hosting cost: sum(c_j * w_j)
engineer_hosting_cost = pulp.lpSum([c_j[loc] * w_j[loc] for loc in c_j])
# Total cost (objective function)
total_cost = office_rent_cost + traveling_cost + engineer_hosting_cost
# Add the objective function to the problem
problem += total_cost, "Minimize Total Cost"
Constraints and Rules
- Facility Scale Constraint (At Most One Scale)
- This constraint ensures that at each location j, only one facility scale (Small, Medium, or Large) can be chosen. If no scale is chosen, it indicates that the facility is not built.
- Purpose: This is an at most constraint that limits the number of scale levels selected per location. It helps in deciding whether to open a facility and, if so, at which scale.
2. Customer Service Constraint (Linking Constraint)
For each customer i, they can only be served by a facility at location j if that facility has been built. This is a linking constraint. It connects the customer assignment variable yijyijβ with the facility decision variable xjkxjkβ. This constraint enforces that customers can only be assigned to locations where a facility is present.
3. Customer Assignment Constraint
Each customer i must be assigned to exactly one facility that is close enough according to the reachability parameter Aij.
4. Facility Capacity Constraint
The number of engineers wjβ allocated to a facility at location jj must not exceed the facilityβs capacity. This constraint limits the number of engineers at each facility based on the scale chosen for that facility. It prevents over-allocation of resources, ensuring each facility operates within its designed capacity.
5. Service Requirement Constraint
The number of engineers allocated at location j, multiplied by the number of services they can complete in a year, must meet or exceed the total service demand assigned to that facility. This constraint ensures that the facility has enough engineers to handle the customer demand.
Code to implement constraints
# Constraint 1: At most one scale level for each location
for loc in c_j:
problem += pulp.lpSum([x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]) <= 1, f"Max_One_Scale_{loc}"
# Constraint 2: Only a built facility may serve customers
for cust in h_i:
for loc in a_ij[cust]:
problem += y_ij[(cust, loc)] <= pulp.lpSum([x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]), f"Serve_Only_Built_{cust}_{loc}"
# Constraint 3: Each customer must be served by one facility
for cust in h_i:
problem += pulp.lpSum([a_ij[cust][loc] * y_ij[(cust, loc)] for loc in a_ij[cust]]) == 1, f"Serve_Customer_{cust}"
# Ensures that the number of engineers allocated to a facility does not exceed its capacity based on the chosen scale.
for loc in m_jk:
problem += w_j[loc] <= pulp.lpSum([m_jk[loc][scale] * x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]), f"Capacity_Constraint_{loc}"
# Ensures that the allocated engineers can fulfill the service demands assigned to a facility.
for loc in c_j:
problem += s * w_j[loc] >= pulp.lpSum([h_i[cust] * y_ij[(cust, loc)] for cust in h_i]), f"Service_Requirement_Constraint_{loc}"
# Solve the problem
problem.solve()
Output and Optimization Result
In the figure above, we see the outcome after running the optimization model. Each row represents a potential facility location, showing which ones end up operating (and at what scale), as well as how many engineers and customers are allocated there. For example, Boston is chosen at a Small scale with 10 engineers assigned, serving enough customers to deliver 1,900 services. In contrast, some locations, like New York and Philadelphia, are not selected to host any facilities β indicating that shutting them down or not expanding them further is more cost-effective. This reassignment of customers ensures demand is still met, but now with a lower total cost.
Moreover, the modelβs result clarifies how many engineers the company should hire or retain for the coming year. Since we know exactly how many engineers each chosen location needs, the management can make informed hiring decisions, rather than guessing or overstaffing. Ultimately, this data-driven approach aligns facility choices, workforce planning, and customer assignments to minimize overall expenses while still ensuring that the companyβs service obligations are fulfilled.
Code for report
import pandas as pd
# List to store results for each location
results = []
for loc in w_j:
engineers_allocated = int(w_j[loc].varValue) if w_j[loc].varValue is not None else 0
customers_assigned = sum([y_ij[(cust, loc)].varValue for cust in h_i if (cust, loc) in y_ij])
services_assigned = sum([h_i[cust] * y_ij[(cust, loc)].varValue for cust in h_i if (cust, loc) in y_ij])
chosen_scale = "None" # Default value
for scale in ['Small', 'Medium', 'Large']:
if (loc, scale) in x_jk and x_jk[(loc, scale)].varValue == 1:
chosen_scale = scale.capitalize()
break
# Append results to the list
results.append([loc, chosen_scale, engineers_allocated, int(customers_assigned), int(services_assigned)])
results_df = pd.DataFrame(results, columns=["Location", "Scale", "Engineers Allocated", "Customers Assigned", "Services Assigned"])
# Calculate totals and create a DataFrame for the totals row
totals = pd.DataFrame({
"Location": ["Total"],
"Scale": [f"Large {sum(results_df['Scale'] == 'Large')}, Small {sum(results_df['Scale'] == 'Small')}"],
"Engineers Allocated": [results_df["Engineers Allocated"].sum()],
"Customers Assigned": [results_df["Customers Assigned"].sum()],
"Services Assigned": [results_df["Services Assigned"].sum()]
})
results_df = pd.concat([results_df, totals], ignore_index=True)
# Print the results in table form
print(results_df)
Complete code on my repo: https://github.com/saifkheraj/d_s/blob/main/operationsresearch/facility_location.ipynb
References
Facility location problems – Mathematical Optimization: Solving Problems using Gurobi and Python
Todo Adapt figures, check maths Computational experiment comparing formulations Adapt kmedian – seems to be stillβ¦
scipbook.readthedocs.io
Facility Location Problem – Gurobi Optimization
Facility location problems can be commonly found in many industries, including logistics and telecommunications. Inβ¦
www.gurobi.com
Facility location problem – Wikipedia
From Wikipedia, the free encyclopedia A facility location problem is the problem of deciding where a given publicβ¦
en.wikipedia.org
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI