Solving Facility Location

Last Updated on December 15, 2024 by Editorial Team

Author(s): Saif Ali Kheraj

Originally published on Towards AI.

We will start with one important use case to solve a facility location problem. Suppose we have a service-based company named MMM that sells different products to customers across various regions. Every year, MMM faces tough decisions about where to set up new offices and how many engineers to host in each office. The goal is to make these choices in a way that reduces service costs, office rent, and utilities expenses while still meeting all customer demands. By using data science and prescriptive analytics, MMM can carefully analyze costs, capacities, and customer locations to find the best facility setup that saves money and delivers services efficiently.

Figure by Author: Locations where company currently operate

Figure by Author: Cost Breakup by location

For example, when the CEO sees these costs, he may want to optimize them, reducing total expenses while still meeting all demand constraints and making other necessary decisions.

Introduction

Service Cost: This is the direct cost associated with providing after-sales services, such as labor, maintenance, and support services.

New York has the highest service cost at $12,500,000. This is likely due to higher wages and operational costs in a metropolitan area where skilled labor demands a premium.
Other cities, like Baltimore and Richmond, have significantly lower service costs because of lower wage scales and potentially less demand for specialized services.

Office Rent: This cost reflects the expense of leasing office space needed to support operations. New York again stands out with an office rent of $45,000,000, highlighting the high real estate costs in urban centers, particularly in prime locations. Boston and Washington D.C. also show considerable office rent, but significantly lower than New York. In contrast, cities like Virginia Beach and Raleigh have much lower rent costs, making them more affordable locations for businesses.

Utility Fee: Utility costs encompass expenses for electricity, water, heating, and other essential services required to maintain the facility.

City Variations:
The utility fees are generally lower compared to office rents, but still reflect the local cost of utilities. For example, New York has a utility fee of $10,000,000, while Virginia Beach has the lowest at $200,000, indicating that utility costs vary with local energy rates and usage levels.

As a manager or CEO, one must find ways to reduce these high costs. Maybe closing a certain location would save on office rent, but then the service costs might increase. Thus, there’s a natural trade-off between these two factors.

If we have fewer facilities then we will have higher service cost
If we have more facilities, then we will have lower service cost.

Not only that, but there are many other factors involved. For example, if we close one facility, the customers using that facility need to be reassigned elsewhere, and engineers must also be reassigned. The situation also depends on an environment that keeps changing. For instance, in the future a customer might need fewer services, or overall demand could shift. When these changes occur, we simply update our input data to reflect the new conditions.

Mathematical Optimization Process

Main goal is to minimize the total operational cost of setting up and maintaining facilities to serve customers across various locations. This total cost includes:

Office Rent Costs for facilities at different scale levels (small, medium, large).
Traveling Costs for engineers to serve customers from facilities.
Engineer Hosting Costs at each facility, representing the cost of maintaining engineers at these locations.

Decision Variables

Decision Variable depends on:

Facility Presence: Whether to establish a facility in a given location.
Facility Scale: The size and capacity of the facility if established.
Engineer Allocation: How many engineers will be assigned to each facility.
Customer Assignment: Which customers are served by which facilities.

Facility Location and Scale Decision (x_jk): Whether to establish a facility at a particular location j and scale level k. If yes, this variable will be set to 1, otherwise 0.

Figure by Author: Explaining assignment or selection of scale within location

2. Customer Assignment Decision (y_ij): Whether a customer i is assigned to be served by a facility at location j. This binary variable is 1 if the customer is assigned to the facility, and 0 otherwise. We will use linking constraint to ensure a customer i can only be assigned to a facility location j if that facility has been established.

3. Engineer Allocation Decision (w_j): The number of engineers allocated to each facility location j. In the constraints part, we will add capacity constraint as well.

Code to define Decision Variables

import pulp

# Step 1: Initialize the LP problem
problem = pulp.LpProblem("Facility_Location_Problem", pulp.LpMinimize)

# Step 2: Define Decision Variables
# Binary decision variable x_jk: 1 if a facility j is built at scale k, 0 otherwise
x_jk = pulp.LpVariable.dicts("x_jk", ((loc, scale) for loc, scale in f_jk), cat='Binary')

# Binary decision variable y_ij: 1 if customer i is served by facility j, 0 otherwise
y_ij = pulp.LpVariable.dicts("y_ij", ((cust, loc) for cust in a_ij for loc in a_ij[cust]), cat='Binary')

# Engineer hosting variable w_j (number of engineers assigned to location j)
w_j = pulp.LpVariable.dicts("w_j", (loc for loc in c_j), lowBound=0, cat='Integer')

Before discussing about objective function, we need to discuss input parameters. These parameters provide the necessary data on which the optimization is based, such as costs, capacities, and demands.

Key Input Parameters

Facility-Scale Rent Cost (f_jk):

Annual rent cost for each facility location j at scale level k (e.g., small, medium, large).
This will represent the cost component in the objective function related to choosing a facility and its scale.

Figure by Author: All Possible Rent available based on Location and scale (Note: The data shown is partial; the actual dataset contains additional rows and details.)

As shown in the diagram, we have various scale options for rent, each offering a different combination of rent.

Customer Service Demand (h_i):

The annual number of services required for each customer i.
This parameter determines the amount of service demand that needs to be met by the facilities.

Figure by Author: Annual Demand by customers (Note: The data shown is partial; the actual dataset contains additional rows and details.)

Travel Cost (d_ij):

The cost per service for an engineer to travel from facility location j to customer i.
This parameter influences the travel cost component in the objective function, affecting which facilities serve which customers.

Figure by Author: Travel Cost (Note: The data shown is partial; the actual dataset contains additional rows and details.)

Engineer Hosting Cost ( cj):

The cost for hosting one engineer at each facility location j.
This component of the objective function will capture the expense of having engineers stationed at a particular facility.
Because labor costs vary from one location to another, we need to factor this into our decisions as well.

Figure by Author: Cost to host one engineer in a particular location (Note: The data shown is partial; the actual dataset contains additional rows and details.)

Maximum Number of Engineers (m_jk):

The maximum capacity (number of engineers) that can be hosted at facility j depending on its scale level k.
This parameter ensures that facilities are not over-allocated beyond their capacity.

Figure by Author: Maximum number of engineers that can be hosted based on location and scale (Note: The data shown is partial; the actual dataset contains additional rows and details.)

Service Capacity per Engineer (s):

The number of services that a single engineer can complete in a year.
This value is crucial for understanding how many engineers are required to meet the customer demand allocated to each facility.
Let us set s=200

Customer-Facility Reachability (a_ij):

A binary parameter that indicates whether a facility location j can serve customer i (1 if reachable, 0 otherwise).
This parameter adds practical constraints, restricting which facilities can realistically serve each customer.

Figure by Author: Where customer i can by reached by location j (Note: The data shown is partial; the actual dataset contains additional rows and details.)

This matrix provides:

Regional clustering for flexibility in customer assignment.
Coverage redundancy where multiple facilities can serve the same customer, allowing for reassignment if needed.

Code for loading input data

import pandas as pd

def clean_numeric(value):
 if isinstance(value, str):
 try:
 return float(value.replace(',', ''))
 except ValueError:
 return value 
 return value

# Load CSV files into pandas dataframes
cost_to_host_engineer_location = pd.read_csv('cost_to_host_engineer_location.csv', skipinitialspace=True)
customer_location_reachability = pd.read_csv('customer_location_reachability.csv', skipinitialspace=True)
customer_service_required = pd.read_csv('customer_service_required.csv', skipinitialspace=True)
location_scale_rent = pd.read_csv('location_scale_rent.csv', skipinitialspace=True)
location_to_customer_cost = pd.read_csv('location_to_customer_cost.csv', skipinitialspace=True)
max_num_engineer_hosted = pd.read_csv('max_num_engineer_hosted.csv', skipinitialspace=True)

# Apply the cleaning function to numeric columns only
cost_to_host_engineer_location['Cost for Hosting One Engineer'] = cost_to_host_engineer_location['Cost for Hosting One Engineer'].apply(clean_numeric)
location_scale_rent['Rent'] = location_scale_rent['Rent'].apply(clean_numeric)
numeric_cols = location_to_customer_cost.columns[1:] # Skip the first column (Customer)
location_to_customer_cost[numeric_cols] = location_to_customer_cost[numeric_cols].applymap(clean_numeric)

# Constant: Number of services an engineer can complete per year
s = 200 

# Display each dataframe to ensure they are loaded correctly
print("Cost to Host Engineer Location:\n", cost_to_host_engineer_location.head(), "\n")
print("Customer Location Reachability:\n", customer_location_reachability.head(), "\n")
print("Customer Service Required:\n", customer_service_required.head(), "\n")
print("Location Scale Rent:\n", location_scale_rent.head(), "\n")
print("Location to Customer Cost:\n", location_to_customer_cost.head(), "\n")
print("Max Number of Engineers Hosted:\n", max_num_engineer_hosted.head(), "\n")

# Convert each dataframe into a dictionary to use as parameters in the model

# 1. Cost to Host Engineer in each Location (c_j)
# c_j represents the annual cost for hosting one engineer at location j
c_j = dict(zip(cost_to_host_engineer_location['Location'], cost_to_host_engineer_location['Cost for Hosting One Engineer']))

# 2. Customer Location Reachability (a_ij)
# a_ij represents whether customer i can be served by facility j (1 if possible, 0 otherwise)
a_ij = customer_location_reachability.set_index('Customer').T.to_dict()

# 3. Customer Service Requirement (h_i)
# h_i represents the annual number of services required for customer i
h_i = dict(zip(customer_service_required['Customer'], customer_service_required['Annual # of Services Required']))

# 4. Location Scale Rent (f_jk)
# f_jk represents the annual office rent for facility j at scale k (small, medium, large)
f_jk = location_scale_rent.set_index(['Location', 'Scale'])['Rent'].to_dict()

# 5. Location to Customer Cost (d_ij)
# d_ij represents the cost per service for an engineer to travel between facility j and customer i
d_ij = location_to_customer_cost.set_index('Customer').T.to_dict()

# 6. Maximum Number of Engineers Hosted at a location (m_jk)
# m_jk represents the maximum number of engineers that can be hosted at location j with scale level k (small, medium, large)
m_jk = max_num_engineer_hosted.set_index('Location').T.to_dict()

print("Cost to Host Engineer Dictionary (c_j):\n", c_j, "\n")
print("Reachability Dictionary (a_ij):\n", a_ij, "\n")
print("Customer Service Requirement Dictionary (h_i):\n", h_i, "\n")
print("Location Scale Rent Dictionary (f_jk):\n", f_jk, "\n")
print("Location to Customer Cost Dictionary (d_ij):\n", d_ij, "\n")
print("Max Number of Engineers Hosted Dictionary (m_jk):\n", m_jk, "\n")

Summary of input Variables:

1. Facility Costs

f_jk: Annual office rent for facility j at scale k (small, medium, large).
c_j: Cost for hosting one engineer at facility j.

2. Customer Requirements

h_i: Annual number of services required for customer i(this is demand).
d_ij: Cost per service for an engineer to travel between facility j and customer i.

3. Facility Capacity

m_jk: Maximum number of engineers that can be hosted at facility j with scale k (small, medium, large).
s: Number of services an engineer can complete per year.

4. Reachability

a_ij: Reachability indicator for whether customer i can be served by facility j (1 if reachable, 0 otherwise).

Objective Function

The objective is to minimize cost, which can be divided into three main components, each with its own trade-offs as we discussed:

Office Rent
Travelling Cost
Utility Cost

Figure by Author: Objective Function equation to represent total cost

The goal is to minimize the sum of these costs. The first part represents the total office rent cost across all facility locations. The second part represents the total traveling cost for engineers to reach customers. In this second part, we use the cost to serve a customer from location j (dij), the number of services required by that customer (hi), and yij (a binary variable indicating if facility location j is assigned to that customer). The third part represents the cost of hosting engineers, which depends on the cost per engineer at location j (cj) and the number of engineers assigned there (wj).

Objective function Code


# 1. Office rent cost: sum(f_jk * x_jk)
office_rent_cost = pulp.lpSum([x_jk[(loc, scale)] * f_jk[(loc, scale)] for loc, scale in f_jk])

# 2. Traveling cost: sum(h_i * d_ij * y_ij)
traveling_cost = pulp.lpSum([h_i[cust] * d_ij[cust][loc] * y_ij[(cust, loc)]
 for cust in h_i for loc in d_ij[cust]])

# 3. Engineer hosting cost: sum(c_j * w_j)
engineer_hosting_cost = pulp.lpSum([c_j[loc] * w_j[loc] for loc in c_j])

# Total cost (objective function)
total_cost = office_rent_cost + traveling_cost + engineer_hosting_cost

# Add the objective function to the problem
problem += total_cost, "Minimize Total Cost"

Constraints and Rules

Facility Scale Constraint (At Most One Scale)

This constraint ensures that at each location j, only one facility scale (Small, Medium, or Large) can be chosen. If no scale is chosen, it indicates that the facility is not built.
Purpose: This is an at most constraint that limits the number of scale levels selected per location. It helps in deciding whether to open a facility and, if so, at which scale.

Figure by Author: At most one scale constraint

2. Customer Service Constraint (Linking Constraint)

For each customer i, they can only be served by a facility at location j if that facility has been built. This is a linking constraint. It connects the customer assignment variable yijyij with the facility decision variable xjkxjk. This constraint enforces that customers can only be assigned to locations where a facility is present.

Figure by Author: Customer Service Linking Constraint

3. Customer Assignment Constraint

Each customer i must be assigned to exactly one facility that is close enough according to the reachability parameter Aij.

Figure by Author: Customer Assignment Constraint

4. Facility Capacity Constraint

The number of engineers wj allocated to a facility at location jj must not exceed the facility’s capacity. This constraint limits the number of engineers at each facility based on the scale chosen for that facility. It prevents over-allocation of resources, ensuring each facility operates within its designed capacity.

Figure by Author: Facility Capacity Constraint

5. Service Requirement Constraint

The number of engineers allocated at location j, multiplied by the number of services they can complete in a year, must meet or exceed the total service demand assigned to that facility. This constraint ensures that the facility has enough engineers to handle the customer demand.

Figure by Author: Service Requirement Constraint

Code to implement constraints


# Constraint 1: At most one scale level for each location
for loc in c_j:
 problem += pulp.lpSum([x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]) <= 1, f"Max_One_Scale_{loc}"

# Constraint 2: Only a built facility may serve customers
for cust in h_i:
 for loc in a_ij[cust]:
 problem += y_ij[(cust, loc)] <= pulp.lpSum([x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]), f"Serve_Only_Built_{cust}_{loc}"

# Constraint 3: Each customer must be served by one facility
for cust in h_i:
 problem += pulp.lpSum([a_ij[cust][loc] * y_ij[(cust, loc)] for loc in a_ij[cust]]) == 1, f"Serve_Customer_{cust}"

# Ensures that the number of engineers allocated to a facility does not exceed its capacity based on the chosen scale.
for loc in m_jk:
 problem += w_j[loc] <= pulp.lpSum([m_jk[loc][scale] * x_jk[(loc, scale)] for scale in ['Small', 'Medium', 'Large']]), f"Capacity_Constraint_{loc}"

# Ensures that the allocated engineers can fulfill the service demands assigned to a facility.
for loc in c_j:
 problem += s * w_j[loc] >= pulp.lpSum([h_i[cust] * y_ij[(cust, loc)] for cust in h_i]), f"Service_Requirement_Constraint_{loc}"

# Solve the problem
problem.solve()

Output and Optimization Result

Figure by Author: Report after performing optimization

In the figure above, we see the outcome after running the optimization model. Each row represents a potential facility location, showing which ones end up operating (and at what scale), as well as how many engineers and customers are allocated there. For example, Boston is chosen at a Small scale with 10 engineers assigned, serving enough customers to deliver 1,900 services. In contrast, some locations, like New York and Philadelphia, are not selected to host any facilities — indicating that shutting them down or not expanding them further is more cost-effective. This reassignment of customers ensures demand is still met, but now with a lower total cost.

Moreover, the model’s result clarifies how many engineers the company should hire or retain for the coming year. Since we know exactly how many engineers each chosen location needs, the management can make informed hiring decisions, rather than guessing or overstaffing. Ultimately, this data-driven approach aligns facility choices, workforce planning, and customer assignments to minimize overall expenses while still ensuring that the company’s service obligations are fulfilled.

Code for report

import pandas as pd

# List to store results for each location
results = []

for loc in w_j:
 engineers_allocated = int(w_j[loc].varValue) if w_j[loc].varValue is not None else 0
 customers_assigned = sum([y_ij[(cust, loc)].varValue for cust in h_i if (cust, loc) in y_ij])
 services_assigned = sum([h_i[cust] * y_ij[(cust, loc)].varValue for cust in h_i if (cust, loc) in y_ij])

 chosen_scale = "None" # Default value
 for scale in ['Small', 'Medium', 'Large']:
 if (loc, scale) in x_jk and x_jk[(loc, scale)].varValue == 1:
 chosen_scale = scale.capitalize()
 break

 # Append results to the list
 results.append([loc, chosen_scale, engineers_allocated, int(customers_assigned), int(services_assigned)])

results_df = pd.DataFrame(results, columns=["Location", "Scale", "Engineers Allocated", "Customers Assigned", "Services Assigned"])

# Calculate totals and create a DataFrame for the totals row
totals = pd.DataFrame({
 "Location": ["Total"],
 "Scale": [f"Large {sum(results_df['Scale'] == 'Large')}, Small {sum(results_df['Scale'] == 'Small')}"],
 "Engineers Allocated": [results_df["Engineers Allocated"].sum()],
 "Customers Assigned": [results_df["Customers Assigned"].sum()],
 "Services Assigned": [results_df["Services Assigned"].sum()]
})


results_df = pd.concat([results_df, totals], ignore_index=True)

# Print the results in table form
print(results_df)

Complete code on my repo: https://github.com/saifkheraj/d_s/blob/main/operationsresearch/facility_location.ipynb

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Solving Facility Location

Author(s): Saif Ali Kheraj

Introduction

Decision Variables

Code to define Decision Variables

Key Input Parameters

Code for loading input data

Summary of input Variables:

Objective Function

Objective function Code

Constraints and Rules

Code to implement constraints

Output and Optimization Result

Code for report

References

Facility location problems – Mathematical Optimization: Solving Problems using Gurobi and Python

Todo Adapt figures, check maths Computational experiment comparing formulations Adapt kmedian – seems to be still…

Facility Location Problem – Gurobi Optimization

Facility location problems can be commonly found in many industries, including logistics and telecommunications. In…

Facility location problem – Wikipedia

From Wikipedia, the free encyclopedia A facility location problem is the problem of deciding where a given public…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement