Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Geolocation Data Analysis of Lagos.
Latest   Machine Learning

Geolocation Data Analysis of Lagos.

Last Updated on July 20, 2023 by Editorial Team

Author(s): Lawrence Alaso Krukrubo

Originally published on Towards AI.

Using EDA and Machine Learning to find prime office locations in Lagos

The problem we’re going to solve using Geolocation data analysis and Machine Learning, is helping a new Tech Start-Up find the ideal location for its office in The City of Lagos, Nigeria.

Geolocation Data Analysis U+007C Img_Credit

A. Introduction

A.1 Background:

I’ve worked in Lagos off and on, the longest stretch was three years from 2006 to 2009. No doubts, Lagos is an amazing place. I call it The real School-of-Hard-Knocks… A place where competition is fiercest and survival, for the fittest.

I’ve done some traveling across Europe and Africa, my last destination was Mauritius, where I went primarily to walk free with the Lions at Casela… By the way, Mauritius is The wealthiest country in Africa with average wealth per person increasing from $(21700–25700) within one year recently. link

This article explores Lagos state, which is the 4th wealthiest city in Africa, fourth place behind Johannesburg, Cairo and Cape Town (link).
Lagos is home to over $(6,800 millionaires, 370 multi-millionaires and 4 billionaires).
It is also the commercial capital of Nigeria (The Most populous black nation), and the commercial hub of West Africa.
I have only recently returned to Lagos city, got enrolled and graduated from The Founder Institute accelerator program and currently building an InsureTech Start-up, with a Co-founder.
Therefore I want to use this opportunity to help future Start-ups find the most ideal location in Lagos city, for an office.

Amongst other attributes, the ideal office location for a Start-up should have:

  1. Proximity to Tech-Hubs and talents.
  2. High feet traffic for easy interaction with potential customers.
  3. Proximity to educational institutions for research and development.
  4. Nearness to cafes and restaurants for business meetings or lunch-meets.
  5. Nearness to bus-stops, seaports, and airports.
  6. Security and safety.
  7. A cluster of economic activities and complementing businesses.

A.2 Data Description

The Data U+007C Img_credit

The Dataset is the Wikipedia page of Lagos state, see the link.

We shall explore Lagos city through its respective Local Government Areas (LGA) or Boroughs. The above link is a web page that shows the respective LGAs in Lagos State and each population figure.
This data will be analyzed through the following steps:-

  1. We shall scrape the web page using the beautiful soup library.
  2. We shall use The Foursquare API calls to retrieve geolocation data.
  3. We shall fetch the text data using the requests library.
  4. We shall convert it from JSON to Pandas DataFrame using the json_normalize module.
  5. We shall use The folium library to render the maps and plot these via The Matplotlib library.
  6. We shall cluster the venue categories per LGA using the Kmeans algorithm.
  7. Then we’d explore respective LGAs to find the top LGA for a Startup to cite an office.
  8. Finally, to add some fun, we shall use the word-cloud library to display the names of the top categories of venues in Lagos.

B. Methodology

The Methodology U+007C Img_credit

First, let’s import the required libraries.

from bs4 import BeautifulSoup
import requests
# library to handle requests.
import pandas as pd
import json
# library to handle JSON files.
from pandas.io.json import json_normalize # transform json files to pandas dataframes.
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values.
# Matplotlib and associated plotting modules.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means for clustering stage.
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
import numpy as np
import csv
print('All modules imported')

B.1 Scraping The Web Page Data:

Let’s save the weblink for The Lagos Data.

lagos_link = 'https://en.wikipedia.org/wiki/List_of_Lagos_State_local_government_areas_by_population'

Get the source code HTML data from the website.

source = requests.get(lagos_link).text

Lets Use BeautifulSoup to parse it.

soup = BeautifulSoup(source,'lxml')# Let's pretty print it in its right shape.print(soup.prettify())
# prettify prints the web page with tags hierarchies maintained.

Next, let’s get the table that contains the data we want to scrape from.

my_table = soup.find_all('td')# Since the table data is between the <td> tags of the web page.

Next, let’s iterate through each link of my_table and append the text parts.

table_text = []for data in my_table:
table_text.append(data.text)

So let’s extract only the relevant data from the table.

relevant_table_data = table_text[4:-3]
# relevant data is from the 4th element up to the third to the last element of the table_text list.

B.2 Creating The Lagos DataFrame:

Img_Credit

First, Let's create a dictionary and append the LGA and corresponding Population data to it.

table_dict={’LGA’:[], 'POP’:[]}
count = 0

for item in relevant_table_data:
# First let’s strip off the \n at the end.
item = item.strip(’\n’)
try:
item = int(item)
except:

# if second item after the int, append to POP.
if count > 0:
#let’s remove the commas.
item = item.replace(’,’,’’)
# Next let’s convert to an integer so we can use it for calculations.
item = int(item)
# Finally, let’s append it to the Population list of the dictionary.
table_dict[’POP’].append(item)
count = 0

else:
# if first item after the int, append to LGA.
table_dict[’LGA’].append(item)
count +=1

Let’s create a DataFrame of each LGA and its respective Population from the table_dict dictionary above.

lagos_df = pd.DataFrame(table_dict)
# Let's see the first five rows.
lagos_df.head()
The first five rows of Lagos_df

Appending the Latitude and Longitude Values:
Let’s define a simple method to extract each LGA Lat and Long data.

def latitude_longitude(Borough):
import time
""" Method takes a Series object and returns
a list of Latitude and corresponding Longitude data,using the geopy library.This method also prints out the coordinate data"""
address = str(Borough)
# We must define a geolocator user agent.
geolocator = Nominatim(user_agent="NG_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of {} are lat {} and long {}.'.format(address, latitude, longitude))time.sleep(2) # we let 2 secs pass after calling each location lat/lon so that the geocode function does not crash.return [latitude, longitude]

Let’s append the latitude and longitude data as a column using apply

lagos_df['latitude'] = lagos_df['LGA'].apply(latitude_longitude)

Let’s see the DataFrame again,

The latitude column contains a list of both latitude and longitude values, lets separate these.

Let’s loop through the DataFrame and separate the latitude and longitude columns.

lon_list = []for i, j in lagos_df.iterrows():
lon_list.append(j.latitude[1])
lagos_df.iat[i,2] = j.latitude[0]

# next let's assign the lon_list as the value of the Longitude Column.
lagos_df['longitude'] = lon_list

Let’s see the first five rows again,

Now both latitude and longitude are distinct columns of numbers, not list items.

B.3 Visualizing Lagos State LGAs:

Map of Lagos State, showing the LGAs in blue-red circles.

Let’s Create the above map of Lagos State with LGAs super-imposed on top.

address = 'Lagos'
geolocator = Nominatim(user_agent="NG_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# Render map using the Folium Library.
map_lagos_state = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map.for lat, lng, LGA in zip(lagos_df['latitude'], lagos_df['longitude'], lagos_df['LGA']):
label = '{}'.format(LGA)
label = folium.Popup(label, parse_html=True) folium.CircleMarker([lat, lng],radius=6,popup=label,color='blue',
fill=True,fill_color='red',fill_opacity=0.7,parse_html=False).add_to(map_lagos_state)
map_lagos_state

B.4 Using The Foursquare API calls:

We shall use the Foursquare API calls to retrieve geolocation info about the venues in each LGA. First, we sign up at https://developer.foursquare.com/ and create a free account complete with CLIENT_ID and CLIENT_SECRET parameters for making valid API calls.

Next, let’s make a copy of the original lagos_df DataFrame, but this time we will set its index to be the names of The LGAs to make indexing easier.

# First let's make a copy of the lagos_df.copy_lagos_df = lagos_df.copy(deep=True) # deep=True means if we make adjustments to this copy, the original won't be affected and vice versa.
# Then let's make LGAs the index.
copy_lagos_df.index = copy_lagos_df.LGA
# Let's drop the LGA Column.
copy_lagos_df = copy_lagos_df.drop('LGA', axis=1)
# Lets view the effect.
copy_lagos_df.head()
copy_lagos_df DataFrame with LGA as index items.

Next thing we need to do is create a method called return_venues, that takes each LGA address, converts it to a latitude and longitude list using the Nominatim module and then we use the Foursquare API calls to retrieve the top 200 categories of venues available in a 10 KM radius around each LGA.

We repeat for each LGA and save the unique list of venue categories to a variable, using a simple method called store_venue_categories, defined below.

def store_venue_categories(df):venue_set = set()for lga in df.index: df_lga = return_venues(lga)
# Converting each category name to lower case using a list comprehension.
df_lga['venue.categories'] = [x.lower() for x in list(df_lga['venue.categories'])]
# Add each converted name to a set to avoid any possible duplicates.
for category in df_lga['venue.categories']:
venue_set.add(category)

# Finally return a list of all venue categories in our DataFrame.
return list(venue_set)

Next, we pass the copy_lagos_df DataFrame as a parameter to the method above and assign the result to a variable that stores all existing venue categories in Lagos State.

all_venue_categories = store_venue_categories(copy_lagos_df)# Let's see howmany venues we have altogether
print(len(all_venue_categories))
>>
101
# so we have 101 different categories of venues altogether.

C. The Results

The Results… U+007C Img_credit

C.1 Clustering Lagos State Venues Using the KMeans algorithm:

Let’s create about 5 Clusters from Lagos State and choose the most viable cluster in terms of available venue categories. Then out of this cluster, we shall select the top LGA based on certain parameters we would define.

First, we create a method called getNearbyVenues, that returns a DataFrame consisting of every venue detail in each LGA, all attached together. We pass this method the list of each LGA name and corresponding latitude and longitude data as parameters and save the output Dataframe in a variable called lagos_state_venues.

lagos_state_venues = getNearbyVenues(names= lagos_df['LGA'],latitudes= lagos_df['latitude'],longitudes= lagos_df['longitude'])# Let's see the shape and first ten rows of lagos_state_venues
Showing first 5 and last 5 rows of all 1089 returned venues in Lagos State across all LGAs.

Next, we one-hot encode the venue_category column of the above DataFrame, and then we group by LGA, taking the mean frequency occurrence of each venue category per LGA.

lagos_grouped = lagos_onehot.groupby('LGA').mean().reset_index()

With this grouped DataFrame showing the mean occurrence of each venue category per LGA, we would create our clustering algorithm.

# set number of clusterskclusters = 5lagos_grouped_clustering = lagos_grouped.drop('LGA', 1)# run k-means clusteringkmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lagos_grouped_clustering)# check cluster labels generated for each row in the dataframe
kmeans.labels_
The various clusters of Lagos venues. Clearly the cluster of red circles is the dominant or ideal cluster.

Let’s inspect the clusters, let’s check the shape of each cluster.

cluster1 = lagos_merged[lagos_merged['Cluster Labels'] == 0]
cluster2 = lagos_merged[lagos_merged['Cluster Labels'] == 1]
cluster3 = lagos_merged[lagos_merged['Cluster Labels'] == 2]
cluster4 = lagos_merged[lagos_merged['Cluster Labels'] == 3]
cluster5 = lagos_merged[lagos_merged['Cluster Labels'] == 4]
for i in range(5): x = lagos_merged[lagos_merged['Cluster Labels'] == i]
print('cluster'+str(i+1) + ' shape is {}'.format(x.shape))
>>
Cluster1 shape is (16, 14)
Cluster2 shape is (1, 14)
Cluster3 shape is (1, 14)
Cluster4 shape is (1, 14)
Cluster5 shape is (1, 14)
# cluster_one is the dominant cluster with 16 LGAs

Let’s see the LGAs listed in cluster1:-

cluster1_lgas = list(cluster1['LGA'])
print(cluster1_lgas)
>>
['Alimosho', 'Ajeromi-Ifelodun', 'Kosofe', 'Mushin, Lagos', 'Oshodi-Isolo', 'Ikorodu', 'Surulere', 'Agege', 'Ifako-Ijaiye', 'Somolu', 'Amuwo-Odofin', 'Lagos Mainland', 'Ikeja', 'Eti-Osa', 'Apapa', 'Lagos Island']

C.2 Finding The Most-Ideal LGAs to Cite an Office in Lagos:

let’s create a master DataFrame with all venue categories as the index and the names of each LGA as the column headers.

It will make sense to group similar venue categories into the same list. For example ‘eateries’ and ‘fast-food-restaurants’ should be on one list.

super_list = [continental_restaurants, eateries, coffee_shops, gas_stations, ice_cream_n_confectionery, hotels_resorts_spas, bars_n_lounges,airport, bus_stations, heliport, it_services_n_hubs, shopping_malls_n_stores, gym_sports_facilities_games, markets, fashion_n_clothing,arts_studios_galleries, convention_centers, halls_events_venues, night_clubs, beach, parks, cinemas, auto_services, electronics_shop,residences, other_services]

Let’s confirm how many list categories we now have:

len(super_list)
>>
26

# We have reduced 101 categories to 26 categories of similar venues

So Let’s continue creating a master DataFrame with the venue categories as the index and the names of LGAs as the column headers.

# Let's make the venue categories the index of the new DataFramedata_index = ['continental_restaurants', 'eateries', 'coffee_shops', 'gas_stations', 'ice_cream_n_confectionery', 'hotels_resorts_spas', 'bars_n_lounges','airport', 'bus_stations', 'heliport', 'it_services_n_hubs', 'shopping_malls_n_stores', 'gym_sports_facilities_games', 'markets', 'fashion_n_clothing','arts_studios_galleries', 'convention_centers', 'halls_events_venues', 'night_clubs', 'beach', 'parks', 'cinemas', 'auto_services', 'electronics_shop','residences', 'other_services']# Let's create a list of LGAs to be the columns of our new DataFrame
data_columns = list(copy_lagos_df.index)
# Let's create a new DataFrame
summary_df = pd.DataFrame(index= data_index, columns= data_columns)
# Next let’s replace any possible NaN values with 0
summary_df.fillna(0, inplace=True)
# Let's see the first 5 rows
summary_df.head()
summary_df DataFrame with venue categories as index and LGA names as columns.

Now let’s create a method called update_lga_category_values that iterates through each LGA venue categories and adds the number of venues in each category per LGA to the summary_df, DataFrame above.

# Let's update the summary_df DataFrame with the number of venues per category, per LGA, using the update_lga_category_values method above. 
summary_df = update_lga_category_values(copy_lagos_df, summary_df)
# Let's see the first five rows of the updated summary_df DataFrame
summary_df.head()
Showing the first five rows of number of venue categories per LGA

Now, out of these 26 broad categories of venues per LGA, let’s select our top 20 categories and sort the LGAs, to see which one has the most widespread categories of venues and the highest number of venues combined. we shall save these in a list called top_criteria seen below.

top_criteria=['hotels_resorts_spas','airport','it_services_n_hubs','gym_sports_facilities_games','shopping_malls_n_stores','coffee_shops','markets','bars_n_lounges','arts_studios_galleries','halls_events_venues','continental_restaurants','eateries','gas_stations','cinemas','residences','auto_services','convention_centers','parks','electronics_shop','night_clubs']# Let's slice out these top-criteria only to a new DataFrame
top_criteria_df = summary_df.loc[top_criteria,]
# Let's add a total or sum row at the base
top_criteria_df.loc['Total Venues'] = top_criteria_df.sum()
# Let's view the first five rows
First five rows of the top_criteria DataFrame

From the result of our analysis, Lagos Mainland LGA has the least number of absent top categories of venues (4), while having the highest number of available venues(95). Next to it is Ikeja LGA, which has the same number of absent top categories of venues(4).

See table below showing the Top 5 LGAs to cite an office in Lagos:-

Top 5 LGAs to cite an office in Lagos State.

It is also no coincidence that from the clustering exercise we did above, all Top 5 LGAs are from the dominant cluster1.

These are Lagos Mainland, Ikeja, Apapa, Oshodi-Isolo and Surulere.

C.3 Top 10 Most-Common Venue Categories per LGA in Lagos State:

With a few simple lines of code, we can show the Top 10 most common venue types per LGA in Lagos State. This can be very useful for event planning or building a solution for a cluster of similar businesses. You can see the codes in my notebook in Github.

Below is a snapshot of the top 10 most common venue categories per LGA.

Top 10 most common venue categories per LGA in Lagos State.

D. Discussion

All models are wrong, but some are useful… George Edward Pelham

Taking a cue from George Pelham, this is by no means an exhaustive or unerring analysis of Lagos State. We simply used the data from the Wikipedia page of Lagos. We scraped this data and applied Machine Learning and Exploratory Data Analysis(EDA) in line with certain parameters we created, to arrive at a plausible result concerning the most ideal location to cite an office in Lagos, which is The Lagos Mainland LGA.

D.1 Lagos Mainland LGA:

Lagos Mainland was founded by Chief Olofin and is peopled by the same group as found in Lagos especially the Egbas and Aworis.
Lagos Mainland developed from settlements such as Ebute-Metta, Ido-Otto, and Ijora. See link

The communities of Lagos Mainland include:

Yaba,
Ebute-Metta,
Iddo-Otto,
Iwaya,
Akoka,
Makoko,
Abule-nla.

Yaba is a beehive of Tech activities, with the presence of Tech partners and co-working services such as Co-creation hub, NG Hub from Facebook, Hub One, Yaba ICT Hub, Pancake Hub, LeadSpace , Mindthegap Incubation Hub, and a bunch of others.

Lagos Mainland LGA, showing some venues.

E. The Conclusion

In this article, we’ve explored a bit of Lagos City and we’ve helped any Start-up looking to relocate or expand to Lagos to find the ideal locations for its offices.

The entire analysis backing this article is available in my GitHub repo.

Finally, let’s see a word cloud object showing the most common venue categories all across Lagos State.

Ladies and Gentlemen, the most common venues in Lagos City are… You guessed right!… Hotels and Bars!!

Cheers…

About Me:

Lawrence is a Data Specialist at Tech Layer, passionate about fair and explainable AI and Data Science. I hold both the Data Science Professional and Advanced Data Science Professional certifications from IBM. I have conducted several projects using ML and DL libraries, I love to code up my functions as much as possible even when existing libraries abound. Finally, I never stop learning and experimenting and yes, I hold several Data Science and AI certifications and I have written several highly recommended articles.

Feel free to find me on:-

Github

Linkedin

Twitter

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓