Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Mumbai vs. Delhi, What is Your Choice? Let the Data Decide!
Latest   Machine Learning

Mumbai vs. Delhi, What is Your Choice? Let the Data Decide!

Last Updated on July 25, 2023 by Editorial Team

Author(s): Sreelatha S

Originally published on Towards AI.

Data Science Storytelling for comparing two cities U+007C Towards AI

Data analysis of various factors for two cities like food, venues to visit, etc

Photo by Parth Vyas on Unsplash

Mumbai and Delhi are the two most important metro cities in India. There has been a war for supremacy in terms of quality of life, jobs, education, entertainment and recreational facilities that these cities have to offer to its residents. This post elaborates on a data science project that attempts to analyze the neighborhoods in each of these two cities and tries to understand what is popular in them and what they have to offer to someone who is contemplating to make a choice on seeking a life in either of the metro cities.

The deciding factor for most would be on how lively, supportive, vibrant and unique each of the cities can be when compared to each other. The business problem in this study assumes that people who would be interested in this study are those who would like to create a projection of potential life and activities in these metro city neighborhoods if the subject moves to live in one of them. The decision to choose one over the other would depend on popular venues in the neighborhoods in each of these metro cities.

For any “data science project” data is of paramount importance. For this study, we needed data about neighborhoods in each of these metro cities. The data published by the government on postal codes for all India would serve us well for this study. We will specifically download the CSV provided under https://data.gov.in/resources/all-india-pincode-directory-contact-details-along-latitude-and-longitude.

In this study, we will download the CSV, read it into a pandas Dataframe and curate it to remove the data related to all other cities, towns, and places which are not Mumbai or Delhi, since we are only interested in comparing these two biggest metro cities in India.

We shall then clean up the unnecessary columns in the CSV, which is not relevant or useful for our current study. Post office names (office name) will be used as the neighborhood names in each of the regions such as Mumbai or Delhi.

Neighborhood names with the same Pincode will be combined as a single row.

Foursquare API will be used to find the longitude and latitude of each of the neighborhoods in both Mumbai and Delhi. This will form the dataset we will use for this study.

The first few records of the dataset we now have after cleanup and curation appear as below.

Dataset after clean up and curation

We now see that there are the same Pincode values for different neighborhoods. The next step is to combine the rows having the same Pincode, we do this by changing the value of the neighborhood by building a comma-separated concatenation of neighborhood values for rows with the same Pincode.

We also notice that the longitude and latitude values from the CSV data are NaN, which means that we do not have relevant data, we can drop these columns from the dataset as well. We now have the neighborhoods for both the metro cities.

First 5 rows of the dataset
Last 5 rows of the dataset

The next step is to enhance the dataset with the required information. We would need the longitude and latitude values for the neighborhoods. We will use the Nominatim library from geocoders.geopy package to find the longitude and latitude for each of the neighborhoods and would eventually create a dataset having all the necessary columns for our analysis.

Longitude and Latitude values added to neighborhoods in the dataset

We now have the necessary information to visualize the neighborhoods for both the cities on a folium map.

Neighborhoods in Mumbai and Delhi plotted on a map

Analyzing the neighborhoods

Finding top venues near Mumbai neighborhoods

We will use the Foursquare API to find the top venues in the neighbourhoods of Mumbai. This will help us understand the nature of life Mumbai neighborhoods have to offer. We will iteratively make Foursquare API calls for each of the Mumbai neighborhoods in our dataset. For illustration purposes, we will look at venues close to one of the neighborhoods in Mumbai, which is Bazargate, Elephanta Caves Po, M.P.T., Stock Exchange, Tajmahal, Town Hall (Mumbai), Foursquare API returns the popular venues within 500m radius of this neighborhood.

Top venues close to one of the Mumbai neighborhoods

Next, we will employ statistically and analytical methods to find the unique venues/venue categories in the Mumbai neighborhoods and we will build a Dataframe that calibrates each of the neighborhoods with the frequency of occurrence for each of the venue category.

From our analysis, we see that there are 116 unique venue categories in Mumbai neighbourhoods. Yoga studios, Indian, Chinese, Thai, American, Spanish, Mediterranean, Deli restaurants, Burger joints, Tea shops, Cafes, Concert halls, theatres, Boutiques, Bowling Alleys, Bars, Flea markets, Harbors, Gourmet shops, Nigh Clubs, Pubs, Bagel shops, Pharmacies and Spas being some of them.

We then create a dataset that lists the top 5 common venues against each of the neighbourhoods in Mumbai. We get a representation such as below for all the neighbourhoods in Mumbai.

Top 5 common venues around each of the Mumbai neighbourhoods

Cluster the neighbourhoods in Mumbai based on the similarity of top common venues

Given that we now have the required information regarding the top venues against each of the neighborhoods in Mumbai, let us now apply a clustering algorithm to group the neighborhoods based on the similarity in types of venues they have. By clustering, we also provide information to users on a common type of neighbourhoods in Mumbai. We will use the k-Means clustering approach to cluster the neighbourhoods. k will be selected as 5. This means that we will group the neighborhoods into 5 clusters. Each of the neighborhoods gets a Cluster Label assigned.

Neighborhoods with Cluster Labels assigned

We will then use the dataset with cluster labels assigned to visualize the clusters in a folium map.

Clusters of neighbourhoods in Mumbai

A piece of important information this map provides is that many neighborhoods in Mumbai are of similar nature concerning the venues they have around, indicated by the cluster marked in blue.

Let us now dig a little deeper into how the neighborhoods are clustered and what is the characteristic of the cluster that is very common across most neighborhoods in Mumbai.

Cluster Label 0

The neighborhoods belonging to this cluster is popular for having Indian restaurants, Cafes, markets, and vegetarian joints. We see that this neighborhood would be something that a subsection of Indians would prefer if they want a scaled-down lifestyle with close to home vegetarian food.

Cluster Label 1

The neighborhoods belonging to this cluster is popular for having Indian restaurants, Irani Cafes, Cafes, Seafood, and fast food joints. We see that this neighborhood would be something that would be interesting to those who would like Seafood, fast food, probably these neighborhoods are also of interest to those who come from Iran and would like to visit places serving their kind of food.

Cluster Label 2

The neighborhoods belonging to this cluster is popular for having a mix of Indian and Chinese restaurants, Train stations, Pubs, Bus stations, Bakeries, etc. We see that this neighborhood would be something that would be interesting to those who depend more on the public commute since these neighborhoods are closer to train and bus stations. Also, these neighborhoods may interest people who have diverse food choices starting from Indian, Asian, Chinese, Afghan to having Snacks, Sandwich, Ice-cream shops. These neighborhoods also provide for some recreational places such as Gyms, Parks, Bowling Alleys, Theatres, and Harbours.

Cluster Label 3

Very few neighborhoods belong to this cluster, making this unique. The main attraction in this neighborhood seems to be its proximity to Theme Park, Pizza place and Cocktail bars.

Cluster Label 4

Again very few neighborhoods belong to this cluster, making this unique. The main attraction in this neighborhood seems to be its proximity to Ferry and College Auditorium.

Since, the objective of this study is to compare the neighborhoods between the two metro cities of Mumbai and Delhi, and not really to compare neighborhoods within Mumbai, we will put forth our conclusion from the study after doing a similar analysis on the neighborhoods in Delhi.

Finding top venues near Delhi neighbourhoods

We will use the Foursquare API to find the top venues in the neighborhoods of Delhi. This will help us in understanding the nature of life Delhi neighborhoods have to offer. We will iteratively make Foursquare API call for each of the Delhi neighborhoods in our dataset. For illustration purpose, we will look at venues close to one of the neighborhoods in Delhi, which is Sansad Marg, Sansadiya South, Secretariat North, Shastri Bhawan, Supreme Court, New Delhi G.P.O., Foursquare API returns the following response as the popular venues close to 500m radius of this neighborhood.

Top venues closest to one of the neighborhoods in Delhi

Next, we will employ statistically and analytical methods to find the unique venues/venue categories in the Delhi neighbourhoods and will build a Dataframe that calibrates each of the neighbourhoods with the frequency of occurrence of each of the venue category

From our analysis, we see that there are 14 unique venue categories in Delhi neighbourhoods. ATMs, Arts and Crafts stores, Burger Joints, Cafes, Gardens, Gyms, Multiplexes, Museums, Pizza places, Indian restaurants, Shopping malls, Water Parks, Gardens and Hotels being some of them.

We then create a dataset that lists the top 5 common venues against each of the neighborhoods in Delhi. We get a representation such as below for all the neighborhoods in Delhi.

Top 5 common venues in the neighbourhoods of Delhi

Cluster the neighborhoods in Delhi based on the similarity of top common venues

Given that we also have the required information regarding the top venues against each of the neighborhoods in Delhi, let us now apply a clustering algorithm to group the neighborhoods based on the similarity in types of venues they have. By clustering, we also provide information to users on a common type of neighborhood in Delhi. We will use the k-Means clustering approach to cluster the neighbourhoods. k will be selected as 5. This means that we will group the neighborhoods into 5 clusters. Each of the neighborhoods gets a Cluster Label assigned.

Delhi neighborhoods and venues with Cluster Label assigned

We will then use the dataset with cluster labels assigned to visualize the clusters in the folium map.

Delhi neighborhoods clustered

A piece of important information this map provides is that the neighborhoods in Delhi are of diverse nature concerning the venues they have around, indicated by the clusters marked in different colors. Also, we did see earlier that we did not have too many venue categories for the neighbourhoods that were returned for the neighbourhoods in Delhi.

Let us now dig a little deeper into how the neighborhoods are clustered and what is their characteristic.

Cluster Label 0

There are close to 93 neighbourhoods belonging to this cluster type. This cluster is popular for having Arts and Crafts stores, Water Parks, Shopping malls and Museums. These neighborhoods are not good for foodies. However, this should be good for those who have children, since the venues close to these neighborhoods are great to keep the children engaged.

Cluster Label 1

Not many neighborhoods belong to this cluster, Multiplexes, department stores, and Gyms seem to be popular venues close to the neighborhood in this cluster.

Cluster Label 2

Not many neighborhoods belong to this cluster, ATMs, Water Parks and Museums seem to be popular venues close to the neighborhood in this cluster.

Cluster Label 3

Not many neighborhoods belong to this cluster, Pizza places, Water Parks and Museums seem to be popular venues close to the neighborhood in this cluster.

Cluster Label 4

Not many neighborhoods belong to this cluster, Museums, Shopping malls and Gardens seem to be popular venues close to the neighborhood in this cluster.

Study findings & conclusion

In this project, we have attempted to load the dataset for two of India’s prime metro cities and have tried to analyze the neighborhood regions in these metro cities based on the type of popular and top venues they have. We have clustered the neighborhoods based on the most common top venues in each of the neighborhood. Our intention with this project was to analyze and understand the difference in the type of life in these metros, which can offer decision points for anybody who is considering to settle in either of the metro cities and can get a peek into what type of experience and facilities he will be provided with.

Given our cluster information for both Mumbai and Delhi, we see that Mumbai and its neighbourhoods are a great place for a foodie. There are a lot of restaurants, cafes, bars, etc in Mumbai neighbourhoods. Also due to the proximity of Mumbai to the seashore, Mumbai neighborhoods offer for harbors, seafood, boat, and ferry rides. On the other hand, we see how dissimilar life in Delhi neighbourhoods would be compared to Mumbai neighbourhoods. Delhi neighborhoods and good for those who like Arts and Crafts, Museums, Water Parks and Pizza places. There is very less in terms of foreign cuisine restaurants in Delhi. Mumbai, on the other hand, is great for international visitors, expats, etc, because of the variety and types of food outlets it has. Delhi is inland and its neighborhoods have proximity to Water Parks, Museums and Arts, and Crafts stores.

Thus with this project, we have analyzed the kind of life each of these big metro cities has to offer based on the popular venues in their neighborhood.

Mumbai would be the choice if you are a foodie!

Another important aspect the study reveals is that the categories of venues Mumbai offers are far too many compared to Delhi. This means that Delhi becomes restrictive in terms of variety and convenience. With the data, we have studied Mumbai wins this battle of metros!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓