Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Top 10 Python libraries to work with geo-spatial data in 2022
Latest   Machine Learning

Top 10 Python libraries to work with geo-spatial data in 2022

Last Updated on July 26, 2023 by Editorial Team

Author(s): Diletta Goglia

Originally published on Towards AI.

Image from TDS Editors

Geographic Data Science is a new emerging field in computational disciplines and applications. Nowadays, a good part of the data in the real world has to do with a spatial component (e.g. population stocks, trading data, climate-changing, viruses spread, etc…). Knowing how to retrieve, gather, clean and integrate geographic features into datasets is therefore necessary.

Whether or not you have some skills in geoprocessing or geocomputation, today I suggest you the best Python libraries to work, manipulate, and interact with geo-spatial data. I will also include all the links needed to find both the packages and their documentation, as well as the code to run for the installation.

Let’s get started!

1.” PyCountry”, for countries standard notation.

pycountry

pycountry provides the ISO databases for the standards: The package includes a copy from Debian's pkg-isocodes and…

pypi.org

GitHub – flyingcircusio/pycountry: A Python library to access ISO country, subdivision, language…

pycountry provides the ISO databases for the standards: The package includes a copy from Debian's pkg-isocodes and…

github.com

PyCountry module provides the following databases for the ISO standards:

  • ISO 3166 Countries, including:
    – ISO 3166–3 Deleted countries
    – ISO 3166–2 Subdivisions of countries
  • ISO 4217 Currencies
  • ISO 639–3 Languages (codes for natural languages)
  • ISO 15924 Scripts (codes for writing systems)

I have found this package particularly useful to retrieve countries names from alpha 2 and alpha 3 ISO codes (and viceversa).

I personally find it very useful also for those who want to produce new data (or clean up existing ones) following an official notation, so that it is easier for other developers to replicate and integrate both scripts and datasets.

Installation:

pip install pycountry 

Moreover, I mark an available pycountry extension which is pycountry-convert, which provides quick and easy conversion functions between country names and codes, and also continents.

Installation:

pip install pycountry-convert

In case you have not experience with geocoding and you want to try this fantastic package, I leave you here an embed of my example made in JuPyter notebook. I hope you will find it useful!

PyCountry example of usage — script by Author

2. “GeoPy”, for geographic coordinates.

geopy

geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers to…

pypi.org

Welcome to GeoPy's documentation! – GeoPy 2.2.0 documentation

Documentation https://geopy.readthedocs.io/ Source Code https://github.com/geopy/geopy Stack Overflow…

geopy.readthedocs.io

GeoPy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe.

This module allows you to retrieve latitude and longitude points, but also getting postal codes, distances, and even more! Among the various libraries to work with geo-data, this is definitely one of the most popular and famous.

It also includes OpenStreetMap Nominatim and Google Geocoding API (V3) as available geocoding services .

Installation:

pip install geopy

3. “Geo-Py”, to work with points in space.

(Yes, it is different from the former!)

geo-py

Functions onto sphere geo.sphere.approximate_distance Approximate calculation distance (expanding the trigonometric…

pypi.org

Geo-Py module provides algorithms and structures related to geodesy and geospatial data useful for computing distances (by choosing among different formulas), destinations and bearing, given some points in space.

Installation:

pip install geo-py

4. “Reverse Geocoder”, for… guess what? Reverse geocoding!

reverse_geocoder

Reverse Geocoder takes a latitude / longitude coordinate and returns the nearest town/city. This library improves on an…

pypi.org

As you can guess from the name, reverse_geocoder package takes a (latitude, longitude) coordinate and returns the nearest city, together with its respective country and administrative 1 and 2 regions.

Hence it could be very helpful for you to deal with different NUTS level.

Installation:

pip install reverse_geocoder

5. “GeoPandas”, for geometric operations.

geopandas

GeoPandas is a project to add support for geographic data to pandas objects. The goal of GeoPandas is to make working…

pypi.org

GeoPandas 0.10.2+0.g04d377f.dirty – GeoPandas 0.10.2+0.g04d377f.dirty documentation

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the…

geopandas.org

GeoPandas extends the datatypes used by Pandas to allow spatial operations on geometric types.

It is a popular open source project built to simplify work with geospatial data in Python by including some new structures that perfectly resembles those of Pandas, such as GeoDataFrame and GeoSeries.

Example of geoplot (image form GeoPandas)

Generally, it is very famous and used for plotting, as it allows to easily generate (even complex and interactive) charts and maps from geometric data. This is possible through specifically designed tools and data structures for spatial manipulations .

Installation:

pip install geopandas

6. “DataPrep”, for cleaning and integration.

DataPrep – The easiest way to prepare data in Python

DataPrep is designed and optimized for computational notebooks, the most popular environment among data scientists

dataprep.ai

This library has not been created specifically to work with geo-spatial data, but more in general to collect, integrate, clean and standardize any kind of data. The idea is that you can use a single module like this one to prepare your dataset with a few lines of code, and start working on it in no time.

However, I found it very useful to be exploited on geographic data especially for some helpful functions to validate coordinates, and a very nice module to retrieve information about each country, clean them and standardize w.r.t. ISO 3166.

It’s such a nice toy, I recommend it to every Data Scientist!

Installation:

pip install -U dataprep

7. “CountryInfo”, for anything you need!

countryinfo

A python module for returning data about countries, ISO info and states/provinces within them. To access one of the…

pypi.org

This module can help you getting literally any information you may need about countries! It has more than 20 APIs that returns data about area (in square kilometers), capital cities, regions, list of bordering countries and much more!

Unfortunately, I found it to be very (very!) slow.

I tried to use it many times but my code ended up taking tens of minutes to run, so in the end I discarded it.

However, if you don’t work with Big Data and you have quite small datasets (a few hundred of records) I highly recommend it! It is undoubtedly one of the most complete and integrated packages.

Also for this library I provide an example, run on a very small dataframe: you can find it in the GitHub gist here below.

CountryInfo example of usage — script by Author

Installation:

pip install countryinfo

8. “PyPopulation”, for population stocks.

pypopulation

Lightweight population lookup using ISO 3166 alpha-1/2 country codes for Python 3.6.1 and higher. The aim is to provide…

pypi.org

This library was purposely built to be “minimalist”, so that it does one thing only, as best as possible: its aim is to provide the number of persons permanently residing in a given country.

The API works whether you give it the alpha 2 or alpha 3 code of the country.

Although the package usage is pretty straightforward, I offer a brief snippet for it too.

PyPopulation example of usage — script by Author

The population supplied by this module is sourced from The World Bank.

Installation:

pip install pypopulation

9. “GeoText”, to extract geo info form text.

geotext

Geotext extracts country and city mentions from text pip install https://github.com/elyase/geotext/archive/master.zip…

pypi.org

Welcome to geotext's documentation! – geotext 0.1.0 documentation

Edit description

geotext.readthedocs.io

This is a really useful (and also quite fast!) library if you work with Natural Language Processing (whether if you built a simple project with nltk, or if you manage to design some complex Human Language technologies or applications).

Indeed, its function is to extract information about countries and cities from mentions contained in textual data.

As a result, one of the main advantages of the module is to extract structured data from unstructured one.

Installation:

pip install geotext

10. “Haversine”, to compute distances.

haversine

Calculate the distance (in various units) between two points on Earth using their latitude and longitude. pip install…

pypi.org

Finally, if the calculation of the geodesic distance available in the previous libraries did not satisfy you, this last package allows you to use a different formulation to calculate the distance between two points on Earth using their latitude and longitude.

Specifically, the haversine formula is as follows:

Haversine distance formula (form Wikipedia)

It is expressed as a function of the central angle θ (simply, the angle between any two points -coordinates- on a sphere) and it is computed directly from the latitude (represented by φ) and longitude (represented by λ) of the two points.

With this module, the output is available in different units of measure.

Installation:

pip install haversine

Conclusion

I hope this article provides you some useful tools to explore GeoData Science world, as well as possible solutions to work with your geospatial data.

Hi there, I’m Diletta Goglia, a junior researcher in Machine Learning and Big Data Science for migration understanding and prediction.

Follow me at your interest 🙂

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓