Top 10 Python libraries to work with geo-spatial data in 2022
Last Updated on July 26, 2023 by Editorial Team
Author(s): Diletta Goglia
Originally published on Towards AI.
Geographic Data Science is a new emerging field in computational disciplines and applications. Nowadays, a good part of the data in the real world has to do with a spatial component (e.g. population stocks, trading data, climate-changing, viruses spread, etcβ¦). Knowing how to retrieve, gather, clean and integrate geographic features into datasets is therefore necessary.
Whether or not you have some skills in geoprocessing or geocomputation, today I suggest you the best Python libraries to work, manipulate, and interact with geo-spatial data. I will also include all the links needed to find both the packages and their documentation, as well as the code to run for the installation.
Letβs get started!
1.β PyCountryβ, for countries standard notation.
pycountry
pycountry provides the ISO databases for the standards: The package includes a copy from Debian's pkg-isocodes andβ¦
pypi.org
GitHub – flyingcircusio/pycountry: A Python library to access ISO country, subdivision, languageβ¦
pycountry provides the ISO databases for the standards: The package includes a copy from Debian's pkg-isocodes andβ¦
github.com
PyCountry module provides the following databases for the ISO standards:
- ISO 3166 Countries, including:
β ISO 3166β3 Deleted countries
β ISO 3166β2 Subdivisions of countries - ISO 4217 Currencies
- ISO 639β3 Languages (codes for natural languages)
- ISO 15924 Scripts (codes for writing systems)
I have found this package particularly useful to retrieve countries names from alpha 2 and alpha 3 ISO codes (and viceversa).
I personally find it very useful also for those who want to produce new data (or clean up existing ones) following an official notation, so that it is easier for other developers to replicate and integrate both scripts and datasets.
Installation:
pip install pycountry
Moreover, I mark an available pycountry extension which is pycountry-convert, which provides quick and easy conversion functions between country names and codes, and also continents.
Installation:
pip install pycountry-convert
In case you have not experience with geocoding and you want to try this fantastic package, I leave you here an embed of my example made in JuPyter notebook. I hope you will find it useful!
2. βGeoPyβ, for geographic coordinates.
geopy
geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers toβ¦
pypi.org
Welcome to GeoPy's documentation! – GeoPy 2.2.0 documentation
Documentation https://geopy.readthedocs.io/ Source Code https://github.com/geopy/geopy Stack Overflowβ¦
geopy.readthedocs.io
GeoPy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe.
This module allows you to retrieve latitude and longitude points, but also getting postal codes, distances, and even more! Among the various libraries to work with geo-data, this is definitely one of the most popular and famous.
It also includes OpenStreetMap Nominatim and Google Geocoding API (V3) as available geocoding services .
Installation:
pip install geopy
3. βGeo-Pyβ, to work with points in space.
(Yes, it is different from the former!)
geo-py
Functions onto sphere geo.sphere.approximate_distance Approximate calculation distance (expanding the trigonometricβ¦
pypi.org
Geo-Py module provides algorithms and structures related to geodesy and geospatial data useful for computing distances (by choosing among different formulas), destinations and bearing, given some points in space.
Installation:
pip install geo-py
4. βReverse Geocoderβ, forβ¦ guess what? Reverse geocoding!
reverse_geocoder
Reverse Geocoder takes a latitude / longitude coordinate and returns the nearest town/city. This library improves on anβ¦
pypi.org
As you can guess from the name, reverse_geocoder package takes a (latitude, longitude) coordinate and returns the nearest city, together with its respective country and administrative 1 and 2 regions.
Hence it could be very helpful for you to deal with different NUTS level.
Installation:
pip install reverse_geocoder
5. βGeoPandasβ, for geometric operations.
geopandas
GeoPandas is a project to add support for geographic data to pandas objects. The goal of GeoPandas is to make workingβ¦
pypi.org
GeoPandas 0.10.2+0.g04d377f.dirty – GeoPandas 0.10.2+0.g04d377f.dirty documentation
GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends theβ¦
geopandas.org
GeoPandas extends the datatypes used by Pandas to allow spatial operations on geometric types.
It is a popular open source project built to simplify work with geospatial data in Python by including some new structures that perfectly resembles those of Pandas, such as GeoDataFrame and GeoSeries.
Generally, it is very famous and used for plotting, as it allows to easily generate (even complex and interactive) charts and maps from geometric data. This is possible through specifically designed tools and data structures for spatial manipulations .
Installation:
pip install geopandas
6. βDataPrepβ, for cleaning and integration.
DataPrep – The easiest way to prepare data in Python
DataPrep is designed and optimized for computational notebooks, the most popular environment among data scientists
dataprep.ai
This library has not been created specifically to work with geo-spatial data, but more in general to collect, integrate, clean and standardize any kind of data. The idea is that you can use a single module like this one to prepare your dataset with a few lines of code, and start working on it in no time.
However, I found it very useful to be exploited on geographic data especially for some helpful functions to validate coordinates, and a very nice module to retrieve information about each country, clean them and standardize w.r.t. ISO 3166.
Itβs such a nice toy, I recommend it to every Data Scientist!
Installation:
pip install -U dataprep
7. βCountryInfoβ, for anything you need!
countryinfo
A python module for returning data about countries, ISO info and states/provinces within them. To access one of theβ¦
pypi.org
This module can help you getting literally any information you may need about countries! It has more than 20 APIs that returns data about area (in square kilometers), capital cities, regions, list of bordering countries and much more!
Unfortunately, I found it to be very (very!) slow.
I tried to use it many times but my code ended up taking tens of minutes to run, so in the end I discarded it.
However, if you donβt work with Big Data and you have quite small datasets (a few hundred of records) I highly recommend it! It is undoubtedly one of the most complete and integrated packages.
Also for this library I provide an example, run on a very small dataframe: you can find it in the GitHub gist here below.
Installation:
pip install countryinfo
8. βPyPopulationβ, for population stocks.
pypopulation
Lightweight population lookup using ISO 3166 alpha-1/2 country codes for Python 3.6.1 and higher. The aim is to provideβ¦
pypi.org
This library was purposely built to be βminimalistβ, so that it does one thing only, as best as possible: its aim is to provide the number of persons permanently residing in a given country.
The API works whether you give it the alpha 2 or alpha 3 code of the country.
Although the package usage is pretty straightforward, I offer a brief snippet for it too.
The population supplied by this module is sourced from The World Bank.
Installation:
pip install pypopulation
9. βGeoTextβ, to extract geo info form text.
geotext
Geotext extracts country and city mentions from text pip install https://github.com/elyase/geotext/archive/master.zipβ¦
pypi.org
Welcome to geotext's documentation! – geotext 0.1.0 documentation
Edit description
geotext.readthedocs.io
This is a really useful (and also quite fast!) library if you work with Natural Language Processing (whether if you built a simple project with nltk, or if you manage to design some complex Human Language technologies or applications).
Indeed, its function is to extract information about countries and cities from mentions contained in textual data.
As a result, one of the main advantages of the module is to extract structured data from unstructured one.
Installation:
pip install geotext
10. βHaversineβ, to compute distances.
haversine
Calculate the distance (in various units) between two points on Earth using their latitude and longitude. pip installβ¦
pypi.org
Finally, if the calculation of the geodesic distance available in the previous libraries did not satisfy you, this last package allows you to use a different formulation to calculate the distance between two points on Earth using their latitude and longitude.
Specifically, the haversine formula is as follows:
It is expressed as a function of the central angle ΞΈ (simply, the angle between any two points -coordinates- on a sphere) and it is computed directly from the latitude (represented by Ο) and longitude (represented by Ξ») of the two points.
With this module, the output is available in different units of measure.
Installation:
pip install haversine
Conclusion
I hope this article provides you some useful tools to explore GeoData Science world, as well as possible solutions to work with your geospatial data.
Hi there, Iβm Diletta Goglia, a junior researcher in Machine Learning and Big Data Science for migration understanding and prediction.
Follow me at your interest π
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI