How To Create a Python Package for Fetching Weather Data
Last Updated on January 12, 2023 by Editorial Team
Author(s): Stavros Theocharis
Originally published on Towards AI.
An easy implementation of a python package for fetching weather data from anyΒ location
Markets across the board rely on weather reports for a wide variety of purposes. Investment forecasts, load demand planning, supply chain management, business analytics applications, transportation distribution demands, and time-series enhanced analysis are some of the many areas of business that might benefit from weatherΒ data.
Up until now, there have been a lot of occasions on which I needed to have access to certain weather data in order to make use of it. The majority of the time, I want to integrate it into some tasks involving time series (e.g., for seasonality reasons, etc.). In the past, I would look for weather data on the Internet and either download it by hand, make a quick script to scrape it from websites, or use an open application programming interface (API).
I have sometimes utilized Open-Meteo, an open-source weather API (for non-commercial use), and I have just learned about another open API that the National Aeronautics and Space Administration (NASA) provides as part of the βPOWERβ project (NASA βs larc-power project) in order to obtain meteorological data. In contrast to NASAβs API, which only provides historical data, Open-Meteo offers a wide variety of endpoints for both types ofΒ data.
NASA has long supported satellite systems and research that provide data critical to the study of climate and climatic processes through its Earth Science research program. This funding continues today. Estimates of surface solar energy fluxes and long-term climatological averages of meteorological parameters are included in these data sets. In addition, mean daily values of the primary solar and meteorological data are supplied in a manner suitable for timeΒ series.
In order to have a robust and fully functional script for using it each time at the corresponding project, I created a GitHub repository called βweather data retrieverβ. In this repository, I have made several enhancements so that it can be used easily and quickly as a package. My goal is to have a robust, fully functional package that can be used every time weather data is needed. Through this package, one can select to use Open-Meteoβs functionality or NASAβs functionality. In order to learn how to install and directly use it, you can follow the instructions inside βREADME.mdβ and also the quick-start Jupyter notebooks. In this article, I mainly explain the logic behind the functionality and the implementation of such aΒ package.
So, letβs dive into theΒ codeβ¦
Implementation
Helpful functions
As stated in the corresponding guides of NASAβS Larc Power Project and Open-Meteo for API use, it is necessary to use the longitude and latitude of an area in order to get this areaβs weather data. In order to have an easy approach, we will construct a function using the βgeopy packageβ that returns the longitude and latitude by inserting the name of theΒ area:
from geopy.geocoders import Nominatim
from typing import Tuple, List, Dict, Union, Literal
def get_location_from_name(
name: str, use_bound_box: bool = False
) -> Tuple[str, Tuple[float, float]]:
nom_loc = Nominatim(user_agent="weather_data_retriever")
try:
location = nom_loc.geocode(name)
if use_bound_box:
coordinates = location.raw["boundingbox"]
# Coordinates has sorted the values as [latmin, latmax, lonmin, lonmax]
return location[0], tuple(coordinates)
else:
return location[0], location[1]
except Exception as e:
raise ValueError("Error in finding Area & Coordinates.", e)
Since we can also use bounding boxes of an area at our API call for NASAβs weather data, we also include the functionality of getting bounding boxes in our custom function.
So now, we can get the longitude and latitude values simply by entering the name of theΒ area.
NASAβs API
The used bounding boxes have a limitation for our final request in NASAβs case. As Iβve seen, the maximum longitude should not be more than 10 points higher than theΒ minimum:
def adjust_coordinates_on_limitations(
longitude_max: Union[float, str], longitude_min: Union[float, str]
) -> str:
"""
Check if a the max and min values have more than 10 points diff.
If yes adjust it beacuse the weather API has limitations.
"""
if float(longitude_max) > float(longitude_min) + 10:
longitude_max = float(longitude_min) + 10
return str(longitude_max)
NASAβs API provides only historical data and supports specific inputs for aggregation of the data, like βhourlyβ, βdailyβ, βmonthlyβ, βclimatologyβ and also specific date formatting (e.g., for using daily and hourly data, the date β2022β05β05β has to be given as β20220505β, and for monthly, climatology data, only the year part β2022β is supported).
def format_date_for_larc_power(
start_date: str,
end_date: str,
aggregation: Literal["hourly", "daily", "monthly", "climatology"],
) -> Tuple[str, str]:
"""
Formats the dates based on the aggregation in order to be ready to be used in Nasa weather request
"""
if aggregation not in ["hourly", "daily", "monthly", "climatology"]:
raise ValueError("Invalid aggregation value")
if (aggregation == "monthly") | (aggregation == "climatology"):
mod_start_date = start_date[0:4]
mod_end_date = end_date[0:4]
else:
mod_start_date = start_date.replace("-", "")
mod_end_date = end_date.replace("-", "")
return mod_start_date, mod_end_date
For each one of the aggregation types, a specific output comes as a response. For example, for a random request for βhourlyβ type aggregation, weΒ get:
This has to be formatted based on the type of aggregation used in order to construct our pd.DataFrame object. So, letβs define some more functions:
from datetime import datetime
import pandas as pd
def convert_str_hour_date_to_datetime(str_hour_date: str) -> datetime:
"""Converts str hour (eg. "2022050501) to datetime"""
return datetime.strptime(str_hour_date, "%Y%m%d%H")
def convert_response_larc_power_dict_to_dataframe(
response_dict: Dict[str, Dict[str, float]], aggregation: str
) -> pd.DataFrame:
"""
Converts coming dict from response to dataframe based on aggregation
"""
weather_df = pd.DataFrame(response_dict).reset_index()
coming_columns = list(response_dict.keys())
weather_df.columns = ["date"] + coming_columns
if aggregation == "daily":
weather_df["date"] = pd.to_datetime(weather_df["date"])
elif aggregation == "hourly":
weather_df["date"] = weather_df.apply(
lambda x: convert_str_hour_date_to_datetime(x["date"]), axis=1
)
weather_df["date"] = pd.to_datetime(weather_df["date"])
return weather_df
Now, letβs define the main function:
import requests
import json
def get_larc_power_weather_data(
start_date: str,
end_date: str,
coordinates: Union[Tuple[float, float], Tuple[float, float, float, float]],
aggregation: Literal["hourly", "daily", "monthly", "climatology"] = "daily",
community: Literal["AG", "RE", "SB"] = "RE",
regional: bool = False,
variables: List[str] = [
"T2M",
"T2MDEW",
"T2MWET",
"TS",
"T2M_RANGE",
"T2M_MAX",
"T2M_MIN",
"RH2M",
"PRECTOT",
"WS2M",
"ALLSKY_SFC_SW_DWN",
],
) -> Union[pd.DataFrame, Dict[str, Dict[str, float]]]:
"""
This function retrieves NASA's historical weather data
"""
if aggregation not in ["hourly", "daily", "monthly", "climatology"]:
raise ValueError("Invalid aggregation value")
if community not in ["AG", "RE", "SB"]:
raise ValueError("Invalid community value")
# Basic modifications
formatted_variables = ",".join(variables)
mod_start_date, mod_end_date = format_date_for_larc_power(
start_date, end_date, aggregation
)
if regional:
base_url = r"https://power.larc.nasa.gov/api/temporal/{aggregation}/regional?parameters={parameters}&community={community}&latitude-min={latitude_min}&latitude-max={latitude_max}&longitude-min={longitude_min}&longitude-max={longitude_max}&start={start}&end={end}&format=JSON"
latitude_min = coordinates[0]
longitude_min = coordinates[2]
latitude_max = coordinates[1]
longitude_max = coordinates[3]
longitude_max = adjust_coordinates_on_limitations(longitude_max, longitude_min)
api_request_url = base_url.format(
latitude_min=latitude_min,
longitude_min=longitude_min,
latitude_max=latitude_max,
longitude_max=longitude_max,
start=mod_start_date,
end=mod_end_date,
aggregation=aggregation,
community=community,
parameters=formatted_variables,
)
else:
base_url = r"https://power.larc.nasa.gov/api/temporal/{aggregation}/point?parameters={parameters}&community={community}&longitude={longitude}&latitude={latitude}&start={start}&end={end}&format=JSON"
latitude = coordinates[0]
longitude = coordinates[1]
api_request_url = base_url.format(
longitude=longitude,
latitude=latitude,
start=mod_start_date,
end=mod_end_date,
aggregation=aggregation,
community=community,
parameters=formatted_variables,
)
try:
response = requests.get(url=api_request_url, verify=True, timeout=30.00)
except Exception as e:
print("There is an error with the Nasa weather API. The error is: ", e)
content = json.loads(response.content.decode("utf-8"))
if len(content["messages"]) > 0:
raise InterruptedError(content["messages"])
if regional:
return content
else:
selected_content_dict = content["properties"]["parameter"]
weather_df = convert_response_larc_power_dict_to_dataframe(
selected_content_dict, aggregation
)
return weather_df
And the entire pipeline:
def fetch_larc_power_historical_weather_data(
location_name: str,
start_date,
end_date,
aggregation: Literal["hourly", "daily", "monthly", "climatology"] = "daily",
community: Literal["AG", "RE", "SB"] = "RE",
regional: bool = False,
use_bound_box: bool = False,
variables_to_fetch: List[str] = ["default"],
) -> Union[pd.DataFrame, Dict[str, Dict[str, float]]]:
location, coordinates = get_location_from_name(location_name, use_bound_box)
if variables_to_fetch == ["default"]:
if aggregation == "hourly":
variables_to_fetch = l_power_base_vars_to_fetch
else:
variables_to_fetch = (
l_power_base_vars_to_fetch + l_power_additional_vars_to_fetch
)
return get_larc_power_weather_data(
start_date=start_date,
end_date=end_date,
aggregation=aggregation,
community=community,
regional=regional,
coordinates=coordinates,
variables=variables_to_fetch,
)
For more information about the returned weather variables and the different possible combinations, have a look at the repositoryβs README file and the corresponding notebook.
The data was obtained from the National Aeronautics and Space Administration (NASA) Langley Research Center (LaRC) Prediction of Worldwide Energy Resource (POWER) Project funded through the NASA Earth Science/Applied ScienceΒ Program.
Open-Meteoβs API
When utilizing Open-Meteo, one may receive a wide variety of meteorological variables. In this instance, we will be making use of the most common ones. Furthermore, certain factors are only applicable to the analysis of historical data, whereas others may be discovered in the research of forecasted data. In this case, the aggregation may be βhourly,β or it could beΒ βdailyβ:
def choose_meteo_default_variables(
aggregation: Literal["hourly", "daily"], case: Literal["forecast", "historical"]
) -> List[str]:
if aggregation == "hourly":
default_variables = [
"temperature_2m",
"relativehumidity_2m",
"dewpoint_2m",
"apparent_temperature",
"precipitation",
"rain",
"snowfall",
]
if case == "forecast":
default_variables.append("showers")
else:
default_variables = [
"temperature_2m_max",
"temperature_2m_min",
"apparent_temperature_max",
"apparent_temperature_min",
"sunrise",
"precipitation_sum",
"rain_sum",
]
if case == "forecast":
default_variables.extend(["showers_sum", "snowfall_sum"])
return default_variables
Letβs also define the function for adjusting the API use (based on historical or forecasted requests):
def build_meteo_request_url(
aggregation: Literal["hourly", "daily"],
parameters_str: str,
longitude: float,
latitude: float,
case: Literal["forecast", "historical"],
start_date: Union[str, None],
end_date: Union[str, None],
) -> str:
if case == "historical":
base_forecast_url = r"https://archive-api.open-meteo.com/v1/archive?latitude={latitude}&longitude={longitude}&start_date={start_date}&end_date={end_date}&{aggregation}={parameters_str}&timeformat=unixtime&timezone=auto"
api_forecast_request_url = base_forecast_url.format(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
start_date=start_date,
end_date=end_date,
)
else:
base_forecast_url = r"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&{aggregation}={parameters_str}&timeformat=unixtime&timezone=auto"
api_forecast_request_url = base_forecast_url.format(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
)
return api_forecast_request_url
Here we also have the main function:
def get_open_meteo_weather_data(
coordinates: Tuple[float, float],
aggregation: Literal["hourly", "daily"],
case: Literal["forecast", "historical"],
parameters: List[str] = ["default"],
start_date: Union[str, None] = None,
end_date: Union[str, None] = None,
) -> Tuple[
pd.DataFrame, Dict[str, Union[str, float, Dict[str, Union[List[str], List[float]]]]]
]:
"""
This function retrieves open-meteo historical or forecasted weather data at a location point
"""
parameters_str = ",".join(parameters)
latitude = coordinates[0]
longitude = coordinates[1]
api_forecast_request_url = build_meteo_request_url(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
case=case,
start_date=start_date,
end_date=end_date,
)
try:
response = requests.get(
url=api_forecast_request_url, verify=True, timeout=30.00
)
except:
raise ConnectionAbortedError("Failed to establish connection")
content = json.loads(response.content.decode("utf-8"))
weather_data_df = pd.DataFrame(content[aggregation])
weather_data_df["time"] = weather_data_df.apply(
lambda x: pd.to_datetime(x["time"], unit="s"), axis=1
)
return weather_data_df, content
More information about the use of the arguments can be found inside the repository.
The final main function that connects everything togetherΒ is:
def fetch_open_meteo_weather_data(
location_name: str,
aggregation: Literal["hourly", "daily"],
case: Literal["forecast", "historical"],
variables_to_fetch: List[str] = ["default"],
start_date: Union[str, None] = None,
end_date: Union[str, None] = None,
) -> Tuple[
pd.DataFrame, Dict[str, Union[str, float, Dict[str, Union[List[str], List[float]]]]]
]:
location, coordinates = get_location_from_name(location_name, use_bound_box=False)
if "default" in variables_to_fetch:
parameters = choose_meteo_default_variables(aggregation=aggregation, case=case)
else:
parameters = variables_to_fetch
return get_open_meteo_weather_data(
start_date=start_date,
end_date=end_date,
aggregation=aggregation,
coordinates=coordinates,
parameters=parameters,
case=case,
)
Please look at the repositoryβs README file and the accompanying notebook for more information about the returned meteorological variables and the many ways these variables can be put together.
Final parts of theΒ package
In order to be able to call the appropriate functions as modules, it is necessary to include an β__init__.pyβ file inside the same folder as the β.pyβ files that will include the above functions and pipelines.
This file needs to import the two main pipelines in order to be able to be called directly:
from weather_data_retriever.pipelines import (
fetch_larc_power_historical_weather_data,
fetch_open_meteo_weather_data,
)
You can also include if you wish, a βLICENSEβ file for arranging the distribution and use of yourΒ package.
Finally, create a βsetup.pyβ file:
from setuptools import setup, find_packages
with open("LICENSE") as f:
license = f.read()
setup(
name="weather_data_retriever",
version="1.0",
author="Stavros Theocharis",
description="Weather data retriever",
long_description="Multiple sources weather data retriever",
url="https://github.com/stavrostheocharis/weather_data_retriever.git",
packages=find_packages(exclude="tests"),
install_requires=[
"pandas",
"geopy",
"requests",
],
license=license,
)
And thatβs it! Your new weather data package is ready to be used. More enhancements can be done, but this is a huge step, and it saved me a lot of time searching and adjusting scripts to get weatherΒ data.
Conclusion
When it comes to making use of weather data or predictions, there is a significant gap between getting them from a variety of sources and combining them together! Even for very simple projects, I had to connect several times to accounts or get tokens for the different apps, etc. Many times, all these apps did not also provide APIs that had been developed correctly.
In this article, I presented an easy implementation for creating a basic package for fetching weather data from two stable APIs provided by Open-Meteo and NASA. More features can be added in order to enhance functionality in theΒ future.
How To Create a Python Package for Fetching Weather Data was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI