Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

How To Create a Python Package for Fetching Weather Data
Latest

How To Create a Python Package for Fetching Weather Data

Last Updated on January 12, 2023 by Editorial Team

Author(s): Stavros Theocharis

Originally published on Towards AI.

An easy implementation of a python package for fetching weather data from anyΒ location

Image generated byΒ DALL-E

Markets across the board rely on weather reports for a wide variety of purposes. Investment forecasts, load demand planning, supply chain management, business analytics applications, transportation distribution demands, and time-series enhanced analysis are some of the many areas of business that might benefit from weatherΒ data.

Up until now, there have been a lot of occasions on which I needed to have access to certain weather data in order to make use of it. The majority of the time, I want to integrate it into some tasks involving time series (e.g., for seasonality reasons, etc.). In the past, I would look for weather data on the Internet and either download it by hand, make a quick script to scrape it from websites, or use an open application programming interface (API).

I have sometimes utilized Open-Meteo, an open-source weather API (for non-commercial use), and I have just learned about another open API that the National Aeronautics and Space Administration (NASA) provides as part of the β€œPOWER” project (NASA β€˜s larc-power project) in order to obtain meteorological data. In contrast to NASA’s API, which only provides historical data, Open-Meteo offers a wide variety of endpoints for both types ofΒ data.

NASA has long supported satellite systems and research that provide data critical to the study of climate and climatic processes through its Earth Science research program. This funding continues today. Estimates of surface solar energy fluxes and long-term climatological averages of meteorological parameters are included in these data sets. In addition, mean daily values of the primary solar and meteorological data are supplied in a manner suitable for timeΒ series.

In order to have a robust and fully functional script for using it each time at the corresponding project, I created a GitHub repository called β€œweather data retriever”. In this repository, I have made several enhancements so that it can be used easily and quickly as a package. My goal is to have a robust, fully functional package that can be used every time weather data is needed. Through this package, one can select to use Open-Meteo’s functionality or NASA’s functionality. In order to learn how to install and directly use it, you can follow the instructions inside β€œREADME.md” and also the quick-start Jupyter notebooks. In this article, I mainly explain the logic behind the functionality and the implementation of such aΒ package.

So, let’s dive into theΒ code…

Implementation

Helpful functions

As stated in the corresponding guides of NASA’S Larc Power Project and Open-Meteo for API use, it is necessary to use the longitude and latitude of an area in order to get this area’s weather data. In order to have an easy approach, we will construct a function using the β€œgeopy package” that returns the longitude and latitude by inserting the name of theΒ area:

from geopy.geocoders import Nominatim
from typing import Tuple, List, Dict, Union, Literal

def get_location_from_name(
name: str, use_bound_box: bool = False
) -> Tuple[str, Tuple[float, float]]:
nom_loc = Nominatim(user_agent="weather_data_retriever")
try:
location = nom_loc.geocode(name)
if use_bound_box:
coordinates = location.raw["boundingbox"]
# Coordinates has sorted the values as [latmin, latmax, lonmin, lonmax]
return location[0], tuple(coordinates)

else:
return location[0], location[1]

except Exception as e:
raise ValueError("Error in finding Area & Coordinates.", e)

Since we can also use bounding boxes of an area at our API call for NASA’s weather data, we also include the functionality of getting bounding boxes in our custom function.

So now, we can get the longitude and latitude values simply by entering the name of theΒ area.

NASA’s API

The used bounding boxes have a limitation for our final request in NASA’s case. As I’ve seen, the maximum longitude should not be more than 10 points higher than theΒ minimum:

def adjust_coordinates_on_limitations(
longitude_max: Union[float, str], longitude_min: Union[float, str]
) -> str:
"""
Check if a the max and min values have more than 10 points diff.
If yes adjust it beacuse the weather API has limitations.
"""
if float(longitude_max) > float(longitude_min) + 10:
longitude_max = float(longitude_min) + 10

return str(longitude_max)

NASA’s API provides only historical data and supports specific inputs for aggregation of the data, like β€œhourly”, β€œdaily”, β€œmonthly”, β€œclimatology” and also specific date formatting (e.g., for using daily and hourly data, the date β€œ2022–05–05” has to be given as β€œ20220505”, and for monthly, climatology data, only the year part β€œ2022” is supported).

def format_date_for_larc_power(
start_date: str,
end_date: str,
aggregation: Literal["hourly", "daily", "monthly", "climatology"],
) -> Tuple[str, str]:
"""
Formats the dates based on the aggregation in order to be ready to be used in Nasa weather request
"""

if aggregation not in ["hourly", "daily", "monthly", "climatology"]:
raise ValueError("Invalid aggregation value")

if (aggregation == "monthly") | (aggregation == "climatology"):
mod_start_date = start_date[0:4]
mod_end_date = end_date[0:4]
else:
mod_start_date = start_date.replace("-", "")
mod_end_date = end_date.replace("-", "")

return mod_start_date, mod_end_date

For each one of the aggregation types, a specific output comes as a response. For example, for a random request for β€œhourly” type aggregation, weΒ get:

This has to be formatted based on the type of aggregation used in order to construct our pd.DataFrame object. So, let’s define some more functions:

from datetime import datetime
import pandas as pd

def convert_str_hour_date_to_datetime(str_hour_date: str) -> datetime:
"""Converts str hour (eg. "2022050501) to datetime"""
return datetime.strptime(str_hour_date, "%Y%m%d%H")

def convert_response_larc_power_dict_to_dataframe(
response_dict: Dict[str, Dict[str, float]], aggregation: str
) -> pd.DataFrame:
"""
Converts coming dict from response to dataframe based on aggregation
"""

weather_df = pd.DataFrame(response_dict).reset_index()
coming_columns = list(response_dict.keys())
weather_df.columns = ["date"] + coming_columns

if aggregation == "daily":
weather_df["date"] = pd.to_datetime(weather_df["date"])
elif aggregation == "hourly":
weather_df["date"] = weather_df.apply(
lambda x: convert_str_hour_date_to_datetime(x["date"]), axis=1
)
weather_df["date"] = pd.to_datetime(weather_df["date"])

return weather_df

Now, let’s define the main function:

import requests
import json

def get_larc_power_weather_data(
start_date: str,
end_date: str,
coordinates: Union[Tuple[float, float], Tuple[float, float, float, float]],
aggregation: Literal["hourly", "daily", "monthly", "climatology"] = "daily",
community: Literal["AG", "RE", "SB"] = "RE",
regional: bool = False,
variables: List[str] = [
"T2M",
"T2MDEW",
"T2MWET",
"TS",
"T2M_RANGE",
"T2M_MAX",
"T2M_MIN",
"RH2M",
"PRECTOT",
"WS2M",
"ALLSKY_SFC_SW_DWN",
],
) -> Union[pd.DataFrame, Dict[str, Dict[str, float]]]:
"""
This function retrieves NASA's historical weather data
"""

if aggregation not in ["hourly", "daily", "monthly", "climatology"]:
raise ValueError("Invalid aggregation value")

if community not in ["AG", "RE", "SB"]:
raise ValueError("Invalid community value")

# Basic modifications
formatted_variables = ",".join(variables)
mod_start_date, mod_end_date = format_date_for_larc_power(
start_date, end_date, aggregation
)

if regional:
base_url = r"https://power.larc.nasa.gov/api/temporal/{aggregation}/regional?parameters={parameters}&community={community}&latitude-min={latitude_min}&latitude-max={latitude_max}&longitude-min={longitude_min}&longitude-max={longitude_max}&start={start}&end={end}&format=JSON"
latitude_min = coordinates[0]
longitude_min = coordinates[2]
latitude_max = coordinates[1]
longitude_max = coordinates[3]
longitude_max = adjust_coordinates_on_limitations(longitude_max, longitude_min)

api_request_url = base_url.format(
latitude_min=latitude_min,
longitude_min=longitude_min,
latitude_max=latitude_max,
longitude_max=longitude_max,
start=mod_start_date,
end=mod_end_date,
aggregation=aggregation,
community=community,
parameters=formatted_variables,
)
else:
base_url = r"https://power.larc.nasa.gov/api/temporal/{aggregation}/point?parameters={parameters}&community={community}&longitude={longitude}&latitude={latitude}&start={start}&end={end}&format=JSON"
latitude = coordinates[0]
longitude = coordinates[1]
api_request_url = base_url.format(
longitude=longitude,
latitude=latitude,
start=mod_start_date,
end=mod_end_date,
aggregation=aggregation,
community=community,
parameters=formatted_variables,
)

try:
response = requests.get(url=api_request_url, verify=True, timeout=30.00)
except Exception as e:
print("There is an error with the Nasa weather API. The error is: ", e)

content = json.loads(response.content.decode("utf-8"))
if len(content["messages"]) > 0:
raise InterruptedError(content["messages"])

if regional:
return content
else:
selected_content_dict = content["properties"]["parameter"]
weather_df = convert_response_larc_power_dict_to_dataframe(
selected_content_dict, aggregation
)
return weather_df

And the entire pipeline:

def fetch_larc_power_historical_weather_data(
location_name: str,
start_date,
end_date,
aggregation: Literal["hourly", "daily", "monthly", "climatology"] = "daily",
community: Literal["AG", "RE", "SB"] = "RE",
regional: bool = False,
use_bound_box: bool = False,
variables_to_fetch: List[str] = ["default"],
) -> Union[pd.DataFrame, Dict[str, Dict[str, float]]]:

location, coordinates = get_location_from_name(location_name, use_bound_box)
if variables_to_fetch == ["default"]:
if aggregation == "hourly":
variables_to_fetch = l_power_base_vars_to_fetch
else:
variables_to_fetch = (
l_power_base_vars_to_fetch + l_power_additional_vars_to_fetch
)

return get_larc_power_weather_data(
start_date=start_date,
end_date=end_date,
aggregation=aggregation,
community=community,
regional=regional,
coordinates=coordinates,
variables=variables_to_fetch,
)

For more information about the returned weather variables and the different possible combinations, have a look at the repository’s README file and the corresponding notebook.

The data was obtained from the National Aeronautics and Space Administration (NASA) Langley Research Center (LaRC) Prediction of Worldwide Energy Resource (POWER) Project funded through the NASA Earth Science/Applied ScienceΒ Program.

Open-Meteo’s API

When utilizing Open-Meteo, one may receive a wide variety of meteorological variables. In this instance, we will be making use of the most common ones. Furthermore, certain factors are only applicable to the analysis of historical data, whereas others may be discovered in the research of forecasted data. In this case, the aggregation may be β€œhourly,” or it could beΒ β€œdaily”:

def choose_meteo_default_variables(
aggregation: Literal["hourly", "daily"], case: Literal["forecast", "historical"]
) -> List[str]:
if aggregation == "hourly":
default_variables = [
"temperature_2m",
"relativehumidity_2m",
"dewpoint_2m",
"apparent_temperature",
"precipitation",
"rain",
"snowfall",
]

if case == "forecast":
default_variables.append("showers")
else:
default_variables = [
"temperature_2m_max",
"temperature_2m_min",
"apparent_temperature_max",
"apparent_temperature_min",
"sunrise",
"precipitation_sum",
"rain_sum",
]

if case == "forecast":
default_variables.extend(["showers_sum", "snowfall_sum"])

return default_variables

Let’s also define the function for adjusting the API use (based on historical or forecasted requests):

def build_meteo_request_url(
aggregation: Literal["hourly", "daily"],
parameters_str: str,
longitude: float,
latitude: float,
case: Literal["forecast", "historical"],
start_date: Union[str, None],
end_date: Union[str, None],
) -> str:

if case == "historical":
base_forecast_url = r"https://archive-api.open-meteo.com/v1/archive?latitude={latitude}&longitude={longitude}&start_date={start_date}&end_date={end_date}&{aggregation}={parameters_str}&timeformat=unixtime&timezone=auto"
api_forecast_request_url = base_forecast_url.format(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
start_date=start_date,
end_date=end_date,
)
else:
base_forecast_url = r"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&{aggregation}={parameters_str}&timeformat=unixtime&timezone=auto"
api_forecast_request_url = base_forecast_url.format(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
)

return api_forecast_request_url

Here we also have the main function:

def get_open_meteo_weather_data(
coordinates: Tuple[float, float],
aggregation: Literal["hourly", "daily"],
case: Literal["forecast", "historical"],
parameters: List[str] = ["default"],
start_date: Union[str, None] = None,
end_date: Union[str, None] = None,
) -> Tuple[
pd.DataFrame, Dict[str, Union[str, float, Dict[str, Union[List[str], List[float]]]]]
]:
"""
This function retrieves open-meteo historical or forecasted weather data at a location point
"""

parameters_str = ",".join(parameters)
latitude = coordinates[0]
longitude = coordinates[1]
api_forecast_request_url = build_meteo_request_url(
aggregation=aggregation,
parameters_str=parameters_str,
longitude=longitude,
latitude=latitude,
case=case,
start_date=start_date,
end_date=end_date,
)

try:
response = requests.get(
url=api_forecast_request_url, verify=True, timeout=30.00
)
except:
raise ConnectionAbortedError("Failed to establish connection")

content = json.loads(response.content.decode("utf-8"))
weather_data_df = pd.DataFrame(content[aggregation])
weather_data_df["time"] = weather_data_df.apply(
lambda x: pd.to_datetime(x["time"], unit="s"), axis=1
)

return weather_data_df, content

More information about the use of the arguments can be found inside the repository.

The final main function that connects everything togetherΒ is:

def fetch_open_meteo_weather_data(
location_name: str,
aggregation: Literal["hourly", "daily"],
case: Literal["forecast", "historical"],
variables_to_fetch: List[str] = ["default"],
start_date: Union[str, None] = None,
end_date: Union[str, None] = None,
) -> Tuple[
pd.DataFrame, Dict[str, Union[str, float, Dict[str, Union[List[str], List[float]]]]]
]:
location, coordinates = get_location_from_name(location_name, use_bound_box=False)
if "default" in variables_to_fetch:
parameters = choose_meteo_default_variables(aggregation=aggregation, case=case)
else:
parameters = variables_to_fetch

return get_open_meteo_weather_data(
start_date=start_date,
end_date=end_date,
aggregation=aggregation,
coordinates=coordinates,
parameters=parameters,
case=case,
)

Please look at the repository’s README file and the accompanying notebook for more information about the returned meteorological variables and the many ways these variables can be put together.

Final parts of theΒ package

In order to be able to call the appropriate functions as modules, it is necessary to include an β€œ__init__.py” file inside the same folder as the β€œ.py” files that will include the above functions and pipelines.

This file needs to import the two main pipelines in order to be able to be called directly:

from weather_data_retriever.pipelines import (
fetch_larc_power_historical_weather_data,
fetch_open_meteo_weather_data,
)

You can also include if you wish, a β€œLICENSE” file for arranging the distribution and use of yourΒ package.

Finally, create a β€œsetup.py” file:

from setuptools import setup, find_packages

with open("LICENSE") as f:
license = f.read()

setup(
name="weather_data_retriever",
version="1.0",
author="Stavros Theocharis",
description="Weather data retriever",
long_description="Multiple sources weather data retriever",
url="https://github.com/stavrostheocharis/weather_data_retriever.git",
packages=find_packages(exclude="tests"),
install_requires=[
"pandas",
"geopy",
"requests",
],
license=license,
)

And that’s it! Your new weather data package is ready to be used. More enhancements can be done, but this is a huge step, and it saved me a lot of time searching and adjusting scripts to get weatherΒ data.

Conclusion

When it comes to making use of weather data or predictions, there is a significant gap between getting them from a variety of sources and combining them together! Even for very simple projects, I had to connect several times to accounts or get tokens for the different apps, etc. Many times, all these apps did not also provide APIs that had been developed correctly.

In this article, I presented an easy implementation for creating a basic package for fetching weather data from two stable APIs provided by Open-Meteo and NASA. More features can be added in order to enhance functionality in theΒ future.


How To Create a Python Package for Fetching Weather Data was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓