Last Updated on December 23, 2020 by Editorial Team
Author(s): Louis (What’s AI) Bouchard
What if we could replace all this with artificial intelligence by analyzing weather patterns of the past 40 years to predict the future?
The current traditional approach for weather forecasting uses what we call “Numerical weather prediction” models. It uses mathematical models of the atmosphere and oceans to predict the weather based on the current conditions. It was first introduced in the 1920s and produced realistic results in the 1950s using computer simulations. These mathematical models work for predicting both short and long-term forecasts. But it’s heavy in computation and cannot base its predictions on as much data as a deep neural network. This is partly why it is so promising. These current numerical weather prediction models already use machine learning to improve the forecasts as a post-processing tool. Weather forecasting is receiving more and more attention from machine learning researchers, already yielding promising results.
“Improving Data-Driven Global Weather Prediction Using Deep Convolutional Neural Networks on a Cubed Sphere” is a recent paper published by researchers from the University of Washington in collaboration with Microsoft research. They proposed a new weather forecasting framework using convolutional neural networks (CNNs) producing stable forecasts and realistic weather patterns at lead times of several weeks and longer. The model even significantly outperforms many other techniques for short and medium-range forecasting. But as they say, it currently does not compete with current operational weather forecasting systems in numerical weather prediction, but this data-driven CNN is much faster and is always improving. Showing that machine learning is definitely a valuable tool in weather forecasting and could eventually replace current approaches giving both faster and more accurate predictions. If you are not familiar with the concept of CNNs, I strongly invite you to check out this video I made explaining what they are and how they work.
They called their method the Deep Learning Weather Prediction (DLWP). It takes an initial atmospheric state as inputs and predicts a state of the atmosphere at a given future time. It does that by learning from historical observations of the weather. Of course, these historical observations are the data fed to the network in the training phase. Which allows it to “use” its knowledge to make its predictions.
More details about the DLWP method
This is achieved in three steps. The first step is all about mapping the predictions. As you may know, the most used coordinate system for the Earth is a latitude and longitude grid. But this coordinate system has a problem for neural networks because there are singularities. Indeed, both poles directly jump from 0 longitude degrees to 180 longitude degrees. Making it very difficult to use deep learning networks with this grid.
Instead, they approximate the data on the globe using, as the title of the paper says, a “Cubed Sphere” approach. This is what the National Oceanic and Atmospheric Administration use in their global forecasting model. You can see an example here where the air temperature from 2 meters above ground level is displayed. The first figure on the left shows the cubed-sphere grid with the blue lines being the boundaries between each face of this “cube”. And each face having 48 grid cells as well for more precision. Just beside, you can see a visual representation of this cube being flattened. This is where it gets interesting for neural networks. This technique allows them to only work on each cube face individually, enabling them to use two-dimensional convolutions just like the normal CNN architectures found everywhere. This also means that the model learns different weights and different biases for each face of the cube. Well, this is not completely true. They used the same network for the four faces centered on the equator, and another one for the two polar faces. Here, by the “same network” I mean the same weights are shared inside the network for these cube faces. Of course, since the atmospheric motions are clockwise in the Antarctic and counterclockwise in the Arctic data on the Artic face is flipped before being sent to the network, and then flipped back.
The neural network’s architecture is the second step of their approach. The specific model they used in their work is a popular type of network in the field of computer vision and especially image segmentation tasks, the U-Net architecture as you can see in this image. It is basically two convolutional neural network architectures put together, with the second one working on reverse in an encoding-decoding process style. Here, each red-ish arrow represents a 2-D convolution operating on each cubed-sphere face. The green and purple arrows indicate the average-pooling, to downsample the image in the first network resulting in fewer parameters, and to upsample the outputs in the second network to come back to the original size. The blue-to-yellow lines represent skip connections. It is frequently used in the U-Net architecture to skip some layers in the neural network and feed the output of one layer as the input to the next layers. This is mainly used to give an alternative path for the gradient during training to eliminate the vanishing gradient problem, which appears when the network is too deep and the gradient eventually becomes zero after passing through all the layers in the backpropagation process. This causes the early layers of the network to not update anymore, thus not learning from the data and never converging to a great solution.
Finally, they merge these predictions with sequence prediction techniques to improve and stabilize the medium and long-range predictions. Here, both input fields at the current time and current time minus 6 hours are fed into the CNN algorithm discussed in the previous step, yielding a 12-hour prediction. Meaning that two output predictions are produced: a current time plus 6 hours and a current time plus 12 hours. Then, they are fed back into the same algorithm to predict the next two steps, and so on. The model improves by calculating the error between the known data and its predictions at each step by computing the mean square error, as you can see in red. This mean square error is basically a measure of the distance between the two values, telling us how far the prediction is from the expected output. The total error is just the sum of all these errors and it is then minimized in the training phase to produce the best possible output for a medium/long-range prediction.
And here we have our final weather forecasts. If you have any questions, please leave them in the comments and I will be sure to answer them. Their work is publicly available and both the links to their code on GitHub and their paper are in the description. I definitely invite you to read it for a better and more in-depth understanding of their technique. The data used in this paper for weather forecasting is also publicly available. This could be a very interesting project to jump in if you are looking for one as you can already start from their results and try to improve them!
Watch the video
If you like my work and want to support me, I’d greatly appreciate it if you follow me on my social media channels:
- The best way to support me is by following me on Medium.
- Subscribe to my YouTube channel.
- Follow my projects on LinkedIn.
- Learn AI together, join our Discord community, share your projects, papers, best courses, find Kaggle teammates, and much more!
The paper covered: J. A. Weyn, D. R. Durran, and R. Caruana, “Improving data-driven global weather predictionusing deep convolutional neural networks on a cubed sphere”, Journal of Advances in Model-ing Earth Systems, vol. 12, no. 9, Sep. 2020,issn: 1942–2466.doi:10.1029/2020ms002109.[Online]. Available:http://dx.doi.org/10.1029/2020MS002109.
CNN explanation video: https://youtu.be/YUyec4eCEiY
AI is Predicting Faster and More Accurate Weather Forecasts was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI