Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Artificial Intelligence   Latest   Machine Learning

Towards AI newsletter #102: GenAI advances beginning to benefit weather forecasting?

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

Microsoft’s Aurora, Codestral, MoRA, XAi raise & more.

What happened this week in AI by Louie

While there was plenty of newsflow in the LLM world again this week, we are also interested in how the LLM-fueled boom in AI research and AI compute capacity can accelerate other AI models. For several months, we have been seeing a lot of progress in machine learning-based weather models, and this week, Microsoft released Aurora — a 1.3 billion parameter foundation model for the atmosphere. This model can be fine-tuned using techniques familiar to the LLM world, such as Low-Rank Adaptation (LoRA) for specific forecasting scenarios. The model was trained on over a million hours of weather and climate simulations, and Microsoft estimates a 5,000x computational speed-up vs. the state-of-the-art numerical forecasting system “Integrated Forecasting System.”

We have been watching many other exciting developments in AI weather models in the past six months — including this week from Windborne, who is also building a huge constellation of small-scale weather balloons to gather more data than is currently available. In March, Nvidia announced its Earth Climate Digital Twin cloud platform. Its Earth-2’s APIs include a new generative AI model called CorrDiff — a diffusion model that generates 12.5x higher resolution images than current numerical models 1,000x faster. Huang discussed aiming for prediction at 3-kilometer weather resolution. Google Deepmind is also working on AI weather models, including graphcast based on graph neural networks last year.

Why should you care?

The weather forecast may be taken for granted, but better weather prediction can still hugely impact people’s lives and the economy. More accurate hurricane prediction can make all the difference in saving lives, better longer-term weather forecasting can boost crop production, and better local forecasting can help you plan your day out! We are pleased that AI computes rollout and progress on models and techniques from the Generative AI boom (such as diffusion models and LoRA finetuning) are beginning to make a difference to weather forecasting and, more broadly, leading to progress and new foundation models across different industries.

— Louie Peters — Towards AI Co-founder and CEO

This issue is brought to you thanks to Ai4:

Ai4, the world’s largest gathering of artificial intelligence leaders in business, is coming to Las Vegas — August 12–14, 2024.

Join 4500+ attendees, 350+ speakers, and 150+ AI exhibitors from 75+ countries at the epicenter of AI innovation.

Don’t wait — prices increase on May 31st. Apply today for a complimentary pass.

Register now for 41% off final prices!

Hottest News

1. Gemini 1.5 Pro and Advanced Ranks Second on the LMSYS Leaderboard

The latest LMSYS leaderboard shows that Gemini 1.5 Pro/Advanced ranks second, right behind GPT-4o, while Gemini 1.5 Flash holds the ninth position, surpassing Llama-3–70b and closely competing with GPT-4–0120.

2. Mistral Introduces Codestral, an Open-Weight Code Model

Codestral is Mistral AI’s new generative AI model, focused on coding. It has a large 32k context window and boasts proficiency in over 80 programming languages. It helps developers write and interact with code through a shared instruction and completion API endpoint.

3. xAI Raised a $6 Billion Series B Funding Round

xAI has raised $6 billion in a Series B round to expand AI tech deployment, including their Grok-1 series, and to innovate new products, building on a year of significant AI advancements and the Grok-1 open-source release.

4. China Invests $47 Billion in Largest Ever Chip Fund

China allocated $47.48 billion to a new chip fund to advance domestic semiconductor production, a critical step toward self-sufficiency and competitiveness in technology sectors, including AI.

5. ElevenLabs Moves Beyond Speech With AI-Generated Sound Effects

ElevenLabs introduced their newest AI Audio model, which can generate sound effects. It can generate short instrumental tracks, soundscapes, and character voices, all from a text prompt. It is now available to all users.

Five 5-minute reads/videos to keep you learning

1. Training and Finetuning Embedding Models with Sentence Transformers v3

The article discusses the release of Sentence Transformers v3.0, highlighting enhanced capabilities for training and finetuning embedding models to boost task-specific performance, and showcases the updated components, including datasets, loss functions, evaluators, and an improved trainer.

2. Accelerating Transformers with NVIDIA cuDNN 9

The NVIDIA cuDNN is a GPU-accelerated library for accelerating deep learning primitives with state-of-the-art performance. This article details the achievable performance of cuDNN SDPA, walks through how to use it, and briefly summarizes some other notable new features of cuDNN 9.

3. Exploring Linear Regression for Spatial Analysis

Linear regression provides insightful information about spatial relationships, patterns, and trends and is a flexible and essential tool in Geographic Information Systems (GIS). This article explains linear regression in the context of spatial analysis and shows a practical example of its use in GIS.

4. LLMs Aren’t Just “Trained On the Internet” Anymore

This essay examines LLMs’ capabilities, particularly the extent to which future improvement is expected. It primarily discusses the shift in trends regarding training data and how using custom data in LLM training can lead to better output.

5. LlamaIndex Launches a Framework for Building Knowledge Graphs with LLMs

LlamaIndex recently launched Property Graphs. They can categorize nodes and relationships into types with associated metadata, treat your graph as a superset of a vector database for hybrid search, and express complex queries using the Cypher graph query language. This article introduces its capabilities in more detail.

Repositories & Tools

  1. Tarsier provides webpage perception for a minimalistic GPT-4 LangChain web agent.
  2. FlashRAG is a Python toolkit for the reproduction and development of RAG research.
  3. LaVague is an open-source large action model framework for developing AI web agents.
  4. Llmware is a unified framework for building enterprise RAG pipelines with small, specialized models.
  5. The llama3-from-scratch repository is an implementation of llama3, one matrix multiplication at a time.

Top Papers of The Week

1. Transformers Can Do Arithmetic with the Right Embeddings

The paper highlights that the addition of positional encodings to transformer models significantly enhances their ability to perform arithmetic operations, achieving up to 99% accuracy on adding 100-digit numbers and boosting performance on other reasoning tasks.

2. MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

This research observes that LoRA may limit the ability of LLMs to learn and memorize new knowledge effectively. It introduces MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters to overcome this limitation. It outperforms LoRA on memory-intensive tasks.

3. Yuan 2.0-M32: Mixture of Experts with Attention Router

This paper introduces Yuan 2.0-M32. It uses a mixture-of-experts architecture with 32 experts, of which two are active. It uses a new router network, Attention Router, for a more efficient selection of experts. Yuan 2.0-M32 surpasses Llama3–70B on MATH and ARC-Challenge benchmarks.

4. An Introduction to Vision-Language Modeling

This paper provides an overview of Vision-Language Models (VLMs), discussing their fundamentals, functioning, training techniques, and assessment strategies. It also addresses challenges related to the complex nature of visual data and the incorporation of video content for individuals new to this area of artificial intelligence research.

5. Matryoshka Multimodal Models

The paper presents Matryoshka Multimodal Models (M3), which improve the efficiency of Large Multimodal Models (LMMs) such as LLaVA by offering adjustable visual token granularity to match the complexity of images during inference.

Quick Links

  1. OpenAI introduces ChatGPT Edu, a version of ChatGPT built for universities to deploy AI to students, faculty, researchers, and campus operations.
  2. Enveda introduces PRISM, a foundation model for life’s chemistry. The model was trained on 1.2 billion small molecule mass spectra, the largest training set of small molecule mass spectra ever assembled.
  3. Perplexity introduces Pages, a powerful AI-driven content creation platform. Pages let you create, organize, and share information. You can search for any topic and instantly receive an article.

Who’s Hiring in AI

Data Engineer — Azure @Rackspace Technology (Vietnam/Remote)

Mechanical Design Engineer, Data Center Design Engineering @Amazon (Freelance/Seattle, WA, USA)

Data Engineer @AdTheorent (Remote)

Data Scientist @Homa (Paris, France)

Solutions Architect, Generative AI Specialist @NVIDIA (USA/Remote)

Principal growth data analyst — NYC @Aircall (New York, NY, USA)

Interested in sharing a job opportunity here? Contact [email protected].

If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓