TAI #118: Open source LLMs progress with Qwen 2.5 and Pixtral 12B

Last Updated on December 21, 2024 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

What happened this week in AI by Louie

This week, several new strong open-source LLM models were released. Following OpenAI’s huge LLM progress with its o1 “reasoning” model family last week, it was positive to see progress again in open source, albeit still behind the leading closed-source LLMs. Qwen 2.5 takes the lead in the open-source world for general language tasks. Pixtral 12B is a very powerful new small open-source multimodal model, while GRIN-MoE is now a competitor in the smallest inference compute LLM category.

Qwen2.5 is the latest release from the Qwen family of foundation models from Alibaba in China. The models generally take the lead for language benchmarks for open source models in its size category (up to 72B parameters) and even beat the much larger LLama 3.1 405bn in some cases. The new models, including Qwen2.5, Qwen2.5-Coder, and Qwen2.5-Math, feature significant improvements in areas like instruction-following, coding, and mathematics, outperforming many comparable models on key benchmarks. Trained on an 18 trillion token dataset, they support multilingual capabilities in over 29 languages and handle long-text generation of up to 8K tokens. The models are open-source, with most available under the Apache 2.0 license. There is also a stronger model, Qwen-Plus, available via API.

Pixtral 12B is Mistral AI’s first multimodal model, featuring a 12-billion parameter decoder and a 400-million parameter vision encoder designed to process both images and text natively. The model now competes with the larger LLaVa OneVision 72B model for the strongest open-source multi-modal while generally beating open models in its price category. The model excels in multimodal tasks like chart understanding, document question answering, and reasoning while also maintaining strong performance on text-only benchmarks, such as coding and math. It can handle variable image sizes and process multiple images within a long context window of 128K tokens. It is open-sourced under the Apache 2.0 license and available through various platforms.

GRIN MoE is a 16×3.8B parameter mixture-of-experts (MoE) LLM from Microsoft with 6.6B active parameters. It employs a SparseMixer-v2 architecture to estimate gradients related to expert routing and avoids the conventional need for expert parallelism or token dropping, allowing for efficient scaling in memory/compute-constrained environments. The model performs very well across various benchmarks, particularly given its low active parameter count, but we haven’t yet seen feedback from real-world usage. The model is available under an MIT license.

Why should you care?

We think OpenAI’s new o1 models and these new strong options for open-source LLMs only increase the need for carefully choosing the best LLM for your task. Key factors include cost, latency, capabilities on different categories of tasks, and flexibility for further adaptation via fine-tuning or techniques such as model distillation. These all vary significantly for different models. It is very easy to use a far too expensive model when a cheaper one is sufficient or even better at your specific problem category. Generally, we expect this to lead to the heavy use of model routers in most advanced LLM pipelines. For example, you may direct some queries to o1 models that need advanced general reasoning and planning, some to Gemini 1.5 pro models when using very long input context, and then Claude Sonnet 3.5 for general advanced coding and multimodal tasks. You likely will also use open source models in your stack — either for cost, adaptability, or privacy and security reasons — here, you might use fine-tuned Qwen 2.5 models for specialized language tasks or Pixtral 12B for specialized multimodal tasks.

We think there is room for multiple foundation model families to provide value across open and closed-source business models. However, pre-training and post-training for these foundation models are getting extremely expensive, and we expect limited companies to compete here. Most of the LLM ecosystem will likely focus on additional post-training steps and building advanced LLM pipelines on top of foundation LLMs.

— Louie Peters — Towards AI Co-founder and CEO

In collaboration with BrightData:

The Future of AI is Powered by Web Data 🌍

As AI continues to evolve, the need for dynamic, real-time web data has never been more critical. Traditional static datasets can’t keep pace with the nuanced, ever-changing data requirements of today’s advanced AI models, particularly LLMs.

Access to real-time, unstructured web data is key to helping these models stay relevant, improve contextual understanding, and deliver more accurate insights.

Bright Data enables:

Seamless data access — providing businesses with organized, real-time insights from a vast array of sources.
Flexibility — a scalable, adaptive platform that evolves with your data needs.
Transparency — adhering to strict ethical and compliance standards for responsible data collection.

Learn how real-time web data is shaping the future of AI and LLMs

Hottest News

1. Microsoft Wants Three Mile Island To Fuel Its AI Power Needs

Microsoft just signed a 20-year deal to exclusively access 835 megawatts of energy from the shuttered Three Mile Island nuclear power plant. If approved by regulators, the software maker would have exclusive rights to 100 percent of the output for its AI data center needs.

2. Anthropic Introduced Contextual Retrieval

Anthropic introduced a method called “Contextual Retrieval” and uses two sub-techniques: Contextual Embeddings and Contextual BM25. This method can reduce the number of failed retrievals by 49% and, when combined with reranking, by 67%. These represent significant improvements in retrieval accuracy, directly translating to better performance in downstream tasks. Users can now deploy Contextual Retrieval solution with Claude.

3. Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

This paper introduced Michelangelo, a new approach for evaluating how well language models can understand and reason over long context windows. Michelangelo aims to move beyond simple “needle in the haystack” evaluations and design more challenging evaluation tasks that require the model to extract and leverage the latent semantic relationships within the text.

4. We Are in the Intelligence Age by Sam Altman

In a victory lap blog, Sam declares that deep learning works and gets predictably better with scale. As AI progresses, we will soon be able to work with capable AI, which will help us accomplish much more than we could ever have without it. However, the dawn of the Intelligence Age is a momentous development with very complex and extremely high-stakes challenges.

Five 5-minute reads/videos to keep you learning

1. The Open Source Project Maintainer’s Guide

This post shares a list of mistakes to avoid if you are looking for contributors for your project. It also highlights how making it easier for people to contribute makes them more likely to do so.

2. AI vs. Human Engineers: Benchmarking Coding Skills Head-to-Head

CodeSignal’s latest report compares top AI models with human engineers using real-world coding assessments. These assessments evaluate general coding abilities and edge-case thinking, providing practical insights that help inform the design of AI-co-piloted assessments.

3. How Streaming LLM APIs Work

This guide explains how the HTTP streaming APIs from the various hosted LLM providers work. This article investigates three APIs: OpenAI, Anthropic Claude, and Google Gemini.

4. How I Deal With Hallucinations at an AI Startup

This article shares the key principles to focus on when designing solutions that may be prone to hallucinations. It also highlights the difference between weak and strong grounding.

5. Fine-Tuning LLMs to 1.58bit: Extreme Quantization Made Easy

BitNet is a special transformer architecture that offers extreme quantization of just 1.58 bits per parameter. However, it requires to train a model from scratch. While the results are impressive, not everybody has the budget to pre-train an LLM. This article explores a few techniques for fine-tuning an existing model to 1.58 bits.

Repositories & Tools

Javascript Algorithms contain JavaScript-based examples of many popular algorithms and data structures.
optillm is an OpenAI API-compatible optimizing inference proxy that implements several techniques to improve LLMs’ accuracy and performance.
Solidroad is an AI-first training and assessment platform.
Agent Zero is a personal and organic AI framework for tasks.

Top Papers of The Week

1. Training Language Models to Self-Correct via Reinforcement Learning

This paper developed SCoRe, a multi-turn online reinforcement learning approach that significantly improves an LLM’s self-correction ability using entirely self-generated data. When applied to Gemini 1.0 Pro and 1.5 Flash models, SCoRe improved the base models’ self-correction by 15.6% and 9.1%, respectively, on the MATH and HumanEval benchmarks.

2. OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

This paper introduces the One-pass Generation and retrieval framework (OneGen). It is designed to improve LLMs’ performance on tasks requiring generation and retrieval. The framework incorporates retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass.

3. Eureka: Evaluating and understanding progress in AI

This paper presents Eureka, an open-source framework for standardizing evaluations of large foundation models beyond single-score reporting and rankings. It also introduces Eureka-Bench as an extensible collection of benchmark testing capabilities. It analyzes 12 state-of-the-art models, providing in-depth insights into failure understanding and model comparison that can be leveraged to plan targeted improvements.

4. Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

This paper introduces SAMMO, a framework to perform symbolic prompt program searches for compile-time optimizations of prompt programs. SAMMO generalizes previous methods and improves the performance of complex prompts on instruction tuning, RAG pipeline tuning, and prompt compression across several different LLMs.

5. Neptune: The Long Orbit to Benchmarking Long Video Understanding

This paper introduces Neptune, an evaluation benchmark that includes tough multiple-choice and open-ended questions for videos of variable lengths up to 15 minutes long. Neptune’s questions are designed to require reasoning over multiple modalities (visual and spoken content) and long time horizons, challenging the abilities of current large multimodal models.

6. Wings: Learning Multimodal LLMs without Text-only Forgetting

This paper presents Wings, an MLLM that excels in text-only dialogues and multimodal comprehension. The experimental results demonstrate that Wings outperforms equally-scaled MLLMs in text-only and visual question-answering tasks.

Quick Links

1. Google Quantum AI demonstrates a quantum memory system that greatly reduces error rates. The quantum computer uses multiple physical bits to create one logical qubit. The researchers have developed an algorithm that they call “surface code” to correct errors.

2. Former Apple design chief Jony Ive has confirmed that he’s working with OpenAI CEO Sam Altman on an AI hardware project. There aren’t many details on the project. Ive reportedly met Altman through Brian Chesky, the CEO of Airbnb, and the venture is being funded by Ive and Laurene Powell Jobs’ company.

Who’s Hiring in AI

PhD Research Intern, Generalist Embodied Agents Research — Fall 2024 @NVIDIA (Santa Clara, CA, USA)

Ai Market Lead — Defense & Intel @Accenture (Arlington, TX, USA)

Lead Research Engineer — Prompt Engineering @GE Vernova (Bangalore, India)

Software Engineer III, Machine Learning, Google Research @Google (Zurich, Switzerland)

Senior AI Research Scientist — LLM Agent @Bosch Group (Sunnyvale, CA, USA)

Senior Technical Support Engineer @Salesforce (Japan/Remote)

Data Analytics Manager @Sei Foundation (Remote)

Interested in sharing a job opportunity here? Contact [email protected].

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

TAI #118: Open source LLMs progress with Qwen 2.5 and Pixtral 12B

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Mistral AI Launches New Mistral OCR API

NN#11 — Neural Networks Decoded: Concepts Over Code

OpenAI Planning to Launch Specialized AI Agents

AI Solutions Are Creating Artificial Needs

OpenAI Invests $50M in NextGenAI Research Consortium

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

TAI #118: Open source LLMs progress with Qwen 2.5 and Pixtral 12B

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement