Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Qwen 2.5 Coder 32B: Is This Best Open Weight Model Better than GPT-4o and Claude 3.5 Sonnet
Latest   Machine Learning

Qwen 2.5 Coder 32B: Is This Best Open Weight Model Better than GPT-4o and Claude 3.5 Sonnet

Last Updated on November 18, 2024 by Editorial Team

Author(s): Barhoumi Mosbeh

Originally published on Towards AI.

source

On November 11, Alibaba announced its most advanced coding model to date: Qwen 2.5-Coder-32B-Instruct. But that’s not all, it’s actually part of a whole family of coding models! In addition to the 32B model, there are versions with 0.5B, 1.5B, 3B, 7B, and 14B parameters. Before writing this article, I reviewed feedback from many of YouTubers, writers, and technical experts, and the consensus has been overwhelmingly positive. Today, we’ll explore whether it really lives up to the hype.

I’ve been using ChatGPT, Gemini, and Claude for a while, and I can confidently say that Claude is the best at coding and understanding complex tasks. To be honest, Gemini is in a different league altogether. Fun fact: I once won a machine learning competition two years ago using Bard (now rebranded as Gemini) because everyone else was using ChatGPT. Bard’s performance gave me a competitive edge back then. Today, I believe Gemini might be the best vision model available, but overall, I’m not a big fan.

To evaluate the model’s performance in coding, we should cover the following three key aspects: Code Generation, Code Repair, and Code Reasoning. Additionally, we need to consider how many programming languages the model supports and its alignment with human preferences.

Overview

Before testing out Alibaba’s model, let’s take a look at the performance results they have announced.

source

Based on their internal benchmarks, it even outperforms GPT-4 and comes very close to Claude 3.5. However, I personally don’t place too much weight on internal benchmarks alone, but the results seem very interesting according to those who have tried it.

You can find more details about the performance here:

Qwen2.5-Coder-32B-Instruct – Qwen2

Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct!

qwen2.org

I’m testing the model locally using Ollama:

qwen2.5-coder:32b

The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and…

ollama.com

You can also try out the model on the demo page of Hugging Face:

Qwen2.5 Coder Demo – a Hugging Face Space by Qwen

Discover amazing ML apps made by the community

huggingface.co

Code Generation

Many people have already covered how well it performs in web and design:

So, we will focus on testing it with some Python challenges. Let’s start with a medium-level LeetCode problem:

Prompt

The problem description: Longest Substring Without Repeating Characters

Answer

class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""

# Dictionary to store the last seen index of each character
char_index_map = {}
# Initialize the start of the current window and the maximum length
start = 0
max_length = 0

for end in range(len(s)):
# If the character is already in the map and its index is within the current window
if s[end] in char_index_map and char_index_map[s[end]] >= start:
# Move the start to the right of the last occurrence of the current character
start = char_index_map[s[end]] + 1

# Update the last seen index of the current character
char_index_map[s[end]] = end

# Update the maximum length of the substring found so far
max_length = max(max_length, end - start + 1)

return max_length

Results

Image from the author

Not the best, but it gets things done and passes all the tests. Can it solve a hard problem? (Note that the model could have been trained on these problems already, but it’s still impressive.)

Prompt

Problem description: Regular Expression Matching

Answer

class Solution(object):
def isMatch(self, s, p):
"""
:type s: str
:type p: str
:rtype: bool
"""

m, n = len(s), len(p)

# Create a 2D array to store the results of subproblems
dp = [[False] * (n + 1) for _ in range(m + 1)]

# Base case: empty string matches empty pattern
dp[0][0] = True

# Handle patterns like a*, a*b*, etc., which can match an empty string
for j in range(1, n + 1):
if p[j - 1] == '*':
dp[0][j] = dp[0][j - 2]

# Fill the dp table
for i in range(1, m + 1):
for j in range(1, n + 1):
if p[j - 1] == '.' or p[j - 1] == s[i - 1]:
dp[i][j] = dp[i - 1][j - 1]
elif p[j - 1] == '*':
dp[i][j] = dp[i][j - 2] # '*' matches zero occurrences
if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
dp[i][j] = dp[i][j] or dp[i - 1][j] # '*' matches one or more occurrences

# The result is in dp[m][n]
return dp[m][n]

Results

Image from the author

After testing it on 10–20 problems on LeetCode, I have the impression that it was trained on all problems, so it’s not the best way to evaluate it. However, it’s still worth sharing. Additionally, it always chooses to provide a Python solution, even when not specifically asked for one. But let’s try something different. I asked the model to optimize the code to minimize resource consumption as much as possible.

Image from the author

I didn’t expect that, but when I asked the model to improve the performance, it even fails in the simplest cases.

Let’s now move to the next test, Code Repair. I will be providing the model with some code containing errors generated by Claude. I asked the model to generate some hard cases to fix, and let’s see how it performs.

Code Repair

Prompt

The following code contains several bugs. Please identify and fix them, 
and provide an explanation for each bug you find.
class AsyncTaskQueue:
def __init__(self):
self.tasks = []
self.running = False
self.lock = threading.Lock()
self.workers = []

async def add_task(self, task):
async with self.lock:
self.tasks.append(task)
if not self.running:
self.start_processing()

async def start_processing(self):
self.running = True
while self.tasks:
task = self.tasks.pop(0)
worker = asyncio.create_task(self.process_task(task))
self.workers.append(worker)
if len(self.workers) >= 3:

done, pending = await asyncio.wait(self.workers)
self.workers = list(pending)
self.running = False

async def process_task(self, task):
result = await task()
return result

Answer

Certainly! Let's go through the provided code and identify the issues, then correct them. Here are the main problems and their fixes:

Issues Identified
Threading Lock Usage with Asyncio:
The threading.Lock is not suitable for use with asyncio. Instead, you should use asyncio.Lock.
Async Context Manager for Lock:
The async with statement is used incorrectly with threading.Lock. It should be used with asyncio.Lock.
Starting Processing:
The start_processing method should be called in a non-blocking manner. Since start_processing is an async method, it should be awaited properly.
Task Management:
The start_processing method should handle task processing more efficiently. The current implementation might not handle task completion correctly.
Worker Management:
The workers list should be managed more carefully to ensure that tasks are completed before marking the queue as not running.
Corrected Code
Here's the corrected version of the code:


import asyncio
import threading

class AsyncTaskQueue:
def __init__(self):
self.tasks = []
self.running = False
self.lock = asyncio.Lock()
self.workers = []

async def add_task(self, task):
async with self.lock:
self.tasks.append(task)
if not self.running:
self.running = True
asyncio.create_task(self.start_processing())

async def start_processing(self):
while True:
async with self.lock:
if not self.tasks:
self.running = False
break
task = self.tasks.pop(0)

worker = asyncio.create_task(self.process_task(task))
self.workers.append(worker)

if len(self.workers) >= 3:
done, pending = await asyncio.wait(self.workers, return_when=asyncio.FIRST_COMPLETED)
self.workers = list(pending)

async def process_task(self, task):
result = await task()
return result

# Example usage
async def example_task():
await asyncio.sleep(1)
print("Task completed")
return "Result"

async def main():
queue = AsyncTaskQueue()
await queue.add_task(example_task)
await queue.add_task(example_task)
await queue.add_task(example_task)
await queue.add_task(example_task)
await queue.add_task(example_task)
await queue.add_task(example_task)

asyncio.run(main())
Explanation of Fixes
Asyncio Lock:
Replaced threading.Lock with asyncio.Lock to ensure compatibility with asynchronous operations.
Async Context Manager for Lock:
Used async with self.lock: to properly manage the lock in an asynchronous context.
Starting Processing:
Changed the logic to start processing in a non-blocking manner by using asyncio.create_task(self.start_processing()) inside add_task.
Task Management:
Modified start_processing to continuously check for tasks and process them until no tasks are left. It breaks out of the loop when there are no more tasks and sets self.running to False.
Worker Management:
Ensured that workers are managed correctly by using asyncio.wait with return_when=asyncio.FIRST_COMPLETED to handle task completion efficiently.
Example Usage
The example usage demonstrates how to add tasks to the AsyncTaskQueue and run them asynchronously. The main function adds several tasks to the queue, and they are processed concurrently with a maximum of 3 workers at a time.

This corrected and optimized code should perform better and be more reliable.

Results and feedback

Claude evaluates the model’s response as 7.5/10. with this feedback:

While the model provided a functionally correct solution, it delivered a β€œtutorial-grade” fix rather than a β€œproduction-grade” implementation. This indicates that for complex concurrent programming tasks, human expertise is still crucial for robust system design.

After experimenting with the model, it gets the job done, but often requires follow-up prompts to improve the response. While it doesn’t struggle with understanding the task at hand and can correct most critical bugs, it still lacks some of the strengths seen in models like ChatGPT and Claude, particularly in areas such as error handling, resource management, and optimization. I believe these may be the model’s key weaknesses.

Specialized LLMs: A New Era in AI

It’s fascinating to witness the emergence of relatively small, domain-specific language models that can run locally. I believe we’re entering an exciting new era of specialized LLMs, and this is just the beginning. While the concept isn’t new, we’re finally seeing models that truly excel in specific domains.

The combination of RAG (Retrieval-Augmented Generation) and specialist LLMs could define the AI landscape in the coming months/years. We’re likely to see more powerful coding-focused models from industry leaders like OpenAI and Anthropic. Programming is perhaps one of the most natural domains for specialized AI, and we might soon see even more focused models, imagine LLMs specifically optimized for DevOps or front-end development!

Don’t feel overwhelmed by these rapid advances. Yes, it can be somewhat daunting to see LLMs mastering skills that traditionally took years to develop. Coding, which has challenged humanity for decades, is being transformed before our eyes. But rather than seeing this as an endpoint, we should view it as an opportunity for growth and innovation.

Whether the current wave of LLM advances slows down in the coming years or this is merely the beginning of a longer journey, our response should remain the same: stay curious, keep learning, and never stop innovating. The future of technology is being written right now, and we all have a part to play in shaping it.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓