Grok 3βs DeepSearch with Googleβs new AI Mode (Search)
Author(s): Nehdiii Originally published on Towards AI. Generative AI is reshaping the way we search, and itβs no longer limited to tools like Perplexity or ChatGPT. Many advanced AI users I speak with regularly rely on xAIβs Grok 3 for everyday search …
What is Vibe Coding?
Author(s): Nehdiii Originally published on Towards AI. Image Source Iβve observed two intriguing trends that I believe will develop in parallel as the future of work unfolds. One reflects AI reasoning models leveraging agentic workflows to rethink traditional scientific methods like Googleβs …
Is AGI merely a Silicon Valley illusion?
Author(s): Nehdiii Originally published on Towards AI. From OpenAI to DeepSeek, everyone now claims to be an AGI startup, but by 2025, the explosion of such companies is becoming overwhelming. On 14 April 2023, High-Flyer announced the start of an artificial general …
DeepSeek Explained Part 5: DeepSeek-V3-Base
Author(s): Nehdiii Originally published on Towards AI. Vegapunk β05 One Piece Character Generated with ChatGPT This article is the fifth installment of our DeepSeek series and the first to specifically highlight the training methodology of DeepSeek-V3 [1, 2]. As illustrated in the …
Llama 4: Is Meta Sounding the Alarm?
Author(s): Nehdiii Originally published on Towards AI. Image Generated by ChatGPT Llama 2 and Llama 3 marked major milestones in AI during their release years, but Llama 4 feels like a misstep. Despite bold shifts in scale, design, and tone, Meta hasnβt …
OpenAIβs o3: Over-Optimization Returns Stranger Than Ever
Author(s): Nehdiii Originally published on Towards AI. Over-optimization is a well-known issue in reinforcement learning (RL), including RL from human feedback (RLHF), which powers models like ChatGPT, and now in emerging reasoning models. Each context presents its own flavor of the problem …
DeepSeek-V3 Explained Part 4: Multi-Token Prediction
Author(s): Nehdiii Originally published on Towards AI. Vegapunk β04 One Piece Character Generated with ChatGPT This is the fourth article in our DeepSeek-V3 series, where we explain the final major architectural innovation in DeepSeek [1, 2] models: multi-token prediction. In previous articles, …
DeepSeek R1: Pioneering Research and Engineering as a Competitor to Pure Scaling Approaches
Author(s): Nehdiii Originally published on Towards AI. Dr Vegaounk from One Piece anime image generated with ChatGPT DeepSeek-R1 landed unexpectedly just as many researchers, myself included, were attempting to reverse-engineer OpenAIβs o1 model. It revealed the inner workings of o1 and dispelled …
Have o1 Models Solved Human Reasoning?
Author(s): Nehdiii Originally published on Towards AI. Image Generated By ChatGPT OpenAI made waves in the AI community with the release of their o1 models. As the excitement settles, I feel itβs the perfect time to share my thoughts on LLMsβ reasoning …
DeepSeek-V3 Part 3: Auxiliary-Loss-Free Load Balancing
Author(s): Nehdiii Originally published on Towards AI. This is the third article in our DeepSeek-V3 series, where we explore another key architectural breakthrough in DeepSeek [1, 2, 3] models related to Mixture-of-Experts (MoE): Auxiliary-Loss-Free Load Balancing [5]. Vegapunk β03 One Piece Character …
DeepSeek-V3 Part 2: DeepSeekMoE
Author(s): Nehdiii Originally published on Towards AI. This article marks the second entry in our DeepSeek-V3 series, focusing on a pivotal architectural breakthrough in the DeepSeek models [1, 2, 3]: DeepSeekMoE [4]. Vegapunk β02 One Piece Character Generated with ChatGPT In this …
DeepSeek-V3 Explained, Part 1: Understanding Multi-Head Latent Attention
Author(s): Nehdiii Originally published on Towards AI. Vegapunk No.01 One Piece Character Generated with ChatGPT This is the first article of our new series βDeepSeek-V3 Explainedβ, where we will try to demystify DeepSeek-V3 [1, 2], the latest model open-sourced by DeepSeek. In …
Extracting Actionable Rules from Raw Data
Author(s): Nehdiii Originally published on Towards AI. Image by DALL-E 3 When working with products, we often encounter situations where introducing certain βrulesβ becomes necessary. Let me clarify what I mean by βrulesβ through some practical examples: Imagine weβre facing a surge …
🧠 From CLIP to the Future: A Deep Dive into Vision-Language Models for Vision Tasks
Author(s): Nehdiii Originally published on Towards AI. From recognizing faces in photos to detecting objects in real-time videos, computer vision has revolutionized the way machines βseeβ the world. Tasks like image classification, object detection, segmentation, and even person re-identification (ReID) have seen …