Ollama vs vLLM vs Unsloth: A Detailed Comparison from an AI Engineer’s Perspective
Author(s): Neel Shah Originally published on Towards AI. As an AI engineer, choosing the right tool for deploying or fine-tuning large language models (LLMs) is crucial for balancing performance, ease of use, and hardware constraints. Among the many options, Ollama, vLLM, and …
Mastering Authentication in MCP: An AI Engineer’s Comprehensive Guide
Author(s): Neel Shah Originally published on Towards AI. As an AI engineer working with the Message Control Protocol (MCP), I’ve implemented and evaluated three authentication methods to secure client-server communication: API Key-based, JWT-based with custom implementation, and JWT-based with FastMCP’s built-in authentication. …
Concurrent vs. Parallel Execution in LLM API Calls: From an AI Engineer’s Perspective
Author(s): Neel Shah Originally published on Towards AI. As an AI engineer, designing systems that interact with Large Language Models (LLMs) like Google’s Gemini is a daily challenge. LLM API calls are inherently I/O-bound — waiting for responses from remote servers — …
From Simple RAG to Agentic RAG: Unlocking Smarter AI Workflows as an AI Engineer
Author(s): Neel Shah Originally published on Towards AI. As an AI engineer who’s spent countless hours tweaking retrieval systems and wrestling with hallucinations in large language models (LLMs), I’ve seen firsthand how Retrieval-Augmented Generation (RAG) has evolved from a straightforward tool into …
Kafka’s Role in MLOps: Scalable and Reliable Data Streams
Author(s): Neel Shah Originally published on Towards AI. Kafka: The Unified Event Streaming Platform 1. Kafka’s Core Value Proposition: A Unified Event Streaming Platform Apache Kafka is frequently compared to message brokers like RabbitMQ or ActiveMQ, but this comparison is incomplete. Kafka …