Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Running LLM Models locally with Docker
Artificial Intelligence   Latest   Machine Learning

Running LLM Models locally with Docker

Last Updated on March 3, 2026 by Editorial Team

Author(s): Kushal Banda

Originally published on Towards AI.

If you’ve been building AI applications recently, you’re likely familiar with the friction of managing API keys, tracking usage costs, and relying on cloud endpoints like OpenAI or Anthropic. But what if you could spin up Large Language Models (LLMs) locally with the exact same workflow you use for your databases and web apps?

Enter Docker Model Runner (DMR).

Just as Docker revolutionized how we pull, manage, and run application containers, it is now bringing that exact same simplicity to AI models. In this guide, we will explore how you can use Docker to pull LLMs, chat with them in your terminal, and seamlessly integrate them into your existing Node.js applications as a drop-in replacement for OpenAI APIs.

Enable Docker Model Runner and host-side TCP support (OpenAI Compatable APIs)

Running LLM Models locally with Docker
Docker

Why Run Models Locally?

Before we dive into the “how,” let’s quickly address the “why.” Running models locally on your own hardware offers several massive benefits:

  1. Zero Cost: No need to constantly load API credits just to test your application logic.
  2. Privacy: Your prompts and data never leave your local machine.
  3. Offline Development: Code from planes, trains, or anywhere without an internet connection.

With Docker’s new capabilities, getting a model up and running is now as easy as pulling a PostgreSQL image.

Step 1: Enabling Docker Model Runner

To get started, you need to have Docker Desktop installed on your machine. Once installed, we need to flip a couple of switches to enable the AI features.

  1. Open Docker Desktop and navigate to Settings.
  2. Look for the AI tab on the sidebar.
  3. Check the box for Enable Docker Model Runner.
  4. Check the box for Enable host-side TCP support.

Note: Enabling the TCP support is crucial. This exposes a local REST API on port 12434 which we will use later to connect our code to the model.

Step 2: Pulling and Running Models via CLI

Docker has introduced a new command namespace: docker model. If you know how to use standard Docker commands, you already know how to manage LLMs.

You can browse available models on Docker Hub’s Model Repository. For this tutorial, let’s use a lightweight model like Gemma 3 (around 571MB), which runs smoothly even without a massive GPU.

Open your terminal and pull the model:

docker model pull ai/gemma3-qat
ai/gemma3-qat

Once downloaded, you can view your available local models using:

docker model list

Write on Medium

To start interacting with the model immediately, simply run:

docker model run gemma3-qat

Boom! You are now in an interactive chat environment directly in your terminal. You can say “Hi” or ask it math questions, and it will respond entirely locally.

Terminal

Step 3: The Magic of OpenAI Compatible APIs

Chatting in the terminal is cool, but as developers, we want to integrate these models into our code. This is where Docker Model Runner truly shines.

Instead of forcing you to learn a completely new SDK to talk to your local models, Docker exposes an OpenAI-compatible REST API. This means the Docker runner expects the exact same JSON request body as OpenAI and returns the exact same response structure.

If you have an existing application built with the openai npm package, you can point it to your local Docker model without rewriting your application logic!

Node.js Integration Example

Let’s set up a quick Node.js playground to see this in action.

Initialize a new project and install the official OpenAI SDK:

mkdir docker-AI
cd docker-AI
pnpm init
pnpm install openai

(Make sure to add "type": "module" to your package.json to use ES modules).

Now, create an index.js file. Notice how we use the standard OpenAI SDK, but we override the baseURL to point to Docker's local TCP port (http://localhost:12434/v1):

import OpenAI from 'openai';

// 1. Initialize the client
// Point the baseURL to the Docker Model Runner port
const client = new OpenAI({
baseURL: '<http://localhost:12434/v1>',
apiKey: 'local-docker', // API key is not validated locally, any string works
});
async function runLocalModel() {
try {
// 2. Use the exact same standard OpenAI syntax
const response = await client.chat.completions.create({
model: 'gemma3-qat:270M-F16', // Use your local docker model name here
messages: [
{ role: 'user', content: 'Write a python code to search a node in a binary tree?' }
],
});
console.log("Response:", response.choices[0].message.content);
} catch (error) {
console.error("Error connecting to local model:", error);
}
}
runLocalModel();

When you run this script (node index.js), the request is routed to your locally running Docker model instead of OpenAI's servers.

Conclusion

Docker has successfully merged the worlds of containerization and AI development. By treating AI models as pull-able, runnable artifacts that expose industry-standard APIs, Docker Model Runner acts as the perfect bridge for developers wanting to experiment locally without opening their wallets.

While you might still deploy to production using hosted endpoints (like OpenAI or Gemini) for heavy lifting, building and testing locally with Docker is an absolute game-changer. Give it a try on your machine today!

Resources

🌐 Connect

For more insights on AI, and LLM systems follow me on:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.