Running LLM Models locally with Docker

Last Updated on March 3, 2026 by Editorial Team

Author(s): Kushal Banda

Originally published on Towards AI.

If you’ve been building AI applications recently, you’re likely familiar with the friction of managing API keys, tracking usage costs, and relying on cloud endpoints like OpenAI or Anthropic. But what if you could spin up Large Language Models (LLMs) locally with the exact same workflow you use for your databases and web apps?

Enter Docker Model Runner (DMR).

Just as Docker revolutionized how we pull, manage, and run application containers, it is now bringing that exact same simplicity to AI models. In this guide, we will explore how you can use Docker to pull LLMs, chat with them in your terminal, and seamlessly integrate them into your existing Node.js applications as a drop-in replacement for OpenAI APIs.

Enable Docker Model Runner and host-side TCP support (OpenAI Compatable APIs)

Running LLM Models locally with Docker — Docker

Why Run Models Locally?

Before we dive into the “how,” let’s quickly address the “why.” Running models locally on your own hardware offers several massive benefits:

Zero Cost: No need to constantly load API credits just to test your application logic.
Privacy: Your prompts and data never leave your local machine.
Offline Development: Code from planes, trains, or anywhere without an internet connection.

With Docker’s new capabilities, getting a model up and running is now as easy as pulling a PostgreSQL image.

Step 1: Enabling Docker Model Runner

To get started, you need to have Docker Desktop installed on your machine. Once installed, we need to flip a couple of switches to enable the AI features.

Open Docker Desktop and navigate to Settings.
Look for the AI tab on the sidebar.
Check the box for Enable Docker Model Runner.
Check the box for Enable host-side TCP support.

Note: Enabling the TCP support is crucial. This exposes a local REST API on port 12434 which we will use later to connect our code to the model.

Step 2: Pulling and Running Models via CLI

Docker has introduced a new command namespace: docker model. If you know how to use standard Docker commands, you already know how to manage LLMs.

You can browse available models on Docker Hub’s Model Repository. For this tutorial, let’s use a lightweight model like Gemma 3 (around 571MB), which runs smoothly even without a massive GPU.

Open your terminal and pull the model:

docker model pull ai/gemma3-qat

Once downloaded, you can view your available local models using:

docker model list

To start interacting with the model immediately, simply run:

docker model run gemma3-qat

Boom! You are now in an interactive chat environment directly in your terminal. You can say “Hi” or ask it math questions, and it will respond entirely locally.

Step 3: The Magic of OpenAI Compatible APIs

Chatting in the terminal is cool, but as developers, we want to integrate these models into our code. This is where Docker Model Runner truly shines.

Instead of forcing you to learn a completely new SDK to talk to your local models, Docker exposes an OpenAI-compatible REST API. This means the Docker runner expects the exact same JSON request body as OpenAI and returns the exact same response structure.

If you have an existing application built with the openai npm package, you can point it to your local Docker model without rewriting your application logic!

Node.js Integration Example

Let’s set up a quick Node.js playground to see this in action.

Initialize a new project and install the official OpenAI SDK:

mkdir docker-AI
cd docker-AI
pnpm init
pnpm install openai

(Make sure to add "type": "module" to your package.json to use ES modules).

Now, create an index.js file. Notice how we use the standard OpenAI SDK, but we override the baseURL to point to Docker's local TCP port (http://localhost:12434/v1):

import OpenAI from 'openai';

// 1. Initialize the client
// Point the baseURL to the Docker Model Runner port
const client = new OpenAI({
 baseURL: '<http://localhost:12434/v1>',
 apiKey: 'local-docker', // API key is not validated locally, any string works
});
async function runLocalModel() {
 try {
 // 2. Use the exact same standard OpenAI syntax
 const response = await client.chat.completions.create({
 model: 'gemma3-qat:270M-F16', // Use your local docker model name here
 messages: [
 { role: 'user', content: 'Write a python code to search a node in a binary tree?' }
 ],
 });
 console.log("Response:", response.choices[0].message.content);
 } catch (error) {
 console.error("Error connecting to local model:", error);
 }
}
runLocalModel();

When you run this script (node index.js), the request is routed to your locally running Docker model instead of OpenAI's servers.

Conclusion

Docker has successfully merged the worlds of containerization and AI development. By treating AI models as pull-able, runnable artifacts that expose industry-standard APIs, Docker Model Runner acts as the perfect bridge for developers wanting to experiment locally without opening their wallets.

While you might still deploy to production using hosted endpoints (like OpenAI or Gemini) for heavy lifting, building and testing locally with Docker is an absolute game-changer. Give it a try on your machine today!

Resources

🌐 Connect

For more insights on AI, and LLM systems follow me on:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Running LLM Models locally with Docker

Author(s): Kushal Banda

Why Run Models Locally?

Step 1: Enabling Docker Model Runner

Step 2: Pulling and Running Models via CLI

Step 3: The Magic of OpenAI Compatible APIs

Node.js Integration Example

Conclusion

Resources

🌐 Connect

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Running LLM Models locally with Docker

Author(s): Kushal Banda

Why Run Models Locally?

Step 1: Enabling Docker Model Runner

Step 2: Pulling and Running Models via CLI

Step 3: The Magic of OpenAI Compatible APIs

Node.js Integration Example

Conclusion

Resources

🌐 Connect

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement