LLM & AI Agent Applications with LangChain and LangGraph — Part 8 — Temperature, Top-p, Top-k and Max Tokens: How to Shape Model Behavior

Last Updated on January 2, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 8 — Temperature, Top-p, Top-k and Max Tokens: How to Shape Model Behavior

Welcome back to another article focused on the LLM-driven applications development.

In this part of the course I want to focus on something very practical: the main generation parameters you can control when working with large language models.

These settings decide a lot about the style, quality and predictability of the answers you get. If you understand them well, you can tune the model depending on what you need: a precise and repeatable assistant, or a more creative partner that explores different possibilities.

Temperature — from analyst to poet

The first parameter is temperature.

You can think of it as a kind of entropy control, a measure of “chaos” in the model’s choices. Technically it is a number, usually between 0 and 1, sometimes up to 2 depending on the API.

At low values, for example temperature = 0, the model becomes very predictable. It always picks the most likely next token. If you send the same prompt ten times, you will get almost identical answers. That is perfect for tasks where you want stability and reproducibility: many business applications, structured data extraction, deterministic analysis.

When you increase the temperature, say to 1.0, the model starts to explore more of the probability distribution. It becomes more creative. Answers are more varied, less repetitive, occasionally a bit surprising. This is useful for brainstorming, creative writing, idea generation.
You can imagine temperature as a slider running from “strict analyst” on one side to “inspired poet” on the other.

Top-p — focusing on the main part of the distribution

The next parameter is top-p, also known as nucleus sampling.

Top-p is an alternative way of controlling randomness. Instead of stretching or compressing the whole distribution like temperature does, it focuses on a subset of tokens that together cover a certain share of the probability mass.

For example, if top_p = 0.9, the model will look at the list of possible next tokens sorted by probability, take just enough of them so that their cumulative probability reaches 90 percent, and only then sample from that reduced group.

The effect is that the model ignores the very long tail of unlikely options, but still has some freedom within the “reasonable” candidates. You control diversity, but in a way that stays focused on the most plausible region. It is a bit like saying: “Stay within this main group of candidates, but do not always pick the same one.”

Top-k — choosing from the best k candidates

A related parameter is top-k.

Here we do not look at percentages, but at a fixed number. We tell the model: “Only consider the top k most probable tokens at each step.”

If top_k = 1, the model will always pick the single most likely token, which makes it extremely deterministic. If top_k = 50, it has a much larger pool of good candidates, so generations will be more diverse.

Top-k is especially common in open source models and in libraries that expose lower level sampling options. It gives you a simple mental model: small k means safe and predictable, large k means more variety and more room for creativity.

You can also combine temperature, top-p and top-k, but often it is enough to pick one approach and tune it for your use case.

Max tokens — how much the model is allowed to say

The last parameter I will mention here is max tokens.

This one is conceptually simple: it is the limit on how long the model’s answer can be in a single call. The value is expressed in tokens, not in words, but the idea is straightforward. Once the model reaches that limit, it has to stop, even if it would happily continue the explanation or story.

If you set a low max_tokens, you will get short, concise answers. That is useful when you want summaries, brief status messages or when you need to keep costs and latency under tight control.

If you set a high value, the model is allowed to write longer narratives, detailed analyses and multi-step reasoning. For some tasks that is exactly what you want: enough space for the model to lay out its thinking and cover edge cases.

Putting it together

As you can see, with just a few parameters you can dramatically change the character of the model:

temperature and top-p / top-k steer how predictable or creative it is
max tokens controls how much it can say in a single run

In real projects there is rarely a single “perfect” setting. You experiment, look at the outputs, adjust, and eventually settle on a configuration that fits your specific task.

In the next step we will move to a notebook and see how these parameters behave in practice. We will call the same model with different temperature, top-p and top-k values and compare the answers side by side, so you can directly feel the impact of each setting.

Install libraries and load environment variables

!pip install -q langchain-openai python-dotenv

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

Helper function requesting LLM API with specified parameters

def test_generation(param_name, values, prompt="Generate new unknown recipe for cheesecake."):
 for v in values:
 print("="*60)
 print(f"{param_name} = {v}")
 llm = ChatOpenAI(model="gpt-4", temperature=0.7) # bazowe settings
 kwargs = {param_name: v}
 response = llm.invoke(prompt, **kwargs)
 print(response.content, "\n")

Experiment with temperature

test_generation("temperature", [0.0, 0.7, 1.2])

output:

============================================================
temperature = 0.0
Recipe Name: Tropical Coconut Mango Cheesecake

Ingredients:

For the crust:
1. 1 1/2 cups graham cracker crumbs

When temperature=0, the responses will be repetitive.
When temperature=1.2, the responses will be more frantic.

Experiment with top_p

test_generation("top_p", [0.2, 0.7, 1.0])

output:

============================================================
top_p = 0.2
Recipe Name: Tropical Coconut Mango Cheesecake

Ingredients:

For the crust:
1. 1 1/2 cups graham cracker crumbs

When top_p is low, the model will be conservative.

Experiment with max_tokens

test_generation("max_tokens", [30, 100, 300])

output:

============================================================
max_tokens = 30
Recipe: Tropical Passion Fruit Cheesecake

Ingredients:

For the crust:
1. 2 cups graham cracker crumbs

Value of max_tokens will decide whether the story is 2 sentences or a whole page.

That all for this chapter. In the next one I will show how to create conversation with memory so model will keep the context of previous messages.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 8 — Temperature, Top-p, Top-k and Max Tokens: How to Shape Model Behavior

Author(s): Michalzarnecki

Temperature — from analyst to poet

Top-p — focusing on the main part of the distribution

Top-k — choosing from the best k candidates

Max tokens — how much the model is allowed to say

Putting it together

Install libraries and load environment variables

Helper function requesting LLM API with specified parameters

Experiment with temperature

Experiment with top_p

Experiment with max_tokens

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 8 — Temperature, Top-p, Top-k and Max Tokens: How to Shape Model Behavior

Author(s): Michalzarnecki

Temperature — from analyst to poet

Top-p — focusing on the main part of the distribution

Top-k — choosing from the best k candidates

Max tokens — how much the model is allowed to say

Putting it together

Install libraries and load environment variables

Helper function requesting LLM API with specified parameters

Experiment with temperature

Experiment with top_p

Experiment with max_tokens

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement