Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches
Latest   Machine Learning

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

Last Updated on May 13, 2025 by Editorial Team

Author(s): Omri Eliyahu Levy

Originally published on Towards AI.

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

TL;DR β€” Full code can be found here

At Baz, we’re building an AI code review agent that combines static analysis with LLMs to reason deeply about code. Along the way, structured output has become essential, especially when chaining model outputs or integrating into larger systems. As part of that journey, we constantly evaluate and integrate new models to see how they perform in complex, real-world pipelines.

Anthropic’s recently released Claude Sonnet 3.7 has attracted a lot of attention for its enhanced reasoning and code understanding. But if you’re building GenAI apps in the wild, there’s a catch: structured output doesn’t work the way you might expect when using its powerful β€œextended thinking” mode.

Structured output is critical in production-grade LLM systems β€” the ability to get consistent, parseable responses that adhere to a defined schema. Whether you’re routing LLM output to downstream systems or building modular agent workflows, you need more than free-form text β€” you need structure you can rely on.

This post walks through what structured output is, why it matters in production, and three practical ways to achieve it with Sonnet 3.7. We include end-to-end examples using Langchain and AWS Bedrock to help others shipping GenAI apps navigate this tradeoff.

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches
Photo by Tamilazhagan on Unsplash

What β€œstructured output” is?

Structured output refers to an LLM response that adheres to a predefined schema. Instead of receiving the LLM response as free text, it is returned in a JSON schema that is easy to parse and work with. For production environments, leveraging structured JSON responses is crucial as this is how systems interact today.

How does structured output work?

Generally speaking, there are three different ways to achieve structured output:

  1. Ask the model to adhere to a predefined schema.
  2. Use constrained generation.
  3. Utilize function (tool) calling.

Let’s discuss each.

Prompt the model to do so

This is the simplest approach but also the least reliable. You simply instruct the model to return its response in a specific format, such as JSON. You can add a few-shot examples, of course, and then try to parse the response directly into JSON.

The drawback here is that the model might not perfectly follow your instructions, or the schema might be too complex, requiring additional parsing or error handling.

Constrained decoding

Also sometimes called β€œJSON mode.” In this approach, there is a restriction on the tokens that the model is allowed to generate at a certain point in time. That is, given the desired output schema, it is translated into a context-free grammar that represents the schema [1]. Then, when generating the n’th token, the model will choose only from the tokens it is allowed to produce according to the grammar.

For example, if the model has already produced:

{β€œkey”: β€œval

It will not be allowed to generate the } token as it must close the quotes " on the val(or generate more β€œletters”) in order to be a valid JSON.

This approach is great, although some research shows it may harm model performance and quality [2]. Also, Anthropics’ Sonnet 3.7 does not support this option.

Function/tool calling

In this approach, models are fine-tuned to generate structured outputs that specify which function to call and with what arguments. Function calling is learned through fine-tuning, allowing the model to decide when and how to invoke external functions based on the context of the conversation.

For example, we can inform the model that it has a web_search function available, which takes a queryas an input, which should be a valid string. So the model is expected to produce a valid function call (tool use) of the following schema:

{"function": "web_search", "arguments": {"query": "when did Claude Sonnet 3.7 come out?" }

This is the mode supported by Anthropics’ models!

Now, if you think about it, if a model knows how to produce high-quality function calls by adhering to their schema, it can actually be utilized to produce any schema. Indeed, this is what Anthropic recommends [4]:

β€œTools do not necessarily need to be client-side functions β€” you can use tools anytime you want the model to return JSON output that follows a provided schema.β€œ

By the way, if you’re a Langchain user, when you provide structured output for Anthropic models, it is β€œtranslated” behind the scenes into β€œfunction calling.”

Claude sonnet 3.7

Sonnet 3.7 is the latest model by Anthropic which includes enhanced reasoning capabilities. Actually, one can choose to operate with Sonnet 3.7 in one of two modes: with or without extended thinking. According to Anthropic [5], without thinking it’ll act as an improved version of 3.5 which we know and love, and with thinking enabled it’ll think and reflect (up to a threshold of tokens you can set) and just then will answer. This approach seems to prove itself as it scores very high on the Aider benchmark [6].

Ok… what’s the catch?

When extended thinking is on β€” Sonnet 3.7 does not support some features we were used to, in particular forced tool calling. Or in other words (recall our intro) it does not support structured output.

Indeed, and citing from the Langchain wrapper to Anthropics’ sourcecode [7]

β€œAnthropic structured output relies on forced tool calling, which is not supported when `thinking` is enabled”.

To be precise, as the Anthropic docs state, besides β€œforced tool use”, also temperature, top_p, or top_k are not supported.

What can we do?

I’ll introduce 3 different ways to achieve structured output with Sonnet 3.7.

In order to explore different ways I’ll be using Langchain with Anthropic models via AWS bedrock β€” but it’ll work the same with Anthropic as the provider.

Lets assume we want the model to be able to create a short story on some provided topic and also determine its genre.

from typing import Any
from dotenv import load_dotenv
from langchain_aws import ChatBedrockConverse
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from pydantic import BaseModel, Field

load_dotenv('.env')

class Story(BaseModel):
"""A short story with its genre"""
content: str = Field(description='A very short story')
genre: str = Field(description='The genre of the story')

β€œNo thinking” mode

In this approach, we will not use the extended thinking mode of Sonnet 3.7. Without thinking enabled we can get reliable structured outputs through tool calling.

def no_thinking_mode() -> Story:
"""
Example of structured output without extended thinking mode.
This approach disables Claude's extended thinking capabilities but allows
for direct structured output via forced tool calling.
"""

prompt = PromptTemplate.from_template('Create a very short story about: {topic} and determine its genre')

llm = ChatBedrockConverse(
model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
region_name='us-east-2',
additional_model_request_fields={'thinking': {'type': 'disabled'}},
)
structured_llm = llm.with_structured_output(Story)
chain = prompt | structured_llm

res = chain.invoke({'topic': 'Harry Potter'})
assert isinstance(res, Story)
return resp

Remember, Langchain is doing here the β€œwork” for us and β€œtranslate” this into a function call with the desired schema.

β€œHopefully structured” mode

In this approach, we’ll use thinking mode and β€œask nicely” for structured output.

This approach leverages the enhanced reasoning but relies on careful prompting to get structured results. Then, try to parse it into the required schema.

In the following example, Langchain will do that for us, but will raise OutputParserExceptionif it fails to do so.

def hopefully_structured_mode() -> Story:
"""
Example of attempting structured output with extended thinking enabled.
It'll not use forced tool calling and will try to parse the response into the provided schema.
Will raise `OutputParserException` if it fails.
"""

prompt = PromptTemplate.from_template(
"""Create a very short story about: {topic} and determine its genre.
IMPORTANT: Your response must be formatted as valid JSON with two fields:
1. content: That is, the story content
2. genre: The genre of the story
"""

)
llm = ChatBedrockConverse(
model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
region_name='us-east-2',
additional_model_request_fields={'thinking': {'type': 'enabled', 'budget_tokens': 2000}},
)
structured_llm = llm.with_structured_output(Story) # will try to parse the result according to the provided schema
chain = prompt | structured_llm

res = chain.invoke({'topic': 'Harry Potter'})
assert isinstance(res, Story)
return res

β€œReason and Structure” mode

In this approach we let Sonnet 3.7 do what it is good at, reasoning, and then use another LLM (say Haiku) to structure its output. This two-step process gives you the best of both worlds but at the cost of additional complexity and latency.

def reason_and_structure_mode(inputs: dict[str, Any] = None) -> Story:
"""
Example of a two-stage approach: reasoning with Sonnet-3.7 followed by structuring with Haiku.
This approach leverages Sonnet's extended thinking for content generation, then
uses Haiku to transform the output into a structured format.
"""


reasoning_prompt = PromptTemplate.from_template('Create a very short story about: {topic}')
reasoning_llm = ChatBedrockConverse(
model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
region_name='us-east-2',
additional_model_request_fields={'thinking': {'type': 'enabled', 'budget_tokens': 2000}},
)
reasoning_chain = reasoning_prompt | reasoning_llm

structuring_prompt = PromptTemplate.from_template(
'Structure the provided story into the requested schema and assign "genre" to be {genre}. Story: {reasoning_output}'
)
structuring_llm = ChatBedrockConverse(
model_id='us.anthropic.claude-3-5-haiku-20241022-v1:0',
region_name='us-east-2',
)
structuring_llm = structuring_llm.with_structured_output(Story)
structuring_chain = structuring_prompt | structuring_llm

# Sometimes, we'll want to pass some of the inputs params directly to the "structuring model", not only the output of the reasoning model.
# In order to support that, we'll create a "dummy" function, that just gets the inputs and returns them.
# Then, we can run both the reasoning chain and the dummy function in parallel, and feed the structuring llm both:
#
# /-> reasoning_chain -> reasoning_output \
# input_params -> merge_inputs -> structuring_llm
# \-> dummy_function -> original_params /
reason_then_structure_chain = (
RunnableParallel(
reasoning_output=reasoning_chain,
original_inputs=RunnablePassthrough(),
)
| RunnableLambda(lambda x: prepare_structuring_inputs(x['original_inputs'], x['reasoning_output']))
| structuring_chain
)

inputs = {'topic': 'Harry Potter', 'genre': 'fantasy'}
res = reason_then_structure_chain.invoke(inputs)
assert isinstance(res, Story)
return res


def prepare_structuring_inputs(original_inputs: dict[str, Any], reasoning_output: str) -> dict[str, Any]:
"""
Prepares inputs for the structuring model by combining original inputs with reasoning output.
"""

return {
**original_inputs, # Pass original inputs as-is
'reasoning_output': reasoning_output, # Add reasoning chain output
}

Full code can be found here.

In summary, while Sonnet 3.7 offers strong reasoning capabilities, achieving reliable structured output in production can be tricky, especially with extended thinking mode. By using approaches like no thinking mode, careful prompting, or combining reasoning with a separate structuring model, you can effectively work around these limitations and integrate structured outputs into your GenAI applications. These methods provide a practical way to ensure seamless integration in real-world pipelines.

References

[1] https://openai.com/index/introducing-structured-outputs-in-the-api/
[2] Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models: https://arxiv.org/abs/2408.02442
[3] https://huggingface.co/learn/agents-course/bonus-unit1/what-is-function-calling
[4] https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-output
[5] https://www.anthropic.com/news/claude-3-7-sonnet
[6] https://aider.chat/docs/leaderboards/
[7]https://github.com/langchain-ai/langchain/blob/47ded80b64fa8bd5d3d2f8cab0fe17fd6668 9019/libs/partners/anthropic/langchain_anthropic/chat_models.py#L1187

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓