How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

Last Updated on May 13, 2025 by Editorial Team

Author(s): Omri Eliyahu Levy

Originally published on Towards AI.

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

TL;DR — Full code can be found here

At Baz, we’re building an AI code review agent that combines static analysis with LLMs to reason deeply about code. Along the way, structured output has become essential, especially when chaining model outputs or integrating into larger systems. As part of that journey, we constantly evaluate and integrate new models to see how they perform in complex, real-world pipelines.

Anthropic’s recently released Claude Sonnet 3.7 has attracted a lot of attention for its enhanced reasoning and code understanding. But if you’re building GenAI apps in the wild, there’s a catch: structured output doesn’t work the way you might expect when using its powerful “extended thinking” mode.

Structured output is critical in production-grade LLM systems — the ability to get consistent, parseable responses that adhere to a defined schema. Whether you’re routing LLM output to downstream systems or building modular agent workflows, you need more than free-form text — you need structure you can rely on.

This post walks through what structured output is, why it matters in production, and three practical ways to achieve it with Sonnet 3.7. We include end-to-end examples using Langchain and AWS Bedrock to help others shipping GenAI apps navigate this tradeoff.

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches — Photo by Tamilazhagan on Unsplash

What “structured output” is?

Structured output refers to an LLM response that adheres to a predefined schema. Instead of receiving the LLM response as free text, it is returned in a JSON schema that is easy to parse and work with. For production environments, leveraging structured JSON responses is crucial as this is how systems interact today.

How does structured output work?

Generally speaking, there are three different ways to achieve structured output:

Ask the model to adhere to a predefined schema.
Use constrained generation.
Utilize function (tool) calling.

Let’s discuss each.

Prompt the model to do so

This is the simplest approach but also the least reliable. You simply instruct the model to return its response in a specific format, such as JSON. You can add a few-shot examples, of course, and then try to parse the response directly into JSON.

The drawback here is that the model might not perfectly follow your instructions, or the schema might be too complex, requiring additional parsing or error handling.

Constrained decoding

Also sometimes called “JSON mode.” In this approach, there is a restriction on the tokens that the model is allowed to generate at a certain point in time. That is, given the desired output schema, it is translated into a context-free grammar that represents the schema [1]. Then, when generating the n’th token, the model will choose only from the tokens it is allowed to produce according to the grammar.

For example, if the model has already produced:

{“key”: “val

It will not be allowed to generate the } token as it must close the quotes " on the val(or generate more “letters”) in order to be a valid JSON.

This approach is great, although some research shows it may harm model performance and quality [2]. Also, Anthropics’ Sonnet 3.7 does not support this option.

Function/tool calling

In this approach, models are fine-tuned to generate structured outputs that specify which function to call and with what arguments. Function calling is learned through fine-tuning, allowing the model to decide when and how to invoke external functions based on the context of the conversation.

For example, we can inform the model that it has a web_search function available, which takes a queryas an input, which should be a valid string. So the model is expected to produce a valid function call (tool use) of the following schema:

{"function": "web_search", "arguments": {"query": "when did Claude Sonnet 3.7 come out?" }

This is the mode supported by Anthropics’ models!

Now, if you think about it, if a model knows how to produce high-quality function calls by adhering to their schema, it can actually be utilized to produce any schema. Indeed, this is what Anthropic recommends [4]:

“Tools do not necessarily need to be client-side functions — you can use tools anytime you want the model to return JSON output that follows a provided schema.“

By the way, if you’re a Langchain user, when you provide structured output for Anthropic models, it is “translated” behind the scenes into “function calling.”

Claude sonnet 3.7

Sonnet 3.7 is the latest model by Anthropic which includes enhanced reasoning capabilities. Actually, one can choose to operate with Sonnet 3.7 in one of two modes: with or without extended thinking. According to Anthropic [5], without thinking it’ll act as an improved version of 3.5 which we know and love, and with thinking enabled it’ll think and reflect (up to a threshold of tokens you can set) and just then will answer. This approach seems to prove itself as it scores very high on the Aider benchmark [6].

Ok… what’s the catch?

When extended thinking is on — Sonnet 3.7 does not support some features we were used to, in particular forced tool calling. Or in other words (recall our intro) it does not support structured output.

Indeed, and citing from the Langchain wrapper to Anthropics’ sourcecode [7]

“Anthropic structured output relies on forced tool calling, which is not supported when `thinking` is enabled”.

To be precise, as the Anthropic docs state, besides “forced tool use”, also temperature, top_p, or top_k are not supported.

What can we do?

I’ll introduce 3 different ways to achieve structured output with Sonnet 3.7.

In order to explore different ways I’ll be using Langchain with Anthropic models via AWS bedrock — but it’ll work the same with Anthropic as the provider.

Lets assume we want the model to be able to create a short story on some provided topic and also determine its genre.

from typing import Any
from dotenv import load_dotenv
from langchain_aws import ChatBedrockConverse
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from pydantic import BaseModel, Field

load_dotenv('.env')

class Story(BaseModel):
 """A short story with its genre"""
 content: str = Field(description='A very short story')
 genre: str = Field(description='The genre of the story')

“No thinking” mode

In this approach, we will not use the extended thinking mode of Sonnet 3.7. Without thinking enabled we can get reliable structured outputs through tool calling.

def no_thinking_mode() -> Story:
 """
 Example of structured output without extended thinking mode.
 This approach disables Claude's extended thinking capabilities but allows
 for direct structured output via forced tool calling.
 """
 prompt = PromptTemplate.from_template('Create a very short story about: {topic} and determine its genre')

 llm = ChatBedrockConverse(
 model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
 region_name='us-east-2',
 additional_model_request_fields={'thinking': {'type': 'disabled'}},
 )
 structured_llm = llm.with_structured_output(Story)
 chain = prompt | structured_llm

 res = chain.invoke({'topic': 'Harry Potter'})
 assert isinstance(res, Story)
 return resp

Remember, Langchain is doing here the “work” for us and “translate” this into a function call with the desired schema.

“Hopefully structured” mode

In this approach, we’ll use thinking mode and “ask nicely” for structured output.

This approach leverages the enhanced reasoning but relies on careful prompting to get structured results. Then, try to parse it into the required schema.

In the following example, Langchain will do that for us, but will raise OutputParserExceptionif it fails to do so.

def hopefully_structured_mode() -> Story:
 """
 Example of attempting structured output with extended thinking enabled.
 It'll not use forced tool calling and will try to parse the response into the provided schema.
 Will raise `OutputParserException` if it fails.
 """
 prompt = PromptTemplate.from_template(
 """Create a very short story about: {topic} and determine its genre.
 IMPORTANT: Your response must be formatted as valid JSON with two fields:
 1. content: That is, the story content
 2. genre: The genre of the story
 """
 )
 llm = ChatBedrockConverse(
 model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
 region_name='us-east-2',
 additional_model_request_fields={'thinking': {'type': 'enabled', 'budget_tokens': 2000}},
 )
 structured_llm = llm.with_structured_output(Story) # will try to parse the result according to the provided schema
 chain = prompt | structured_llm

 res = chain.invoke({'topic': 'Harry Potter'})
 assert isinstance(res, Story)
 return res

“Reason and Structure” mode

In this approach we let Sonnet 3.7 do what it is good at, reasoning, and then use another LLM (say Haiku) to structure its output. This two-step process gives you the best of both worlds but at the cost of additional complexity and latency.

def reason_and_structure_mode(inputs: dict[str, Any] = None) -> Story:
 """
 Example of a two-stage approach: reasoning with Sonnet-3.7 followed by structuring with Haiku.
 This approach leverages Sonnet's extended thinking for content generation, then
 uses Haiku to transform the output into a structured format.
 """

 reasoning_prompt = PromptTemplate.from_template('Create a very short story about: {topic}')
 reasoning_llm = ChatBedrockConverse(
 model_id='us.anthropic.claude-3-7-sonnet-20250219-v1:0',
 region_name='us-east-2',
 additional_model_request_fields={'thinking': {'type': 'enabled', 'budget_tokens': 2000}},
 )
 reasoning_chain = reasoning_prompt | reasoning_llm

 structuring_prompt = PromptTemplate.from_template(
 'Structure the provided story into the requested schema and assign "genre" to be {genre}. Story: {reasoning_output}'
 )
 structuring_llm = ChatBedrockConverse(
 model_id='us.anthropic.claude-3-5-haiku-20241022-v1:0',
 region_name='us-east-2',
 )
 structuring_llm = structuring_llm.with_structured_output(Story)
 structuring_chain = structuring_prompt | structuring_llm

 # Sometimes, we'll want to pass some of the inputs params directly to the "structuring model", not only the output of the reasoning model.
 # In order to support that, we'll create a "dummy" function, that just gets the inputs and returns them.
 # Then, we can run both the reasoning chain and the dummy function in parallel, and feed the structuring llm both:
 #
 # /-> reasoning_chain -> reasoning_output \
 # input_params -> merge_inputs -> structuring_llm
 # \-> dummy_function -> original_params /
 reason_then_structure_chain = (
 RunnableParallel(
 reasoning_output=reasoning_chain,
 original_inputs=RunnablePassthrough(),
 )
 | RunnableLambda(lambda x: prepare_structuring_inputs(x['original_inputs'], x['reasoning_output']))
 | structuring_chain
 )

 inputs = {'topic': 'Harry Potter', 'genre': 'fantasy'}
 res = reason_then_structure_chain.invoke(inputs)
 assert isinstance(res, Story)
 return res


def prepare_structuring_inputs(original_inputs: dict[str, Any], reasoning_output: str) -> dict[str, Any]:
 """
 Prepares inputs for the structuring model by combining original inputs with reasoning output.
 """
 return {
 **original_inputs, # Pass original inputs as-is
 'reasoning_output': reasoning_output, # Add reasoning chain output
 }

Full code can be found here.

In summary, while Sonnet 3.7 offers strong reasoning capabilities, achieving reliable structured output in production can be tricky, especially with extended thinking mode. By using approaches like no thinking mode, careful prompting, or combining reasoning with a separate structuring model, you can effectively work around these limitations and integrate structured outputs into your GenAI applications. These methods provide a practical way to ensure seamless integration in real-world pipelines.

References

[1] https://openai.com/index/introducing-structured-outputs-in-the-api/
[2] Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models: https://arxiv.org/abs/2408.02442
[3] https://huggingface.co/learn/agents-course/bonus-unit1/what-is-function-calling
[4] https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-output
[5] https://www.anthropic.com/news/claude-3-7-sonnet
[6] https://aider.chat/docs/leaderboards/
[7]https://github.com/langchain-ai/langchain/blob/47ded80b64fa8bd5d3d2f8cab0fe17fd6668 9019/libs/partners/anthropic/langchain_anthropic/chat_models.py#L1187

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

Author(s): Omri Eliyahu Levy

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

What “structured output” is?

Prompt the model to do so

Constrained decoding

Function/tool calling

Claude sonnet 3.7

“No thinking” mode

“Hopefully structured” mode

“Reason and Structure” mode

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

Author(s): Omri Eliyahu Levy

How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

What “structured output” is?

Prompt the model to do so

Constrained decoding

Function/tool calling

Claude sonnet 3.7

“No thinking” mode

“Hopefully structured” mode

“Reason and Structure” mode

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement