Skip to content

Allow configuration for passing previous thinking tokens #2203

@ihower

Description

@ihower

Background

In #1744 and #2158 we added support for Claude 4 and Gemini 3, where thinking tokens must be returned in the same turn during function calling.

For previous turns, if thinking tokens are sent back, most reasoning models will drop them server-side, similar to the OpenAI API. The main reason is that these tokens increase context window usage. However, starting from Claude Opus 4.5+ and Gemini 3, previous turn thinking tokens are no longer always dropped. If included, the server will preserve them in the context.

For Claude Opus 4.5+ and Gemini, including previous thinking blocks is a developer choice with clear trade-offs.

Pros: Better reasoning and prompt cache
Cons: Increased context window usage and higher token cost

References

In #2195, there is a proposal to always attach previous thinking tokens for Claude. But I don’t think we should attach thinking tokens to assistant messages by default when they are not strictly required.

I think the default behavior should remain not including previous thinking blocks. This keeps the behavior consistent with previous versions. Since this behavior is optional and has meaningful trade-offs, I suggest adding an explicit configuration option.

Proposed Design

A simple approach would be an agent-level and RunConfig option, for example:

agent = Agent(
   ...
   include_previous_reasoning=True
)

# and/or

Runner.run(
  agent,
  ...
  include_previous_reasoning=True
)

However, this is still not enough, because there is another important decision.

Should previous thinking tokens be stored internally (when converting to raw items in to_input_list()), even if they are not sent, in order to keep flexibility for sending them in future turns?

An alternative and more refined design approach is to use a small enum-based policy:

previous_reasoning_policy =
1. KEEP            # keep internally, but do not send
2. KEEP_AND_SEND   # keep internally and send
3. DROP            # do not keep, do not send

This gives developers explicit control, instead of silently increasing context usage and token cost.

Design Question

  1. The OpenAI API does not currently fully support this behavior, though I believe this option should be added to give developers explicit control in the future.

    For the OpenAI API, previous reasoning tokens are still dropped server-side. When store=True is used, clients must continue to pass back the reasoning item IDs to avoid API errors, even though the actual reasoning content is not preserved.

    When store=False, the API allows developers to skip sending previous encrypted reasoning content, which helps reduce network bandwidth usage. So the proposed configuration still makes sense and provides real value by allowing explicit control over whether previous encrypted reasoning items are sent.

  2. Because this is context management logic, not a model server-side parameter, I am unsure whether this belongs in ModelSettings, especially if it applies to LiteLLM only. I think it may be more appropriate as a top-level agent and run configuration as demonstrated in the code proposal above.

I’d like to get feedback or suggestions on the API design. If this approach makes sense, I can work on an implementation after #2158 is merged.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions