OpenAI Responses API Pricing & Plans 2026

Name: OpenAI Responses API
Brand: OpenAI Responses API
Price: 5 USD
Availability: InStock

Complete pricing guide for OpenAI Responses API. Compare all plans, analyze costs, and find the perfect tier for your needs.

Try OpenAI Responses API Free →Compare Plans ↓

Not sure if free is enough? See our Free vs Paid comparison →
Still deciding? Read our full verdict on whether OpenAI Responses API is worth it →

🆓Free Tier Available

💎4 Paid Plans

⚡No Setup Fees

Choose Your Plan

Low-cost model tier

GPT-5 nano: $0.05 / 1M input tokens, $0.005 / 1M cached input tokens, $0.40 / 1M output tokens

✓Lower-cost model option for high-volume lightweight tasks
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Start Free Trial →

Mini model tier

GPT-5 mini: $0.25 / 1M input tokens, $0.025 / 1M cached input tokens, $2.00 / 1M output tokens

✓Lower-cost model tier for general production workloads
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Start Free Trial →

Flagship model tier

GPT-5.4: $2.50 / 1M input tokens, $0.25 / 1M cached input tokens, $15.00 / 1M output tokens

✓More capable model tier for agent workflows
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Start Free Trial →

Higher-capability model tier

GPT-5.5: $5.00 / 1M input tokens, $0.50 / 1M cached input tokens, $30.00 / 1M output tokens

✓Higher-capability model tier where available
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Start Free Trial →

Built-in Tools

Web search: $10.00 / 1K calls; File Search Tool Call: $2.50 / 1K tool calls; File Search Storage: $0.10 / GB-day, first GB free; Containers: 1 GB for $0.03 or 64 GB for $1.92 per 20-minute session per container

✓Web search billed separately where used
✓File search tool calls billed separately where used
✓File search storage billed separately where retained
✓Code interpreter and hosted shell containers billed by session or container usage where used

Start Free →

Pricing sourced from OpenAI Responses API · Last verified March 2026

Feature Comparison

Features	Low-cost model tier	Mini model tier	Flagship model tier	Higher-capability model tier	Built-in Tools
Lower-cost model option for high-volume lightweight tasks	✓	✓	✓	✓	✓
Input tokens billed by usage	✓	✓	✓	✓	✓
Output tokens billed by usage	✓	✓	✓	✓	✓
Built-in tools billed separately	✓	✓	✓	✓	✓
Lower-cost model tier for general production workloads	—	✓	✓	✓	✓
More capable model tier for agent workflows	—	—	✓	✓	✓
Cached input may be billed at a lower rate where supported	—	—	✓	✓	✓
Higher-capability model tier where available	—	—	—	✓	✓
Web search billed separately where used	—	—	—	—	✓
File search tool calls billed separately where used	—	—	—	—	✓
File search storage billed separately where retained	—	—	—	—	✓
Code interpreter and hosted shell containers billed by session or container usage where used	—	—	—	—	✓

Is OpenAI Responses API Worth It?

✅ Why Choose OpenAI Responses API

• Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
• Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
• The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
• Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
• Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
• The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

⚠️ Consider This

• It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
• Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
• Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
• File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
• The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.

What Users Say About OpenAI Responses API

👍 What Users Love

✓Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
✓Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
✓The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
✓Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
✓Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
✓The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

👎 Common Concerns

⚠It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
⚠Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
⚠Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
⚠File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
⚠The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.

Pricing FAQ

How is the Responses API different from Chat Completions?

The Responses API is OpenAI's more general interface for generating model responses with stateful interactions, structured JSON outputs, and built-in tools. It supports text and image inputs, file inputs, streaming, function calling, and tools such as web search and file search from the same endpoint. Chat Completions is still a familiar pattern for chat-style generation, but Responses is better suited when the application needs tool calls, retrieval, conversation state, or structured outputs in one workflow.

Does the Responses API have a monthly subscription price?

No monthly subscription tier is visible in the provided OpenAI API pricing documentation for the Responses API. It is priced as pay-per-use: tokens are billed at the selected model's input, cached input where supported, and output rates, and built-in tools have their own usage charges. Teams should verify the current OpenAI pricing page before estimating production cost because model names, availability, and rates can change.

What built-in tools can the Responses API use?

OpenAI documents built-in tools and tool categories including web search, file search, code interpreter, computer use, MCP tools, and custom function calls. The tools parameter lets developers specify which tools the model may call while generating a response, and tool_choice can guide how the model selects tools. The max_tool_calls parameter is important in production because it caps total built-in tool calls across a response, helping control latency and cost.

How should teams estimate Responses API costs?

Teams should estimate both model tokens and tool usage, because the API itself is not priced separately but tools can add meaningful cost. Start with the selected model's input, cached input, and output token rates, then add web search at $10.00 per 1K calls, file search tool calls at $2.50 per 1K calls, retained file search storage at $0.10 per GB-day after the first free GB, and container usage at $0.03 for 1 GB or $1.92 for 64 GB per 20-minute session per container. Production deployments should enforce max_tool_calls, prefer cheaper mini or nano models for routine steps, use prompt caching and Batch API where supported, clean up stored files, and set project-level budgets or alerts.

Who is the Responses API best for compared with other AI model APIs?

The Responses API is best for teams that want a managed OpenAI endpoint with built-in search, retrieval, code execution, structured output, and function calling. It is especially useful for product teams building agents, data extraction systems, research assistants, and internal automation tools. Teams that need vendor-neutral model routing may prefer an orchestration layer above the model APIs, while teams deeply invested in Google Cloud or Anthropic-specific behavior may compare Gemini or Anthropic directly.

Ready to Get Started?

AI builders and operators use OpenAI Responses API to streamline their workflow.

Try OpenAI Responses API Now →

More about OpenAI Responses API

Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Compare OpenAI Responses API Pricing with Alternatives

Google Gemini Pricing

Google Gemini is a ai assistant tool for teams evaluating real workflows, pricing limits, strengths, drawbacks, and alternatives before committing.

Compare Pricing →

OpenAI Agents SDK Pricing

OpenAI Agents SDK is an open-source Python framework for building agentic apps with handoffs, guardrails, sessions, tracing, MCP tools, sandbox agents, and realtime voice agents.

Compare Pricing →

Choose Your Plan

Low-cost model tier

GPT-5 nano: $0.05 / 1M input tokens, $0.005 / 1M cached input tokens, $0.40 / 1M output tokens

✓Lower-cost model option for high-volume lightweight tasks
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Start Free Trial →

Mini model tier

GPT-5 mini: $0.25 / 1M input tokens, $0.025 / 1M cached input tokens, $2.00 / 1M output tokens

✓Lower-cost model tier for general production workloads
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Start Free Trial →

Flagship model tier

GPT-5.4: $2.50 / 1M input tokens, $0.25 / 1M cached input tokens, $15.00 / 1M output tokens

✓More capable model tier for agent workflows
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Start Free Trial →

Higher-capability model tier

GPT-5.5: $5.00 / 1M input tokens, $0.50 / 1M cached input tokens, $30.00 / 1M output tokens

✓Higher-capability model tier where available
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Start Free Trial →

Built-in Tools

✓Web search billed separately where used
✓File search tool calls billed separately where used
✓File search storage billed separately where retained
✓Code interpreter and hosted shell containers billed by session or container usage where used

Start Free →

Pricing sourced from OpenAI Responses API · Last verified March 2026

Feature Comparison

Features	Low-cost model tier	Mini model tier	Flagship model tier	Higher-capability model tier	Built-in Tools
Lower-cost model option for high-volume lightweight tasks	✓	✓	✓	✓	✓
Input tokens billed by usage	✓	✓	✓	✓	✓
Output tokens billed by usage	✓	✓	✓	✓	✓
Built-in tools billed separately	✓	✓	✓	✓	✓
Lower-cost model tier for general production workloads	—	✓	✓	✓	✓
More capable model tier for agent workflows	—	—	✓	✓	✓
Cached input may be billed at a lower rate where supported	—	—	✓	✓	✓
Higher-capability model tier where available	—	—	—	✓	✓
Web search billed separately where used	—	—	—	—	✓
File search tool calls billed separately where used	—	—	—	—	✓
File search storage billed separately where retained	—	—	—	—	✓
Code interpreter and hosted shell containers billed by session or container usage where used	—	—	—	—	✓

Is OpenAI Responses API Worth It?

✅ Why Choose OpenAI Responses API

• Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
• Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
• The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
• Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
• Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
• The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

⚠️ Consider This

• It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
• Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
• Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
• File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
• The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.

What Users Say About OpenAI Responses API

👍 What Users Love

✓Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
✓Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
✓The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
✓Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
✓Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
✓The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

👎 Common Concerns

⚠It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
⚠Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
⚠Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
⚠File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
⚠The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.