AI Models🔴Developer

OpenAI Responses API

Name: OpenAI Responses API
Brand: OpenAI Responses API
Price: 5 USD
Availability: InStock

OpenAI's primary API for building AI agents — combines text generation, built-in web search, file search, code interpreter, and computer use in a single endpoint with server-side tool orchestration.

Starting at$0.05 / 1M input tokens

Visit OpenAI Responses API →

💡

In Plain English

OpenAI's primary API for building AI agents — includes built-in web search, code execution, file analysis, and schema-based structured outputs in a single endpoint.

Overview

OpenAI Responses API is an AI Models developer API for building text, vision, structured-output, and tool-using AI agents through one endpoint, using OpenAI-hosted model reasoning, usage-based token billing, and native tool orchestration for developers, product teams, and enterprises that want fewer custom agent infrastructure components.

The API creates model responses at https://api.openai.com/v1/responses and supports text inputs, image inputs, file inputs, text outputs, JSON outputs, streaming, function calling, and conversation state. Its main advantage over older single-turn generation endpoints is that tools are first-class: OpenAI documents built-in tools including web search, file search, computer use, code interpreter, MCP tools, and custom function calls. The request schema also exposes operational controls that matter in production, including maxtoolcalls to cap built-in tool calls, paralleltoolcalls defaulting to true, store defaulting to true, stream defaulting to false, and metadata with up to 16 key-value pairs where keys are limited to 64 characters and values to 512 characters.

Pricing is usage-based rather than a monthly SaaS subscription. As of the current OpenAI pricing page, Responses API itself is not priced separately; tokens are billed at the selected model's rates, with examples including GPT-5 nano at $0.05 per 1M input tokens, $0.005 per 1M cached input tokens, and $0.40 per 1M output tokens; GPT-5 mini at $0.25 input, $0.025 cached input, and $2.00 output; GPT-5.4 at $2.50 input, $0.25 cached input, and $15.00 output; and GPT-5.5 at $5.00 input, $0.50 cached input, and $30.00 output. Built-in tools add separate cost drivers such as web search at $10.00 per 1K calls, file search tool calls at $2.50 per 1K calls, file search storage at $0.10 per GB-day with the first GB free, and containers at $0.03 for 1 GB or $1.92 for 64 GB per 20-minute session per container. Teams should verify the latest OpenAI pricing page before committing production budgets, especially for high-volume token workloads, web search, file search, regional processing, and container-heavy data analysis workflows.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

Server-Side Tool Orchestration+

The API handles multi-step tool use internally — the model can chain web searches, file reads, code execution, and custom function calls within a single request without client-side orchestration loops.

Use Case:

A single API call where the model searches the web for competitor pricing, processes results with code interpreter to build a comparison table, and returns structured JSON — no client-side loop management needed.

Built-in Web Search+

Native web search powered by OpenAI's search infrastructure provides access to recent information where the selected model and tool configuration support it. Pricing and availability should be verified against OpenAI's current pricing documentation.

Use Case:

Building a research agent that answers questions about current events, recent product launches, or live pricing data without integrating a separate search API.

Schema-Guided Structured Outputs+

JSON Schema enforcement helps responses conform to a developer-specified format, reducing invalid values, missing fields, and schema violations compared with free-form text generation.

Use Case:

An extraction pipeline that processes invoices and returns {vendor, amount, date, line_items[]} in the requested format, with fewer malformed JSON outputs or hallucinated field names.

Containers (Code Interpreter)+

Sandboxed Python execution environment for data analysis, calculations, file processing, and chart generation. Container pricing and session rules should be verified against OpenAI's current pricing documentation before production deployment.

Use Case:

An analyst uploads a CSV of 100K sales records and asks for quarterly trends. The model writes Python code, executes it in a container, generates charts, and returns both the analysis and downloadable visualizations.

Computer Use (Preview)+

Screen-based interaction with desktop applications via screenshots and mouse/keyboard control — enabling automation of workflows in applications that lack APIs. Pricing and availability may vary by model and deployment context.

Use Case:

Automating data entry in a legacy ERP system by having the agent navigate the application UI, fill in forms, and verify submissions — useful for enterprise apps with no API access.

Prompt Caching and Batch Processing+

Prompt caching can reduce costs on repeated input prefixes where supported, and Batch API can process asynchronous workloads at reduced cost where available. Teams should verify current model support, discount levels, and timing guarantees in OpenAI documentation.

Use Case:

A content moderation system processes 1 million user posts daily using Batch API, with a shared system prompt that may benefit from prompt caching across all requests.

Pricing Plans

Low-cost model tier

GPT-5 nano: $0.05 / 1M input tokens, $0.005 / 1M cached input tokens, $0.40 / 1M output tokens

✓Lower-cost model option for high-volume lightweight tasks
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Mini model tier

GPT-5 mini: $0.25 / 1M input tokens, $0.025 / 1M cached input tokens, $2.00 / 1M output tokens

✓Lower-cost model tier for general production workloads
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Flagship model tier

GPT-5.4: $2.50 / 1M input tokens, $0.25 / 1M cached input tokens, $15.00 / 1M output tokens

✓More capable model tier for agent workflows
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Higher-capability model tier

GPT-5.5: $5.00 / 1M input tokens, $0.50 / 1M cached input tokens, $30.00 / 1M output tokens

✓Higher-capability model tier where available
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Built-in Tools

Web search: $10.00 / 1K calls; File Search Tool Call: $2.50 / 1K tool calls; File Search Storage: $0.10 / GB-day, first GB free; Containers: 1 GB for $0.03 or 64 GB for $1.92 per 20-minute session per container

✓Web search billed separately where used
✓File search tool calls billed separately where used
✓File search storage billed separately where retained
✓Code interpreter and hosted shell containers billed by session or container usage where used

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with OpenAI Responses API?

View Pricing Options →

Best Use Cases

🎯

Structured invoice and form extraction: A finance workflow can send scanned or text-based business documents to the API and request a JSON response containing vendor, date, amount, tax, and line-item fields.

⚡

Research assistants with live information: A market intelligence product can let the model call web search, synthesize current sources, and return a structured brief with citations or source metadata included where supported.

🔧

Document Q&A over internal files: A support or legal team can combine uploaded documents with file search so the assistant retrieves relevant passages before generating an answer, while controlling cost through storage cleanup and tool-call limits.

🚀

Data analysis copilots: An analytics application can use code interpreter containers to inspect CSV files, run Python calculations, generate charts, and return both narrative findings and machine-readable outputs.

💡

Agent workflows that need custom business logic: A SaaS product can expose typed custom functions for actions such as checking order status, creating tickets, or querying internal APIs while also allowing built-in search or file retrieval.

🔄

High-volume classification or moderation pipelines: Teams processing large batches of records can choose lower-cost models where appropriate and benefit from cached input pricing when prompts share the same repeated prefix.

Limitations & What It Can't Do

We believe in transparent reviews. Here's what OpenAI Responses API doesn't handle well:

⚠OpenAI-only — no model portability; switching to Anthropic or Google requires rewriting tool orchestration code or adding an abstraction layer.
⚠Tool use can create unpredictable bills unless max_tool_calls and application-level budgets are enforced.
⚠File search storage costs can accumulate, so large document collections require active retention policies.
⚠Code interpreter and hosted shell containers may be inefficient for infrequent small calculations if sessions are not reused effectively.
⚠Regional processing or data residency options may carry different pricing or availability, so teams should verify current OpenAI documentation before deployment.

Pros & Cons

✓ Pros

✓Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
✓Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
✓The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
✓Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
✓Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
✓The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

✗ Cons

✗It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
✗Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
✗Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
✗File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
✗The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.

Frequently Asked Questions

How is the Responses API different from Chat Completions?+

The Responses API is OpenAI's more general interface for generating model responses with stateful interactions, structured JSON outputs, and built-in tools. It supports text and image inputs, file inputs, streaming, function calling, and tools such as web search and file search from the same endpoint. Chat Completions is still a familiar pattern for chat-style generation, but Responses is better suited when the application needs tool calls, retrieval, conversation state, or structured outputs in one workflow.

Does the Responses API have a monthly subscription price?+

No monthly subscription tier is visible in the provided OpenAI API pricing documentation for the Responses API. It is priced as pay-per-use: tokens are billed at the selected model's input, cached input where supported, and output rates, and built-in tools have their own usage charges. Teams should verify the current OpenAI pricing page before estimating production cost because model names, availability, and rates can change.

What built-in tools can the Responses API use?+

OpenAI documents built-in tools and tool categories including web search, file search, code interpreter, computer use, MCP tools, and custom function calls. The tools parameter lets developers specify which tools the model may call while generating a response, and tool_choice can guide how the model selects tools. The max_tool_calls parameter is important in production because it caps total built-in tool calls across a response, helping control latency and cost.

How should teams estimate Responses API costs?+

Teams should estimate both model tokens and tool usage, because the API itself is not priced separately but tools can add meaningful cost. Start with the selected model's input, cached input, and output token rates, then add web search at $10.00 per 1K calls, file search tool calls at $2.50 per 1K calls, retained file search storage at $0.10 per GB-day after the first free GB, and container usage at $0.03 for 1 GB or $1.92 for 64 GB per 20-minute session per container. Production deployments should enforce max_tool_calls, prefer cheaper mini or nano models for routine steps, use prompt caching and Batch API where supported, clean up stored files, and set project-level budgets or alerts.

Who is the Responses API best for compared with other AI model APIs?+

The Responses API is best for teams that want a managed OpenAI endpoint with built-in search, retrieval, code execution, structured output, and function calling. It is especially useful for product teams building agents, data extraction systems, research assistants, and internal automation tools. Teams that need vendor-neutral model routing may prefer an orchestration layer above the model APIs, while teams deeply invested in Google Cloud or Anthropic-specific behavior may compare Gemini or Anthropic directly.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on OpenAI Responses API and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

Alternatives to OpenAI Responses API

Google Gemini

AI assistant

Google Gemini is a ai assistant tool for teams evaluating real workflows, pricing limits, strengths, drawbacks, and alternatives before committing.

OpenAI Agents SDK

AI Agent Builders

OpenAI Agents SDK is an open-source Python framework for building agentic apps with handoffs, guardrails, sessions, tracing, MCP tools, sandbox agents, and realtime voice agents.

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try OpenAI Responses API Today

Get started with OpenAI Responses API and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about OpenAI Responses API

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

Server-Side Tool Orchestration+

Use Case:

Built-in Web Search+

Use Case:

Building a research agent that answers questions about current events, recent product launches, or live pricing data without integrating a separate search API.

Schema-Guided Structured Outputs+

JSON Schema enforcement helps responses conform to a developer-specified format, reducing invalid values, missing fields, and schema violations compared with free-form text generation.

Use Case:

An extraction pipeline that processes invoices and returns {vendor, amount, date, line_items[]} in the requested format, with fewer malformed JSON outputs or hallucinated field names.

Containers (Code Interpreter)+

Use Case:

Computer Use (Preview)+

Use Case:

Automating data entry in a legacy ERP system by having the agent navigate the application UI, fill in forms, and verify submissions — useful for enterprise apps with no API access.

Prompt Caching and Batch Processing+

Use Case:

A content moderation system processes 1 million user posts daily using Batch API, with a shared system prompt that may benefit from prompt caching across all requests.

Pricing Plans

Low-cost model tier

GPT-5 nano: $0.05 / 1M input tokens, $0.005 / 1M cached input tokens, $0.40 / 1M output tokens

✓Lower-cost model option for high-volume lightweight tasks
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Mini model tier

GPT-5 mini: $0.25 / 1M input tokens, $0.025 / 1M cached input tokens, $2.00 / 1M output tokens

✓Lower-cost model tier for general production workloads
✓Input tokens billed by usage
✓Output tokens billed by usage
✓Built-in tools billed separately

Flagship model tier

GPT-5.4: $2.50 / 1M input tokens, $0.25 / 1M cached input tokens, $15.00 / 1M output tokens

✓More capable model tier for agent workflows
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Higher-capability model tier

GPT-5.5: $5.00 / 1M input tokens, $0.50 / 1M cached input tokens, $30.00 / 1M output tokens

✓Higher-capability model tier where available
✓Input tokens billed by usage
✓Cached input may be billed at a lower rate where supported
✓Output tokens billed by usage

Built-in Tools

✓Web search billed separately where used
✓File search tool calls billed separately where used
✓File search storage billed separately where retained
✓Code interpreter and hosted shell containers billed by session or container usage where used

Ready to get started with OpenAI Responses API?

View Pricing Options →

Best Use Cases

🎯

Structured invoice and form extraction: A finance workflow can send scanned or text-based business documents to the API and request a JSON response containing vendor, date, amount, tax, and line-item fields.

⚡

Research assistants with live information: A market intelligence product can let the model call web search, synthesize current sources, and return a structured brief with citations or source metadata included where supported.

🔧

Document Q&A over internal files: A support or legal team can combine uploaded documents with file search so the assistant retrieves relevant passages before generating an answer, while controlling cost through storage cleanup and tool-call limits.

🚀

Data analysis copilots: An analytics application can use code interpreter containers to inspect CSV files, run Python calculations, generate charts, and return both narrative findings and machine-readable outputs.

💡

Agent workflows that need custom business logic: A SaaS product can expose typed custom functions for actions such as checking order status, creating tickets, or querying internal APIs while also allowing built-in search or file retrieval.

🔄

High-volume classification or moderation pipelines: Teams processing large batches of records can choose lower-cost models where appropriate and benefit from cached input pricing when prompts share the same repeated prefix.

Limitations & What It Can't Do

We believe in transparent reviews. Here's what OpenAI Responses API doesn't handle well:

⚠OpenAI-only — no model portability; switching to Anthropic or Google requires rewriting tool orchestration code or adding an abstraction layer.

⚠Tool use can create unpredictable bills unless max_tool_calls and application-level budgets are enforced.

⚠File search storage costs can accumulate, so large document collections require active retention policies.

⚠Code interpreter and hosted shell containers may be inefficient for infrequent small calculations if sessions are not reused effectively.

⚠Regional processing or data residency options may carry different pricing or availability, so teams should verify current OpenAI documentation before deployment.

Pros & Cons

✓ Pros

✓Single endpoint supports text, image, and file inputs plus text or JSON outputs, reducing integration surface for teams already building on OpenAI.
✓Built-in tool support covers web search, file search, computer use, code interpreter, MCP tools, and custom function calls, so many agent workflows can run without separate search, retrieval, and execution services.
✓The API includes production controls such as max_tool_calls, parallel_tool_calls defaulting to true, stream control, truncation behavior, and conversation state through previous_response_id or conversation.
✓Usage pricing is documented at the model and tool level, including separate billing for model tokens, cached input where supported, tool calls, storage, and container sessions.
✓Prompt caching can materially lower repeated-prefix costs where supported by the selected model and pricing tier.
✓The same API can be used for simple prompts, structured JSON extraction, streaming chat, retrieval-augmented answers, and multi-step tool use, which is useful for teams consolidating older Chat Completions or Assistants-style workflows.

✗ Cons

✗It is OpenAI-specific; teams that need model portability across Anthropic, Google, or open-source models will need an abstraction layer or separate implementations.
✗Costs can become hard to forecast when agents are allowed to call tools repeatedly, especially because tool usage and model tokens may be billed separately.
✗Computer use is a specialized automation capability and may require more validation than conventional API integrations because it depends on screen-level actions rather than stable application APIs.
✗File search can have separate cost drivers for tool calls and retained storage, so large document collections require active cost management.
✗The documentation page requires JavaScript/cookies in some contexts, which can make automated scraping or offline inspection less straightforward than static API documentation.