OpenAI's primary API for building AI agents — combines text generation, built-in web search, file search, code interpreter, and computer use in a single endpoint with server-side tool orchestration.
OpenAI's primary API for building AI agents — includes built-in web search, code execution, file analysis, and guaranteed structured outputs in a single endpoint.
The OpenAI Responses API is OpenAI's recommended API for building AI-powered applications, designed to succeed the Chat Completions API with native support for agentic workflows. The key innovation is server-side tool orchestration: when tools are enabled, the model autonomously decides to search the web, read files, execute code, or chain multiple steps within a single API call — eliminating client-side tool loop management.
Built-in tools include web search (powered by OpenAI's own search infrastructure with real-time results), file search (vector-based retrieval over uploaded documents with automatic chunking and ranking), code interpreter via containers (sandboxed Python execution for data analysis and visualization), and computer use (in preview — screen-based desktop automation via screenshots and mouse/keyboard control).
The API introduces guaranteed structured outputs via JSON Schema enforcement, meaning responses always conform to developer-specified formats without hallucinating invalid values. For cost optimization, the API offers prompt caching (up to 90% discount on repeated input prefixes), batch processing (50% discount on async workloads), and model distillation for creating fine-tuned cheaper variants.
Current model pricing on the Responses API ranges from GPT-5.4-nano at $0.20/$1.25 per million tokens (input/output) to GPT-5.4-pro at $30/$180 per million tokens. Web search tool calls cost $25 per 1,000 calls for GPT-4o/4.1 models and $10 per 1,000 calls for reasoning models (gpt-5 and newer, with free search content tokens). File search costs $2.50 per 1,000 calls plus $0.10/GB/day storage (1 GB free). Container pricing for code interpreter is $0.03 per session (1 GB) up to $1.92 per session (64 GB), transitioning to 20-minute session billing on March 31, 2026.
The Responses API works with all current OpenAI models and supports streaming, function calling for custom tools alongside built-in ones, MCP protocol integration for external tool access, and parallel tool execution. Responses API, Chat Completions API, Realtime API, Batch API, and Assistants API are not priced separately — tokens are billed at the chosen model's per-token rates.
Was this helpful?
The API handles multi-step tool use internally — the model chains web searches, file reads, code execution, and custom function calls within a single request without client-side orchestration loops.
Use Case:
A single API call where the model searches the web for competitor pricing, processes results with code interpreter to build a comparison table, and returns structured JSON — no client-side loop management needed.
Native web search powered by OpenAI's search infrastructure providing real-time information access. Verified pricing: $25/1K calls for GPT-4o/4.1 models, $10/1K calls for reasoning models (gpt-5 and newer) with free search content tokens.
Use Case:
Building a research agent that answers questions about current events, recent product launches, or live pricing data without integrating a separate search API.
JSON Schema enforcement ensures every response conforms exactly to a developer-specified format — no invalid values, missing fields, or schema violations. Eliminates JSON parsing errors entirely.
Use Case:
An extraction pipeline that processes invoices and always returns {vendor, amount, date, line_items[]} in exactly the right format, without ever producing malformed JSON or hallucinated field names.
Sandboxed Python execution environment for data analysis, calculations, file processing, and chart generation. Pricing: $0.03 (1 GB) to $1.92 (64 GB) per session, with 20-minute session billing starting March 31, 2026.
Use Case:
An analyst uploads a CSV of 100K sales records and asks for quarterly trends. The model writes Python code, executes it in a container, generates charts, and returns both the analysis and downloadable visualizations.
Screen-based interaction with desktop applications via screenshots and mouse/keyboard control — enabling automation of workflows in applications that lack APIs. Priced at $3 input / $12 output per 1M tokens.
Use Case:
Automating data entry in a legacy ERP system by having the agent navigate the application UI, fill in forms, and verify submissions — useful for enterprise apps with no API access.
Prompt caching gives up to 90% discount on repeated input prefixes (e.g., GPT-5.4 cached input at $0.25/1M vs $2.50/1M standard). Batch API processes async workloads at 50% discount over 24-hour windows.
Use Case:
A content moderation system processes 1 million user posts daily using Batch API at half the per-token cost, with a shared system prompt that benefits from prompt caching across all requests.
$2.50 input / $15.00 output per 1M tokens
$0.75 input / $4.50 output per 1M tokens
$0.20 input / $1.25 output per 1M tokens
$30.00 input / $180.00 output per 1M tokens
Varies
Ready to get started with OpenAI Responses API?
View Pricing Options →Production systems that need guaranteed-format outputs — invoice processing, form extraction, content classification — where JSON Schema enforcement eliminates parsing failures entirely.
Agents that combine web search, document analysis, and code execution in a single workflow — market research, competitive analysis, or due diligence tasks that need current information.
Companies processing millions of items (moderation, classification, extraction) who benefit from Batch API's 50% discount and prompt caching's 90% input reduction.
Developers who want to build functional agents quickly without managing tool orchestration infrastructure — the API handles search, code execution, and file analysis server-side.
We believe in transparent reviews. Here's what OpenAI Responses API doesn't handle well:
The Responses API adds built-in tools (web search, file search, code interpreter, computer use), server-side tool orchestration (the model chains multiple tool calls in one request), guaranteed structured outputs, and a richer conversation model. It's designed for agent workflows. Chat Completions still works but new features focus on Responses.
No. There is no API surcharge — you pay the same per-token rates regardless of which API you use (Responses, Chat Completions, Realtime, Batch, or Assistants). The only additional costs are for built-in tool usage: web search calls, file search calls, and container sessions.
Yes. Custom function definitions work alongside web search, file search, and code interpreter in the same request. The model can decide to use any combination of built-in and custom tools within a single orchestration loop.
MCP (Model Context Protocol) is a standard for connecting AI models to external tools and data sources. The Responses API supports MCP, meaning agents can invoke any MCP-compatible tool server — accessing databases, APIs, or custom services through a standardized interface.
All current OpenAI models including GPT-5.4, GPT-5.4-mini, GPT-5.4-nano, GPT-5.4-pro, reasoning models (o3, o4-mini), and legacy GPT-4o/4.1 series. Each model has different pricing and capability tradeoffs.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
People who use this tool also find these helpful
Anthropic's AI assistant with advanced reasoning, extended thinking, coding tools, and context windows up to 1M tokens — available as a consumer product and developer API.
Google's multimodal AI assistant with deep integration into Google services, web search, and advanced reasoning capabilities.
AI-powered translation service with superior accuracy and context understanding
Anthropic's developer platform for building with Claude AI models via API, featuring the Workbench for prompt engineering, usage analytics, and team management.
AI writing assistant for content creation with multiple formats and tones - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
All-in-one AI design and content creation platform for marketing teams - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
See how OpenAI Responses API compares to Gemini and other alternatives
View Full Comparison →AI Models
Google's multimodal AI assistant with deep integration into Google services, web search, and advanced reasoning capabilities.
AI Agent Builders
OpenAI's official open-source framework for building agentic AI applications with minimal abstractions. Production-ready successor to Swarm, providing agents, handoffs, guardrails, and tracing primitives that work with Python and TypeScript.
No reviews yet. Be the first to share your experience!
Get started with OpenAI Responses API and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →