No free plan. The cheapest way in is Low-cost model tier at GPT-5 nano: $0.05 / 1M input tokens, $0.005 / 1M cached input tokens, $0.40 / 1M output tokens. Consider free alternatives in the ai models category if budget is tight.
The Responses API is OpenAI's more general interface for generating model responses with stateful interactions, structured JSON outputs, and built-in tools. It supports text and image inputs, file inputs, streaming, function calling, and tools such as web search and file search from the same endpoint. Chat Completions is still a familiar pattern for chat-style generation, but Responses is better suited when the application needs tool calls, retrieval, conversation state, or structured outputs in one workflow.
No monthly subscription tier is visible in the provided OpenAI API pricing documentation for the Responses API. It is priced as pay-per-use: tokens are billed at the selected model's input, cached input where supported, and output rates, and built-in tools have their own usage charges. Teams should verify the current OpenAI pricing page before estimating production cost because model names, availability, and rates can change.
OpenAI documents built-in tools and tool categories including web search, file search, code interpreter, computer use, MCP tools, and custom function calls. The tools parameter lets developers specify which tools the model may call while generating a response, and tool_choice can guide how the model selects tools. The max_tool_calls parameter is important in production because it caps total built-in tool calls across a response, helping control latency and cost.
Teams should estimate both model tokens and tool usage, because the API itself is not priced separately but tools can add meaningful cost. Start with the selected model's input, cached input, and output token rates, then add web search at $10.00 per 1K calls, file search tool calls at $2.50 per 1K calls, retained file search storage at $0.10 per GB-day after the first free GB, and container usage at $0.03 for 1 GB or $1.92 for 64 GB per 20-minute session per container. Production deployments should enforce max_tool_calls, prefer cheaper mini or nano models for routine steps, use prompt caching and Batch API where supported, clean up stored files, and set project-level budgets or alerts.
The Responses API is best for teams that want a managed OpenAI endpoint with built-in search, retrieval, code execution, structured output, and function calling. It is especially useful for product teams building agents, data extraction systems, research assistants, and internal automation tools. Teams that need vendor-neutral model routing may prefer an orchestration layer above the model APIs, while teams deeply invested in Google Cloud or Anthropic-specific behavior may compare Gemini or Anthropic directly.
See OpenAI Responses API plans and find the right tier for your needs.
See Pricing Plans →Still not sure? Read our full verdict →
Last verified March 2026