GPT-5 Python API Documentation
This document provides a comprehensive guide to using the GPT-5 model via the Python API. It includes model details, endpoints, pricing, and code examples for various features.
Model: GPT-5
GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains.
- Context Window: 400,000 tokens
- Max Output Tokens: 128,000 tokens
- Knowledge Cutoff: Sep 30, 2024
Pricing
| Type | Per 1M tokens |
|---|---|
| Input | $1.25 |
| Cached Input | $0.125 |
| Output | $10.00 |
Modalities
| Modality | Support |
|---|---|
| Text | Input and Output |
| Image | Input only |
| Audio | Not supported |
Endpoints
- Chat Completions:
v1/chat/completions - Responses:
v1/responses - Realtime:
v1/realtime - Assistants:
v1/assistants - Batch:
v1/batch - Fine-tuning:
v1/fine-tuning - Embeddings:
v1/embeddings - Image Generation:
v1/images/generations - Image Edit:
v1/images/edits - Speech Generation:
v1/audio/speech - Transcription:
v1/audio/transcriptions - Translation:
v1/audio/translations - Moderation:
v1/moderations - Completions (legacy):
v1/completions
Features & Tools
| Feature | Supported | Tool | Supported |
|---|---|---|---|
| Streaming | ✅ | Web search | ✅ |
| Function calling | ✅ | File search | ✅ |
| Structured outputs | ✅ | Image generation | ✅ |
| Fine-tuning | ✅ | Code interpreter | ✅ |
| Distillation | ✅ | Computer use | ❌ |
| Predicted outputs | ✅ | MCP | ✅ |
Snapshots
Snapshots let you lock in a specific version of the model. Available snapshots and aliases for GPT-5:
gpt-5gpt-5-2025-08-07
Text Generation
Use a large language model to generate text from a prompt. Models can generate code, mathematical equations, structured JSON data, or human-like prose.
Generate text from a simple prompt
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-5",
messages=[
{
"role": "user",
"content": "Write a one-sentence bedtime story about a unicorn."
}
]
)
print(completion.choices[0].message.content)
Prompt Engineering
Prompt engineering is the process of writing effective instructions for a model. It's recommended to pin production applications to specific model snapshots (like gpt-5-2025-08-07) and build evaluations to monitor prompt performance.
Message Roles and Instruction Following
You can provide instructions with differing levels of authority using the instructions API parameter and message roles.
Generate text with instructions
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
reasoning={"effort": "low"},
instructions="Talk like a pirate.",
input="Are semicolons optional in JavaScript?",
)
print(response.output_text)
Generate text with messages using different roles
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
reasoning={"effort": "low"},
input=[
{
"role": "developer",
"content": "Talk like a pirate."
},
{
"role": "user",
"content": "Are semicolons optional in JavaScript?"
}
]
)
print(response.output_text)
Reusable Prompts
In the OpenAI dashboard, you can develop reusable prompts with placeholders that you can use in API requests.
Generate text with a prompt template
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
prompt={
"id": "pmpt_abc123",
"version": "2",
"variables": {
"customer_name": "Jane Doe",
"product": "40oz juice box"
}
}
)
print(response.output_text)
Images and Vision
Use models to understand or generate images. Vision is the ability for a model to "see" and understand images, including text within them.
Analyze Images
You can provide images as input via a URL or a Base64 encoded string.
Analyze an image from a URL
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
],
}],
)
print(response.choices[0].message.content)
Analyze a Base64 encoded image
import base64
from openai import OpenAI
client = OpenAI()
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the Base64 string
base64_image = encode_image(image_path)
completion = client.chat.completions.create(
model="gpt-4.1",
messages=[
{
"role": "user",
"content": [
{ "type": "text", "text": "what's in this image?" },
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
)
print(completion.choices[0].message.content)
Calculating Costs
Image inputs are metered and charged in tokens. The cost depends on image dimensions and the model used. For detailed formulas and examples, refer to the official pricing page and its FAQ section.
Structured Model Outputs
Ensure model responses adhere to a JSON schema you define using Pydantic in Python. This provides reliable type-safety and simpler prompting.
Getting a structured response with Pydantic
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
print(event)
Example: Chain-of-thought Math Tutoring
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class Step(BaseModel):
explanation: str
output: str
class MathReasoning(BaseModel):
steps: list[Step]
final_answer: str
completion = client.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
{"role": "user", "content": "how can I solve 8x + 7 = -23"}
],
response_format=MathReasoning,
)
math_reasoning = completion.choices[0].message.parsed
print(math_reasoning.model_dump_json(indent=2))
Example: Structured Data Extraction
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class ResearchPaperExtraction(BaseModel):
title: str
authors: list[str]
abstract: str
keywords: list[str]
completion = client.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},
{"role": "user", "content": "..."} # Add research paper text here
],
response_format=ResearchPaperExtraction,
)
research_paper = completion.choices[0].message.parsed
print(research_paper)
Example: Moderation
from enum import Enum
from typing import Optional
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class Category(str, Enum):
violence = "violence"
sexual = "sexual"
self_harm = "self_harm"
class ContentCompliance(BaseModel):
is_violating: bool
category: Optional[Category]
explanation_if_violating: Optional[str]
completion = client.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Determine if the user input violates specific guidelines and explain if they do."},
{"role": "user", "content": "How do I prepare for a job interview?"}
],
response_format=ContentCompliance,
)
compliance = completion.choices[0].message.parsed
print(compliance)
Refusals with Structured Outputs
If the model refuses to respond for safety reasons, the response will contain a refusal field instead of the parsed content.
# ... Pydantic class definition ...
completion = client.chat.completions.parse(...)
message = completion.choices[0].message
if message.refusal:
print(f"Request refused: {message.refusal}")
else:
print("Parsed content:", message.parsed)
Streaming Structured Outputs
Use streaming to process parts of the structured response as they are generated.
from typing import List
from pydantic import BaseModel
from openai import OpenAI
class EntitiesModel(BaseModel):
attributes: List[str]
colors: List[str]
animals: List[str]
client = OpenAI()
with client.beta.chat.completions.stream(
model="gpt-4.1",
messages=[
{"role": "system", "content": "Extract entities from the input text"},
{"role": "user", "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes"},
],
response_format=EntitiesModel,
) as stream:
for event in stream:
if event.type == "content.delta":
if event.parsed is not None:
print("content.delta parsed:", event.parsed)
elif event.type == "content.done":
print("content.done")
elif event.type == "error":
print("Error in stream:", event.error)
final_completion = stream.get_final_completion()
print("Final completion:", final_completion)
JSON Mode (Legacy)
For older models, you can use JSON mode by setting response_format={"type": "json_object"}. This ensures the output is valid JSON but does not guarantee it adheres to a specific schema. You must instruct the model to produce JSON in the prompt and handle edge cases like incomplete responses.
Function Calling & Tools
Give models access to external functionality and data. This involves defining tools (functions), letting the model request to call them, executing the tool's logic, and sending the results back to the model.
Function Tool Example
End-to-end example of a multi-step tool-calling flow.
from openai import OpenAI
import json
client = OpenAI()
# 1. Define a list of callable tools for the model
tools = [
{
"type": "function",
"function": {
"name": "get_horoscope",
"description": "Get today's horoscope for an astrological sign.",
"parameters": {
"type": "object",
"properties": {
"sign": {
"type": "string",
"description": "An astrological sign like Taurus or Aquarius",
},
},
"required": ["sign"],
},
}
},
]
# Create a running list of messages
messages = [
{"role": "user", "content": "What is my horoscope? I am an Aquarius."}
]
# 2. First request to the model
response = client.chat.completions.create(
model="gpt-5",
messages=messages,
tools=tools,
tool_choice="auto",
)
response_message = response.choices[0].message
messages.append(response_message) # extend conversation with assistant's reply
# 3. Check if the model wants to call a function
if response_message.tool_calls:
# 4. Call the function
def get_horoscope(sign):
return f"{sign}: Next Tuesday you will befriend a baby otter."
available_functions = {"get_horoscope": get_horoscope}
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(sign=function_args.get("sign"))
# 5. Send the info back to the model
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
)
# 6. Get a new response from the model where it can use the function output
second_response = client.chat.completions.create(
model="gpt-5",
messages=messages,
)
print(second_response.choices[0].message.content)
Defining Functions with Pydantic
The SDK includes helpers to convert Pydantic models into the required JSON schema.
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field
client = OpenAI()
class GetWeather(BaseModel):
location: str = Field(
...,
description="City and country e.g. Bogotá, Colombia"
)
tools = [pydantic_function_tool(GetWeather)]
completion = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools
)
print(completion.choices[0].message.tool_calls)
Streaming Function Calls
You can stream function calls to get partial progress and arguments in real-time.
from openai import OpenAI
client = OpenAI()
# ... (tools definition) ...
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
print(delta.tool_calls)
Custom Tools with Grammars
For more control, custom tools can use a context-free grammar (CFG) to constrain the model's text input. Both Lark and Regex syntaxes are supported.
Lark CFG Example
from openai import OpenAI
client = OpenAI()
grammar = """
start: expr
expr: term (SP ADD SP term)* -> add
term: factor (SP MUL SP factor)* -> mul
factor: INT
SP: " "
ADD: "+"
MUL: "*"
%import common.INT
"""
response = client.responses.create(
model="gpt-5",
input="Use the math_exp tool to add four plus four.",
tools=[
{
"type": "custom",
"name": "math_exp",
"description": "Creates valid mathematical expressions",
"format": {
"type": "grammar",
"syntax": "lark",
"definition": grammar,
},
}
]
)
print(response.output)
Using Built-in Tools
Extend model capabilities using built-in tools like web search or file search.
Web Search
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
tools=[{"type": "web_search_preview"}],
input="What was a positive news story from today?"
)
print(response.output_text)
File Search
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4.1",
input="What is deep research by OpenAI?",
tools=[{
"type": "file_search",
"vector_store_ids": ["<vector_store_id>"]
}]
)
print(response)
Remote MCP
from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
model="gpt-4.1",
tools=[
{
"type": "mcp",
"server_label": "deepwiki",
"server_url": "https://mcp.deepwiki.com/mcp",
"require_approval": "never",
},
],
input="What transport protocols are supported in the 2025-03-26 version of the MCP spec?",
)
print(resp.output_text)
Rate Limits
Your usage tier determines your rate limits for requests per minute (RPM) and tokens per minute (TPM).
| Tier | RPM | TPM | Batch Queue Limit |
|---|---|---|---|
| Free | Not supported | Not supported | Not supported |
| Tier 1 | 500 | 30,000 | 90,000 |
| Tier 2 | 5,000 | 450,000 | 1,350,000 |
| Tier 3 | 5,000 | 800,000 | 100,000,000 |
| Tier 4 | 10,000 | 2,000,000 | 200,000,000 |
| Tier 5 | 15,000 | 40,000,000 | 15,000,000,000 |