GPT-5 Python API Documentation

This document provides a comprehensive guide to using the GPT-5 model via the Python API. It includes model details, endpoints, pricing, and code examples for various features.

Model: GPT-5

GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains.

Context Window: 400,000 tokens
Max Output Tokens: 128,000 tokens
Knowledge Cutoff: Sep 30, 2024

Pricing

Type	Per 1M tokens
Input	$1.25
Cached Input	$0.125
Output	$10.00

Modalities

Modality	Support
Text	Input and Output
Image	Input only
Audio	Not supported

Endpoints

Chat Completions: v1/chat/completions
Responses: v1/responses
Realtime: v1/realtime
Assistants: v1/assistants
Batch: v1/batch
Fine-tuning: v1/fine-tuning
Embeddings: v1/embeddings
Image Generation: v1/images/generations
Image Edit: v1/images/edits
Speech Generation: v1/audio/speech
Transcription: v1/audio/transcriptions
Translation: v1/audio/translations
Moderation: v1/moderations
Completions (legacy): v1/completions

Features & Tools

Feature	Supported	Tool	Supported
Streaming	✅	Web search	✅
Function calling	✅	File search	✅
Structured outputs	✅	Image generation	✅
Fine-tuning	✅	Code interpreter	✅
Distillation	✅	Computer use	❌
Predicted outputs	✅	MCP	✅

Snapshots

Snapshots let you lock in a specific version of the model. Available snapshots and aliases for GPT-5:

gpt-5
gpt-5-2025-08-07

Text Generation

Use a large language model to generate text from a prompt. Models can generate code, mathematical equations, structured JSON data, or human-like prose.

Generate text from a simple prompt

from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {
            "role": "user",
            "content": "Write a one-sentence bedtime story about a unicorn."
        }
    ]
)

print(completion.choices[0].message.content)

Prompt Engineering

Prompt engineering is the process of writing effective instructions for a model. It's recommended to pin production applications to specific model snapshots (like gpt-5-2025-08-07) and build evaluations to monitor prompt performance.

Message Roles and Instruction Following

You can provide instructions with differing levels of authority using the instructions API parameter and message roles.

Generate text with instructions

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    instructions="Talk like a pirate.",
    input="Are semicolons optional in JavaScript?",
)

print(response.output_text)

Generate text with messages using different roles

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    input=[
        {
            "role": "developer",
            "content": "Talk like a pirate."
        },
        {
            "role": "user",
            "content": "Are semicolons optional in JavaScript?"
        }
    ]
)

print(response.output_text)

Reusable Prompts

In the OpenAI dashboard, you can develop reusable prompts with placeholders that you can use in API requests.

Generate text with a prompt template

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    prompt={
        "id": "pmpt_abc123",
        "version": "2",
        "variables": {
            "customer_name": "Jane Doe",
            "product": "40oz juice box"
        }
    }
)

print(response.output_text)

Images and Vision

Use models to understand or generate images. Vision is the ability for a model to "see" and understand images, including text within them.

Analyze Images

You can provide images as input via a URL or a Base64 encoded string.

Analyze an image from a URL

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],
    }],
)

print(response.choices[0].message.content)

Analyze a Base64 encoded image

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "what's in this image?" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Calculating Costs

Image inputs are metered and charged in tokens. The cost depends on image dimensions and the model used. For detailed formulas and examples, refer to the official pricing page and its FAQ section.

Structured Model Outputs

Ensure model responses adhere to a JSON schema you define using Pydantic in Python. This provides reliable type-safety and simpler prompting.

Getting a structured response with Pydantic

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed
print(event)

Example: Chain-of-thought Math Tutoring

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

completion = client.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathReasoning,
)

math_reasoning = completion.choices[0].message.parsed
print(math_reasoning.model_dump_json(indent=2))

Example: Structured Data Extraction

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ResearchPaperExtraction(BaseModel):
    title: str
    authors: list[str]
    abstract: str
    keywords: list[str]

completion = client.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},
        {"role": "user", "content": "..."} # Add research paper text here
    ],
    response_format=ResearchPaperExtraction,
)

research_paper = completion.choices[0].message.parsed
print(research_paper)

Example: Moderation

from enum import Enum
from typing import Optional
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Category(str, Enum):
    violence = "violence"
    sexual = "sexual"
    self_harm = "self_harm"

class ContentCompliance(BaseModel):
    is_violating: bool
    category: Optional[Category]
    explanation_if_violating: Optional[str]

completion = client.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Determine if the user input violates specific guidelines and explain if they do."},
        {"role": "user", "content": "How do I prepare for a job interview?"}
    ],
    response_format=ContentCompliance,
)

compliance = completion.choices[0].message.parsed
print(compliance)

Refusals with Structured Outputs

If the model refuses to respond for safety reasons, the response will contain a refusal field instead of the parsed content.

# ... Pydantic class definition ...
completion = client.chat.completions.parse(...)
message = completion.choices[0].message

if message.refusal:
    print(f"Request refused: {message.refusal}")
else:
    print("Parsed content:", message.parsed)

Streaming Structured Outputs

Use streaming to process parts of the structured response as they are generated.

from typing import List
from pydantic import BaseModel
from openai import OpenAI

class EntitiesModel(BaseModel):
    attributes: List[str]
    colors: List[str]
    animals: List[str]

client = OpenAI()

with client.beta.chat.completions.stream(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Extract entities from the input text"},
        {"role": "user", "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes"},
    ],
    response_format=EntitiesModel,
) as stream:
    for event in stream:
        if event.type == "content.delta":
            if event.parsed is not None:
                print("content.delta parsed:", event.parsed)
        elif event.type == "content.done":
            print("content.done")
        elif event.type == "error":
            print("Error in stream:", event.error)

final_completion = stream.get_final_completion()
print("Final completion:", final_completion)

JSON Mode (Legacy)

For older models, you can use JSON mode by setting response_format={"type": "json_object"}. This ensures the output is valid JSON but does not guarantee it adheres to a specific schema. You must instruct the model to produce JSON in the prompt and handle edge cases like incomplete responses.

Function Calling & Tools

Give models access to external functionality and data. This involves defining tools (functions), letting the model request to call them, executing the tool's logic, and sending the results back to the model.

Function Tool Example

End-to-end example of a multi-step tool-calling flow.

from openai import OpenAI
import json

client = OpenAI()

# 1. Define a list of callable tools for the model
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_horoscope",
            "description": "Get today's horoscope for an astrological sign.",
            "parameters": {
                "type": "object",
                "properties": {
                    "sign": {
                        "type": "string",
                        "description": "An astrological sign like Taurus or Aquarius",
                    },
                },
                "required": ["sign"],
            },
        }
    },
]

# Create a running list of messages
messages = [
    {"role": "user", "content": "What is my horoscope? I am an Aquarius."}
]

# 2. First request to the model
response = client.chat.completions.create(
    model="gpt-5",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
response_message = response.choices[0].message
messages.append(response_message) # extend conversation with assistant's reply

# 3. Check if the model wants to call a function
if response_message.tool_calls:
    # 4. Call the function
    def get_horoscope(sign):
        return f"{sign}: Next Tuesday you will befriend a baby otter."

    available_functions = {"get_horoscope": get_horoscope}
    
    for tool_call in response_message.tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(sign=function_args.get("sign"))
        
        # 5. Send the info back to the model
        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )
    
    # 6. Get a new response from the model where it can use the function output
    second_response = client.chat.completions.create(
        model="gpt-5",
        messages=messages,
    )
    print(second_response.choices[0].message.content)

Defining Functions with Pydantic

The SDK includes helpers to convert Pydantic models into the required JSON schema.

from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field

client = OpenAI()

class GetWeather(BaseModel):
    location: str = Field(
        ...,
        description="City and country e.g. Bogotá, Colombia"
    )

tools = [pydantic_function_tool(GetWeather)]

completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools
)

print(completion.choices[0].message.tool_calls)

Streaming Function Calls

You can stream function calls to get partial progress and arguments in real-time.

from openai import OpenAI

client = OpenAI()

# ... (tools definition) ...

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        print(delta.tool_calls)

Custom Tools with Grammars

For more control, custom tools can use a context-free grammar (CFG) to constrain the model's text input. Both Lark and Regex syntaxes are supported.

Lark CFG Example

from openai import OpenAI
client = OpenAI()
grammar = """
start: expr
expr: term (SP ADD SP term)* -> add
term: factor (SP MUL SP factor)* -> mul
factor: INT
SP: " "
ADD: "+"
MUL: "*"
%import common.INT
"""
response = client.responses.create(
    model="gpt-5",
    input="Use the math_exp tool to add four plus four.",
    tools=[
        {
            "type": "custom",
            "name": "math_exp",
            "description": "Creates valid mathematical expressions",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": grammar,
            },
        }
    ]
)
print(response.output)

Using Built-in Tools

Extend model capabilities using built-in tools like web search or file search.

Web Search

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    tools=[{"type": "web_search_preview"}],
    input="What was a positive news story from today?"
)

print(response.output_text)

File Search

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    input="What is deep research by OpenAI?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": ["<vector_store_id>"]
    }]
)
print(response)

Remote MCP

from openai import OpenAI

client = OpenAI()

resp = client.responses.create(
    model="gpt-4.1",
    tools=[
        {
            "type": "mcp",
            "server_label": "deepwiki",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="What transport protocols are supported in the 2025-03-26 version of the MCP spec?",
)

print(resp.output_text)

Rate Limits

Your usage tier determines your rate limits for requests per minute (RPM) and tokens per minute (TPM).

Tier	RPM	TPM	Batch Queue Limit
Free	Not supported	Not supported	Not supported
Tier 1	500	30,000	90,000
Tier 2	5,000	450,000	1,350,000
Tier 3	5,000	800,000	100,000,000
Tier 4	10,000	2,000,000	200,000,000
Tier 5	15,000	40,000,000	15,000,000,000