Friday, July 3, 2026

Getting Began with the Claude API in Python


 

Introduction

 
You need to add Claude to a Python software. Creating an account and making your first API name is simple. The official documentation can get you from zero to a working request in a couple of minutes. The following questions are normally extra sensible:

  • What does the response object include?
  • How do you stream responses so customers can see output because it’s generated?
  • How do you construction prompts and deal with responses in a manufacturing software?

The Claude Python SDK takes care of a lot of the underlying API interplay. It gives typed response objects, built-in retry dealing with, and a easy interface for working with the Messages API.

This text walks you thru setup, your first API name, studying the response, system prompts, and streaming. By the top, you may have a working basis.

 

Conditions and Set up

 
You want Python 3.9 or increased, a free Claude Console account, and an API key from the Console’s Settings > API Keys web page. You’ll be able to add $5 in credit and work via every little thing on this article.

With these in place, set up the SDK:

 

By no means hardcode your API key in supply information. Retailer it as an surroundings variable as an alternative:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

 

Or add it to a .env file on the undertaking root in case you’re utilizing python-dotenv. The SDK reads the ANTHROPIC_API_KEY out of your surroundings, so that you need not go it wherever in your code.

 

Making Your First API Name

 
The entry level for each interplay is consumer.messages.create(). Let’s ask Claude to elucidate what a context window is, one thing you may really need to grasp as you utilize the API.

You go three issues: the mannequin ID, a max_tokens restrict, and a messages listing. The messages listing is at all times a listing of dicts, every with a "position" and "content material" key.

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content material[0].textual content)

 

The mannequin area takes the precise mannequin ID string. max_tokens is a tough ceiling on what number of output tokens Claude will produce; the response stops there even when the thought is not full, so set it excessive sufficient for open-ended requests. The messages listing should at all times begin with a "person" flip.

Pattern output:

A context window is the utmost quantity of textual content (measured in tokens) {that a} language
mannequin can course of and think about at one time, encompassing each your enter and its output.

 

Understanding the Response Object

 
The response from messages.create() is a typed Message object. It is price inspecting the complete construction earlier than constructing something on high of it.

Substitute the print line within the earlier instance with:

 

Operating that provides you the complete object:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  sort="message",
  position="assistant",
  content material=[TextBlock(text="A context window is...", type="text")],
  mannequin="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  utilization=Utilization(input_tokens=19, output_tokens=42)
)

 

A number of fields right here matter greater than they first seem. stop_reason tells you why Claude stopped producing. end_turn means Claude completed by itself phrases. In the event you see max_tokens, the response was reduce off by your restrict, and you might want to boost it or rethink the immediate.

The utilization area tracks each enter and output tokens for the request. That is how Anthropic calculates billing, and it is also the way you detect when a immediate is creeping too near the mannequin’s context restrict. content material is a listing — in commonplace textual content responses it at all times has one merchandise, a TextBlock — so response.content material[0].textual content is the idiomatic method to pull the textual content out.

 

Utilizing System Prompts

 
A system immediate enables you to give Claude a persistent position, set constraints, or present context that ought to apply throughout the whole dialog. You go it as a top-level system parameter — separate from the messages listing, not as a message itself.

Right here we configure Claude to behave as a code reviewer who solely responds in Python and avoids normal explanations:

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You're a Python code reviewer. "
        "Reply solely with corrected or improved Python code. "
        "Don't clarify adjustments until the person explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content material[0].textual content)

 

The system immediate sits above the dialog in Claude’s context. It carries the identical authority all through all turns, so position directions, formatting guidelines, and area constraints you set right here persist with out you repeating them in each message.

 

Streaming Responses

 
For requests the place Claude might take a number of seconds to reply, streaming enables you to show textual content because it arrives as an alternative of ready for the complete response. The SDK exposes this via consumer.messages.stream(), used as a context supervisor.

The text_stream iterator yields particular person textual content chunks in actual time. Every chunk is a string fragment, not a full sentence. You go finish="" and flush=True to print() so output seems repeatedly reasonably than buffering:

import anthropic

consumer = anthropic.Anthropic()

with consumer.messages.stream(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, finish="", flush=True)

print()  # newline after stream ends

 

The context supervisor ensures the HTTP connection is closed cleanly when the block exits, even when an exception is raised mid-stream. In the event you want the entire Message object after streaming — together with token utilization counts — name stream.get_final_message() earlier than the block closes.

Pattern output:

Python lists are dynamic arrays. Whenever you append a component and the listing has no
room, Python allocates a brand new, bigger block of reminiscence — usually 1.125x the present
dimension — copies all current parts into it, and releases the outdated block. This
operation is O(n) within the worst case, however as a result of it occurs occasionally relative to
the variety of appends, the amortized price per append stays O(1). You'll be able to pre-allocate
capability with a listing comprehension or by passing an iterable to the listing constructor
if  the ultimate dimension upfront.

 

Subsequent Steps

 
You now have the core constructing blocks: requests, structured responses, system prompts, and streaming.

Subsequent, you may study error dealing with, token utilization, and multi-turn conversations. As a result of the API is stateless, you have to ship the dialog historical past with every request. The SDK documentation reveals the beneficial method.

The API reference additionally consists of options like structured outputs and instrument use. Pleased exploring!
 
 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and low! At the moment, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.



Related Articles

Latest Articles