5 Highly effective Python Decorators to Optimize LLM Purposes

March 7, 2026

2

Picture by Editor

# Introduction

Python decorators are tailored options which can be designed to assist simplify advanced software program logic in quite a lot of purposes, together with LLM-based ones. Coping with LLMs typically includes dealing with unpredictable, sluggish—and regularly costly—third-party APIs, and interior designers have quite a bit to supply for making this process cleaner by wrapping, for example, API calls with optimized logic.

Let’s check out 5 helpful Python decorators that may assist you optimize your LLM-based purposes with out noticeable further burden.

The accompanying examples illustrate the syntax and method to utilizing every decorator. They’re generally proven with out precise LLM use, however they’re code excerpts finally designed to be a part of bigger purposes.

# 1. In-memory Caching

This resolution comes from Python’s functools customary library, and it’s helpful for costly features like these utilizing LLMs. If we had an LLM API name within the perform outlined beneath, wrapping it in an LRU (Least Just lately Used) decorator provides a cache mechanism that forestalls redundant requests containing similar inputs (prompts) in the identical execution or session. That is a sublime solution to optimize latency points.

This instance illustrates its use:

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def summarize_text(textual content: str) -> str:
    print("Sending textual content to LLM...")
    time.sleep(1) # A simulation of community delay
    return f"Abstract of {len(textual content)} characters."

print(summarize_text("The fast brown fox.")) # Takes one second
print(summarize_text("The fast brown fox.")) # Prompt

# 2. Caching On Persistent Disk

Talking of caching, the exterior library diskcache takes it a step additional by implementing a persistent cache on disk, specifically by way of a SQLite database: very helpful for storing outcomes of time-consuming features comparable to LLM API calls. This fashion, outcomes could be rapidly retrieved in later calls when wanted. Think about using this decorator sample when in-memory caching is just not adequate as a result of the execution of a script or software could cease.

import time
from diskcache import Cache

# Creating a light-weight native SQLite database listing
cache = Cache(".local_llm_cache")

@cache.memoize(expire=86400) # Cached for twenty-four hours
def fetch_llm_response(immediate: str) -> str:
    print("Calling costly LLM API...") # Change this by an precise LLM API name
    time.sleep(2) # API latency simulation
    return f"Response to: {immediate}"

print(fetch_llm_response("What's quantum computing?")) # 1st perform name
print(fetch_llm_response("What's quantum computing?")) # Prompt load from disk occurs right here!

# 3. Community-resilient Apps

Since LLMs could typically fail as a consequence of transient errors in addition to timeouts and “502 Dangerous Gateway” responses on the Web, utilizing a community resilience library like tenacity together with the @retry decorator might help intercept these frequent community failures.

The instance beneath illustrates this implementation of resilient conduct by randomly simulating a 70% probability of community error. Strive it a number of instances, and ultimately you will note this error arising: completely anticipated and supposed!

import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

class RateLimitError(Exception): go

# Retrying as much as 4 instances, ready 2, 4, and eight seconds between every try
@retry(
    wait=wait_exponential(multiplier=2, min=2, max=10),
    cease=stop_after_attempt(4),
    retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(immediate: str):
    print("Trying to name API...")
    if random.random() < 0.7: # Simulating a 70% probability of API failure
        elevate RateLimitError("Price restrict exceeded! Backing off.")
    return "Textual content has been efficiently generated!"

print(call_flaky_llm_api("Write a haiku"))

# 4. Consumer-side Throttling

This mixed decorator makes use of the ratelimit library to manage the frequency of calls to a (normally extremely demanded) perform: helpful to keep away from client-side limits when utilizing exterior APIs. The next instance does so by defining Requests Per Minute (RPM) limits. The supplier will reject prompts from a consumer software when too many concurrent prompts are launched.

from ratelimit import limits, sleep_and_retry
import time

# Strictly implementing a 3-call restrict per 10-second window
@sleep_and_retry
@limits(calls=3, interval=10)
def generate_text(immediate: str) -> str:
    print(f"[{time.strftime('%X')}] Processing: {immediate}")
    return f"Processed: {immediate}"

# First 3 print instantly, the 4th pauses, thereby respecting the restrict
for i in vary(5):
    generate_text(f"Immediate {i}")

# 5. Structured Output Binding

The fifth decorator on the listing makes use of the magentic library along with Pydantic to supply an environment friendly interplay mechanism with LLMs by way of API, and acquire structured responses. It simplifies the method of calling LLM APIs. This course of is vital for coaxing LLMs to return formatted information like JSON objects in a dependable vogue. The decorator would deal with underlying system prompts and Pydantic-led parsing, optimizing the utilization of tokens consequently and serving to preserve a cleaner codebase.

To do this instance out, you have to an OpenAI API key.

# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated instance
from magentic import immediate
from pydantic import BaseModel

class CapitalInfo(BaseModel):
    capital: str
    inhabitants: int

# A decorator that simply maps the immediate to the Pydantic return sort
@immediate("What's the capital and inhabitants of {nation}?")
def get_capital_info(nation: str) -> CapitalInfo:
    ... # No perform physique wanted right here!

information = get_capital_info("France")
print(f"Capital: {information.capital}, Inhabitants: {information.inhabitants}")

# Wrapping Up

On this article, we listed and illustrated 5 Python decorators primarily based on various libraries that tackle explicit significance when used within the context of LLM-based purposes to simplify logic, make processes extra environment friendly, or enhance community resilience, amongst different facets.

Iván Palomares Carrascosa is a frontrunner, author, speaker, and adviser in AI, machine studying, deep studying & LLMs. He trains and guides others in harnessing AI in the true world.

5 Highly effective Python Decorators to Optimize LLM Purposes

# Introduction

# 1. In-memory Caching

# 2. Caching On Persistent Disk

# 3. Community-resilient Apps

# 4. Consumer-side Throttling

# 5. Structured Output Binding

# Wrapping Up

Related Articles

10 GitHub Repositories to Grasp System Design

Termite ransomware breaches linked to ClickFix CastleRAT assaults

Closing tabs: Memphis version – by scott cunningham

Latest Articles

10 GitHub Repositories to Grasp System Design

Termite ransomware breaches linked to ClickFix CastleRAT assaults

Closing tabs: Memphis version – by scott cunningham

Understanding Context and Contextual Retrieval in RAG

Is the Pentagon allowed to surveil Individuals with AI?