On this tutorial, we construct a workflow utilizing Outlines to generate structured and type-safe outputs from language fashions. We work with typed constraints like Literal, int, and bool, and design immediate templates utilizing outlines.Template, and implement strict schema validation with Pydantic fashions. We additionally implement strong JSON restoration and a function-calling fashion that generates validated arguments and executes Python features safely. All through the tutorial, we deal with reliability, constraint enforcement, and production-grade structured technology.
import os, sys, subprocess, json, textwrap, re
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q",
"outlines", "transformers", "accelerate", "sentencepiece", "pydantic"])
import torch
import outlines
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import Literal, Record, Union, Annotated
from pydantic import BaseModel, Subject
from enum import Enum
print("Torch:", torch.__version__)
print("CUDA out there:", torch.cuda.is_available())
print("Outlines:", getattr(outlines, "__version__", "unknown"))
machine = "cuda" if torch.cuda.is_available() else "cpu"
print("Utilizing machine:", machine)
MODEL_NAME = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
hf_model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16 if machine == "cuda" else torch.float32,
device_map="auto" if machine == "cuda" else None,
)
if machine == "cpu":
hf_model = hf_model.to(machine)
mannequin = outlines.from_transformers(hf_model, tokenizer)
def build_chat(user_text: str, system_text: str = "You're a exact assistant. Observe directions precisely.") -> str:
attempt:
msgs = [{"role": "system", "content": system_text}, {"role": "user", "content": user_text}]
return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
besides Exception:
return f"{system_text}nnUser: {user_text}nAssistant:"
def banner(title: str):
print("n" + "=" * 90)
print(title)
print("=" * 90)
We set up all required dependencies and initialize the Outlines pipeline with a light-weight instruct mannequin. We configure machine dealing with in order that the system routinely switches between CPU and GPU primarily based on availability. We additionally construct reusable helper features for chat formatting and clear part banners to construction the workflow.
def extract_json_object(s: str) -> str:
s = s.strip()
begin = s.discover("{")
if begin == -1:
return s
depth = 0
in_str = False
esc = False
for i in vary(begin, len(s)):
ch = s[i]
if in_str:
if esc:
esc = False
elif ch == "":
esc = True
elif ch == '"':
in_str = False
else:
if ch == '"':
in_str = True
elif ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return s[start:i + 1]
return s[start:]
def json_repair_minimal(dangerous: str) -> str:
dangerous = dangerous.strip()
final = dangerous.rfind("}")
if final != -1:
return dangerous[:last + 1]
return dangerous
def safe_validate(model_cls, raw_text: str):
uncooked = extract_json_object(raw_text)
attempt:
return model_cls.model_validate_json(uncooked)
besides Exception:
raw2 = json_repair_minimal(uncooked)
return model_cls.model_validate_json(raw2)
banner("2) Typed outputs (Literal / int / bool)")
sentiment = mannequin(
build_chat("Analyze the sentiment: 'This product fully modified my life!'. Return one label solely."),
Literal["Positive", "Negative", "Neutral"],
max_new_tokens=8,
)
print("Sentiment:", sentiment)
bp = mannequin(build_chat("What is the boiling level of water in Celsius? Return integer solely."), int, max_new_tokens=8)
print("Boiling level (int):", bp)
prime = mannequin(build_chat("Is 29 a chief quantity? Return true or false solely."), bool, max_new_tokens=6)
print("Is prime (bool):", prime)
We implement strong JSON extraction and minimal restore utilities to securely get well structured outputs from imperfect generations. We then display strongly typed technology utilizing Literal, int, and bool, making certain the mannequin returns values which can be strictly constrained. We validate how Outlines enforces deterministic type-safe outputs straight at technology time.
banner("3) Immediate templating (outlines.Template)")
tmpl = outlines.Template.from_string(textwrap.dedent("""
<|system|>
You're a strict classifier. Return ONLY one label.
<|person|>
Classify sentiment of this textual content:
{{ textual content }}
Labels: Optimistic, Detrimental, Impartial
<|assistant|>
""").strip())
templated = mannequin(tmpl(textual content="The meals was chilly however the workers had been sort."), Literal["Positive","Negative","Neutral"], max_new_tokens=8)
print("Template sentiment:", templated)
We use outlines.Template to construct structured immediate templates with strict output management. We dynamically inject person enter into the template whereas preserving function formatting and classification constraints. We display how templating improves reusability and ensures constant, constrained responses.
banner("4) Pydantic structured output (superior constraints)")
class TicketPriority(str, Enum):
low = "low"
medium = "medium"
excessive = "excessive"
pressing = "pressing"
IPv4 = Annotated[str, Field(pattern=r"^((25[0-5]|2[0-4]d|[01]?dd?).){3}(25[0-5]|2[0-4]d|[01]?dd?)$")]
ISODate = Annotated[str, Field(pattern=r"^d{4}-d{2}-d{2}$")]
class ServiceTicket(BaseModel):
precedence: TicketPriority
class: Literal["billing", "login", "bug", "feature_request", "other"]
requires_manager: bool
abstract: str = Subject(min_length=10, max_length=220)
action_items: Record[str] = Subject(min_length=1, max_length=6)
class NetworkIncident(BaseModel):
affected_service: Literal["dns", "vpn", "api", "website", "database"]
severity: Literal["sev1", "sev2", "sev3"]
public_ip: IPv4
start_date: ISODate
mitigation: Record[str] = Subject(min_length=2, max_length=6)
electronic mail = """
Topic: URGENT - Can not entry my account after cost
I paid for the premium plan 3 hours in the past and nonetheless cannot entry any options.
I've a shopper presentation in an hour and wish the analytics dashboard.
Please repair this instantly or refund my cost.
""".strip()
ticket_text = mannequin(
build_chat(
"Extract a ServiceTicket from this message.n"
"Return JSON ONLY matching the ServiceTicket schema.n"
"Motion objects have to be distinct.nnMESSAGE:n" + electronic mail
),
ServiceTicket,
max_new_tokens=240,
)
ticket = safe_validate(ServiceTicket, ticket_text) if isinstance(ticket_text, str) else ticket_text
print("ServiceTicket JSON:n", ticket.model_dump_json(indent=2))
We outline superior Pydantic schemas with enums, regex constraints, area limits, and structured lists. We extract a posh ServiceTicket object from uncooked electronic mail textual content and validate it utilizing schema-driven decoding. We additionally apply secure validation logic to deal with edge circumstances and guarantee robustness at manufacturing scale.
banner("5) Perform-calling fashion (schema -> args -> name)")
class AddArgs(BaseModel):
a: int = Subject(ge=-1000, le=1000)
b: int = Subject(ge=-1000, le=1000)
def add(a: int, b: int) -> int:
return a + b
args_text = mannequin(
build_chat("Return JSON ONLY with two integers a and b. Make a odd and b even."),
AddArgs,
max_new_tokens=80,
)
args = safe_validate(AddArgs, args_text) if isinstance(args_text, str) else args_text
print("Args:", args.model_dump())
print("add(a,b) =", add(args.a, args.b))
print("Tip: For greatest pace and fewer truncations, change Colab Runtime → GPU.")
We implement a function-calling fashion workflow by producing structured arguments that conform to an outlined schema. We validate the generated arguments, then safely execute a Python operate with these validated inputs. We display how schema-first technology permits managed instrument invocation and dependable LLM-driven computation.
In conclusion, we carried out a completely structured technology pipeline utilizing Outlines with sturdy typing, schema validation, and managed decoding. We demonstrated find out how to transfer from easy typed outputs to superior Pydantic-based extraction and function-style execution patterns. We additionally constructed resilience by way of JSON salvage and validation mechanisms, making the system strong towards imperfect mannequin outputs. General, we created a sensible and production-oriented framework for deterministic, secure, and schema-driven LLM functions.
Take a look at Full Codes right here. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.
