A Coding Information to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

May 28, 2026

33

On this tutorial, we construct a whole pgvector playground inside Google Colab and discover how PostgreSQL can work as a robust vector database for contemporary AI purposes. We begin by putting in PostgreSQL, compiling the pgvector extension, connecting by Psycopg, and registering vector varieties for easy Python integration. Then, we create embeddings with SentenceTransformers, retailer them in PostgreSQL, construct HNSW indexes, and run semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. By means of this workflow, we learn the way pgvector helps sensible retrieval-augmented technology, advice, similarity search, and hybrid search programs utilizing solely open-source instruments.

Copy Code

import os
import subprocess
import sys
import time
def sh(cmd: str, test: bool = True):
   """Run a shell command, streaming a compact log."""
   print(f"  $ {cmd}")
   return subprocess.run(cmd, shell=True, test=test,
                         stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
print("[0/10] Putting in PostgreSQL + constructing pgvector (≈1–2 min)...")
sh("apt-get -qq replace")
sh("apt-get -qq set up -y postgresql postgresql-contrib "
  "postgresql-server-dev-all build-essential git")
if not os.path.exists("/tmp/pgvector"):
   sh("git clone --depth 1 https://github.com/pgvector/pgvector.git /tmp/pgvector")
sh("cd /tmp/pgvector && make && make set up")
sh("service postgresql begin")
time.sleep(3)
sh("""sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" """)
print("[0/10] Putting in Python packages...")
sh(f"{sys.executable} -m pip set up -q pgvector psycopg[binary] "
  f"sentence-transformers numpy")

We arrange the whole PostgreSQL and pgvector setting. We set up the required system packages, clone and construct pgvector from supply, begin the PostgreSQL service, and configure the database password. We additionally set up the Python dependencies wanted to connect with PostgreSQL and work with vector embeddings.

Copy Code

import numpy as np
import psycopg
from pgvector import HalfVector, SparseVector
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
print("n[1/10] Connecting and enabling the 'vector' extension...")
conn = psycopg.join(
   "host=127.0.0.1 port=5432 dbname=postgres consumer=postgres password=postgres",
   autocommit=True,
)
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)
ver = conn.execute("SELECT extversion FROM pg_extension WHERE extname="vector"").fetchone()[0]
print(f"      pgvector model: {ver}")
print("n[2/10] Loading embedding mannequin + encoding corpus...")
mannequin = SentenceTransformer("all-MiniLM-L6-v2")
DIM = mannequin.get_sentence_embedding_dimension()
corpus = [
   ("Octopuses have three hearts and blue blood.",             "animals"),
   ("Transformers revolutionized natural language processing.","technology"),
   ("Quantum computers exploit superposition and entanglement.","technology"),
   ("GPUs accelerate deep learning by parallelizing matrix math.","technology"),
   ("Sourdough bread relies on wild yeast and lactobacilli.",  "food"),
   ("Dark chocolate contains flavonoid antioxidants.",         "food"),
   ("A black hole's gravity is so strong light cannot escape.","space")
]
contents   = [c for c, _ in corpus]
classes = [k for _, k in corpus]
embeddings = mannequin.encode(contents, normalize_embeddings=True)
conn.execute("DROP TABLE IF EXISTS paperwork")
conn.execute(f"""
   CREATE TABLE paperwork (
       id        bigserial PRIMARY KEY,
       content material   textual content,
       class  textual content,
       embedding vector({DIM})
   )
""")
with conn.cursor() as cur:
   cur.executemany(
       "INSERT INTO paperwork (content material, class, embedding) VALUES (%s, %s, %s)",
       record(zip(contents, classes, [np.asarray(e) for e in embeddings])),
   )
print(f"      Inserted {len(corpus)} paperwork with {DIM}-d embeddings.")

We connect with PostgreSQL, allow the pgvector extension, and register vector assist with Psycopg. We load the SentenceTransformers mannequin, outline a small textual content corpus, generate normalized embeddings, and create a PostgreSQL desk for storing paperwork. We then insert every doc with its class and vector illustration in order that we will carry out semantic search later.

Copy Code

print("n[3/10] Constructing HNSW index and operating semantic search...")
conn.execute(
   "CREATE INDEX ON paperwork USING hnsw (embedding vector_cosine_ops) "
   "WITH (m = 16, ef_construction = 64)"
)
conn.execute("SET hnsw.ef_search = 100")
def semantic_search(question: str, okay: int = 4):
   q = np.asarray(mannequin.encode(question, normalize_embeddings=True))
   return conn.execute(
       "SELECT content material, class, embedding <=> %s AS distance "
       "FROM paperwork ORDER BY distance LIMIT %s",
       (q, okay),
   ).fetchall()
for content material, cat, dist in semantic_search("animals which might be unusually fast"):
   print(f"      {dist:.3f}  [{cat:<10}] {content material}")
print("n[4/10] Filtered search (solely class = 'house')...")
q = np.asarray(mannequin.encode("objects with excessive gravity", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content material, embedding <=> %s AS distance "
   "FROM paperwork WHERE class = %s ORDER BY distance LIMIT 3",
   (q, "house"),
).fetchall()
for content material, dist in rows:
   print(f"      {dist:.3f}  {content material}")
print("n[5/10] Similar question beneath totally different distance metrics (high hit every)...")
q = np.asarray(mannequin.encode("brewing a scorching caffeinated drink", normalize_embeddings=True))
for op, label in [("<->", "L2"), ("<=>", "cosine"), ("<#>", "neg-inner"), ("<+>", "L1")]:
   content material, rating = conn.execute(
       f"SELECT content material, embedding {op} %s AS s FROM paperwork ORDER BY s LIMIT 1", (q,)
   ).fetchone()
   print(f"      {label:<10} {rating:+.3f}  {content material}")

We construct an HNSW index on the embedding column to allow quicker, extra environment friendly vector search. We outline a semantic search operate that converts a question into an embedding and retrieves probably the most comparable paperwork utilizing cosine similarity. We additionally carry out metadata-filtered search and examine totally different pgvector distance operators akin to L2, cosine, damaging internal product, and L1.

Copy Code

print("n[6/10] Half-precision storage with halfvec...")
conn.execute(f"ALTER TABLE paperwork ADD COLUMN IF NOT EXISTS embedding_half halfvec({DIM})")
conn.execute("UPDATE paperwork SET embedding_half = embedding::halfvec")
conn.execute(
   "CREATE INDEX ON paperwork USING hnsw (embedding_half halfvec_cosine_ops)"
)
q_half = HalfVector(mannequin.encode("the galaxy we reside in", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content material, embedding_half <=> %s AS d FROM paperwork ORDER BY d LIMIT 2",
   (q_half,),
).fetchall()
for content material, d in rows:
   print(f"      {d:.3f}  {content material}")
print("n[7/10] Binary quantization (Hamming) + actual re-rank...")
conn.execute(
   f"CREATE INDEX ON paperwork "
   f"USING hnsw ((binary_quantize(embedding)::bit({DIM})) bit_hamming_ops)"
)
q = np.asarray(mannequin.encode("parallel {hardware} for AI coaching", normalize_embeddings=True))
rerank_sql = f"""
   SELECT content material, candidates.embedding <=> %(q)s AS exact_distance
   FROM (
       SELECT content material, embedding
       FROM paperwork
       ORDER BY binary_quantize(embedding)::bit({DIM})
             <~> binary_quantize(%(q)s)::bit({DIM})
       LIMIT 8
   ) AS candidates
   ORDER BY exact_distance
   LIMIT 3
"""
for content material, d in conn.execute(rerank_sql, {"q": q}).fetchall():
   print(f"      {d:.3f}  {content material}")
print("n[8/10] Native sparse vectors...")
conn.execute("DROP TABLE IF EXISTS sparse_items")
conn.execute("CREATE TABLE sparse_items (id bigserial PRIMARY KEY, embedding sparsevec(10))")
sparse_data = [
   SparseVector({0: 1.0, 3: 2.0, 7: 1.5}, 10),
   SparseVector({1: 0.5, 3: 1.0, 9: 3.0}, 10),
   SparseVector({0: 0.2, 4: 2.5, 7: 0.8}, 10),
]
with conn.cursor() as cur:
   cur.executemany("INSERT INTO sparse_items (embedding) VALUES (%s)",
                   [(v,) for v in sparse_data])
query_sparse = SparseVector({0: 1.0, 7: 1.0}, 10)
rows = conn.execute(
   "SELECT id, embedding, embedding <#> %s AS neg_ip "
   "FROM sparse_items ORDER BY neg_ip LIMIT 3",
   (query_sparse,),
).fetchall()
for _id, vec, neg_ip in rows:
   print(f"      id={_id}  inner_product={-neg_ip:.2f}  nnz_indices={vec.indices()}")

We discover superior pgvector storage and retrieval strategies past customary dense vectors. We convert embeddings into half-precision vectors to scale back storage, use binary quantization with Hamming seek for quick candidate retrieval, after which re-rank outcomes with full-precision vectors. We additionally create sparse vectors and question them utilizing inner-product similarity, which is beneficial for keyword-weighted or SPLADE-style retrieval.

Copy Code

print("n[9/10] Hybrid search (vector + full-text) by way of RRF...")
user_query = "quick animal"
qvec = np.asarray(mannequin.encode(user_query, normalize_embeddings=True))
hybrid_sql = """
WITH semantic AS (
   SELECT id, RANK() OVER (ORDER BY embedding <=> %(qvec)s) AS rank
   FROM paperwork
   ORDER BY embedding <=> %(qvec)s
   LIMIT 20
),
key phrase AS (
   SELECT d.id,
          RANK() OVER (ORDER BY ts_rank_cd(to_tsvector('english', d.content material), q) DESC) AS rank
   FROM paperwork d, plainto_tsquery('english', %(qtext)s) AS q
   WHERE to_tsvector('english', d.content material) @@ q
   LIMIT 20
)
SELECT d.content material,
      COALESCE(1.0 / (60 + semantic.rank), 0.0)
    + COALESCE(1.0 / (60 + key phrase.rank),  0.0) AS rrf_score
FROM paperwork d
LEFT JOIN semantic ON d.id = semantic.id
LEFT JOIN key phrase  ON d.id = key phrase.id
WHERE semantic.id IS NOT NULL OR key phrase.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 4
"""
for content material, rating in conn.execute(hybrid_sql, {"qvec": qvec, "qtext": user_query}).fetchall():
   print(f"      {rating:.5f}  {content material}")
print("n[10/10] Aggregating vectors with AVG (class centroid)...")
centroid = conn.execute(
   "SELECT AVG(embedding) FROM paperwork WHERE class = %s", ("meals",)
).fetchone()[0]
typical = conn.execute(
   "SELECT content material, embedding <=> %s AS d FROM paperwork "
   "WHERE class = %s ORDER BY d LIMIT 1",
   (np.asarray(centroid), "meals"),
).fetchone()
print(f"      Centroid dim = {len(centroid)}")
print(f"      Most consultant 'meals' doc: {typical[0]}")
print("n Performed. You now have a working pgvector playground inside Colab.")
print("   Attempt enhancing `corpus`, the queries, or swap in your personal embedding mannequin.")

We mix semantic vector search with PostgreSQL full-text search utilizing Reciprocal Rank Fusion. We retrieve outcomes from each semantic and key phrase rankings, merge their scores, and produce a stronger hybrid search output. Lastly, we compute the typical embedding for a class and use it as a centroid to search out probably the most consultant doc in that group.

In conclusion, we have now a working pgvector-based retrieval system that runs fully in Google Colab, with out exterior providers or API keys. We used PostgreSQL not simply as a standard relational database, however as a versatile vector search engine that helps dense vectors, half-precision vectors, binary-quantized retrieval, sparse vectors, full-text search, and aggregation. We additionally noticed how metadata filtering, HNSW indexing, Reciprocal Rank Fusion, and centroid-based evaluation make pgvector helpful for real-world AI search pipelines.

Try the Full Codes with Pocket book right here. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 150k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be a part of us on telegram as nicely.

Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us

The submit A Coding Information to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System appeared first on MarkTechPost.

A Coding Information to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

Related Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Latest Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Catch Mercury shining at its greatest on June 15 earlier than it slips again into the solar’s glare

How xAI, Tesla, X, Neuralink, and SpaceX Are Converging