Saturday, November 29, 2025

OceanBase Releases seekdb: An Open Supply AI Native Hybrid Search Database for Multi-model RAG and AI Brokers


AI purposes not often cope with one clear desk. They combine person profiles, chat logs, JSON metadata, embeddings, and typically spatial knowledge. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI targeted database (beneath the Apache 2.0 license). seekdb is described as an AI native search database that unifies relational knowledge, vector knowledge, textual content, JSON, and GIS in a single engine and exposes hybrid search and in database AI workflows. 

What’s seekdb?

seekdb is positioned because the light-weight, embedded model of the OceanBase engine, geared toward AI purposes relatively than normal objective distributed deployments. It runs as a single node database, helps embedded mode and consumer or server mode, and stays appropriate with MySQL drivers and SQL syntax.

Within the functionality matrix, seekdb is marked as:

  • Embedded database supported
  • Standalone database supported
  • Distributed database not supported

whereas the complete OceanBase product covers the distributed case.

From a knowledge mannequin perspective, seekdb helps:

  • Relational knowledge with normal SQL
  • Vector search
  • Full textual content search
  • JSON knowledge
  • Spatial GIS knowledge

all inside one storage and indexing layer.

Hybrid search because the core function

The primary function OceanBase pushes is hybrid search. That is search that mixes vector based mostly semantic retrieval, full textual content key phrase retrieval, and scalar filters in a single question and a single rating step.

seekdb implements hybrid search by means of a system bundle named DBMS_HYBRID_SEARCH with two entry factors:

  • DBMS_HYBRID_SEARCH.SEARCH which returns outcomes as JSON, sorted by relevance
  • DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution

The hybrid search path can run:

  • pure vector search
  • pure full textual content search
  • mixed hybrid search

and may push relational filters and joins down into storage. It additionally helps question reranking methods like weighted scores and reciprocal rank fusion and may plug in massive language mannequin based mostly re-rankers.

For retrieval augmented technology (RAG) and agent reminiscence, this implies you possibly can write a single SQL question that does semantic matching on embeddings, precise matching on product codes or correct nouns, and relational filtering on person or tenant scopes.

Vector and full textual content engine particulars

At its core, seekdb exposes a trendy vector and full textual content stack.

For vectors, seekdb:

  • helps dense vectors and sparse vectors
  • helps Manhattan, Euclidean, interior product, and cosine distance metrics
  • offers in reminiscence index sorts akin to HNSW, HNSW SQ, HNSW BQ
  • offers disk based mostly index sorts together with IVF and IVF PQ

Hybrid vector index present how one can retailer uncooked textual content, let seekdb name an embedding mannequin mechanically, and have the system preserve the corresponding vector index with no separate preprocessing pipeline.

For textual content, seekdb provides full textual content search with:

  • key phrase, phrase, and Boolean queries
  • BM25 rating for relevance
  • a number of tokenizer modes

The important thing level is that full textual content and vector indexes are firstclass and are built-in in the identical question planner as scalar indexes and GIS indexes, so hybrid search doesn’t want exterior orchestration.

AI capabilities contained in the database

seekdb contains inbuilt AI operate expressions that allow you to name fashions instantly from SQL, with no separate utility service mediating each name. The primary capabilities are:

  • AI_EMBED to transform textual content into embeddings
  • AI_COMPLETE for textual content technology utilizing a chat or completion mannequin
  • AI_RERANK to rerank an inventory of candidates
    AI_PROMPT to assemble immediate templates and dynamic values right into a JSON object for AI_COMPLETE

Mannequin metadata and endpoints are managed by the DBMS_AI_SERVICE bundle, which helps you to register exterior suppliers, set URLs, and configure keys, all on the database aspect. 

Multimodal knowledge and workloads

seekdb is constructed to deal with a number of knowledge modalities in a single node. it has a multimodal knowledge and indexing layer that covers vectors, textual content, JSON, and GIS, and a multi-model compute layer for hybrid workloads throughout vector, full textual content, and scalar circumstances.

It additionally offers JSON indexes for metadata queries and GIS indexes for spatial circumstances. This permits queries like:

  • discover semantically comparable paperwork
  • filter by JSON metadata like tenant, area, or class
  • constrain by spatial vary or polygon

with out leaving the identical engine.

As a result of seekdb is derived from the OceanBase engine, it inherits ACID transactions, row and column hybrid storage, and vectorized execution, though excessive scale distributed deployments stay a job for the complete OceanBase database.

Comparability Desk

Key Takeaways

  1. AI native hybrid search: seekdb unifies vector search, full textual content search and relational filtering in a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and agent workloads can run multi sign retrieval in a single question as a substitute of sewing collectively a number of engines.
  2. Multimodal knowledge in a single engine: seekdb shops and indexes relational knowledge, vectors, textual content, JSON and GIS in the identical engine, which lets AI purposes hold paperwork, embeddings and metadata constant with out sustaining separate databases.
  3. In database AI capabilities for RAG: With AI_EMBED, AI_COMPLETE, AI_RERANK and AI_PROMPT, seekdb can name embedding fashions, LLMs and rerankers instantly from SQL, which simplifies RAG pipelines and strikes extra orchestration logic into the database layer.
  4. Single node, embedded pleasant design: seekdb is a single node, MySQL appropriate engine that helps embedded and standalone modes, whereas distributed, massive scale deployments stay the function of full OceanBase, which makes seekdb appropriate for native, edge and repair embedded AI workloads.
  5. Open supply and gear ecosystem: seekdb is open sourced beneath Apache 2.0 and integrates with a rising ecosystem of AI instruments and frameworks, with Python help through pyseekdb and MCP based mostly integration for code assistants and brokers, so it may act as a unified knowledge airplane for AI purposes.

Try the Repo and Undertaking. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be part of us on telegram as effectively.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Related Articles

Latest Articles