LlamaIndex has revealed legal-kb, a public reference software on GitHub. It’s described as a information base for authorized paperwork, powered by LlamaIndex Index v2 (the LlamaParse Platform). The challenge demonstrates a sample the workforce calls a Retrieval Harness for agentic retrieval.
The method differs from single-shot retrieval. As an alternative of 1 embedding search per question, an agent is given filesystem-style instruments. It might probably then crawl a big, evolving information base to unravel a process. The instruments mirror operations engineers already know: semantic and key phrase search, regex grep, file search, and skim.
What’s legal-kb?
legal-kb is a working TanStack Begin internet app, not a library. You check in, create a challenge, add recordsdata, and chat with an agent. Every challenge is mirrored as a managed LlamaCloud Index v2. Uploaded recordsdata are parsed and listed robotically within the background. The chat agent then queries that index dwell throughout every flip.
The Retrieval Harness, in plain phrases
The harness supplies a persistent knowledge pipeline over your paperwork. It connects to a knowledge supply, indexes it, and retains it up to date. On high of that pipeline, it exposes a set of instruments to the agent.
These instruments are intentionally near filesystem operations. An agent can listing recordsdata, learn a file, grep inside a file, or run hybrid search. As a result of the instruments are generic, you may plug the harness into your personal brokers.
The agent in src/lib/agent.ts is given 4 instruments. Every maps to an Index v2 retrieval API. The desk beneath lists them as applied.
| Instrument | Backing API | Key parameters | What it does |
|---|---|---|---|
retrieve |
beta.retrieval.retrieve |
question, top_k, score_threshold, rerank_top_n, file_name, file_version |
Runs hybrid semantic search; optionally available reranking; returns chunks plus citations |
findFiles |
beta.retrieval.discover |
file_name, file_name_contains |
Searches recordsdata by precise identify or substring; paginates robotically |
readFile |
beta.retrieval.learn |
file_id, offset, max_length |
Reads uncooked file content material, with offset and size home windows |
grepFile |
beta.retrieval.grep |
file_id, sample, context_chars, restrict |
Matches a sample in a single file; returns character positions |
The system immediate enforces an order. The agent should name findFiles first to ascertain the doc stock. It then narrows with retrieve, and confirms precise wording with readFile or grepFile earlier than citing.
The way it works below the hood
Uploads observe a transparent pipeline in src/lib/recordsdata.ts. Bytes are pushed to the challenge’s LlamaCloud supply listing. A File and ProjectFile row are written to PostgreSQL through Prisma. An index sync is triggered however not awaited; the UI polls standing till prepared.
Versioning is scoped to the (challenge, filename) pair. Re-uploading nda.pdf to the identical challenge produces v1, v2, v3 aspect by aspect. The retrieval layer filters on the model metadata discipline. This provides model management over the information base itself.
The agent makes use of the ToolLoopAgent from Vercel AI SDK 6. You choose OpenAI or Anthropic per flip and produce your personal keys. Reasoning is streamed: Claude fashions use prolonged pondering; OpenAI reasoning fashions use a medium reasoning effort.
Here’s a condensed however trustworthy view of the retrieve device and the agent.
import { LlamaCloud } from '@llamaindex/llama-cloud'
import { device, ToolLoopAgent } from 'ai'
import { z } from 'zod'
import { makeCitationId } from './citations'
// One device closure per index. Wraps Index v2 retrieval APIs.
perform createLlamaParseTools(apiKey: string, projectId: string, indexId: string) {
const shopper = new LlamaCloud({ apiKey })
const retrieve = device({
description: 'Run a semantic retrieval question towards an index.',
inputSchema: z.object({
question: z.string(),
top_k: z.quantity().nullable(),
score_threshold: z.quantity().nullable(),
rerank_top_n: z.quantity().nullable(), // set to allow reranking
file_name: z.string().nullable(), // metadata filter
file_version: z.quantity().nullable(),
}),
execute: async ({ question, top_k, score_threshold, rerank_top_n, file_name }) => {
const custom_filters = file_name
? { file_name: { operator: 'eq' as const, worth: file_name } }
: undefined
const response = await shopper.beta.retrieval.retrieve({
index_id: indexId,
project_id: projectId,
question,
top_k,
score_threshold,
rerank: rerank_top_n != null ? { enabled: true, top_n: rerank_top_n } : undefined,
custom_filters,
})
// Return a model-readable listing plus citations that drive the UI chips.
const citations = response.outcomes.map((r) => ({
id: makeCitationId(), // e.g. "c7f2qa"
fileName: r.metadata?.file_name,
rating: r.rerank_score ?? r.rating ?? null,
preview: r.content material.slice(0, 500),
}))
const formatted = response.outcomes
.map((r, i) => `### Outcome #${i + 1}nn${r.content material.slice(0, 600)}`)
.be part of('nn---nn')
return { formatted, citations }
},
})
// findFiles / readFile / grepFile observe the identical form, backed by
// shopper.beta.retrieval.discover / .learn / .grep
return { retrieve /* , findFiles, readFile, grepFile */ }
}
export perform buildAgent(mannequin, apiKey: string, projectId: string, indexId: string) {
return new ToolLoopAgent({
mannequin,
instruments: createLlamaParseTools(apiKey, projectId, indexId),
directions:
'At all times name findFiles first, floor each reply within the paperwork, ' +
'and cite ids inline as `cite:`.',
})
}
Solutions carry visible citations. Every retrieved chunk will get a brief id, reminiscent of cite:c7f2qa. The agent references that id inline, and the UI renders a clickable quotation chip. Clicking it opens the supply web page screenshot with bounding-box rectangles over the cited textual content.
Naive RAG vs the agentic Retrieval Harness
The harness is a distinct execution mannequin from single-shot RAG. The comparability beneath focuses on conduct.
| Dimension | Naive / single-shot RAG | Agentic Retrieval Harness (Index v2) |
|---|---|---|
| Retrieval move | One vector search per question | Multi-step device loop: discover → retrieve → learn/grep |
| Search modes | Vector similarity solely | Hybrid semantic search, key phrase, and regex grep |
| Context | Mounted top-k chunks | Agent reads full recordsdata or home windows on demand |
| Freshness | Static index | Persistent pipeline with sync and versioning |
| Precision management | Principally hidden | top_k, score_threshold, rerank_top_n uncovered |
| Citations | Chunk ids | Visible citations with web page screenshots and bboxes |
| Greatest match | Brief query answering | Lengthy-horizon doc duties |
Use circumstances, with examples
The design targets domains the place brokers navigate massive doc units. Authorized and fintech are the acknowledged examples.
- Take into account a contract query: ‘What discover is required to terminate the MSA?’ The agent lists recordsdata, runs
retrieve, then greps the precise clause. It solutions with a quotation to the precise web page. - Take into account due diligence throughout a knowledge room: An agent can
findFilesby identify, thenreadFileevery candidate. It cross-checks clauses with out a human opening each PDF. - Take into account a versioned coverage base: As a result of
retrieveaccepts afile_versionfilter, an agent can question a selected model. This helps change monitoring over time.
Reference implementation
perform match(textual content){
var t=textual content.toLowerCase(),finest=null,hit=0;
INTENTS.forEach(perform(it){
var c=0; it.kw.forEach(perform(okay){ if(t.indexOf(okay)>-1)c++; });
if(c>hit){hit=c;finest=it;}
});
return finest;
}
perform litFile(fn){
root.querySelectorAll(‘.file’).forEach(perform(f){
f.classList.toggle(‘lit’, f.getAttribute(‘data-fn’)===fn);
});
}
perform addStep(cls,label,html,delay){
return new Promise(perform(res){
setTimeout(perform(){
var s=doc.createElement(‘div’);s.className=”step”;
s.innerHTML=’
‘+label+’
‘+html;
feed.appendChild(s); ping(); res();
},delay);
});
}
var C1,C2;
perform run(forceKey){
if(busy)return; busy=true; go.disabled=true;
if(empty)empty.type.show=’none’;
feed.innerHTML=”;
var it = forceKey ? INTENTS.filter(perform(x){return x.key===forceKey;})[0] : match(enter.worth||”);
C1=rid(); C2=rid();
if(!it){
addStep(‘discover’,’findFiles’,callHTML(‘findFiles’,{},’3 recordsdata: Mutual_NDA.pdf (v2), MSA_Acme_Vendor.pdf (v1), Employment_Agreement.pdf (v1)’),150)
.then(perform(){ return addStep(‘ans’,’reply’,’
The listed paperwork don’t comprise sufficient info to reply that. Strive termination, confidentiality, cost phrases, non-compete, legal responsibility, or governing legislation.
‘,700); })
.then(achieved); return;
}
litFile(it.file);
// 1) findFiles (all the time first)
addStep(‘discover’,’findFiles’,callHTML(‘findFiles’,{},’3 recordsdata listed · ‘+it.file+’ (v’+it.ver+’) is a candidate’),150)
// 2) retrieve (hybrid search)
.then(perform(){ return addStep(”,’retrieve’,callHTML(‘retrieve’,{question:it.question,top_k:5,rerank_top_n:3},null),820); })
.then(perform(){ return addStep(”,’outcomes’,retrieveResults(it),780); })
// 3) grep to verify precise wording
.then(perform(){ return addStep(‘grep’,’grepFile’,callHTML(‘grepFile’,{file:it.file,sample:it.grep.slice(0,32)+’…’},’1 match confirmed on p.’+it.web page),820); })
// 4) grounded reply with citations
.then(perform(){ return addStep(‘ans’,’reply’,’
‘+answerHTML(it)+’
‘,780); })
.then(achieved);
}
perform achieved(){ busy=false; go.disabled=false; }
perform callHTML(identify,args,word){
var a=Object.keys(args).map(perform(okay){
var v=args[k];
var val = typeof v===’quantity’ ? ‘‘+v+’‘ : ‘“‘+esc(String(v))+'”‘;
return ‘‘+okay+’: ‘+val;
}).be part of(‘, ‘);
var line=”
→ device “+identify+'({ ‘+a+’ })’;
if(word) line+=’
✓ ‘+esc(word)+’‘;
line+=’
‘;
return line;
}
perform retrieveResults(it){
var s2=(it.score-0.14).toFixed(3);
var h=”
‘
Outcome #1 · ‘+it.file+’ · p.’+it.web page+’rating ‘+it.rating.toFixed(3)+’ · cite:’+C1+’
‘+esc(it.chunk.slice(0,150))+’…
‘+
‘
Outcome #2 · ‘+it.file+’ · p.’+it.web page+’rating ‘+s2+’ · cite:’+C2+’
‘+esc(it.chunk.slice(120,250))+’…
‘+
‘
‘;
return h;
}
perform answerHTML(it){
var html=esc(it.reply)
.change(‘§CITE§’,’cite:’+C1+’‘)
.change(‘§CITE2§’,’cite:’+C2+’‘);
// stash for modal
root._cur=it;
return html;
}
// quotation modal
var modal=root.querySelector(‘#modal’), shot=root.querySelector(‘#shot’),
mpv=root.querySelector(‘#mpv’), mt=root.querySelector(‘#mt’);
feed.addEventListener(‘click on’,perform(e){
var chip=e.goal.closest(‘.citechip’); if(!chip)return;
var it=root._cur; if(!it)return;
mt.textContent=it.file+’ · web page ‘+it.web page+’ · v’+it.ver;
shot.innerHTML=’
‘+esc(it.chunk)+’
‘+
”;
mpv.textContent=it.chunk;
modal.classList.add(‘on’); ping();
});
root.querySelector(‘#mx’).onclick=perform(){modal.classList.take away(‘on’);ping();};
modal.onclick=perform(e){ if(e.goal===modal){modal.classList.take away(‘on’);ping();} };
go.onclick=perform(){ run(null); };
enter.addEventListener(‘keydown’,perform(e){ if(e.key===’Enter’)run(null); });
// auto-resize for WordPress embed
perform ping(){
strive{
var h=doc.getElementById(‘mtp-harness’).offsetHeight+40;
mum or dad.postMessage({sort:’mtp-harness-height’,peak:h},’*’);
}catch(e){}
}
window.addEventListener(‘load’,ping);
window.addEventListener(‘resize’,ping);
setTimeout(ping,300);
})();
