LlamaIndex ‘legal-kb’: Agentic Retrieval over Index v2 with retrieve, discover, learn, and grep Instruments

July 5, 2026

4

LlamaIndex has revealed legal-kb, a public reference software on GitHub. It’s described as a information base for authorized paperwork, powered by LlamaIndex Index v2 (the LlamaParse Platform). The challenge demonstrates a sample the workforce calls a Retrieval Harness for agentic retrieval.

The method differs from single-shot retrieval. As an alternative of 1 embedding search per question, an agent is given filesystem-style instruments. It might probably then crawl a big, evolving information base to unravel a process. The instruments mirror operations engineers already know: semantic and key phrase search, regex grep, file search, and skim.

What’s legal-kb?

legal-kb is a working TanStack Begin internet app, not a library. You check in, create a challenge, add recordsdata, and chat with an agent. Every challenge is mirrored as a managed LlamaCloud Index v2. Uploaded recordsdata are parsed and listed robotically within the background. The chat agent then queries that index dwell throughout every flip.

The Retrieval Harness, in plain phrases

The harness supplies a persistent knowledge pipeline over your paperwork. It connects to a knowledge supply, indexes it, and retains it up to date. On high of that pipeline, it exposes a set of instruments to the agent.

These instruments are intentionally near filesystem operations. An agent can listing recordsdata, learn a file, grep inside a file, or run hybrid search. As a result of the instruments are generic, you may plug the harness into your personal brokers.

The agent in src/lib/agent.ts is given 4 instruments. Every maps to an Index v2 retrieval API. The desk beneath lists them as applied.

Instrument	Backing API	Key parameters	What it does
`retrieve`	`beta.retrieval.retrieve`	`question`, `top_k`, `score_threshold`, `rerank_top_n`, `file_name`, `file_version`	Runs hybrid semantic search; optionally available reranking; returns chunks plus citations
`findFiles`	`beta.retrieval.discover`	`file_name`, `file_name_contains`	Searches recordsdata by precise identify or substring; paginates robotically
`readFile`	`beta.retrieval.learn`	`file_id`, `offset`, `max_length`	Reads uncooked file content material, with offset and size home windows
`grepFile`	`beta.retrieval.grep`	`file_id`, `sample`, `context_chars`, `restrict`	Matches a sample in a single file; returns character positions

The system immediate enforces an order. The agent should name findFiles first to ascertain the doc stock. It then narrows with retrieve, and confirms precise wording with readFile or grepFile earlier than citing.

The way it works below the hood

Uploads observe a transparent pipeline in src/lib/recordsdata.ts. Bytes are pushed to the challenge’s LlamaCloud supply listing. A File and ProjectFile row are written to PostgreSQL through Prisma. An index sync is triggered however not awaited; the UI polls standing till prepared.

Versioning is scoped to the (challenge, filename) pair. Re-uploading nda.pdf to the identical challenge produces v1, v2, v3 aspect by aspect. The retrieval layer filters on the model metadata discipline. This provides model management over the information base itself.

The agent makes use of the ToolLoopAgent from Vercel AI SDK 6. You choose OpenAI or Anthropic per flip and produce your personal keys. Reasoning is streamed: Claude fashions use prolonged pondering; OpenAI reasoning fashions use a medium reasoning effort.

Here’s a condensed however trustworthy view of the retrieve device and the agent.

import { LlamaCloud } from '@llamaindex/llama-cloud'
import { device, ToolLoopAgent } from 'ai'
import { z } from 'zod'
import { makeCitationId } from './citations'

// One device closure per index. Wraps Index v2 retrieval APIs.
perform createLlamaParseTools(apiKey: string, projectId: string, indexId: string) {
  const shopper = new LlamaCloud({ apiKey })

  const retrieve = device({
    description: 'Run a semantic retrieval question towards an index.',
    inputSchema: z.object({
      question: z.string(),
      top_k: z.quantity().nullable(),
      score_threshold: z.quantity().nullable(),
      rerank_top_n: z.quantity().nullable(),   // set to allow reranking
      file_name: z.string().nullable(),      // metadata filter
      file_version: z.quantity().nullable(),
    }),
    execute: async ({ question, top_k, score_threshold, rerank_top_n, file_name }) => {
      const custom_filters = file_name
        ? { file_name: { operator: 'eq' as const, worth: file_name } }
        : undefined

      const response = await shopper.beta.retrieval.retrieve({
        index_id: indexId,
        project_id: projectId,
        question,
        top_k,
        score_threshold,
        rerank: rerank_top_n != null ? { enabled: true, top_n: rerank_top_n } : undefined,
        custom_filters,
      })

      // Return a model-readable listing plus citations that drive the UI chips.
      const citations = response.outcomes.map((r) => ({
        id: makeCitationId(),                    // e.g. "c7f2qa"
        fileName: r.metadata?.file_name,
        rating: r.rerank_score ?? r.rating ?? null,
        preview: r.content material.slice(0, 500),
      }))
      const formatted = response.outcomes
        .map((r, i) => `### Outcome #${i + 1}nn${r.content material.slice(0, 600)}`)
        .be part of('nn---nn')
      return { formatted, citations }
    },
  })

  // findFiles / readFile / grepFile observe the identical form, backed by
  // shopper.beta.retrieval.discover / .learn / .grep
  return { retrieve /* , findFiles, readFile, grepFile */ }
}

export perform buildAgent(mannequin, apiKey: string, projectId: string, indexId: string) {
  return new ToolLoopAgent({
    mannequin,
    instruments: createLlamaParseTools(apiKey, projectId, indexId),
    directions:
      'At all times name findFiles first, floor each reply within the paperwork, ' +
      'and cite ids inline as `cite:`.',
  })
}

Solutions carry visible citations. Every retrieved chunk will get a brief id, reminiscent of cite:c7f2qa. The agent references that id inline, and the UI renders a clickable quotation chip. Clicking it opens the supply web page screenshot with bounding-box rectangles over the cited textual content.

Naive RAG vs the agentic Retrieval Harness

The harness is a distinct execution mannequin from single-shot RAG. The comparability beneath focuses on conduct.

Dimension	Naive / single-shot RAG	Agentic Retrieval Harness (Index v2)
Retrieval move	One vector search per question	Multi-step device loop: discover → retrieve → learn/grep
Search modes	Vector similarity solely	Hybrid semantic search, key phrase, and regex grep
Context	Mounted top-k chunks	Agent reads full recordsdata or home windows on demand
Freshness	Static index	Persistent pipeline with sync and versioning
Precision management	Principally hidden	`top_k`, `score_threshold`, `rerank_top_n` uncovered
Citations	Chunk ids	Visible citations with web page screenshots and bboxes
Greatest match	Brief query answering	Lengthy-horizon doc duties

Use circumstances, with examples

The design targets domains the place brokers navigate massive doc units. Authorized and fintech are the acknowledged examples.

Take into account a contract query: ‘What discover is required to terminate the MSA?’ The agent lists recordsdata, runs retrieve, then greps the precise clause. It solutions with a quotation to the precise web page.
Take into account due diligence throughout a knowledge room: An agent can findFiles by identify, then readFile every candidate. It cross-checks clauses with out a human opening each PDF.
Take into account a versioned coverage base: As a result of retrieve accepts a file_version filter, an agent can question a selected model. This helps change monitoring over time.

Reference implementation

/g,’>’);}

perform match(textual content){
var t=textual content.toLowerCase(),finest=null,hit=0;
INTENTS.forEach(perform(it){
var c=0; it.kw.forEach(perform(okay){ if(t.indexOf(okay)>-1)c++; });
if(c>hit){hit=c;finest=it;}
});
return finest;
}

perform litFile(fn){
root.querySelectorAll(‘.file’).forEach(perform(f){
f.classList.toggle(‘lit’, f.getAttribute(‘data-fn’)===fn);
});
}

perform addStep(cls,label,html,delay){
return new Promise(perform(res){
setTimeout(perform(){
var s=doc.createElement(‘div’);s.className=”step”;
s.innerHTML=’

‘+label+’

‘+html;
feed.appendChild(s); ping(); res();
},delay);
});
}

var C1,C2;
perform run(forceKey){
if(busy)return; busy=true; go.disabled=true;
if(empty)empty.type.show=’none’;
feed.innerHTML=”;
var it = forceKey ? INTENTS.filter(perform(x){return x.key===forceKey;})[0] : match(enter.worth||”);
C1=rid(); C2=rid();

if(!it){
addStep(‘discover’,’findFiles’,callHTML(‘findFiles’,{},’3 recordsdata: Mutual_NDA.pdf (v2), MSA_Acme_Vendor.pdf (v1), Employment_Agreement.pdf (v1)’),150)
.then(perform(){ return addStep(‘ans’,’reply’,’

The listed paperwork don’t comprise sufficient info to reply that. Strive termination, confidentiality, cost phrases, non-compete, legal responsibility, or governing legislation.

‘,700); })
.then(achieved); return;
}

litFile(it.file);

// 1) findFiles (all the time first)
addStep(‘discover’,’findFiles’,callHTML(‘findFiles’,{},’3 recordsdata listed · ‘+it.file+’ (v’+it.ver+’) is a candidate’),150)
// 2) retrieve (hybrid search)
.then(perform(){ return addStep(”,’retrieve’,callHTML(‘retrieve’,{question:it.question,top_k:5,rerank_top_n:3},null),820); })
.then(perform(){ return addStep(”,’outcomes’,retrieveResults(it),780); })
// 3) grep to verify precise wording
.then(perform(){ return addStep(‘grep’,’grepFile’,callHTML(‘grepFile’,{file:it.file,sample:it.grep.slice(0,32)+’…’},’1 match confirmed on p.’+it.web page),820); })
// 4) grounded reply with citations
.then(perform(){ return addStep(‘ans’,’reply’,’

‘+answerHTML(it)+’

‘,780); })
.then(achieved);
}

perform achieved(){ busy=false; go.disabled=false; }

perform callHTML(identify,args,word){
var a=Object.keys(args).map(perform(okay){
var v=args[k];
var val = typeof v===’quantity’ ? ‘‘+v+’‘ : ‘“‘+esc(String(v))+'”‘;
return ‘‘+okay+’: ‘+val;
}).be part of(‘, ‘);
var line=”

→ device “+identify+'({ ‘+a+’ })’;
if(word) line+=’
✓ ‘+esc(word)+’‘;
line+=’

‘;
return line;
}

perform retrieveResults(it){
var s2=(it.score-0.14).toFixed(3);
var h=”

“+
‘

Outcome #1 · ‘+it.file+’ · p.’+it.web page+’rating ‘+it.rating.toFixed(3)+’ · cite:’+C1+’

‘+esc(it.chunk.slice(0,150))+’…

‘+
‘

Outcome #2 · ‘+it.file+’ · p.’+it.web page+’rating ‘+s2+’ · cite:’+C2+’

‘+esc(it.chunk.slice(120,250))+’…

‘+
‘

‘;
return h;
}

perform answerHTML(it){
var html=esc(it.reply)
.change(‘§CITE§’,’cite:’+C1+’‘)
.change(‘§CITE2§’,’cite:’+C2+’‘);
// stash for modal
root._cur=it;
return html;
}

// quotation modal
var modal=root.querySelector(‘#modal’), shot=root.querySelector(‘#shot’),
mpv=root.querySelector(‘#mpv’), mt=root.querySelector(‘#mt’);
feed.addEventListener(‘click on’,perform(e){
var chip=e.goal.closest(‘.citechip’); if(!chip)return;
var it=root._cur; if(!it)return;
mt.textContent=it.file+’ · web page ‘+it.web page+’ · v’+it.ver;
shot.innerHTML=’

‘+esc(it.chunk)+’

‘+
”;
mpv.textContent=it.chunk;
modal.classList.add(‘on’); ping();
});
root.querySelector(‘#mx’).onclick=perform(){modal.classList.take away(‘on’);ping();};
modal.onclick=perform(e){ if(e.goal===modal){modal.classList.take away(‘on’);ping();} };

go.onclick=perform(){ run(null); };
enter.addEventListener(‘keydown’,perform(e){ if(e.key===’Enter’)run(null); });

// auto-resize for WordPress embed
perform ping(){
strive{
var h=doc.getElementById(‘mtp-harness’).offsetHeight+40;
mum or dad.postMessage({sort:’mtp-harness-height’,peak:h},’*’);
}catch(e){}
}
window.addEventListener(‘load’,ping);
window.addEventListener(‘resize’,ping);
setTimeout(ping,300);
})();

LlamaIndex ‘legal-kb’: Agentic Retrieval over Index v2 with retrieve, discover, learn, and grep Instruments

What’s legal-kb?

The Retrieval Harness, in plain phrases

The way it works below the hood

Naive RAG vs the agentic Retrieval Harness

Use circumstances, with examples

Reference implementation

Related Articles

Get Licensed Earlier than the CCNA Refresh

Finest practices for utilizing AI to generate C# code

Is the Oura membership value it? 5 the reason why I believe it’s

Latest Articles

Get Licensed Earlier than the CCNA Refresh

Finest practices for utilizing AI to generate C# code

Is the Oura membership value it? 5 the reason why I believe it’s

Greatest Wi-Fi Routers (2026): My Sincere Picks After Testing 40+

Alibaba Bans Claude Code: The Backdoor Scare Defined