Discovering “Silver Bullet” Agentic AI Flows with syftr

October 26, 2025

124

TL; DR

The quickest approach to stall an agentic AI mission is to reuse a workflow that now not matches. Utilizing syftr, we recognized “silver bullet” flows for each low-latency and high-accuracy priorities that constantly carry out properly throughout a number of datasets. These flows outperform random seeding and switch studying early in optimization. They recuperate about 75% of the efficiency of a full syftr run at a fraction of the price, which makes them a quick start line however nonetheless leaves room to enhance.

If in case you have ever tried to reuse an agentic workflow from one mission in one other, you know the way usually it falls flat. The mannequin’s context size won’t be sufficient. The brand new use case may require deeper reasoning. Or latency necessities may need modified.

Even when the outdated setup works, it could be overbuilt – and overpriced – for the brand new drawback. In these instances, a less complicated, quicker setup is likely to be all you want.

We got down to reply a easy query: Are there agentic flows that carry out properly throughout many use instances, so you’ll be able to select one based mostly in your priorities and transfer ahead?

Our analysis suggests the reply is sure, and we name them “silver bullets.”

We recognized silver bullets for each low-latency and high-accuracy targets. In early optimization, they constantly beat switch studying and random seeding, whereas avoiding the total price of a full syftr run.

Within the sections that observe, we clarify how we discovered them and the way they stack up towards different seeding methods.

A fast primer on Pareto-frontiers

You don’t want a math diploma to observe alongside, however understanding the Pareto-frontier will make the remainder of this put up a lot simpler to observe.

Determine 1 is an illustrative scatter plot – not from our experiments – exhibiting accomplished syftr optimization trials. Sub-plot A and Sub-plot B are equivalent, however B highlights the primary three Pareto-frontiers: P1 (crimson), P2 (inexperienced), and P3 (blue).

Every trial: A particular circulation configuration is evaluated on accuracy and common latency (greater accuracy, decrease latency are higher).
Pareto-frontier (P1): No different circulation has each greater accuracy and decrease latency. These are non-dominated.
Non-Pareto flows: A minimum of one Pareto circulation beats them on each metrics. These are dominated.
P2, P3: Should you take away P1, P2 turns into the next-best frontier, then P3, and so forth.

You may select between Pareto flows relying in your priorities (e.g., favoring low latency over most accuracy), however there’s no motive to decide on a dominated circulation — there’s at all times a greater possibility on the frontier.

Optimizing agentic AI flows with syftr

All through our experiments, we used syftr to optimize agentic flows for accuracy and latency.

This strategy means that you can:

Choose datasets containing query–reply (QA) pairs
Outline a search house for circulation parameters
Set targets corresponding to accuracy and value, or on this case, accuracy and latency

In brief, syftr automates the exploration of circulation configurations towards your chosen targets.

Determine 2 exhibits the high-level syftr structure.

Figure 02 syftr — Determine 2: Excessive-level syftr structure. For a set of QA pairs, syftr can mechanically discover agentic flows utilizing multi-objective Bayesian optimization by evaluating circulation responses with precise solutions.

Given the virtually limitless variety of attainable agentic circulation parametrizations, syftr depends on two key strategies:

Multi-objective Bayesian optimization to navigate the search house effectively.
ParetoPruner to cease analysis of probably suboptimal flows early, saving time and compute whereas nonetheless surfacing the simplest configurations.

Silver bullet experiments

Our experiments adopted a four-part course of (Determine 3).

Figure 03 experiments — Determine 3: The workflow begins with a two-step information era part:
A: Run syftr utilizing easy random sampling for seeding.
B: Run all completed flows on all different experiments. The ensuing information then feeds into the subsequent step.
C: Figuring out silver bullets and conducting switch studying.
D: Operating syftr on 4 held-out datasets 3 times, utilizing three totally different seeding methods.

Step 1: Optimize flows per dataset

We ran a number of hundred trials on every of the next datasets:

CRAG Process 3 Music
FinanceBench
HotpotQA
MultihopRAG

For every dataset, syftr looked for Pareto-optimal flows, optimizing for accuracy and latency (Determine 4).

Figure 04 training — Determine 4: Optimization outcomes for 4 datasets. Every dot represents a parameter mixture evaluated on 50 QA pairs. Purple traces mark Pareto-frontiers with the perfect accuracy–latency tradeoffs discovered by the TPE estimator.

Step 3: Establish silver bullets

As soon as we had equivalent flows throughout all coaching datasets, we might pinpoint the silver bullets — the flows which are Pareto-optimal on common throughout all datasets.

Figure 05 silver bullets process — *Determine 5: Silver bullet era course of, detailing the “Establish Silver Bullets” step from Determine 3.*

Course of:

Normalize outcomes per dataset. For every dataset, we normalize accuracy and latency scores by the very best values in that dataset.
Group equivalent flows. We then group matching flows throughout datasets and calculate their common accuracy and latency.
Establish the Pareto-frontier. Utilizing this averaged dataset (see Determine 6), we choose the flows that construct the Pareto-frontier.

These 23 flows are our silver bullets — those that carry out properly throughout all coaching datasets.

Figure 06 silver bullets plot — *Determine 6: Normalized and averaged scores throughout datasets. The 23 flows on the Pareto-frontier carry out properly throughout all coaching datasets.*

Step 4: Seed with switch studying

In our authentic syftr paper, we explored switch studying as a approach to seed optimizations. Right here, we in contrast it straight towards silver bullet seeding.

On this context, switch studying merely means deciding on particular high-performing flows from historic (coaching) research and evaluating them on held-out datasets. The information we use right here is similar as for silver bullets (Determine 3).

Course of:

Choose candidates. From every coaching dataset, we took the top-performing flows from the highest two Pareto-frontiers (P1 and P2).
Embed and cluster. Utilizing the embedding mannequin BAAI/bge-large-en-v1.5, we transformed every circulation’s parameters into numerical vectors. We then utilized Ok-means clustering (Ok = 23) to group comparable flows (Determine 7).
Match experiment constraints. We restricted every seeding technique (silver bullets, switch studying, random sampling) to 23 flows for a good comparability, since that’s what number of silver bullets we recognized.

Observe: Switch studying for seeding isn’t but totally optimized. We might use extra Pareto-frontiers, choose extra flows, or strive totally different embedding fashions.

Figure 07 transfer learning — *Determine 7: Clustered trials from Pareto-frontiers P1 and P2 throughout the coaching datasets.*

Step 5: Testing all of it

Within the ultimate analysis part (Step D in Determine 3), we ran ~1,000 optimization trials on 4 take a look at datasets — Vibrant Biology, DRDocs, InfiniteBench, and PhantomWiki — repeating the method 3 times for every of the next seeding methods:

Silver bullet seeding
Switch studying seeding
Random sampling

For every trial, GPT-4o-mini served because the decide, verifying an agent’s response towards the ground-truth reply.

Outcomes

We got down to reply:

Which seeding strategy — random sampling, switch studying, or silver bullets — delivers the perfect efficiency for a brand new dataset within the fewest trials?

For every of the 4 held-out take a look at datasets (Vibrant Biology, DRDocs, InfiniteBench, and PhantomWiki), we plotted:

Accuracy
Latency
Value
Pareto-area: a measure of how shut outcomes are to the optimum outcome

In every plot, the vertical dotted line marks the purpose when all seeding trials have accomplished. After seeding, silver bullets confirmed on common:

9% greater most accuracy
84% decrease minimal latency
28% bigger Pareto-area

in comparison with the opposite methods.

Vibrant Biology

Silver bullets had the very best accuracy, lowest latency, and largest Pareto-area after seeding. Some random seeding trials didn’t end. Pareto-areas for all strategies elevated over time however narrowed as optimization progressed.

Figure 08 bright biology — ***Determine 8:*** *Vibrant Biology outcomes*

DRDocs

Just like Vibrant Biology, silver bullets reached an 88% Pareto-area after seeding vs. 71% (switch studying) and 62% (random).

InfiniteBench

Different strategies wanted ~100 extra trials to match the silver bullet Pareto-area, and nonetheless didn’t match the quickest flows discovered by way of silver bullets by the top of ~1,000 trials.

PhantomWiki

Silver bullets once more carried out finest after seeding. This dataset confirmed the widest price divergence. After ~70 trials, the silver bullet run briefly targeted on dearer flows.

Pareto-fraction evaluation

In runs seeded with silver bullets, the 23 silver bullet flows accounted for ~75% of the ultimate Pareto-area after 1,000 trials, on common.

Purple space: Good points from optimization over preliminary silver bullet efficiency.
Blue space: Silver bullet flows nonetheless dominating on the finish.

Figure 12 test plot — ***Determine 12:*** *Pareto-fraction for silver bullet seeding throughout all datasets*

Our takeaway

Seeding with silver bullets delivers constantly sturdy outcomes and even outperforms switch studying, regardless of that methodology pulling from a various set of historic Pareto-frontier flows.

For our two targets (accuracy and latency), silver bullets at all times begin with greater accuracy and decrease latency than flows from different methods.

In the long term, the TPE sampler reduces the preliminary benefit. Inside a number of hundred trials, outcomes from all methods usually converge, which is anticipated since every ought to ultimately discover optimum flows.

So, do agentic flows exist that work properly throughout many use instances? Sure — to a degree:

On common, a small set of silver bullets recovers about 75% of the Pareto-area from a full optimization.
Efficiency varies by dataset, corresponding to 92% restoration for Vibrant Biology in comparison with 46% for PhantomWiki.

Backside line: silver bullets are an affordable and environment friendly approach to approximate a full syftr run, however they aren’t a substitute. Their influence might develop with extra coaching datasets or longer coaching optimizations.

Silver bullet parametrizations

We used the next:

LLMs

microsoft/Phi-4-multimodal-instruct
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
Qwen/Qwen2.5
Qwen/Qwen3-32B
google/gemma-3-27b-it
nvidia/Llama-3_3-Nemotron-Tremendous-49B

Embedding fashions

BAAI/bge-small-en-v1.5
thenlper/gte-large
mixedbread-ai/mxbai-embed-large-v1
sentence-transformers/all-MiniLM-L12-v2
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
BAAI/bge-base-en-v1.5
BAAI/bge-large-en-v1.5
TencentBAC/Conan-embedding-v1
Linq-AI-Analysis/Linq-Embed-Mistral
Snowflake/snowflake-arctic-embed-l-v2.0
BAAI/bge-multilingual-gemma2

Circulate sorts

vanilla RAG
ReAct RAG agent
Critique RAG agent
Subquestion RAG

Right here’s the total record of all 23 silver bullets, sorted from low accuracy / low latency to excessive accuracy / excessive latency: silver_bullets.json.

Attempt it your self

Wish to experiment with these parametrizations? Use the running_flows.ipynb pocket book in our syftr repository — simply ensure you have entry to the fashions listed above.

For a deeper dive into syftr’s structure and parameters, try our technical paper or discover the codebase.

We’ll even be presenting this work on the Worldwide Convention on Automated Machine Studying (AutoML) in September 2025 in New York Metropolis.

Discovering “Silver Bullet” Agentic AI Flows with syftr

A fast primer on Pareto-frontiers

Optimizing agentic AI flows with syftr

Silver bullet experiments

Outcomes

Vibrant Biology

DRDocs

InfiniteBench

PhantomWiki

Pareto-fraction evaluation

Our takeaway

Silver bullet parametrizations

Attempt it your self

Related Articles

‘Rectal garlic insertion for immune help’: Medical chatbots confidently give disastrously misguided recommendation, consultants say

Hadn’t the Satisfaction At all times Been within the Discovering Not the Discoveries?

A greater methodology for planning advanced visible duties | MIT Information

Latest Articles

‘Rectal garlic insertion for immune help’: Medical chatbots confidently give disastrously misguided recommendation, consultants say

Hadn’t the Satisfaction At all times Been within the Discovering Not the Discoveries?

A greater methodology for planning advanced visible duties | MIT Information

NVIDIA Releases Nemotron 3 Tremendous: A 120B Parameter Open-Supply Hybrid Mamba-Consideration MoE Mannequin Delivering 5x Larger Throughput for Agentic AI

‘Within the outdated days, we had been watching a f ***ing inexperienced display with tape marks on it’: We speak to ‘Star Trek’ legend...