Sunday, March 1, 2026

Scaling Search Relevance: Augmenting App Retailer Rating with LLM-Generated Judgments


Massive-scale business search techniques optimize for relevance to drive profitable classes that assist customers discover what they’re in search of. To maximise relevance, we leverage two complementary targets: behavioral relevance (outcomes customers are likely to click on or obtain) and textual relevance (a end result’s semantic match to the question). A persistent problem is the shortage of expert-provided textual relevance labels relative to considerable behavioral relevance labels. We first deal with this by systematically evaluating LLM configurations, discovering {that a} specialised, fine-tuned mannequin considerably outperforms a a lot bigger pre-trained one in offering extremely related labels. Utilizing this optimum mannequin as a drive multiplier, we generate hundreds of thousands of textual relevance labels to beat the info shortage. We present that augmenting our manufacturing ranker with these textual relevance labels results in a big outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance whereas concurrently growing for textual relevance. These offline good points had been validated by a worldwide A/B take a look at on the App Retailer ranker, which demonstrated a statistically vital +0.24% improve in conversion price, with essentially the most substantial efficiency good points occurring in tail queries, the place the brand new textual relevance labels present a strong sign within the absence of dependable behavioral relevance labels.

Related Articles

Latest Articles