PORTool: Significance-Conscious Coverage Optimization with Rewarded Tree for Multi-Device-Built-in Reasoning

May 5, 2026

76

Multi-tool-integrated reasoning allows LLM-empowered tool-use brokers to unravel complicated duties by interleaving natural-language reasoning with calls to exterior instruments. Nevertheless, coaching such brokers utilizing outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use selections) result in success or failure. On this paper, we suggest PORTool, an importance-aware policy-optimization algorithm that reinforces brokers’ tool-use competence from outcome-level supervision whereas assigning reward on the step degree. Particularly, PORTool generates a rewarded rollout tree by which trajectories share prefixes earlier than branching, enabling direct comparisons amongst various tool-use selections inside the similar context. It then estimates every step’s significance by a correctness-dominant sign, i.e., whether or not descendants of that step can finally produce an accurate last reply, plus an auxiliary time period indicating whether or not the step’s device calls execute efficiently. Utilizing these step-wise significance estimates, PORTool updates the coverage to generate environment friendly tool-call steps, guided by each native comparisons inside every branching determination and the general high quality of whole trajectories. Experiments present that PORTool improves final-answer accuracy whereas lowering tool-call steps in contrast with state-of-the-art baselines, and ablation research affirm the robustness of the proposed step-wise significance estimates.

† Purdue College
** Work completed whereas at Apple

PORTool: Significance-Conscious Coverage Optimization with Rewarded Tree for Multi-Device-Built-in Reasoning

Related Articles

Google Pockets simply received an replace to trace your on-line orders

17 Greatest Prime Day Health Tech Offers (2026) As much as $250 Off

translateZ() | CSS-Methods

Latest Articles

Google Pockets simply received an replace to trace your on-line orders

17 Greatest Prime Day Health Tech Offers (2026) As much as $250 Off

translateZ() | CSS-Methods

Huntington Financial institution: Redacting delicate information from 400M+ paperwork with AWS

What occurs to MAHA after MAGA?