Monday, May 11, 2026

PORTool: Significance-Conscious Coverage Optimization with Rewarded Tree for Multi-Device-Built-in Reasoning


Multi-tool-integrated reasoning allows LLM-empowered tool-use brokers to unravel complicated duties by interleaving natural-language reasoning with calls to exterior instruments. Nevertheless, coaching such brokers utilizing outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use selections) result in success or failure. On this paper, we suggest PORTool, an importance-aware policy-optimization algorithm that reinforces brokers’ tool-use competence from outcome-level supervision whereas assigning reward on the step degree. Particularly, PORTool generates a rewarded rollout tree by which trajectories share prefixes earlier than branching, enabling direct comparisons amongst various tool-use selections inside the similar context. It then estimates every step’s significance by a correctness-dominant sign, i.e., whether or not descendants of that step can finally produce an accurate last reply, plus an auxiliary time period indicating whether or not the step’s device calls execute efficiently. Utilizing these step-wise significance estimates, PORTool updates the coverage to generate environment friendly tool-call steps, guided by each native comparisons inside every branching determination and the general high quality of whole trajectories. Experiments present that PORTool improves final-answer accuracy whereas lowering tool-call steps in contrast with state-of-the-art baselines, and ablation research affirm the robustness of the proposed step-wise significance estimates.

Related Articles

Latest Articles