MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

April 12, 2026

5

MiniMax has formally open-sourced MiniMax M2.7, making the mannequin weights publicly accessible on Hugging Face. Initially introduced on March 18, 2026, MiniMax M2.7 is the MiniMax’s most succesful open-source mannequin thus far — and its first mannequin to actively take part in its personal growth cycle, a significant shift in how massive language fashions are constructed and iterated.

What’s MiniMax M2.7?

MiniMax M2.7 is a part of MiniMax’s M2-series of Combination-of-Consultants (MoE) fashions. MoE is an architectural design the place solely a subset of the overall parameters are ‘activated’ throughout any inference cross, which makes the mannequin considerably quicker and cheaper to serve in comparison with a dense mannequin of comparable output high quality.

MiniMax M2.7 is constructed round three core functionality areas: skilled software program engineering, skilled workplace work, and what MiniMax calls Agent Groups — native multi-agent collaboration. MiniMax M2.7 is able to constructing complicated agent harnesses and finishing extremely elaborate productiveness duties, leveraging capabilities resembling Agent Groups, complicated Expertise, and dynamic software search.

SOTA Benchmark Efficiency: SWE-Professional and Terminal Bench 2

On SWE-Professional, which covers a number of programming languages, MiniMax M2.7 achieved a 56.22% accuracy price, matching GPT-5.3-Codex. SWE-Professional duties span log evaluation, bug troubleshooting, code safety evaluate, and machine studying workflow debugging — a lot nearer to the messy actuality of manufacturing programs than commonplace algorithmic coding assessments.

On Terminal Bench 2 (57.0%) and NL2Repo (39.8%), each of which demand a excessive diploma of system-level comprehension, MiniMax M2.7 performs solidly. The mannequin excels not solely at code era however can even deeply perceive the operational logic and collaborative dynamics of software program programs.

On the repo-level code era benchmark VIBE-Professional, MiniMax M2.7 scored 55.6%, almost on par with Opus 4.6 — that means whether or not the requirement includes Net, Android, iOS, or simulation duties, they are often handed on to MiniMax M2.7 to finish. It additionally demonstrates a powerful benefit on benchmarks nearer to real-world engineering situations: SWE Multilingual (76.5) and Multi SWE Bench (52.7).

Manufacturing Debugging: Underneath Three Minutes

When confronted with alerts in manufacturing, MiniMax M2.7 can correlate monitoring metrics with deployment timelines to carry out causal reasoning, conduct statistical evaluation on hint sampling and suggest exact hypotheses, proactively connect with databases to confirm root causes, pinpoint lacking index migration information within the code repository, and use non-blocking index creation to cease the bleeding earlier than submitting a merge request. MiniMax group reviews that on a number of events, this lowered restoration time for dwell manufacturing system incidents to below three minutes. From observability evaluation and database experience to SRE-level decision-making, this positions MiniMax M2.7 as one thing past a code-generation mannequin.

The Self-Evolution Structure

To check the boundaries of autonomous enchancment, MiniMax M2.7 was tasked with optimizing a mannequin’s programming efficiency on an inside scaffold. It ran completely autonomously, executing an iterative loop of ‘analyze failure trajectories → plan modifications → modify scaffold code → run evaluations → evaluate outcomes → resolve to maintain or revert modifications’ for over 100 rounds. Throughout this course of, MiniMax M2.7 found efficient optimizations by itself: systematically trying to find the optimum mixture of sampling parameters resembling temperature, frequency penalty, and presence penalty; designing extra particular workflow tips (resembling mechanically looking for a similar bug sample in different information after a repair); and including loop detection to the scaffold’s agent loop. This achieved a 30% efficiency enchancment on inside analysis units.

Inside MiniMax’s personal reinforcement studying group workflows, M2.7 is now able to dealing with 30%–50% of the workflow end-to-end, with human researchers solely interacting for important choices and discussions.

MLE Bench Lite: Testing Autonomous ML Experimentation

MiniMax group additionally examined MiniMax M2.7 on MLE Bench Lite, OpenAI’s open-sourced suite of twenty-two machine studying competitions runnable on a single A30 GPU, overlaying just about all levels of the ML workflow.

For this analysis, MiniMax group designed a easy three-component harness: short-term reminiscence, self-feedback, and self-optimization. After every iteration spherical, the agent generates a short-term reminiscence markdown file, performs self-criticism on the present outcomes, and offers optimization instructions for the following spherical. Three trials have been run, every with a 24-hour window for iterative evolution.

The most effective run achieved 9 gold medals, 5 silver medals, and 1 bronze medal. The common medal price throughout the three runs was 66.6%, a outcome second solely to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1 (66.6%).

Skilled Workplace Work and Finance

Past software program engineering, MiniMax M2.7 targets skilled workplace duties. Within the GDPval-AA analysis, which measures area experience and process supply functionality throughout 45 fashions, MiniMax M2.7 achieved an ELO rating of 1495 — the best amongst open-source fashions, second solely to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3.

On Toolathon, MiniMax M2.7 achieved an accuracy of 46.3%, reaching the worldwide high tier. In MM Claw testing — an analysis MiniMax constructed based mostly on real-world utilization patterns from the OpenClaw private agent platform — MiniMax M2.7 maintained a 97% ability compliance price throughout 40 complicated abilities (every exceeding 2,000 tokens) and achieved an total accuracy of 62.7%, approaching Sonnet 4.6.

In finance, MiniMax M2.7 can autonomously learn an organization’s annual reviews and earnings name transcripts, cross-reference a number of analysis reviews, independently design assumptions and construct a income forecast mannequin, and produce a PPT and Phrase analysis report based mostly on templates — understanding, making judgments, and producing output like a junior analyst.

Key Takeaways

MiniMax M2.7 is now formally open supply, with weights accessible on Hugging Face, making a frontier-grade agentic mannequin freely accessible for builders to deploy and construct on.
MiniMax M2.7 achieves SOTA efficiency on real-world software program engineering benchmarks, scoring 56.22% on SWE-Professional (matching GPT-5.3-Codex) and 57.0% on Terminal Bench 2 — assessments that measure production-level reasoning, not simply code era.
MiniMax M2.7 is the primary mannequin to actively take part in its personal growth, operating over 100 autonomous rounds of scaffold optimization and attaining a 30% efficiency enchancment — an early, concrete instance of AI-assisted AI growth in apply.
The mannequin is constructed for actual agentic deployments, sustaining 97% ability adherence throughout 40 complicated abilities (every exceeding 2,000 tokens), supporting native Agent Groups with steady function boundaries, and dealing with 30–50% of MiniMax’s inside RL group workflows autonomously.
MiniMax M2.7 is the highest-ranked open-source mannequin on GDPval-AA with an ELO rating of 1495 throughout 45 fashions, demonstrating sturdy skilled work capabilities spanning workplace doc enhancing, monetary evaluation, and multi-round high-fidelity process supply.

Take a look at the Technical particulars and Mannequin Weight. Additionally, be at liberty to comply with us on Twitter and don’t neglect to hitch our 130k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as properly.

Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Join with us

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

What’s MiniMax M2.7?

SOTA Benchmark Efficiency: SWE-Professional and Terminal Bench 2

Manufacturing Debugging: Underneath Three Minutes

The Self-Evolution Structure

MLE Bench Lite: Testing Autonomous ML Experimentation

Skilled Workplace Work and Finance

Key Takeaways

Related Articles

Bettering AI fashions’ potential to clarify their predictions | MIT Information

The winners and losers of AI coding

Apple’s ‘binned’ iPhone and Mac chips defined

Latest Articles

Bettering AI fashions’ potential to clarify their predictions | MIT Information

The winners and losers of AI coding

Apple’s ‘binned’ iPhone and Mac chips defined

10 epic occasions for the Aug. 12, 2026, whole photo voltaic eclipse in Spain and Iceland

The hole between Japanese and Western Easter