MiniMax M3: Open-Weight 1M-Context Coding Rival

Editor J Jun 2, 2026

MiniMax launched M3, a coding model with a 1M-token context via MSA, priced at a fraction of frontier rivals. Open-weight, but the training code stays closed.

Shanghai-based artificial intelligence startup MiniMax unveiled M3, its latest multimodal model, on June 1. Tailored for coding and agentic workflows, M3 leverages a proprietary MSA architecture (MiniMax Sparse Attention) to process context windows of up to one million tokens. The model accepts text, image, and video inputs natively.

Its aggressive pricing structure has drawn immediate industry attention. The API is priced at $0.30 per million input tokens and $1.20 per million output tokens—approximately 5% to 10% of the rates charged by leading proprietary models. Alongside the release of the model's weights, MiniMax claimed that M3 outperforms GPT-5.5 and Gemini 3.1 Pro on critical software engineering benchmarks, including SWE-bench Pro.

The Economics of One-Million-Token Contexts Powered by MSA

Sparse attention index branch and sparse branch block diagram — MiniMax Sparse Attention (MSA) architecture diagram

Underpinning this cost efficiency is the new MSA architecture. By processing only high-salience segments of an input rather than executing quadratic computations across all tokens, this MSA architecture reduces compute costs within the one-million-token range to approximately one-twentieth of the previous generation's requirements. The architecture reads each key-value block only once, keeping memory access contiguous. Furthermore, under the MSA architecture, decoding speeds have reportedly increased by up to 15.6 times.

While the model supports a full one-million-token context window, it guarantees a minimum threshold of 512,000 tokens depending on deployment configurations. This capacity is particularly significant for ingestion of entire codebases or executing multi-step agentic workflows. Native multimodality—trained on interleaved text, image, and video data—remains a rare offering among open-weight models within this price tier.

The convergence of low inference costs and expanded context windows continues to dominate the AI landscape. While the price reductions initiated by DeepSeek disrupted standard token pricing for proprietary models, M3 intensifies the competition by introducing an open-weight alternative.

Evaluating Performance Across Coding and Agentic Benchmarks

MiniMax M3 benchmark comparison SWE-bench Pro BrowseComp Terminal-Bench — M3 versus rival models across key benchmarks

MiniMax's performance claims focus heavily on coding and autonomous agent applications. On SWE-bench Pro, an evaluation measuring real-world software engineering capabilities, M3 recorded a score of 59.0%, exceeding reported metrics for GPT-5.5 and Gemini 3.1 Pro while approaching those of Claude 4.7 Opus. The model also posted a 66.0% success rate on TerminalBench 2.1 and scored 83.5 on BrowseComp for autonomous browsing, marginally outperforming Opus 4.7, which scored 79.3.

However, these benchmarks remain self-reported by the developer. As noted by VentureBeat, while M3 achieves leading results in targeted evaluations, it does not consistently outperform premium proprietary systems, such as Opus 4.8, across a broader spectrum of capabilities. Until independent, third-party evaluations are conducted, industry analysts suggest viewing M3 primarily as a coding AI model specialized for agentic workflows.

The Limits of the Open-Weight Release Model

Upon launch, M3 was made immediately available via API, with support integrated across platforms including OpenRouter and Ollama. The provider also introduced OpenAI- and Anthropic-compatible endpoints to facilitate seamless integration into existing software pipelines. MiniMax is distributing the model's weights through Hugging Face.

However, the openness of the release is constrained. By withholding the training dataset, training code, and proprietary inference operators, the release has drawn criticism for being an open-weight distribution rather than a fully open-source project. While developers can deploy and run M3 locally, replicating the model's training from scratch is impossible.

Nonetheless, the combination of frontier-class coding performance, a one-million-token context window, and native multimodal support at this price point represents a significant market entry. This release increases pressure on proprietary developers to lower their pricing, though the long-term adoption of this coding AI model will depend on how its performance claims hold up under independent, third-party evaluation.