Frontier
intelligence, delivered.
A foundation model that fuses coding, 1M context, and native multimodality — into one coherent, production-grade system.
A foundation model that ships with production-grade coding, 1M context, and native multimodality — designed end-to-end, deployed in hours.
- 12hunattended
- 70Ttokens
- 0 → 1production
- 3 frontiers
- 1 system
- 0 trade-offs
One model. Six frontiers.
From-scratch sparse attention that holds up at 1M context.
MiniMax Sparse Attention (MSA) is engineered from the very first pretraining step — not retrofitted afterward. It keeps M3 sharp across long contexts and unlocks efficient inference at frontier scale.
# MSA block forward pass def msa_forward(x, k_idx, l_idx): q = x.linear(d, d) k = x.gather(k_idx).linear(d, d) v = x.gather(l_idx).linear(d, d) return sdp_attn(q, k, v)
- From-scratchsparse attention
- ~100%GPU utilization
- 9.7×Prefill vs M2
- 15.6×Decode vs M2
- 70Tpretraining tokens
- Step 0multimodal from start
- 1Mcontext window
- 512Kguaranteed usable
- 83.5BrowseComp · > Opus 4.7 (79.3)
- 59.0SWE-Bench Pro
- 66.0Terminal Bench 2.1
- 37.1PostTrainBench · rank 3
Three frontiers, one model.
- 01Coding / Agentic SOTA
- 02Native Multimodal
- 031M Long Context
- Scroll to explore →
- Long-horizon tasks
- Producer + Verifier loop
- Computer Use
- 1M context window
const m3 = await MiniMax.agent({
model: "M3",
context: "1M",
tools: ["shell", "browser", "computer_use"],
team: true,
});
// runs unattended for days
await m3.run("reproduce-paper-iclr-2025");
- 1-hour native video
- Unified token space
- Bidirectional reasoning
- Screen-recording ready
- 1,000,000 tokens
- Sustained at length
- Concurrent processing
- Code + paper + logs in-window
Every layer of M3, in one view
From the control plane to the model core — explore the capabilities that ship in production.
An agent trained with M3, for M3.
MiniMax Code is an agent product designed for M3 and trained alongside it — tuned to take full advantage of M3's long context, coding, and native multimodal capabilities. It's the recommended agent for working with MiniMax-M3. Built on the open-source OpenCode and Pi Agent harnesses, with plans to open-source the project after launch.
-
01Agent Team workflowProducer + Verifier loops decompose, parallelize, and self-correct — running unattended for days on long-horizon tasks.
-
02Deep reflection & correctionThe agent re-aligns plan and priority based on live task progress. You can step in, add requirements, and redirect at any time.
-
03Computer UseNative multimodality lets Code operate across applications, files, and systems — say what you need, Code does the rest.
Three tiers. Pick the one that fits your runway.
- ~1.7B tokens / month of M3 usage
- Full access to the MiniMax model family (M3 / M2.7 / image / speech / music)
- Run 3–4 concurrent agents
- Integrates with popular coding tools, with more on the way
- 1M context window — built for long documents and large codebases
- Native multimodal understanding: image and video input
- Text, image, speech, and music share one quota
- ~5.1B tokens / month of M3 usage
- Full access to the MiniMax model family (M3 / M2.7 / image / speech / music)
- Run 4–5 concurrent agents
- Integrates with popular coding tools, with more on the way
- 1M context window — built for long documents and large codebases
- Native multimodal understanding: image and video input
- Text, image, speech, and music share one quota
- Video generation: 3 clips / day
- ~12.5B tokens / month of M3 usage
- Full access to the MiniMax model family (M3 / M2.7 / image / speech / music)
- Run 6–7 concurrent agents
- Integrates with popular coding tools, with more on the way
- 1M context window — built for long documents and large codebases
- Native multimodal understanding: image and video input
- Text, image, speech, and music share one quota
- Video generation: 5 clips / day
curl https://api.minimaxi.com/v1/text/chatcompletion_v2 \
-H "Authorization: Bearer $M3_API_KEY" \
-d '{
"model": "MiniMax-M3",
"context_window": "1M",
"messages": [
{ "role": "user",
"content": [
{ "type": "text", "text": "reproduce this ICLR 2025 paper" },
{ "type": "file", "file_id": "iclr2025-oa.pdf" },
{ "type": "image_url", "image_url": "fig-1.png" }
]
}
]
}'
Drop-in. Multi-modal. Priced to scale.
Open the MiniMax platform, pick a Token Plan or top up for usage-based billing, and integrate M3 through a single API key — for any coding agent, IDE, or your own stack.
- Any coding agent·IDE plugins·Custom harness·SDK & REST
- 1Mcontext · 512K guaranteed
- 83.5BrowseComp
- v2chatcompletion endpoint
- MSAsparse attention
Two unattended tasks. Two domains. One model.
M3 was set to two open-ended tasks with no human in the loop — one in research, one in systems engineering. Each ran for half a day or more, planning, debugging and self-correcting on its own.
We handed M3 an ICLR 2025 Outstanding Paper Award winner — Learning Dynamics of LLM Finetuning. M3 ran unattended for nearly 12 hours, produced 18 autonomous commits and 23 experiment figures, and reproduced the paper's core results.
- ~12hunattended runtime
- 18autonomous commits
- 23experiment figures
- ✓SFT trend matched
- ✓DPO squeezing reproduced
- ✓Extend method validated
Given only a task description, a benchmark script and a non-runnable Triton skeleton — no reference implementation — M3 spent ~24 hours optimizing an FP8 GEMM kernel on Hopper GPUs. Through 6 optimization rounds and a long plateau, it pushed peak FP8 utilization from 7.6% to 71.3% — a 9.4× speedup vs. the initial baseline.
- ~24hunattended runtime
- 147benchmark submissions
- 1,959tool calls
- 6optimization rounds
- 9.4×vs. baseline
- 7.6→71.3%FP8 peak util.
Where M3 fits.
A selection of tasks where M3 ships in production today — from autonomous research to Computer Use. Each card is a real working scenario; hover to read more.
Reproducing an ICLR Outstanding Paper
M3 took an ICLR 2025 paper, ran autonomously for 12 hours, and produced 18 commits plus 23 figures that matched the original.
Long-horizon coding agents
Producer + Verifier loops decompose, parallelize and self-correct — running unattended for days on multi-stage engineering work.
Multimodal screen-recording → actions
Say what you need on your phone. M3 watches the screen, understands the UI, and drives the desktop through Computer Use.
Sparse attention R&D on 70T tokens
Engineering and training pipeline for from-scratch sparse attention — 9.7× prefill and 15.6× decode speedup vs. M2.
Frontier intelligence, delivered.