Mistral’s new AI coding agent bets everything on “vibe coding”

According to Ars Technica, French AI startup Mistral AI released Devstral 2 on Tuesday, a 123 billion parameter open-weights coding model designed as an autonomous software engineering agent. The model scores 72.2% on the SWE-bench Verified benchmark and is released alongside a new command-line interface app called Mistral Vibe, which is licensed under Apache 2.0. The company also released a smaller 24 billion parameter version, Devstral Small 2, which scores 68% and can run locally on a laptop. After a free period, pricing for the API will be $0.40 per million input tokens and $2.00 per million output tokens for Devstral 2, and $0.10/$0.30 for the small version, which Mistral claims is about 7x more cost-efficient than Claude Sonnet. The “Vibe” name directly references “vibe coding,” a term coined by Andrej Karpathy in February 2025 that Collins Dictionary named Word of the Year.

The benchmark and the bet

Here’s the thing about that 72.2% score on SWE-bench Verified: everyone in the AI coding space is watching it. It’s basically the closest thing we have to a standardized test for seeing if an AI can actually fix real bugs in real Python projects on GitHub. Now, is it perfect? No. A lot of the tasks are apparently simple fixes. But it’s the benchmark that matters right now. Mistral isn’t just releasing another model; they’re releasing a whole CLI agent that’s supposed to scan your entire project, make changes across files, and even run shell commands on its own. That’s a huge bet on autonomy. They’re saying Devstral 2 can maintain coherency across a whole codebase, track dependencies, and modernize legacy systems. That’s a far cry from just writing a function snippet.

The vibe coding philosophy

This is where it gets really interesting. By naming the tool “Mistral Vibe,” they’re fully leaning into the controversial trend of “vibe coding.” The idea, as researchers have described, is you just describe what you want in plain English and accept the AI’s output without deep scrutiny. You “give in to the vibes.” It’s incredibly fun for prototyping, as developer Simon Willison noted. But is it responsible for building actual, maintainable software that teams have to work on for years? Probably not. Mistral’s bet seems to be that their agent is *so good* at understanding context and correcting itself that it transcends the risks of pure vibe coding. That’s a massive claim. We’ll need to see some serious, independent testing before anyone should trust an autonomous agent with a production codebase.

The developer and market impact

For developers, the immediate appeal is twofold: a powerful, open-weights model and a brutally competitive price. Being able to run the 24B parameter model locally with a 256K token context is a big deal for offline work or proprietary code. And that pricing? It’s a direct shot across the bow of Anthropic and OpenAI. At $0.40/$2.00 for their top model versus Claude Sonnet’s $3/$15, Mistral is competing on value hard. They’re betting developers and companies will flock to a cheaper, “good enough” option that’s integrated into a slick CLI. For enterprises, the promise of automating bug fixes and legacy modernization at scale is tantalizing, but the risks of unleashing an autonomous agent are huge. It could reshape how engineering teams are built if it works. If it doesn’t, it could lead to a lot of messy, AI-generated technical debt. It’s a high-stakes gamble on the future of software development itself.