Microsoft's MDASH Swarm Finds 16 Windows Bugs, Including Four Critical RCEs in the TCP/IP Stack
Microsoft's new multi-model agentic scanning harness orchestrates 100+ AI agents to discover, debate, and prove exploitable vulnerabilities — and just dropped 16 fresh CVEs in Patch Tuesday.
Microsoft disclosed on May 12 that an internal AI system codenamed MDASH discovered 16 new vulnerabilities across the Windows networking and authentication stack, including four Critical-severity remote code execution flaws in components as sensitive as the Windows kernel TCP/IP driver and the IKEv2 IPsec service. The findings shipped as part of the same Patch Tuesday cycle and represent what Microsoft is calling the first time a fully agentic AI pipeline has produced this many high-impact CVEs in a single release.
MDASH — short for multi-model agentic scanning harness — orchestrates more than 100 specialized AI agents drawn from an ensemble of frontier and distilled models. The system runs a five-stage pipeline: Prepare (code indexing and threat modeling), Scan (auditor agents hunt for candidate bugs), Validate (debater agents argue for and against each finding), Dedup (collapses equivalent reports), and Prove (constructs and executes the actual triggering inputs, with domain-expert plugins for things like network protocol fuzzing). Crucially, the harness does not stop at "this looks suspicious"; it produces a working proof-of-concept exploit before a human reviewer ever sees the finding.
The 16 vulnerabilities span tcpip.sys, ikeext.dll, http.sys, netlogon.dll, dnsapi.dll, and telnet.exe — a remarkably broad sweep of the Windows networking surface. The four Critical CVEs (CVE-2026-33827, CVE-2026-33824, CVE-2026-41089, and CVE-2026-41096) include flaws that Microsoft classifies as remote code execution with no authentication required, which historically rank among the most dangerous bug classes for the simple reason that they are the building blocks of wormable attacks. The remaining 12 are rated Important.
Benchmarks released alongside the disclosure put MDASH at the top of a public leaderboard known as CyberGym, with an 88.45% success rate against 1,507 real-world vulnerabilities — the highest score on the board. On Microsoft's internal StorageDrive test the system caught 21 out of 21 planted bugs with zero false positives, and on historical MSRC cases it hit 96% recall on the clfs.sys driver and 100% on tcpip.sys. Those numbers matter because vulnerability discovery has historically been bottlenecked on false-positive triage, not raw bug count.
The bigger story is that defensive security may be entering the same agentic step-change that coding has been working through for the past 18 months. If MDASH-class systems generalize beyond Microsoft's own codebase — and that is the obvious next move — every major software vendor will face the same question: do you run an agentic auditor against your own product before an attacker does. Anthropic opened a public beta of Claude Security this month, and Google's Project Zero has been experimenting with LLM-assisted fuzzing for over a year, so the race is on.
Comments
Share your thoughts. Be kind.
Loading comments…