Commerce's CAISI Signs Pre-Deployment Frontier AI Testing Pacts With Google DeepMind, Microsoft, and xAI
The Center for AI Standards and Innovation will get pre-release access to frontier models from Google DeepMind, Microsoft, and xAI — including versions with safeguards stripped — to evaluate cyber, bio, and chemical risk.
The Center for AI Standards and Innovation (CAISI), the AI evaluation arm housed inside the Commerce Department's National Institute of Standards and Technology, announced new frontier AI testing agreements on May 5, 2026 with Google DeepMind, Microsoft, and xAI. Under the deals, all three labs will give the U.S. government pre-deployment access to their highest-capability models for security and capability evaluations, with additional assessments continuing after public release.
The agreements broaden the scope of what CAISI can probe. Evaluators will assess "demonstrable risks" tied to national security — explicitly including cybersecurity, biosecurity, and chemical-weapons capabilities — and will be allowed to test in classified environments. To make those tests meaningful, the developers have agreed to provide model variants with reduced or removed safety guardrails, the same approach that government red-teamers have argued is necessary to surface ceiling-level capabilities rather than the post-mitigation behavior end-users see.
"These expanded industry collaborations help us scale our work in the public interest at a critical moment," CAISI Director Chris Fall said in a statement. The center has now completed more than 40 evaluations, and its remit was reset earlier this year to align with Commerce Secretary Howard Lutnick's directives and the administration's AI Action Plan, which leans more heavily on national security framing than the framework CAISI's predecessor body operated under in 2024.
The new arrangement updates partnerships that the U.S. AI Safety Institute, CAISI's predecessor, signed with OpenAI and Anthropic in August 2024. Those original deals were focused on voluntary safety testing of frontier models. The renegotiated terms with Google DeepMind, Microsoft, and xAI fold in security and capability assessment work that CAISI now treats as a formal pre-deployment gate, even though compliance remains voluntary in the absence of federal AI legislation.
For the labs, the deals are a hedge. Pre-deployment evaluations let them get ahead of likely regulatory scrutiny without conceding to a binding licensing regime, while also giving CAISI a clearer picture of which capabilities — autonomous cyber operations, dual-use biology, agentic tool use — are crossing thresholds that warrant new policy. Anthropic, notably, has been pushing for stronger formal oversight, and OpenAI's existing arrangement remains in place; the May 5 announcement explicitly positions the new agreements as additions to, not replacements for, the broader frontier testing program.