Research·4 min read·NVIDIA / The Robot Report

Nvidia Open-Sources Cosmos 3, Its “Think-Then-Act” Physical-AI Omnimodel — and a 6-Foot GR00T Humanoid to Run It

Beyond the Vera Rubin headlines at COMPUTEX 2026, Nvidia released Cosmos 3 — what it calls the first fully open physical-AI “omnimodel,” a mixture-of-transformers design that reasons about a scene before generating physically grounded actions like joint angles and gripper positions. It ships as Nano (16B) and Super (64B), tops open leaderboards including Physics-IQ and R-Bench, and arrives with the open Isaac GR00T reference humanoid — a 6-foot, 31-degree-of-freedom robot running on Nvidia Thor.

GTC TAIPEI · COMPUTEX 2026 · PHYSICAL AI OPEN STACK Physical AI, fully open. One model to perceive, simulate, and act. Cosmos 3 omnimodel · reason, then generate action OPEN WEIGHTS · NANO 16B · SUPER 64B · EDGE 2B SOON COSMOS 3First fully open physical-AI omnimodelNANO · SUPER16B (8B+8B) · 64B (32B+32B) · open weightsTHINK, THEN ACTReason block reads scene → action block actsBENCHMARKS#1 on Physics-IQ, R-Bench, PAI-BenchISAAC GR00TOpen 6-ft humanoid · 31 DoF · runs on Thor BITSMINDS.COM Source: NVIDIA blog · The Robot Report
Share:

Lost in the Vera Rubin and RTX Spark headlines from Jensen Huang’s GTC Taipei keynote at COMPUTEX 2026 was arguably Nvidia’s most consequential bet on the next decade: Cosmos 3, which the company calls the first fully open physical-AI “omnimodel,” released alongside the open Isaac GR00T reference humanoid. Together they are Nvidia’s attempt to be for robots what open-weight large language models became for chatbots — the foundation layer everyone else builds on.

Cosmos 3’s pitch is that it learns to think before it acts. It uses a mixture-of-transformers architecture that splits the work in two: a reasoning block first interprets what is happening in a scene, and a generation block then uses that context to produce physically grounded outputs. Crucially, those outputs are not just text or video — Cosmos 3 has native action generation, emitting the numerical signals a robot actually needs, such as joint angles and gripper positions. In a single model it spans vision reasoning and multimodal generation across text, video, images, ambient sound and action.

Nvidia shipped two sizes immediately — Cosmos 3 Nano at 16B parameters (an 8B reasoner paired with an 8B generator) and Cosmos 3 Super at 64B (32B and 32B) — with a 2B Cosmos 3 Edge variant promised for on-robot deployment later. The weights are open. Nvidia says the family ranks first on several open-weights leaderboards, including Physics-IQ, R-Bench and PAI-Bench, and leads VANTAGE-Bench for smart-infrastructure understanding and the TAR challenge for traffic-anomaly reasoning.

The point of all this is to collapse the cost of teaching machines to operate in the real world. Nvidia frames Cosmos 3 as a single model that can act as a vision-language model, a world model, a simulator or a robot policy, generating synthetic data for the long-tail edge cases that are expensive or dangerous to capture in reality — and, the company says, cutting robot training cycles from months to days. The use cases it demoed run from pick-and-place manipulation and humanoid policy development to autonomous-vehicle scenario prediction, factory forklift-trajectory safety and smart-city video analysis across thousands of feeds.

To give that software a body, Nvidia introduced the Isaac GR00T reference humanoid — billed as the industry’s first fully open reference design, meant to clean up a badly fragmented robot-hardware and simulation landscape. It is a six-foot, roughly 150-pound humanoid with 31 degrees of freedom, running on Nvidia’s Thor onboard computer, and it ships as an open hardware-and-software blueprint aimed squarely at university labs. Nvidia paired it with an open-source toolkit of agents and skills and named robotics partners across the U.S., Europe, South Korea and China — including Unitree.

Strategically, the release rounds out the end-to-end stack Huang spent the rest of the keynote describing. Nvidia now sells the training silicon (Vera Rubin), the onboard compute (Thor), the simulation environment (Omniverse and Isaac), the foundation model (Cosmos 3) and a reference robot to run it all — and by open-sourcing the model and the humanoid blueprint, it is doing exactly what open-weight LLMs did for the agent boom: seeding a generation of developers on its platform before any rival can. The question is no longer whether Nvidia supplies the robotics industry’s chips, but whether it also defines its software.

Comments

Share your thoughts. Be kind.

0/2000

Loading comments…

Related Articles