Majestic Labs Unveils Prometheus: AI Server With 128TB Memory and 1,000x More Capacity Than GPUs
Founded by ex-Google and Meta chip veterans, Majestic Labs has revealed a server purpose-built to break AI's memory wall — packing up to 128TB of unified memory to run multi-trillion-parameter models in a single box.
Majestic Labs on April 28, 2026 unveiled Prometheus, a new server architecture designed from the ground up to attack what the company calls the most critical bottleneck in modern AI: the memory wall. Prometheus is configurable with up to 128 terabytes of high-speed memory in a standard-size server — roughly 1,000 times the high-bandwidth memory connected to each processor in leading GPU systems — and exposes the entire pool as a uniform, contiguous address space at full bandwidth.
The startup was founded by Ofer Shacham (CEO), Sha Rabii (President), and Masumi Reynders (COO), who previously designed and shipped hundreds of millions of custom chips at Google and Meta. The company emerged from stealth in late 2025 with $100 million in funding to rebuild AI servers around memory rather than raw compute. "Prometheus represents the first ground-up reimagining of AI infrastructure with memory as a first-class citizen," Shacham said in the launch announcement.
According to Majestic Labs, Prometheus is the first commercial system capable of running today's most advanced multi-trillion-parameter models in a single server, and is targeted at workloads that struggle on conventional GPU clusters: LLMs with context windows running into the hundreds of millions of tokens, large mixture-of-experts models, agentic AI systems, graph neural networks, and very large tabular models. The platform supports industry-standard frameworks including PyTorch, vLLM, and OpenAI's Triton compiler, aiming to slot into existing AI software stacks rather than force teams to retool.
The system is in development with early customers, with broader availability planned for 2027. Prometheus arrives at a moment when hyperscalers and frontier labs are increasingly bottlenecked not by FLOPs but by the cost and complexity of stitching huge models across racks of GPUs over high-speed interconnects. By collapsing those racks into a single memory-coherent box, Majestic Labs is making an explicit bet that the next phase of AI scaling will be won by architectures that move memory closer to compute — and is positioning itself as a potential challenger to the GPU-centric infrastructure that has powered the AI boom so far.