The market for cutting-edge ML compute is broken. Startups, researchers and even big AI labs are scrambling to buy or rent access to the latest chips for ML training. But demand far outstrips supply, and what’s available is only accessible to the well-resourced, placing an artificial damper on innovation.
Today, we’re launching Voltage Park, and we’re on a mission to make machine learning infrastructure accessible to all, from large enterprises and research universities, to seed-stage startups and nonprofits.
“Not enough people understand how much the compute shortage is affecting AI innovators,” said Voltage Park CEO Eric Park. “ML teams and AI founders have to wait months or pay exorbitant sums to access the latest hardware to train their models. We hope to redress this imbalance and accelerate cutting-edge work in AI.”
The ML compute scramble
The current market for ML compute is causing three huge headaches across the ecosystem:
- Long-Term Contracts: Many providers have rigid contracts that force companies to lease large compute clusters for several years. Smaller companies need much more agility — and often only a few machines.
- Availability: Companies that can afford to buy are facing long lead times, forcing them to wait while their competition passes them by.
- Cost: GPU rental rates from large cloud providers are often out of reach for startups and research labs. For teams working on larger models, every cent counts and higher per-hour rates can lead to millions of dollars in higher training costs.
Unveiling one of the largest ML compute clouds in the world
With around 24,000 NVIDIA H100 GPUs, the Voltage Park cloud is one of the most powerful collections of cutting-edge ML compute in the world. Our clusters consist of 80GB H100 SXM5 GPUs fully interconnected with 3.2T InfiniBand. We currently offer bare-metal access for large-scale users that need peak performance. We will add support for short-term leases and hourly billing soon as we spin up our infrastructure along with support for familiar tools like Slurm, Kubernetes, and Mosaic for easy integration into existing training frameworks.
Voltage Park clusters are already serving exciting AI companies, including Imbue, and are finalizing clusters for other AI leaders like Character.ai and Atomic AI. We expect our remaining compute to come online by early next year.
“Voltage Park helped us access critical compute much more quickly than other providers could have,” said Kanjun Qiu, CEO of Imbue. “Our training requirements are demanding, and their team helped us get the very best performance out of our models (and with super responsive support, too). I hope their infrastructure helps more ML teams train and deploy state-of-the-art models fast.”
As we build out our infrastructure, we want to hear from potential customers how we can engineer our clusters to support the use cases you need most, whether for experimentation, training, fine-tuning, or inference.
Grab a spot on our waitlist by signing up on our website if you're interested in compute, supply is first come first serve.
If building a cloud infrastructure business from the ground up piques your interest, we’re hiring!