Nebius AI Cloud provides infrastructure designed for AI developers and practitioners to build and deploy generative AI applications. The platform addresses the need for running and training machine learning models by offering resources for developing AI solutions in a secure and performance-oriented cloud environment. Its services aim to support efficient handling of workloads associated with AI development across various organizational contexts.
Do You Manage Peer Insights at Nebius?
Access Vendor Portal to update and manage your profile.
The quality of the GPUs is excellent, with consistently strong and predictable performance, and perfect all-to-all connectivity. I also really value the managed services, which are easy to set up and we use them constantly (e.g. managed k8s and SkyPilot). The platform has been very reliable and we have not experienced any issues on the hardware side.
Strong CLI support, managed Kubernetes, Smooth integration with existing workflows.
The cluster works well and it's robust. Rarely we had problems with failing nodes during training (we were using more than 15 nodes for post-training).
Capacity planning is less predictable, sometimes no on-demand resources. Sometimes small bugs in the UI, e.g. no pay button and the resolution took longer than expected. But it didn't cause us problems, we just couldn't pay them.
- Capacity issues in certain regions / GPU types - Unexpected disruptions when using on-demand instances (due to preemption) - limited predictability in resource availability
It's hard to find some GPUs. E.g. There was limited availability for H200's so we had to switch to B300's to increase our compute