Cloudscale Logo

Built for AI workloadsSwiss cloud with L40S GPUs

Run large language models, train deep learning systems and accelerate inference.
With access to NVIDIA L40S GPUs – directly from our datacenters in Switzerland.

AI & LLM
Ready

Optimized for training and inference of transformer models and deep neural networks.

Fast
Storage

Blazing-fast access to your training data, with local NVMe SSD scratch disks scalable from 200 GB to up to 1'600 GB.

Scalable
Infrastructure

Easily spin up virtual machines with 1–4 L40S GPUs. Add or remove capacity as your project evolves. From less than 2.20 CHF per hour.

MLOps and
DevOps Ready

Fully automate the provisioning of your GPUs. Get your GPUs as a Service.

Ideal for

  • Training and fine-tuning large language models (LLMs)
  • Running inference pipelines for GenAI and vision models
  • GPU acceleration for NLP, CV and deep learning
  • Academic and commercial AI research
  • Rendering or simulation with CUDA-based software

Performancethat scales

Optimized for multi-GPU scalability – you can get up to four GPUs per instance. GPU servers are based on our “Dedicated CPU Cores” offering with the full performance of the selected number of CPU cores. You rent the full performance and can scale the memory or CPU at any time. Check out the Pricing page for all available configurations.

Dedicated L40S GPUs

  • 48 GB VRAM per GPU
  • Ada Lovelace architecture
  • Full support for CUDA, TensorRT, PyTorch and TensorFlow
Discover Pricing Details

Data sovereignty for AIRun your workloads entirely in our datacenters, so your data remains in Switzerland – delivering the privacy, legal certainty and low‑latency performance you expect.

Even in heavily regulated industries?At cloudscale, we support our customers in meeting the demands placed on them. In addition to the ISO/IEC 27001 certification, a report according to ISAE 3000 is also available to cover processes outsourced to us in your own reporting.

Learn more about security and compliance

FAQs

Can I run LLM training or fine-tuning jobs?Absolutely. The NVIDIA L40S is well-suited for LLM workloads, offering 48 GB of VRAM and strong multi-GPU scaling – ideal for training and fine-tuning large models efficiently.

Is there a minimum commitment?No. Billing is per second. Create and remove GPU servers on demand. Only pay for what you use, as you are used to with all our cloud services.

Can I automate my GPU server?Yes, this service is also available via our API and through tools like our Terraform provider. Whether you are working in DevOps or MLOps, you can seamlessly integrate GPU provisioning into your workflows. There is nothing stopping you from fully automating the lifecycle of your GPU servers – from deployment to teardown.

What about NVIDIA A100, H100, L40S and H200 cards?Currently, we only offer NVIDIA L40S cards for their great cost-to-performance ratio. Using two NVIDIA L40S GPUs can provide similar performance at a lower cost.

Do you support larger setups or GPU clusters?Yes. With Floating IPs (even across our geographically separate cloud locations), private networking and our Load Balancer service, you can seamlessly distribute, migrate and scale your GPU workloads across multiple servers – including setups that go beyond four L40S GPUs.

For which use cases are your GPUs recommended?Our GPUs suit everything from startup AI experiments to enterprise-scale LLM inference based in Switzerland. For a hands-on example, see our blog post on how we set up DeepSeek in our cloud with Ollama.

Try it yourselfLaunch a dedicated GPU server