Ollama Costs: CPU vs GPU Ranked by $/Month
Llama 7B runs on Hetzner's 8 GB CPU plan at $7.50/mo. Larger models need GPU. Our GPU-VPS chart shows bare-metal vs cloud GPU pricing. Spec by spec comparison—no affiliate rankings.
Hetzner — Lowest $/GB for Ollama Inference
$7.50/mo for 16 GB RAM and 2 vCPU. Llama 7B runs at ~2 sec/token CPU; 13B models fit in this tier. In our 12-provider chart, Hetzner leads on cost-per-GB for LLM inference.
Get Hetzner VPS →Ollama — When CPU Makes Sense vs GPU
Ollama runs open-source LLMs (Llama, Mistral, CodeLlama, Phi) locally on a VPS. No cloud API fees; model weights stay on your server. Download a model with one command, then query via CLI or HTTP API. Quantized versions (Q4, Q5) shrink 70B models to fit 16–32 GB.
CPU inference is bound by latency, not cost. Llama 7B on Hetzner's $7.50/mo 16 GB plan generates ~2 tokens/sec (slow for chat). GPU acceleration cuts that to 50–200ms per token, but GPUs cost $20–100/mo. Our chart ranks CPU tiers by model fit: 7B needs 8 GB, 13B needs 16 GB, 70B needs 64+ GB or GPU.
Pick CPU for batch workloads, experimentation, and isolated inference. Pick GPU for user-facing chat or real-time automation. Either way, self-hosted Ollama has zero per-inference fees—unlike cloud APIs.
Minimum Server Requirements for Ollama
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB |
| CPU | 4 vCPU | 2+ vCPUs |
| Storage | 50 GB | 40+ GB NVMe |
| OS | Ubuntu 22.04+ | Ubuntu 24.04 LTS |
Top 5 VPS Providers for Ollama Compared
We deployed Ollama on each provider and measured startup time, response latency, and resource usage. Here are the results:
Pros
- Unbeatable price-to-performance ratio
- European data centers with strong privacy
- NVMe storage on all plans
Cons
- No US data centers
- Control panel less polished than competitors
All Hetzner Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| CX22 | 2 vCPU | 4 GB | 40 GB NVMe | $4.15/mo | Get Plan → |
| CX32 | 4 vCPU | 8 GB | 80 GB NVMe | $7.49/mo | Get Plan → |
| CX42 | 8 vCPU | 16 GB | 160 GB NVMe | $14.49/mo | Get Plan → |
| CX52 | 16 vCPU | 32 GB | 320 GB NVMe | $28.49/mo | Get Plan → |
Pros
- Very beginner-friendly control panel
- Competitive pricing with frequent deals
- 24/7 customer support
Cons
- Renewal prices are higher
- Limited advanced configuration options
All Hostinger Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| KVM 1 | 1 vCPU | 4 GB | 50 GB NVMe | $4.99/mo | Get Plan → |
| KVM 2 | 2 vCPU | 8 GB | 100 GB NVMe | $6.99/mo | Get Plan → |
| KVM 4 | 4 vCPU | 16 GB | 200 GB NVMe | $12.99/mo | Get Plan → |
| KVM 8 | 8 vCPU | 32 GB | 400 GB NVMe | $19.99/mo | Get Plan → |
Pros
- Excellent documentation and tutorials
- $200 free credit for new accounts
- Strong developer ecosystem
Cons
- Higher pricing than budget providers
- No phone support available
All DigitalOcean Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| Basic | 1 vCPU | 2 GB | 50 GB SSD | $12.00/mo | Get Plan → |
| Regular | 2 vCPU | 4 GB | 80 GB SSD | $24.00/mo | Get Plan → |
| CPU-Optimized | 2 vCPU | 4 GB | 25 GB SSD | $42.00/mo | Get Plan → |
| Memory-Opt | 2 vCPU | 16 GB | 50 GB SSD | $84.00/mo | Get Plan → |
Pros
- 32 data center locations worldwide
- Hourly billing with no lock-in
- High-performance NVMe storage
Cons
- Interface can be overwhelming for beginners
- Support response times vary
All Vultr Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| Cloud Compute | 1 vCPU | 2 GB | 50 GB SSD | $10.00/mo | Get Plan → |
| Cloud Compute | 2 vCPU | 4 GB | 80 GB SSD | $20.00/mo | Get Plan → |
| High Frequency | 2 vCPU | 4 GB | 64 GB NVMe | $24.00/mo | Get Plan → |
| Bare Metal | E-2286G | 32 GB | 2x 480GB SSD | $120.00/mo | Get Plan → |
Pros
- One-click deploys from Git
- Auto-scaling based on usage
- No server management needed
Cons
- Can get expensive at scale
- Less control over infrastructure
All Railway Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| Hobby | Shared 8 vCPU | 8 GB | 100 GB | $5.00/mo | Get Plan → |
| Pro | Shared 32 vCPU | 32 GB | 250 GB | $20.00/mo | Get Plan → |
| Enterprise | Custom | Custom | Custom | Custom | Get Plan → |
Architecture Overview
A typical Ollama deployment on a VPS uses Docker for easy management and Nginx as a reverse proxy:
Ollama Deployment Architecture
How to Set Up Ollama on a VPS
Step 1: Provision a high-memory VPS
Choose your VPS provider (we recommend Hetzner for the best value), select an Ubuntu 24.04 LTS image, and configure your SSH keys. Most providers have this ready in under 2 minutes.
Step 2: Install Ollama and pull models
SSH into your server, install Docker and Docker Compose, and pull the Ollama container image. Configure your environment variables and Docker Compose file according to the official documentation.
Step 3: Configure API access and security
Set up Nginx as a reverse proxy with SSL certificates from Let's Encrypt. Point your domain to the server IP, and your Ollama instance will be accessible via HTTPS.
Frequently Asked Questions
How much RAM for Ollama?
In our CPU chart: 7B models fit in 8 GB. 13B need 16 GB. 70B need 64+ GB or GPU. Our Hetzner 16 GB plan ($12/mo) suits 13B models. See the side-by-side specs for all tiers.
Can Ollama run without a GPU?
Yes. CPU inference is available. Llama 7B on Hetzner's 8 GB takes ~2 sec/token. Slow, but works. Our chart shows CPU costs; GPU costs appear in our GPU-VPS sheet.
Which model should I start with?
Llama 2 7B is lightweight (8 GB RAM). Mistral 7B is faster. For production, Llama 3.1 8B needs balanced CPU or GPU. Ranked by model and provider cost in our data.
Is Ollama free?
The software is free and open source. You pay only VPS or GPU rental. Our chart shows the full monthly cost per provider—no markup.
Can I use Ollama with Open WebUI?
Yes. Ollama runs the LLM; Open WebUI runs the chat interface. Both deploy on the same 8 GB VPS. See our Open WebUI page for combined cost.