Skip to main contentServerless API (LoRA) is billed based on GPU instance uptime (per-second billing). This is different from the Model API, which is billed per request.
Pricing Overview
Serverless API (LoRA) supports two billing plans:
- Pay as You Go: standard hourly rates by machine tier.
- Pro (subscription): 20%–30% discount on Pay as You Go rates.
Prices and machine availability may change. Refer to RunComfy Pricing for the latest machine rates, plan benefits, and extras.
| Machine Type | GPU Options | VRAM | RAM | vCPUs | Pay as You Go Price | Pro Price |
|---|
| Medium | T4, A4000 | 16GB | 16GB | 8 | $0.99/hour | $0.79/hour |
| Large | A10G, A5000 | 24GB | 32GB | 8 | $1.75/hour | $1.39/hour |
| X-Large | A6000 | 48GB | 48GB | 28 | $2.50/hour | $1.99/hour |
| X-Large Plus | L40S, L40 | 48GB | 64GB | 28 | $2.99/hour | $2.15/hour |
| 2X-Large | A100 | 80GB | 96GB | 28 | $4.99/hour | $3.99/hour |
| 2X-Large Plus | H100 | 80GB | 180GB | 28 | $7.49/hour | $5.99/hour |
| 3X-Large | H200 | 141GB | 240GB | 24 | $8.75/hour | $6.99/hour |
How Billing Works
Billing is usage-based and calculated per second:
- Billing starts when an instance is signaled to wake up.
- Billing stops when the instance is fully shut down.
Your deployment can run a mix of persistent and on-demand instances, controlled by minimum_instances and maximum_instances.
Persistent Instances
- Set
minimum_instances > 0 to keep that many instances running.
- You are billed for the full uptime (including idle time) until you scale down.
On-Demand Instances
- Additional instances spin up to handle demand above
minimum_instances (or all demand when minimum_instances = 0).
- They scale down after the keep-warm period.
- You are billed for cold start, execution, and keep-warm time.
Instance Cost Breakdown
- Cold start / warm-up time: instance boots and loads models/assets. Duration depends on machine tier, workflow complexity, and model size.
- Execution time: workflows run. This is the main compute time.
- Keep-warm time: idle time before scale-down. This time is billed.
Note: You may also see Queue Time (waiting for resources or concurrency). Queue time is not billed.
Controlling cost
- Scale to zero: set minimum instances = 0 to avoid idle cost (first request may be slower due to cold start).
- Cap concurrency: keep maximum instances conservative to limit parallel capacity and spend.
- Tune keep-warm: shorter keep-warm lowers idle cost; longer keep-warm reduces cold starts during bursty traffic.
Support
If you believe you’ve been incorrectly billed, contact [email protected] with your deployment ID, request ID (if applicable), and the approximate time of the issue.