Skip to main content
Serverless API (LoRA) is billed based on GPU instance uptime (per-second billing). This is different from the Model API, which is billed per request.

Pricing Overview

Serverless API (LoRA) supports two billing plans:
  • Pay as You Go: standard hourly rates by machine tier.
  • Pro (subscription): 20%–30% discount on Pay as You Go rates.
Prices and machine availability may change. Refer to RunComfy Pricing for the latest machine rates, plan benefits, and extras.
Machine TypeGPU OptionsVRAMRAMvCPUsPay as You Go PricePro Price
MediumT4, A400016GB16GB8$0.99/hour$0.79/hour
LargeA10G, A500024GB32GB8$1.75/hour$1.39/hour
X-LargeA600048GB48GB28$2.50/hour$1.99/hour
X-Large PlusL40S, L4048GB64GB28$2.99/hour$2.15/hour
2X-LargeA10080GB96GB28$4.99/hour$3.99/hour
2X-Large PlusH10080GB180GB28$7.49/hour$5.99/hour
3X-LargeH200141GB240GB24$8.75/hour$6.99/hour

How Billing Works

Billing is usage-based and calculated per second:
  • Billing starts when an instance is signaled to wake up.
  • Billing stops when the instance is fully shut down.
Your deployment can run a mix of persistent and on-demand instances, controlled by minimum_instances and maximum_instances.

Persistent Instances

  • Set minimum_instances > 0 to keep that many instances running.
  • You are billed for the full uptime (including idle time) until you scale down.

On-Demand Instances

  • Additional instances spin up to handle demand above minimum_instances (or all demand when minimum_instances = 0).
  • They scale down after the keep-warm period.
  • You are billed for cold start, execution, and keep-warm time.

Instance Cost Breakdown

  • Cold start / warm-up time: instance boots and loads models/assets. Duration depends on machine tier, workflow complexity, and model size.
  • Execution time: workflows run. This is the main compute time.
  • Keep-warm time: idle time before scale-down. This time is billed.
Note: You may also see Queue Time (waiting for resources or concurrency). Queue time is not billed.

Controlling cost

  • Scale to zero: set minimum instances = 0 to avoid idle cost (first request may be slower due to cold start).
  • Cap concurrency: keep maximum instances conservative to limit parallel capacity and spend.
  • Tune keep-warm: shorter keep-warm lowers idle cost; longer keep-warm reduces cold starts during bursty traffic.

Support

If you believe you’ve been incorrectly billed, contact [email protected] with your deployment ID, request ID (if applicable), and the approximate time of the issue.