RunComfy API Docs

Serverless API (LoRA) is billed based on GPU instance uptime (per-second billing). This is different from the Model API, which is billed per request.

Pricing Overview

Serverless API (LoRA) supports two billing plans:

Pay as You Go: standard hourly rates by machine tier.
Pro (subscription): 20%–30% discount on Pay as You Go rates.

Prices and machine availability may change. Refer to RunComfy Pricing for the latest machine rates, plan benefits, and extras.

Machine Type	GPU Options	VRAM	RAM	vCPUs	Pay as You Go Price	Pro Price
Medium	T4, A4000	16GB	16GB	8	$0.99/hour	$0.79/hour
Large	A10G, A5000	24GB	32GB	8	$1.75/hour	$1.39/hour
X-Large	A6000	48GB	48GB	28	$2.50/hour	$1.99/hour
X-Large Plus	L40S, L40	48GB	64GB	28	$2.99/hour	$2.15/hour
2X-Large	A100	80GB	96GB	28	$4.99/hour	$3.99/hour
2X-Large Plus	H100	80GB	180GB	28	$7.49/hour	$5.99/hour
3X-Large	H200	141GB	240GB	24	$8.75/hour	$6.99/hour

How Billing Works

Billing is usage-based and calculated per second:

Billing starts when an instance is signaled to wake up.
Billing stops when the instance is fully shut down.

Your deployment can run a mix of persistent and on-demand instances, controlled by minimum_instances and maximum_instances.

Persistent Instances

Set minimum_instances > 0 to keep that many instances running.
You are billed for the full uptime (including idle time) until you scale down.

On-Demand Instances

Additional instances spin up to handle demand above minimum_instances (or all demand when minimum_instances = 0).
They scale down after the keep-warm period.
You are billed for cold start, execution, and keep-warm time.

Instance Cost Breakdown

Cold start / warm-up time: instance boots and loads models/assets. Duration depends on machine tier, workflow complexity, and model size.
Execution time: workflows run. This is the main compute time.
Keep-warm time: idle time before scale-down. This time is billed.

Note: You may also see Queue Time (waiting for resources or concurrency). Queue time is not billed.

Controlling cost

Scale to zero: set minimum instances = 0 to avoid idle cost (first request may be slower due to cold start).
Cap concurrency: keep maximum instances conservative to limit parallel capacity and spend.
Tune keep-warm: shorter keep-warm lowers idle cost; longer keep-warm reduces cold starts during bursty traffic.

Support

If you believe you’ve been incorrectly billed, contact hi@runcomfy.com with your deployment ID, request ID (if applicable), and the approximate time of the issue.

Getting Started

Deployment

API Reference

Pricing & Support

Pricing

Pricing Overview

How Billing Works

Persistent Instances

On-Demand Instances

Instance Cost Breakdown

Controlling cost

Support

Getting Started

Deployment

API Reference

Pricing & Support

​Pricing Overview

​How Billing Works

​Persistent Instances

​On-Demand Instances

​Instance Cost Breakdown

​Controlling cost

​Support

Pricing Overview

How Billing Works

Persistent Instances

On-Demand Instances

Instance Cost Breakdown

Controlling cost

Support