RunComfy API Docs

Serverless API (ComfyUI) offers flexible, pay-per-use pricing with no upfront costs. Unlike the Model API (per-request pricing), Serverless API pricing is based on GPU instance uptime for your deployments (billed per second).

Pricing overview

Serverless API (ComfyUI) supports two billing plans:

Pay as You Go: standard hourly rates by machine tier
Pro (subscription): 20%–30% discount on Pay as You Go rates

Prices and machine availability may change. Refer to RunComfy Pricing for the latest machine rates, plan benefits, and extras.

Machine Type	GPU Options	VRAM	RAM	vCPUs	Pay as You Go Price	Pro Price
Medium	T4, A4000	16GB	16GB	8	$0.99/hour	$0.79/hour
Large	A10G, A5000	24GB	32GB	8	$1.75/hour	$1.39/hour
X-Large	A6000	48GB	48GB	28	$2.50/hour	$1.99/hour
X-Large Plus	L40S, L40	48GB	64GB	28	$2.99/hour	$2.15/hour
2X-Large	A100	80GB	96GB	28	$4.99/hour	$3.99/hour
2X-Large Plus	H100	80GB	180GB	28	$7.49/hour	$5.99/hour
3X-Large	H200	141GB	240GB	24	$8.75/hour	$6.99/hour

How billing works

Billing is usage-based and calculated per second:

Billing starts when an instance is signaled to wake up (cold start + initialization).
Billing stops when the instance is fully shut down.

Your deployment can run a mix of persistent and on-demand instances, controlled by:

min_instances / max_instances (autoscaling bounds)
keep_warm_duration_in_seconds (how long to keep idle instances warm)

Persistent instances

Set min_instances > 0 to keep that many instances running.
You are billed for the full uptime (including idle time) until you scale down.

On-demand instances

Additional instances spin up to handle demand above min_instances (or all demand when min_instances = 0).
Instances scale down after the keep-warm period.
You are billed for cold start, execution, and keep-warm time.

Instance cost breakdown

Cold start: instance boots and loads models/assets. Duration depends on machine tier, workflow complexity, and model size.
Execution time: workflows run. This is the main compute time.
Keep-warm time: idle time before scale-down. This time is billed.

Note: You may also see Queue Time (waiting for resources or concurrency). Queue time is not billed.

Support

If you believe you’ve been incorrectly billed, contact us at [email protected] with your deployment_id, the request_id (if applicable), and the approximate time of the issue.

Getting Started

Workflows

Deployment

API Reference

Pricing & Support

Pricing

Pricing overview

How billing works

Persistent instances

On-demand instances

Instance cost breakdown

Support

Getting Started

Workflows

Deployment

API Reference

Pricing & Support

​Pricing overview

​How billing works

​Persistent instances

​On-demand instances

​Instance cost breakdown

​Support

Pricing overview

How billing works

Persistent instances

On-demand instances

Instance cost breakdown

Support