deployment_id using an async queue API (submit → get request_id → poll status/result, or receive updates via webhooks).
What you get
With Serverless API (ComfyUI) you can:- Deploy a workflow as an API, no infra to manage (RunComfy handles containerization + GPU orchestration)
- Choose hardware per deployment (GPU/VRAM tier) and change it later if requirements evolve
- Autoscale with explicit knobs (min/max instances, queue threshold, keep-warm duration)
- Version workflows safely (deployments are pinned to a workflow version; upgrades are explicit and reversible)
- Integrate in production with webhooks and the instance proxy for advanced operations
Key objects
- Workflow (cloud-saved): a ComfyUI workflow packaged together with its runtime (nodes, models, dependencies).
- Workflow version: each Cloud Save creates an immutable version (like a container image snapshot).
- Deployment: the serverless endpoint you call (identified by
deployment_id), pinned to a workflow version. - Request: a single async inference job against a deployment (identified by
request_id). - Instance: a running container for a deployment that actually executes requests; instances scale up/down based on your autoscaling settings.
Typical workflow
- Build or customize a workflow in RunComfy’s ComfyUI Cloud.
- Cloud Save the workflow (creates a version).
- Create a Deployment (choose hardware + autoscaling).
- Submit inference:
POST /prod/v1/deployments/{deployment_id}/inference - Poll status/result (or use webhooks).
How this relates to the other RunComfy APIs
- Model API: on-demand inference for hosted models/pipelines, no deployment, per-request billing, call by
model_id. - Serverless API (LoRA): built on the same serverless deployment system, but what you deploy is a Trainer LoRA (instead of a workflow).
