RunComfy lets you take any ComfyUI workflow and instantly turn it into a serverless API, giving you a direct path from prototype to production without the operational headaches. Your generative AI pipelines become scalable, production-ready endpoints, no servers to maintain, no GPUs to provision, no dependency conflicts to chase down. Behind the scenes, RunComfy packages your entire workflow, nodes, models, dependencies, and hardware settings into a fully reproducible cloud environment. Containerization ensures that what you deploy today will run exactly the same tomorrow, while cloud orchestration scales on demand. You get to focus on building and iterating, while RunComfy handles everything else.

Key Features

No-Hassle Deployment: Turn any ComfyUI workflow into a production API in just a few clicks. When you save a workflow to the cloud, RunComfy captures the entire runtime, including nodes, models, drivers, and libraries, and turns it into a self-contained, reproducible environment. That saved environment becomes the container your API runs on. With deployment handled automatically, there are no servers to configure, no dependencies to troubleshoot, and no extra infrastructure to maintain. High-Performance GPUs: Choose hardware that matches your model’s memory and performance needs. Options range from 16 GB GPUs such as T4 or A4000, through 24 GB and 48 GB tiers, up to 80 GB A100 and H100 cards, and all the way to the 141 GB H200 for the heaviest workloads. This flexibility lets you right-size cost and throughput without changing your workflow code. Scale On-Demand: Autoscaling keeps latency low during bursts and costs low when traffic is quiet. You control the minimum and maximum instance counts, the queue size threshold that triggers new instances, and how long to keep an instance warm before it shuts down. Set minimum instances to zero for scale-to-idle or keep one warm to avoid cold starts, then tune based on real usage. Workflow Versioning: Iterate safely with versioned workflows. Each cloud save produces an isolated, fully captured environment and a new version you can test in isolation. Deployments are pinned to a specific version, so you can roll forward or roll back with minimal disruption and without affecting live traffic until you switch versions. Real-Time Monitoring: Track each request directly in the RunComfy dashboard with real-time visibility into queue wait, cold start time, execution time, and total duration, alongside billing details. With this level of insight, you can tune autoscaling, queue limits, and keep-warm settings, or switch to a different GPU tier to meet your latency targets while keeping costs under control. 200+ Ready-to-Deploy Templates: Start fast by picking from a large library of pre-saved community workflows that already run in the cloud. You can explore, launch, customize, save your own version, and deploy it as an API in minutes, which is ideal for prototyping new features or standing up a service without starting from scratch. Pay-Per-Use Pricing: Only pay for the GPU time you actually consume. Whether you see steady usage or spiky demand, transparent pay-per-use billing works hand in hand with autoscaling to keep costs predictable while maintaining performance.
Alternative Option: RunComfy Server API with ComfyUI Backend API
To accommodate diverse integration needs, RunComfy provides not only the Serverless API but also the Server API paired with the ComfyUI Backend API, giving you complete control over the ComfyUI backend. These APIs let you spin up and fully manage a dedicated ComfyUI backend instance, making them ideal for scenarios where you need to integrate the ComfyUI backend into tools like Krita, Photoshop, Blender, iClone, or other software. For more details, refer to the documentation here.