RunComfy Serverless API offers flexible, pay-per-use pricing with no upfront costs. This guide explains how pricing works and how you can manage your costs effectively.

Pricing Overview

You can use the API under two plans: Hobby (pay-as-you-go) or Pro (subscription). Under the Hobby plan, you pay standard hourly rates for different machine tiers. Under the Pro plan, you receive a 20%–30% discount on those rates. For complete details on machine rates, plan benefits, and extras like increased storage or priority support, see the RunComfy Pricing page.
Machine TypeGPU OptionsVRAMRAMvCPUsHobby Price per HourPro Price per Hour
MediumT4, A400016GB16GB8$0.99$0.79
LargeA10G, A500024GB32GB8$1.75$1.39
X-LargeA600048GB48GB28$2.50$1.99
X-Large PlusL40S, L4048GB64GB28$2.99$2.15
2X-LargeA10080GB96GB28$4.99$3.99
2X-Large PlusH10080GB180GB28$7.49$5.99

How Billing Works

Billing for the Serverless API is usage-based and calculated per second. Charges begin when an instance is signaled to wake up and stop when the instance is fully shut down. This ensures you only pay for the exact compute time you use, giving you the flexibility to scale up for heavy workloads and scale down to zero when idle, without incurring costs for unused capacity. Your deployment can include persistent instances, on-demand instances, or a combination of both, depending on the minimum_instances and maximum_instances settings:

Persistent Instances

When you set minimum_instances > 0 in your deployment settings, that number of instances is created and stays active even without requests. Billing for these persistent instances starts shortly after saving the changes and continues until you reduce minimum_instances (causing excess instances to scale down). The full duration, including idle time, is billed. This setup supports immediate responses by avoiding cold starts for the baseline capacity, but balance the performance gains with costs for your specific workload.

On-Demand Instances

On-demand instances spin up automatically to handle additional demand beyond your minimum_instances (or all demand if minimum_instances = 0). They scale down completely when idle after the keep-warm period. You’re billed only for their active time during requests, including cold start, execution, and keep-warm durations. This suits fluctuating workloads, aligning costs directly with demand, and is ideal for variable loads, non-time-sensitive apps, and optimizing costs in sporadic usage. If maximum_instances > minimum_instances, these on-demand instances provide elastic scaling on top of any persistent baseline.

Instance Cost Breakdown

Cold Start Time: The time taken for an instance to launch from a fully scaled-down state. This includes container startup and loading workflow models and assets into GPU memory. Duration varies based on machine tier, workflow complexity, and model size, and can take several minutes. Cold start time, when it occurs, is billed as part of active instance usage. Execution Time: The period when the instance is actively running workflows. This is the main compute time, affected by workflow complexity, input size, and GPU performance. Execution time is fully billed. Keep Warm Time: The period an instance remains active after completing its last task, based on the keep-warm timeout configured in Deployment. This helps handle bursts of requests without additional cold starts. Keep-warm time is also billed.
Note: In your Request page, you may also see Queue Time, the period an instance is waiting before it starts work. This can happen while GPU resources are being allocated, concurrency limits are reached, or earlier tasks are still running. Queue time is not billed, as the instance has not yet begun active use.

Support

If you believe you’ve been incorrectly billed, please contact us at hi@runcomfy.com with your deployment ID, the request ID (if applicable), and the approximate time of the issue.