RunComfy API Docs

RunComfy Trainer lets you run inference with the same LoRA + base model inference setup in two different ways.

Both options keep training and inference parity (same base model, same setup, same defaults).
What changes is how you run it (on‑demand vs. your own dedicated endpoint) and how you’re billed.

Model API

On‑demand, no deployment, call a model_id and pass your LoRA in the request body.
Billing: per request

Serverless API (LoRA)

Deploy your LoRA as a dedicated endpoint (a deployment) and call it with a deployment_id.
Billing: GPU uptime, you can autoscale and scale down when idle

How to decide (10 seconds)

If your goal is simply:

“I trained/imported a LoRA and I want to generate with it on top of the base model”

Start with the Model API. Only choose the Serverless API (LoRA) when you specifically need a dedicated endpoint, for example:

you want to pick a GPU tier for your workload
you need to control how many runs can happen in parallel (autoscaling / concurrency)
you want to keep capacity warm so the first request isn’t slow (reduce cold starts)
you need more predictable response time for production traffic

Tip: If you’re unsure, start with the Model API. You can always deploy a dedicated endpoint later without changing your prompts or workflow logic.

Quick comparison

What you care about	Model API (on‑demand)	Serverless API (LoRA) (dedicated endpoint)
Do I need to deploy anything first?	No	Yes, create a Deployment from your LoRA Asset
What ID do I call?	`model_id`	`deployment_id`
Where do I find that ID?	In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its `model_id`.	In Trainer > Deployments, open your Deployment and copy the `deployment_id` from Deployment details.
Where does the LoRA go?	You pass the LoRA in the request body (e.g. `lora.path`)	The LoRA is already attached to the Deployment (you don’t pass `lora.path`)
Submit endpoint	`POST https://model-api.runcomfy.net/v1/models/{model_id}`	`POST https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/inference`
Job flow	Async: submit > `request_id` > poll status/result	Async: submit > `request_id` > poll status/result
Billing	Per request	GPU uptime (per‑second; can scale down to zero when idle)

Same async pattern, different IDs

Both options are asynchronous. Model API

POST  https://model-api.runcomfy.net/v1/models/{model_id}
  -> returns request_id
GET   https://model-api.runcomfy.net/v1/requests/{request_id}/status
GET   https://model-api.runcomfy.net/v1/requests/{request_id}/result
POST  https://model-api.runcomfy.net/v1/requests/{request_id}/cancel

Serverless API (LoRA) (Deployment)

POST  https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/inference
  -> returns request_id
GET   https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/status
GET   https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/result
POST  https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/cancel

Option A — Model API (on‑demand, no deployment)

Use this when you want the fastest path from LoRA to generation.

What you do

In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its model_id.
- This model_id represents the inference setup you’ll run (the model’s pipeline/workflow).
Call the Model API with that model_id and include your LoRA as an input parameter (for example lora.path).
Poll status and fetch results using the returned request_id.

Key thing to remember

The Model API will run whatever inference setup the model_id points to. For your trained LoRA inference:

model_id = the base model’s inference setup (from the model page)
LoRA = an input you provide on each request

Where to go next

Start here: Model APIs Quickstart
LoRA request fields + examples: LoRA Inputs (Trainer)

Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)

Use this when you need more control over runtime behavior (GPU choice, autoscaling, warm instances) and want to call your LoRA through a stable dedicated endpoint.

What you do

In Trainer, turn your LoRA Asset into a Deployment (this creates the dedicated endpoint and pins the LoRA + base model + default settings).
Copy the deployment_id from the Deployment details page.
Submit inference to POST /prod/v1/deployments/{deployment_id}/inference.
Poll status and fetch results (same async pattern, but scoped under the deployment).

Key thing to remember

With a Deployment, the LoRA is already part of the endpoint. That means:

deployment_id selects the endpoint
Your request body only includes the inputs defined by the deployment’s schema (prompt, images, params, etc.)
You typically do not pass lora.path because the LoRA is already attached

Where to go next

Overview: Serverless API (LoRA) Introduction
Full walkthrough: Serverless API (LoRA) Quickstart
Request lifecycle details: Async Queue Endpoints

Getting Started

Deployment

API Reference

Pricing & Support

Choose a LoRA Inference API

Model API

Serverless API (LoRA)

How to decide (10 seconds)

Quick comparison

Same async pattern, different IDs

Option A — Model API (on‑demand, no deployment)

What you do

Key thing to remember

Where to go next

Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)

What you do

Key thing to remember

Where to go next

Getting Started

Deployment

API Reference

Pricing & Support

Model API

Serverless API (LoRA)

​How to decide (10 seconds)

​Quick comparison

​Same async pattern, different IDs

​Option A — Model API (on‑demand, no deployment)

​What you do

​Key thing to remember

​Where to go next

​Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)

​What you do

​Key thing to remember

​Where to go next

How to decide (10 seconds)

Quick comparison

Same async pattern, different IDs

Option A — Model API (on‑demand, no deployment)

What you do

Key thing to remember

Where to go next

Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)

What you do

Key thing to remember

Where to go next