- Both options keep training and inference parity (same base model, same setup, same defaults).
- What changes is how you run it (on‑demand vs. your own dedicated endpoint) and how you’re billed.
Model API
On‑demand, no deployment, call a
model_id and pass your LoRA in the request body.Billing: per request
Serverless API (LoRA)
Deploy your LoRA as a dedicated endpoint (a deployment) and call it with a
deployment_id.Billing: GPU uptime, you can autoscale and scale down when idle
How to decide (10 seconds)
If your goal is simply:- “I trained/imported a LoRA and I want to generate with it on top of the base model”
- you want to pick a GPU tier for your workload
- you need to control how many runs can happen in parallel (autoscaling / concurrency)
- you want to keep capacity warm so the first request isn’t slow (reduce cold starts)
- you need more predictable response time for production traffic
Tip: If you’re unsure, start with the Model API. You can always deploy a dedicated endpoint later without changing your prompts or workflow logic.
Quick comparison
| What you care about | Model API (on‑demand) | Serverless API (LoRA) (dedicated endpoint) |
|---|---|---|
| Do I need to deploy anything first? | No | Yes, create a Deployment from your LoRA Asset |
| What ID do I call? | model_id | deployment_id |
| Where do I find that ID? | In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its model_id. | In Trainer > Deployments, open your Deployment and copy the deployment_id from Deployment details. |
| Where does the LoRA go? | You pass the LoRA in the request body (e.g. lora.path) | The LoRA is already attached to the Deployment (you don’t pass lora.path) |
| Submit endpoint | POST https://model-api.runcomfy.net/v1/models/{model_id} | POST https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/inference |
| Job flow | Async: submit > request_id > poll status/result | Async: submit > request_id > poll status/result |
| Billing | Per request | GPU uptime (per‑second; can scale down to zero when idle) |
Same async pattern, different IDs
Both options are asynchronous. Model APIOption A — Model API (on‑demand, no deployment)
Use this when you want the fastest path from LoRA to generation.What you do
- In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its
model_id.- This
model_idrepresents the inference setup you’ll run (the model’s pipeline/workflow).
- This
- Call the Model API with that
model_idand include your LoRA as an input parameter (for examplelora.path). - Poll status and fetch results using the returned
request_id.
Key thing to remember
The Model API will run whatever inference setup themodel_id points to.
For your trained LoRA inference:
model_id= the base model’s inference setup (from the model page)- LoRA = an input you provide on each request
Where to go next
- Start here: Model APIs Quickstart
- LoRA request fields + examples: LoRA Inputs (Trainer)
Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)
Use this when you need more control over runtime behavior (GPU choice, autoscaling, warm instances) and want to call your LoRA through a stable dedicated endpoint.What you do
- In Trainer, turn your LoRA Asset into a Deployment (this creates the dedicated endpoint and pins the LoRA + base model + default settings).
- Copy the
deployment_idfrom the Deployment details page. - Submit inference to
POST /prod/v1/deployments/{deployment_id}/inference. - Poll status and fetch results (same async pattern, but scoped under the deployment).
Key thing to remember
With a Deployment, the LoRA is already part of the endpoint. That means:deployment_idselects the endpoint- Your request body only includes the inputs defined by the deployment’s schema (prompt, images, params, etc.)
- You typically do not pass
lora.pathbecause the LoRA is already attached
Where to go next
- Overview: Serverless API (LoRA) Introduction
- Full walkthrough: Serverless API (LoRA) Quickstart
- Request lifecycle details: Async Queue Endpoints
