Skip to main content
RunComfy Trainer lets you run inference with the same LoRA + base model inference setup in two different ways.
  • Both options keep training and inference parity (same base model, same setup, same defaults).
  • What changes is how you run it (on‑demand vs. your own dedicated endpoint) and how you’re billed.

How to decide (10 seconds)

If your goal is simply:
  • I trained/imported a LoRA and I want to generate with it on top of the base model
Start with the Model API. Only choose the Serverless API (LoRA) when you specifically need a dedicated endpoint, for example:
  • you want to pick a GPU tier for your workload
  • you need to control how many runs can happen in parallel (autoscaling / concurrency)
  • you want to keep capacity warm so the first request isn’t slow (reduce cold starts)
  • you need more predictable response time for production traffic
Tip: If you’re unsure, start with the Model API. You can always deploy a dedicated endpoint later without changing your prompts or workflow logic.

Quick comparison

What you care aboutModel API (on‑demand)Serverless API (LoRA) (dedicated endpoint)
Do I need to deploy anything first?NoYes, create a Deployment from your LoRA Asset
What ID do I call?model_iddeployment_id
Where do I find that ID?In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its model_id.In Trainer > Deployments, open your Deployment and copy the deployment_id from Deployment details.
Where does the LoRA go?You pass the LoRA in the request body (e.g. lora.path)The LoRA is already attached to the Deployment (you don’t pass lora.path)
Submit endpointPOST https://model-api.runcomfy.net/v1/models/{model_id}POST https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/inference
Job flowAsync: submit > request_id > poll status/resultAsync: submit > request_id > poll status/result
BillingPer requestGPU uptime (per‑second; can scale down to zero when idle)

Same async pattern, different IDs

Both options are asynchronous. Model API
POST  https://model-api.runcomfy.net/v1/models/{model_id}
  -> returns request_id
GET   https://model-api.runcomfy.net/v1/requests/{request_id}/status
GET   https://model-api.runcomfy.net/v1/requests/{request_id}/result
POST  https://model-api.runcomfy.net/v1/requests/{request_id}/cancel
Serverless API (LoRA) (Deployment)
POST  https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/inference
  -> returns request_id
GET   https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/status
GET   https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/result
POST  https://api.runcomfy.net/prod/v1/deployments/{deployment_id}/requests/{request_id}/cancel

Option A — Model API (on‑demand, no deployment)

Use this when you want the fastest path from LoRA to generation.

What you do

  1. In Trainer > Run LoRA, select your LoRA’s base model, then open that base model page and copy its model_id.
    • This model_id represents the inference setup you’ll run (the model’s pipeline/workflow).
  2. Call the Model API with that model_id and include your LoRA as an input parameter (for example lora.path).
  3. Poll status and fetch results using the returned request_id.

Key thing to remember

The Model API will run whatever inference setup the model_id points to. For your trained LoRA inference:
  • model_id = the base model’s inference setup (from the model page)
  • LoRA = an input you provide on each request

Where to go next


Option B — Serverless API (LoRA) (dedicated endpoint / Deployment)

Use this when you need more control over runtime behavior (GPU choice, autoscaling, warm instances) and want to call your LoRA through a stable dedicated endpoint.

What you do

  1. In Trainer, turn your LoRA Asset into a Deployment (this creates the dedicated endpoint and pins the LoRA + base model + default settings).
  2. Copy the deployment_id from the Deployment details page.
  3. Submit inference to POST /prod/v1/deployments/{deployment_id}/inference.
  4. Poll status and fetch results (same async pattern, but scoped under the deployment).

Key thing to remember

With a Deployment, the LoRA is already part of the endpoint. That means:
  • deployment_id selects the endpoint
  • Your request body only includes the inputs defined by the deployment’s schema (prompt, images, params, etc.)
  • You typically do not pass lora.path because the LoRA is already attached

Where to go next