Skip to main content
Serverless API (LoRA) lets you deploy a LoRA as a dedicated, scalable endpoint (a Deployment) and run inference through a standard async queue API. It is built on the same serverless system as Serverless API (ComfyUI) — the difference is simply what you deploy:
  • Serverless API (ComfyUI): you deploy a ComfyUI workflow
  • Serverless API (LoRA): you deploy your trained LoRA (pinned to its base model + default inference config)
If you only want to run LoRA inference without creating a deployment, use the Model API instead.
Start here: Choose a LoRA inference API

Key concepts

Serverless API (LoRA) revolves around three objects:

LoRA Asset

A LoRA Asset is the output of training or importing a LoRA in RunComfy Trainer. It includes:
  • LoRA adapter weights (.safetensors)
  • training metadata (for example the base model reference)
  • the defaults Trainer uses for inference

Deployment

A Deployment is the serverless endpoint you call from your app. When you create a Deployment from a LoRA Asset, RunComfy:
  • pins the base model checkpoint the LoRA was trained on
  • attaches the LoRA weights
  • loads the same default inference setup you used in Trainer
This is what gives you “training and inference parity”: the deployed endpoint starts from the same setup that produced your training samples.

Request

A request is a single inference job against a Deployment. You submit a request, get back a request_id, then poll status/results (or use webhooks).

Where to find things in the UI


Typical workflow

  1. Train or import a LoRA in Trainer > you get a LoRA Asset
  2. Create a Deployment (choose hardware + autoscaling)
  3. Submit inference to the Deployment endpoint (POST …/inference)
  4. Poll status (GET …/status) and fetch outputs (GET …/result)
Next step: Quickstart