Skip to main contentServerless API (LoRA) lets you deploy a LoRA as a dedicated, scalable endpoint (a Deployment) and run inference through a standard async queue API.
It is built on the same serverless system as Serverless API (ComfyUI) — the difference is simply what you deploy:
- Serverless API (ComfyUI): you deploy a ComfyUI workflow
- Serverless API (LoRA): you deploy your trained LoRA (pinned to its base model + default inference config)
If you only want to run LoRA inference without creating a deployment, use the Model API instead.
Start here: Choose a LoRA inference API
Key concepts
Serverless API (LoRA) revolves around three objects:
LoRA Asset
A LoRA Asset is the output of training or importing a LoRA in RunComfy Trainer. It includes:
- LoRA adapter weights (
.safetensors)
- training metadata (for example the base model reference)
- the defaults Trainer uses for inference
Deployment
A Deployment is the serverless endpoint you call from your app.
When you create a Deployment from a LoRA Asset, RunComfy:
- pins the base model checkpoint the LoRA was trained on
- attaches the LoRA weights
- loads the same default inference setup you used in Trainer
This is what gives you “training and inference parity”: the deployed endpoint starts from the same setup that produced your training samples.
Request
A request is a single inference job against a Deployment.
You submit a request, get back a request_id, then poll status/results (or use webhooks).
Where to find things in the UI
Typical workflow
- Train or import a LoRA in Trainer > you get a LoRA Asset
- Create a Deployment (choose hardware + autoscaling)
- Submit inference to the Deployment endpoint (
POST …/inference)
- Poll status (
GET …/status) and fetch outputs (GET …/result)
Next step: Quickstart