Skip to main content
Edit a deployment when you want to update what the endpoint serves, such as switching to a newer LoRA revision, or change how it runs, including hardware and scaling, without changing the deployment’s identity.

Pinned base model

A deployment always pins the base model checkpoint used during training and treats it as part of the contract. You can edit the deployment’s LoRA, hardware tier, autoscaling behavior, and whether it is enabled or disabled, but you cannot change the base model. If you need to serve a LoRA trained on a different base model, create a new deployment for that LoRA instead of editing the existing one.

Change LoRA on the same base model

Use this when you trained a new revision or imported updated .safetensors weights and want the same endpoint to start serving the new adapter. First confirm the new LoRA asset exists in LoRA Assets. Then open the deployment, choose Edit, select the new LoRA in the LoRA picker, and save. Because the base model is locked, the LoRA picker only shows LoRAs that are compatible with the deployment’s base checkpoint, so you cannot accidentally select an incompatible adapter.

Tune hardware and scaling

You can switch GPU tiers to match your VRAM and throughput needs, and tune autoscaling settings such as minimum instances, maximum instances, queue size, and keep warm to balance cost, latency, and cold starts.

Rollout behavior

Changing the served LoRA or hardware tier may trigger a brief warm-up. If minimum instances is 0, the first request may incur a cold start. Autoscaling-only changes take effect immediately for future scaling decisions, though brief queuing may occur while updated instances come online.

Enable, disable, delete

  • Disable a deployment Disabling a deployment is an immediate “off switch”. It stops serving requests and shuts down capacity to halt runtime cost. New requests will fail immediately, and in-flight requests may be interrupted depending on execution state.
  • Re-enable a deployment Re-enabling makes the deployment accept requests again and applies your autoscaling rules. If minimum instances is 0, the first request after enabling may incur a cold start.
  • Delete a deployment Delete a deployment only when you no longer need the endpoint itself. Deleting removes the deployment configuration and endpoint, but it does not delete your underlying LoRA assets. Those remain available for future deployments.

Save changes

After updating the LoRA selection, hardware, scaling settings, or enabled state, save the edit to apply it.