> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runcomfy.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Async Queue Endpoints - Datasets

A **dataset** is the collection of training data you use when training a LoRA. Before you start a training job, you must create and upload a dataset first—only datasets in `READY` status can be mounted and used by a training job.

## Quickstart (minimum working flow)

1. `POST /prod/v1/trainers/datasets` → create a dataset (get `dataset_id` + `dataset_name`)
2. Upload files
   * **≤150MB per file**: `POST /prod/v1/trainers/datasets/{dataset_id}/upload`
   * **>150MB per file**: `POST /prod/v1/trainers/datasets/{dataset_id}/get-upload-endpoint` → `PUT` each file to the returned `upload_url`
3. `GET /prod/v1/trainers/datasets/{dataset_id}/status` → poll until `READY`
4. Use `dataset_name` in training job requests

## Dataset status lifecycle

Datasets move through these statuses:

* **`DRAFT`**: dataset resource created, but it contains **no uploaded files yet**
* **`UPLOADING`**: dataset is currently receiving files (either direct upload or signed URL uploads)
* **`READY`**: all uploaded files are complete and validation passed; the duration depends on **file count**, **file size**, and whether all uploads complete successfully; when it is READY, the dataset can be mounted by a training job
* **`FAILED`**: upload/validation failed; `error` field is present

***

## Endpoints

**Base URL**: `https://trainer-api.runcomfy.net`

| Endpoint                                                      | Method   | Description                                                |
| ------------------------------------------------------------- | -------- | ---------------------------------------------------------- |
| `/prod/v1/trainers/datasets`                                  | `POST`   | Create a dataset resource (metadata only)                  |
| `/prod/v1/trainers/datasets/{dataset_id}/upload`              | `POST`   | Upload a dataset file (**≤150MB**)                         |
| `/prod/v1/trainers/datasets/{dataset_id}/get-upload-endpoint` | `POST`   | Get **signed upload URLs** (for larger/multi-file uploads) |
| `/prod/v1/trainers/datasets/{dataset_id}/status`              | `GET`    | Get a dataset status                                       |
| `/prod/v1/trainers/datasets`                                  | `GET`    | List datasets                                              |
| `/prod/v1/trainers/datasets/{dataset_id}`                     | `DELETE` | Delete a dataset                                           |

***

## Common Parameters

| Field        | Type   | Description                                                                                                       |
| ------------ | ------ | ----------------------------------------------------------------------------------------------------------------- |
| `id`         | string | Stable identifier for this dataset (used as `dataset_id` in API paths for upload/status/delete)                   |
| `name`       | string | Human-readable dataset name (used as `dataset_name` in training job requests; must be unique within your account) |
| `status`     | string | One of: `DRAFT`, `UPLOADING`, `READY`, `FAILED`                                                                   |
| `created_at` | string | ISO 8601 timestamp (microsecond precision, e.g. `2025-07-22T13:05:16.143086`)                                     |
| `updated_at` | string | ISO 8601 timestamp (microsecond precision, e.g. `2025-07-22T13:05:16.143086`)                                     |
| `error`      | object | Present when `status = FAILED`                                                                                    |

***

## Create a dataset

Create a new dataset resource (metadata only) that you will upload training files into. Right after creation, the dataset is empty (no files uploaded yet) and its `status` is `DRAFT`.

```
POST /prod/v1/trainers/datasets
```

### Request body

| Field  | Type   | Required | Description                                                                                                                                                                                   |
| ------ | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | string | no       | Optional. Human-readable dataset name. Must be unique within your account. This value is used as `dataset_name` in training job requests. If omitted, RunComfy generates one (e.g. `ds_...`). |

### Request example

```bash theme={null}
curl --request POST \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <token>" \
  --data '{
    "name": "<YOUR_DATASET_NAME>"
  }'
```

### Response example

```json theme={null}
{
  "id": "{dataset_id}",
  "name": "{dataset_name}",
  "status": "DRAFT",
  "created_at": "2026-01-31T10:20:30.143086",
  "updated_at": "2026-01-31T10:20:30.143086"
}
```

***

## Upload a dataset file (≤150MB)

Use this endpoint for small files. For larger uploads or multi-file batches, use **Get signed upload URLs**.

**Rules (important):**

* **Size limit**: **≤150MB per file** (150,000,000 bytes). For larger files, use signed URLs.

* **Supported file types**: images, videos, and caption `.txt` files.

* **Caption naming rule (critical for LoRA / AI Toolkit)**: each image/video must have a caption file with the **same base filename**.
  * Example: `img_0001.jpg` ↔ `img_0001.txt`
  * Example: `clip_0001.mp4` ↔ `clip_0001.txt`

* **Track upload success per file**: check the response for each upload request. If an upload fails, the response returns an error and the file is **not** added to the dataset.

* If the **same filename** is uploaded multiple times within the same `dataset_id`, the **latest upload overwrites** the previous one.

* In `curl --form "file=@./path/to/file"`, the `@./path/to/file` is a local path on the machine running `curl` (relative to your current directory or an absolute path).

```
POST /prod/v1/trainers/datasets/{dataset_id}/upload
```

### Request

* `file` (required): the file to upload

### Request example

```bash theme={null}
curl --request POST \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets/{dataset_id}/upload" \
  --header "Authorization: Bearer <token>" \
  --form "file=@./dog_01.jpg"
```

### Response example

```json theme={null}
{
  "id": "{dataset_id}",
  "name": "{dataset_name}",
  "object": "file",
  "bytes": 2134567,
  "created_at": "2026-01-31T10:21:05.143086",
  "filename": "dog_01.jpg"
}
```

***

## Get signed upload URLs (file size > 150MB)

RunComfy returns short-lived signed URLs you can upload to (typically object storage). Use this when a file is >150MB.

```
POST /prod/v1/trainers/datasets/{dataset_id}/get-upload-endpoint
```

### Request body

For multi-file uploads, provide a map of `filename -> size_in_bytes`.

Notes:  **`size_in_bytes` must exactly match the actual file size in bytes.**

RunComfy generates signed upload URLs based on the byte size you provide. If the size is incorrect (larger or smaller than the real file), the upload may be rejected by the storage service and fail.

```json theme={null}
{
  "filenameToByteSize": {
    "img_0001.jpg": 2000000,
    "img_0001.txt": 12000,
    "img_0002.jpg": 3100000,
    "img_0002.txt": 14000
  }
}
```

### Request example

```bash theme={null}
curl --request POST \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets/{dataset_id}/get-upload-endpoint" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <token>" \
  --data '{
    "filenameToByteSize": {
      "img_0001.jpg": 2000000,
      "img_0001.txt": 12000,
      "img_0002.jpg": 3100000,
      "img_0002.txt": 14000
    }
  }'
```

### Response example

```json theme={null}
{
  "uploads": {
    "img_0001.jpg": {
      "upload_url": "https://storage.example.com/presigned/datasets/ds_123/img_0001.jpg?X-Amz-Signature=...",
      "method": "PUT",
      "headers": {
        "Content-Type": "image/jpeg"
      },
      "expires_at": "2026-01-31T10:40:30Z"
    },
    "img_0001.txt": {
      "upload_url": "https://storage.example.com/presigned/datasets/ds_123/img_0001.txt?X-Amz-Signature=...",
      "method": "PUT",
      "headers": {
        "Content-Type": "text/plain"
      },
      "expires_at": "2026-01-31T10:40:30Z"
    },
    "img_0002.jpg": {
      "upload_url": "https://storage.example.com/presigned/datasets/ds_123/img_0002.jpg?X-Amz-Signature=...",
      "method": "PUT",
      "headers": {
        "Content-Type": "image/jpeg"
      },
      "expires_at": "2026-01-31T10:40:30Z"
    },
    "img_0002.txt": {
      "upload_url": "https://storage.example.com/presigned/datasets/ds_123/img_0002.txt?X-Amz-Signature=...",
      "method": "PUT",
      "headers": {
        "Content-Type": "text/plain"
      },
      "expires_at": "2026-01-31T10:40:30Z"
    }
  }
}
```

### Upload bytes to the signed URL

Use the `method` and `headers` returned in the response.

```bash theme={null}
curl -X PUT \
  --upload-file "./img_0001.jpg" \
  -H "Content-Type: image/jpeg" \
  "<upload_url>"
```

#### Note:

* If a signed URL expires, call `get-upload-endpoint` again to get a fresh URL.
* **Track upload success per file**: your client should record whether each `PUT` succeeded. A successful `PUT` typically returns HTTP **200** or **204**. If a `PUT` fails, the response returns an error and the file is **not** added to the dataset.
* After all files have uploaded successfully, poll `GET /prod/v1/trainers/datasets/{dataset_id}/status` until `READY`.

***

## Get a dataset status

After you finish uploading your dataset (direct upload or signed URLs), poll this endpoint until the dataset becomes `READY`. If it becomes `FAILED`, check the `error` field, fix the issue, and re-upload (or create a new dataset).

The response includes a `files` array so you can see which files are currently available in the dataset. **Only successfully uploaded files appear in `files`**—files that are still uploading or that failed to upload are not listed.

```
GET /prod/v1/trainers/datasets/{dataset_id}/status
```

### Request example

```bash theme={null}
curl --request GET \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets/{dataset_id}/status" \
  --header "Authorization: Bearer <token>"
```

### Response example

```json theme={null}
{
  "id": "{dataset_id}",
  "name": "{dataset_name}",
  "status": "READY",
  "files": [
    {
      "filename": "img_0001.png",
      "size_bytes": 215290
    },
    {
      "filename": "img_0001.txt",
      "size_bytes": 24
    }
  ],
  "created_at": "2026-01-31T10:20:30.143086",
  "updated_at": "2026-01-31T10:41:02.143086"
}
```

***

## List datasets

List all datasets in your account, including their current `status`. Use this to find the dataset `name` and `id` you’ll reference in training requests.

```
GET /prod/v1/trainers/datasets
```

### Request example

```bash theme={null}
curl --request GET \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets" \
  --header "Authorization: Bearer <token>"
```

### Response example

```json theme={null}
{
  "datasets": [
    {
      "id": "{dataset_id}",
      "name": "{dataset_name}",
      "status": "DRAFT",
      "created_at": "2026-01-31T10:20:30.143086",
      "updated_at": "2026-01-31T10:20:30.143086"
    },
    {
      "id": "{dataset_id}",
      "name": "{dataset_name}",
      "status": "READY",
      "created_at": "2026-01-31T10:20:30.143086",
      "updated_at": "2026-01-31T10:20:30.143086"
    }
  ]
}
```

***

## Delete a dataset

Permanently delete a dataset by `dataset_id`. This is irreversible—only delete datasets you no longer need for training.

```
DELETE /prod/v1/trainers/datasets/{dataset_id}
```

### Request example

```bash theme={null}
curl --request DELETE \
  --url "https://trainer-api.runcomfy.net/prod/v1/trainers/datasets/{dataset_id}" \
  --header "Authorization: Bearer <token>"
```

### Response example

```json theme={null}
{
  "id": "{dataset_id}",
  "name": "{dataset_name}",
  "deleted": true
}
```
