---
title: Upload from folder
description: Walk a local folder and upload every image to a dataset, parallel and idempotent.
section: Workflows
order: 2
---
`upload_dataset_from_folder()` walks a local directory of images, creates the destination dataset if needed, and uploads everything through a thread pool. Subdirectories become virtual folders on the dataset by default. Re-runs are idempotent — duplicate filenames are skipped, not failed.

```python
from pictograph import Client
from pictograph.workflows import upload_dataset_from_folder

client = Client()

report = upload_dataset_from_folder(
    client,
    dataset_name="road-signs",
    folder="./road_signs",
)
print(f"{report.images_uploaded} uploaded, {report.images_skipped} skipped")
```

## Signature

```python
upload_dataset_from_folder(
    client: Client,
    dataset_name: str,
    folder: str | Path,
    *,
    organize_by_class: bool = True,
    parallel: bool = True,
    max_workers: int = 8,
    skip_existing: bool = True,
    create_if_missing: bool = True,
    progress: Callable[[int, int, str | None], None] | None = None,
) -> UploadReport
```

| Argument | Default | Purpose |
| --- | --- | --- |
| `dataset_name` | required | Destination dataset |
| `folder` | required | Local directory (walked recursively) |
| `organize_by_class` | `True` | First-level subdirectories become virtual folders |
| `parallel` | `True` | Use a thread pool |
| `max_workers` | `8` | Pool size — higher values risk hitting the rate limit |
| `skip_existing` | `True` | Treat duplicate-filename conflicts as skips, not failures |
| `create_if_missing` | `True` | Create the dataset if it doesn't exist (else `NotFoundError`) |
| `progress` | `None` | `(completed, total, filename)` callback fired after each file |

## Folder layout convention

With `organize_by_class=True` (the default), the **first-level subdirectory** becomes the virtual folder:

```
./road_signs/
├── stop/         → /stop on the dataset
│   ├── 001.jpg
│   └── 002.jpg
├── yield/        → /yield
│   └── 003.jpg
└── 004.jpg       → / (root)
```

Nested subdirectories collapse — `./road_signs/stop/night/005.jpg` still lands in `/stop`. Pass `organize_by_class=False` to put every file at the root.

Supported extensions: `.jpg`, `.jpeg`, `.png`, `.webp`, `.bmp`, `.tif`, `.tiff`, `.gif`, `.heic`.

## Idempotency

Re-running the same call on a dataset that already has matching filenames is safe — those uploads come back as `images_skipped`. To force re-upload, set `skip_existing=False` (failures will be recorded instead).

```python
# First run — uploads everything.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 100 and report.images_skipped == 0

# Second run — skips everything that's already there.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 0 and report.images_skipped == 100
```

## Progress callback

```python
def on_progress(done: int, total: int, filename: str | None) -> None:
    print(f"[{done}/{total}] {filename}")

upload_dataset_from_folder(
    client, "road-signs", "./road_signs", progress=on_progress,
)
```

The callback fires once per file, regardless of success or failure.

## Inspecting the report

```python
@dataclass
class UploadReport:
    dataset_name: str
    images_attempted: int
    images_uploaded: int
    images_skipped: int
    failures: list[UploadFailure]  # each carries .path and .reason

    @property
    def success(self) -> bool: ...
```

`success` is `True` only when there are zero failures **and** at least one file uploaded. An empty folder returns a report with `success=False`.

## Errors

| Status | Exception | Cause |
| --- | --- | --- |
| `FileNotFoundError` | — | `folder` doesn't exist or isn't a directory |
| 404 | `NotFoundError` | `dataset_name` missing and `create_if_missing=False` |

Per-file errors (network, validation, conflict) are recorded in `report.failures`, not raised.

## See also

- [Full pipeline](/docs/workflows/full-pipeline.md) — chains upload with annotate + train
- [Images](/docs/api-reference/images.md) — the underlying `client.images.upload()` method