Upload from folder

Walk a local folder and upload every image to a dataset, parallel and idempotent.

upload_dataset_from_folder() walks a local directory of images, creates the destination dataset if needed, and uploads everything through a thread pool. Subdirectories become virtual folders on the dataset by default. Re-runs are idempotent — duplicate filenames are skipped, not failed.

from pictograph import Client
from pictograph.workflows import upload_dataset_from_folder

client = Client()

report = upload_dataset_from_folder(
    client,
    dataset_name="road-signs",
    folder="./road_signs",
)
print(f"{report.images_uploaded} uploaded, {report.images_skipped} skipped")

Signature

upload_dataset_from_folder(
    client: Client,
    dataset_name: str,
    folder: str | Path,
    *,
    organize_by_class: bool = True,
    parallel: bool = True,
    max_workers: int = 8,
    skip_existing: bool = True,
    create_if_missing: bool = True,
    progress: Callable[[int, int, str | None], None] | None = None,
) -> UploadReport

Argument	Default	Purpose
`dataset_name`	required	Destination dataset
`folder`	required	Local directory (walked recursively)
`organize_by_class`	`True`	First-level subdirectories become virtual folders
`parallel`	`True`	Use a thread pool
`max_workers`	`8`	Pool size — higher values risk hitting the rate limit
`skip_existing`	`True`	Treat duplicate-filename conflicts as skips, not failures
`create_if_missing`	`True`	Create the dataset if it doesn’t exist (else `NotFoundError`)
`progress`	`None`	`(completed, total, filename)` callback fired after each file

Folder layout convention

With organize_by_class=True (the default), the first-level subdirectory becomes the virtual folder:

./road_signs/
├── stop/         → /stop on the dataset
│   ├── 001.jpg
│   └── 002.jpg
├── yield/        → /yield
│   └── 003.jpg
└── 004.jpg       → / (root)

Nested subdirectories collapse — ./road_signs/stop/night/005.jpg still lands in /stop. Pass organize_by_class=False to put every file at the root.

Supported extensions: .jpg, .jpeg, .png, .webp, .bmp, .tif, .tiff, .gif, .heic.

Idempotency

Re-running the same call on a dataset that already has matching filenames is safe — those uploads come back as images_skipped. To force re-upload, set skip_existing=False (failures will be recorded instead).

# First run — uploads everything.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 100 and report.images_skipped == 0

# Second run — skips everything that's already there.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 0 and report.images_skipped == 100

Progress callback

def on_progress(done: int, total: int, filename: str | None) -> None:
    print(f"[{done}/{total}] {filename}")

upload_dataset_from_folder(
    client, "road-signs", "./road_signs", progress=on_progress,
)

The callback fires once per file, regardless of success or failure.

Inspecting the report

@dataclass
class UploadReport:
    dataset_name: str
    images_attempted: int
    images_uploaded: int
    images_skipped: int
    failures: list[UploadFailure]  # each carries .path and .reason

    @property
    def success(self) -> bool: ...

success is True only when there are zero failures and at least one file uploaded. An empty folder returns a report with success=False.

Errors

Status	Exception	Cause
`FileNotFoundError`	—	`folder` doesn’t exist or isn’t a directory
404	`NotFoundError`	`dataset_name` missing and `create_if_missing=False`

Per-file errors (network, validation, conflict) are recorded in report.failures, not raised.