Sign in Get started

Upload from folder

Walk a local folder and upload every image to a dataset, parallel and idempotent.

View as Markdown

upload_dataset_from_folder() walks a local directory of images, creates the destination dataset if needed, and uploads everything through a thread pool. Subdirectories become virtual folders on the dataset by default. Re-runs are idempotent — duplicate filenames are skipped, not failed.

from pictograph import Client
from pictograph.workflows import upload_dataset_from_folder

client = Client()

report = upload_dataset_from_folder(
    client,
    dataset_name="road-signs",
    folder="./road_signs",
)
print(f"{report.images_uploaded} uploaded, {report.images_skipped} skipped")

Signature

upload_dataset_from_folder(
    client: Client,
    dataset_name: str,
    folder: str | Path,
    *,
    organize_by_class: bool = True,
    parallel: bool = True,
    max_workers: int = 8,
    skip_existing: bool = True,
    create_if_missing: bool = True,
    progress: Callable[[int, int, str | None], None] | None = None,
) -> UploadReport
ArgumentDefaultPurpose
dataset_namerequiredDestination dataset
folderrequiredLocal directory (walked recursively)
organize_by_classTrueFirst-level subdirectories become virtual folders
parallelTrueUse a thread pool
max_workers8Pool size — higher values risk hitting the rate limit
skip_existingTrueTreat duplicate-filename conflicts as skips, not failures
create_if_missingTrueCreate the dataset if it doesn’t exist (else NotFoundError)
progressNone(completed, total, filename) callback fired after each file

Folder layout convention

With organize_by_class=True (the default), the first-level subdirectory becomes the virtual folder:

./road_signs/
├── stop/         → /stop on the dataset
│   ├── 001.jpg
│   └── 002.jpg
├── yield/        → /yield
│   └── 003.jpg
└── 004.jpg       → / (root)

Nested subdirectories collapse — ./road_signs/stop/night/005.jpg still lands in /stop. Pass organize_by_class=False to put every file at the root.

Supported extensions: .jpg, .jpeg, .png, .webp, .bmp, .tif, .tiff, .gif, .heic.

Idempotency

Re-running the same call on a dataset that already has matching filenames is safe — those uploads come back as images_skipped. To force re-upload, set skip_existing=False (failures will be recorded instead).

# First run — uploads everything.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 100 and report.images_skipped == 0

# Second run — skips everything that's already there.
report = upload_dataset_from_folder(client, "road-signs", "./road_signs")
assert report.images_uploaded == 0 and report.images_skipped == 100

Progress callback

def on_progress(done: int, total: int, filename: str | None) -> None:
    print(f"[{done}/{total}] {filename}")

upload_dataset_from_folder(
    client, "road-signs", "./road_signs", progress=on_progress,
)

The callback fires once per file, regardless of success or failure.

Inspecting the report

@dataclass
class UploadReport:
    dataset_name: str
    images_attempted: int
    images_uploaded: int
    images_skipped: int
    failures: list[UploadFailure]  # each carries .path and .reason

    @property
    def success(self) -> bool: ...

success is True only when there are zero failures and at least one file uploaded. An empty folder returns a report with success=False.

Errors

StatusExceptionCause
FileNotFoundErrorfolder doesn’t exist or isn’t a directory
404NotFoundErrordataset_name missing and create_if_missing=False

Per-file errors (network, validation, conflict) are recorded in report.failures, not raised.

See also

  • Full pipeline — chains upload with annotate + train
  • Images — the underlying client.images.upload() method
Copied to clipboard