Sign in Get started

Auto-annotate a dataset

Run SAM3 across every unlabelled image in a dataset and save the results.

View as Markdown

auto_annotate_dataset() runs SAM3 over a dataset and saves the resulting annotations. By default it runs in batch mode — one async job over many images — which is the right call for anything above ~10 images. Text mode (one synchronous prompt per image) is available for debugging.

from pictograph import Client
from pictograph.workflows import auto_annotate_dataset

client = Client()

report = auto_annotate_dataset(
    client,
    dataset_name="road-signs",
    classes=[("stop_sign", "bbox"), ("yield", "polygon")],
)
print(f"{report.annotations_added} annotations across {report.images_processed} images")

Signature

auto_annotate_dataset(
    client: Client,
    dataset_name: str,
    classes: Sequence[BatchClass | tuple[str, str] | dict[str, str]],
    *,
    mode: AnnotateMode = "batch",
    confidence_threshold: float = 0.5,
    overwrite: bool = False,
    max_images: int | None = None,
    poll_interval: float = 5.0,
    timeout: float = 1800.0,
) -> AnnotateReport
ArgumentDefaultPurpose
dataset_namerequiredProject name
classesrequiredWhat to detect — see “Class specs” below
mode"batch"batch (async multi-image) or text (synchronous per-image)
confidence_threshold0.5SAM3 score cutoff (0–1)
overwriteFalseWhen False, skip images that already have annotations
max_imagesNoneCap (useful for dry-runs)
poll_interval5.0batch mode — seconds between status polls
timeout1800batch mode — max seconds to wait

Class specs

classes accepts three shapes — pick whichever is shortest:

# 1. Tuples — name + output_type
classes=[("stop_sign", "bbox"), ("yield", "polygon")]

# 2. Dicts
classes=[
    {"name": "stop_sign", "output_type": "bbox"},
    {"name": "yield", "output_type": "polygon"},
]

# 3. BatchClass (canonical)
from pictograph.models.auto_annotate import BatchClass
classes=[BatchClass(name="stop_sign", output_type="bbox")]

Valid output_type values: "bbox", "polygon", "polyline", "keypoint".

Batch vs text mode

mode="batch" (default) sends every image and every class to one async SAM3 job. The job is polled until it terminates; you get one report at the end. This is what you want for >10 images — it’s faster and cheaper per image.

mode="text" runs one synchronous SAM3 text-prompt per image, per class. It’s slower (no batching) and saves annotations as they come back. Use it when you need to debug a single image or when the dataset is small enough that the batch warmup overhead isn’t worth it.

Skip vs overwrite

By default the workflow skips images that already have at least one annotation. Set overwrite=True to re-annotate everything:

# Annotate only the unlabelled subset.
auto_annotate_dataset(client, "road-signs", classes=[...])

# Re-annotate every image (overwrites existing).
auto_annotate_dataset(client, "road-signs", classes=[...], overwrite=True)

Inspecting the report

@dataclass
class AnnotateReport:
    dataset_name: str
    images_attempted: int
    images_processed: int
    images_skipped: int
    annotations_added: int
    failures: list[AnnotationFailure]
    job_id: str | None         # set only when mode="batch"

    @property
    def success(self) -> bool: ...

In batch mode, job_id lets you fetch the job later via client.auto_annotate.get_batch(job_id) (e.g. to surface progress in a UI) or cancel it.

Errors

StatusExceptionCause
404NotFoundErrorDataset doesn’t exist
402PaymentRequiredErrorInsufficient credits
422ValidationErrorClass name invalid or output_type not recognised

Per-image failures are recorded in report.failures — they don’t raise.

See also

Copied to clipboard