Auto-annotate a dataset

Run SAM3 across every unlabelled image in a dataset and save the results.

auto_annotate_dataset() runs SAM3 over a dataset and saves the resulting annotations. By default it runs in batch mode — one async job over many images — which is the right call for anything above ~10 images. Text mode (one synchronous prompt per image) is available for debugging.

from pictograph import Client
from pictograph.workflows import auto_annotate_dataset

client = Client()

report = auto_annotate_dataset(
    client,
    dataset_name="road-signs",
    classes=[("stop_sign", "bbox"), ("yield", "polygon")],
)
print(f"{report.annotations_added} annotations across {report.images_processed} images")

Signature

auto_annotate_dataset(
    client: Client,
    dataset_name: str,
    classes: Sequence[BatchClass | tuple[str, str] | dict[str, str]],
    *,
    mode: AnnotateMode = "batch",
    confidence_threshold: float = 0.5,
    overwrite: bool = False,
    max_images: int | None = None,
    poll_interval: float = 5.0,
    timeout: float = 1800.0,
) -> AnnotateReport

Argument	Default	Purpose
`dataset_name`	required	Project name
`classes`	required	What to detect — see “Class specs” below
`mode`	`"batch"`	`batch` (async multi-image) or `text` (synchronous per-image)
`confidence_threshold`	`0.5`	SAM3 score cutoff (0–1)
`overwrite`	`False`	When `False`, skip images that already have annotations
`max_images`	`None`	Cap (useful for dry-runs)
`poll_interval`	`5.0`	`batch` mode — seconds between status polls
`timeout`	`1800`	`batch` mode — max seconds to wait

Class specs

classes accepts three shapes — pick whichever is shortest:

# 1. Tuples — name + output_type
classes=[("stop_sign", "bbox"), ("yield", "polygon")]

# 2. Dicts
classes=[
    {"name": "stop_sign", "output_type": "bbox"},
    {"name": "yield", "output_type": "polygon"},
]

# 3. BatchClass (canonical)
from pictograph.models.auto_annotate import BatchClass
classes=[BatchClass(name="stop_sign", output_type="bbox")]

Valid output_type values: "bbox", "polygon", "polyline", "keypoint".

Batch vs text mode

mode="batch" (default) sends every image and every class to one async SAM3 job. The job is polled until it terminates; you get one report at the end. This is what you want for >10 images — it’s faster and cheaper per image.

mode="text" runs one synchronous SAM3 text-prompt per image, per class. It’s slower (no batching) and saves annotations as they come back. Use it when you need to debug a single image or when the dataset is small enough that the batch warmup overhead isn’t worth it.

Skip vs overwrite

By default the workflow skips images that already have at least one annotation. Set overwrite=True to re-annotate everything:

# Annotate only the unlabelled subset.
auto_annotate_dataset(client, "road-signs", classes=[...])

# Re-annotate every image (overwrites existing).
auto_annotate_dataset(client, "road-signs", classes=[...], overwrite=True)

Inspecting the report

@dataclass
class AnnotateReport:
    dataset_name: str
    images_attempted: int
    images_processed: int
    images_skipped: int
    annotations_added: int
    failures: list[AnnotationFailure]
    job_id: str | None         # set only when mode="batch"

    @property
    def success(self) -> bool: ...

In batch mode, job_id lets you fetch the job later via client.auto_annotate.get_batch(job_id) (e.g. to surface progress in a UI) or cancel it.

Errors

Status	Exception	Cause
404	`NotFoundError`	Dataset doesn’t exist
402	`PaymentRequiredError`	Insufficient credits
422	`ValidationError`	Class name invalid or `output_type` not recognised

Per-image failures are recorded in report.failures — they don’t raise.