Auto-annotate a dataset
Run SAM3 across every unlabelled image in a dataset and save the results.
auto_annotate_dataset() runs SAM3 over a dataset and saves the resulting annotations. By default it runs in batch mode — one async job over many images — which is the right call for anything above ~10 images. Text mode (one synchronous prompt per image) is available for debugging.
from pictograph import Client
from pictograph.workflows import auto_annotate_dataset
client = Client()
report = auto_annotate_dataset(
client,
dataset_name="road-signs",
classes=[("stop_sign", "bbox"), ("yield", "polygon")],
)
print(f"{report.annotations_added} annotations across {report.images_processed} images")
Signature
auto_annotate_dataset(
client: Client,
dataset_name: str,
classes: Sequence[BatchClass | tuple[str, str] | dict[str, str]],
*,
mode: AnnotateMode = "batch",
confidence_threshold: float = 0.5,
overwrite: bool = False,
max_images: int | None = None,
poll_interval: float = 5.0,
timeout: float = 1800.0,
) -> AnnotateReport
| Argument | Default | Purpose |
|---|---|---|
dataset_name | required | Project name |
classes | required | What to detect — see “Class specs” below |
mode | "batch" | batch (async multi-image) or text (synchronous per-image) |
confidence_threshold | 0.5 | SAM3 score cutoff (0–1) |
overwrite | False | When False, skip images that already have annotations |
max_images | None | Cap (useful for dry-runs) |
poll_interval | 5.0 | batch mode — seconds between status polls |
timeout | 1800 | batch mode — max seconds to wait |
Class specs
classes accepts three shapes — pick whichever is shortest:
# 1. Tuples — name + output_type
classes=[("stop_sign", "bbox"), ("yield", "polygon")]
# 2. Dicts
classes=[
{"name": "stop_sign", "output_type": "bbox"},
{"name": "yield", "output_type": "polygon"},
]
# 3. BatchClass (canonical)
from pictograph.models.auto_annotate import BatchClass
classes=[BatchClass(name="stop_sign", output_type="bbox")]
Valid output_type values: "bbox", "polygon", "polyline", "keypoint".
Batch vs text mode
mode="batch" (default) sends every image and every class to one async SAM3 job. The job is polled until it terminates; you get one report at the end. This is what you want for >10 images — it’s faster and cheaper per image.
mode="text" runs one synchronous SAM3 text-prompt per image, per class. It’s slower (no batching) and saves annotations as they come back. Use it when you need to debug a single image or when the dataset is small enough that the batch warmup overhead isn’t worth it.
Skip vs overwrite
By default the workflow skips images that already have at least one annotation. Set overwrite=True to re-annotate everything:
# Annotate only the unlabelled subset.
auto_annotate_dataset(client, "road-signs", classes=[...])
# Re-annotate every image (overwrites existing).
auto_annotate_dataset(client, "road-signs", classes=[...], overwrite=True)
Inspecting the report
@dataclass
class AnnotateReport:
dataset_name: str
images_attempted: int
images_processed: int
images_skipped: int
annotations_added: int
failures: list[AnnotationFailure]
job_id: str | None # set only when mode="batch"
@property
def success(self) -> bool: ...
In batch mode, job_id lets you fetch the job later via client.auto_annotate.get_batch(job_id) (e.g. to surface progress in a UI) or cancel it.
Errors
| Status | Exception | Cause |
|---|---|---|
| 404 | NotFoundError | Dataset doesn’t exist |
| 402 | PaymentRequiredError | Insufficient credits |
| 422 | ValidationError | Class name invalid or output_type not recognised |
Per-image failures are recorded in report.failures — they don’t raise.
See also
- Full pipeline — chains annotate with upload + train
- Auto-annotate — point / box / text / batch primitives
- Credits — cost estimation per image