---
title: Auto-annotate a dataset
description: Run SAM3 across every unlabelled image in a dataset and save the results.
section: Workflows
order: 3
---
`auto_annotate_dataset()` runs SAM3 over a dataset and saves the resulting annotations. By default it runs in **batch mode** — one async job over many images — which is the right call for anything above ~10 images. Text mode (one synchronous prompt per image) is available for debugging.

```python
from pictograph import Client
from pictograph.workflows import auto_annotate_dataset

client = Client()

report = auto_annotate_dataset(
    client,
    dataset_name="road-signs",
    classes=[("stop_sign", "bbox"), ("yield", "polygon")],
)
print(f"{report.annotations_added} annotations across {report.images_processed} images")
```

## Signature

```python
auto_annotate_dataset(
    client: Client,
    dataset_name: str,
    classes: Sequence[BatchClass | tuple[str, str] | dict[str, str]],
    *,
    mode: AnnotateMode = "batch",
    confidence_threshold: float = 0.5,
    overwrite: bool = False,
    max_images: int | None = None,
    poll_interval: float = 5.0,
    timeout: float = 1800.0,
) -> AnnotateReport
```

| Argument | Default | Purpose |
| --- | --- | --- |
| `dataset_name` | required | Project name |
| `classes` | required | What to detect — see "Class specs" below |
| `mode` | `"batch"` | `batch` (async multi-image) or `text` (synchronous per-image) |
| `confidence_threshold` | `0.5` | SAM3 score cutoff (0–1) |
| `overwrite` | `False` | When `False`, skip images that already have annotations |
| `max_images` | `None` | Cap (useful for dry-runs) |
| `poll_interval` | `5.0` | `batch` mode — seconds between status polls |
| `timeout` | `1800` | `batch` mode — max seconds to wait |

## Class specs

`classes` accepts three shapes — pick whichever is shortest:

```python
# 1. Tuples — name + output_type
classes=[("stop_sign", "bbox"), ("yield", "polygon")]

# 2. Dicts
classes=[
    {"name": "stop_sign", "output_type": "bbox"},
    {"name": "yield", "output_type": "polygon"},
]

# 3. BatchClass (canonical)
from pictograph.models.auto_annotate import BatchClass
classes=[BatchClass(name="stop_sign", output_type="bbox")]
```

Valid `output_type` values: `"bbox"`, `"polygon"`, `"polyline"`, `"keypoint"`.

## Batch vs text mode

`mode="batch"` (default) sends every image and every class to one async SAM3 job. The job is polled until it terminates; you get one report at the end. This is what you want for >10 images — it's faster and cheaper per image.

`mode="text"` runs one synchronous SAM3 text-prompt per image, per class. It's slower (no batching) and saves annotations as they come back. Use it when you need to debug a single image or when the dataset is small enough that the batch warmup overhead isn't worth it.

## Skip vs overwrite

By default the workflow skips images that already have at least one annotation. Set `overwrite=True` to re-annotate everything:

```python
# Annotate only the unlabelled subset.
auto_annotate_dataset(client, "road-signs", classes=[...])

# Re-annotate every image (overwrites existing).
auto_annotate_dataset(client, "road-signs", classes=[...], overwrite=True)
```

## Inspecting the report

```python
@dataclass
class AnnotateReport:
    dataset_name: str
    images_attempted: int
    images_processed: int
    images_skipped: int
    annotations_added: int
    failures: list[AnnotationFailure]
    job_id: str | None         # set only when mode="batch"

    @property
    def success(self) -> bool: ...
```

In batch mode, `job_id` lets you fetch the job later via `client.auto_annotate.get_batch(job_id)` (e.g. to surface progress in a UI) or cancel it.

## Errors

| Status | Exception | Cause |
| --- | --- | --- |
| 404 | `NotFoundError` | Dataset doesn't exist |
| 402 | `PaymentRequiredError` | Insufficient credits |
| 422 | `ValidationError` | Class name invalid or `output_type` not recognised |

Per-image failures are recorded in `report.failures` — they don't raise.

## See also

- [Full pipeline](/docs/workflows/full-pipeline.md) — chains annotate with upload + train
- [Auto-annotate](/docs/api-reference/auto-annotate.md) — point / box / text / batch primitives
- [Credits](/docs/api-reference/credits.md) — cost estimation per image