---
title: Training
description: Low-level primitives for spawning, polling, and cancelling training runs.
section: API Reference
order: 8
---
The training resource manages the lifecycle of a single run against a pre-built export. For the end-to-end "give me an ONNX model from this dataset" call, use [`train_pipeline`](/docs/workflows/train.md) instead.

```python
from pictograph import Client
client = Client()
```

## Pipelines

| `pipeline_type` | Output |
| --- | --- |
| `yolox` | Object detection (boxes) |
| `detectron2` | Instance segmentation (polygons + masks) |
| `sm_pytorch` | Semantic segmentation |
| `classification` | Image classification |
| `rfdetr_detection` | Object detection (RT-DETR) |
| `rfdetr_segmentation` | Instance segmentation (RT-DETR) |

## GPU tiers

| `gpu_type` | Approx. cost | Pick for |
| --- | --- | --- |
| `a10g` (default) | ~$0.30/hr | YOLOX, classification, RF-DETR-detection |
| `a100` | ~$2/hr | Detectron2, large RF-DETR, big batches |
| `h100` | ~$4/hr | Last resort — only when A100 OOMs |

## create

Spawn a run against an existing export.

```python
run = client.training.create(
    dataset_name="road-signs",
    export_name="road-signs-20260512-120000",
    pipeline_type="yolox",
    name="yolox-run-1",
    config={"epochs": 50},
    gpu_type="a10g",
    wait=True,
    poll_interval=5.0,
    timeout=7200.0,
)
```

| Arg | Type | Default | Notes |
| --- | --- | --- | --- |
| `dataset_name` | `str` | required | Source project |
| `export_name` | `str` | required | Pre-built export |
| `pipeline_type` | `PipelineType` | required | See table above |
| `name` | `str \| None` | auto | Defaults to `<pipeline>-run-<ts>` |
| `config` | `dict` | `{}` | `epochs`, `batch_size`, `learning_rate`, `image_size` |
| `gpu_type` | `GpuType` | `"a10g"` | |
| `wait` | `bool` | `True` | When `False`, returns immediately with `status="queued"` |
| `poll_interval` | `float` | `5.0` | Seconds between polls |
| `timeout` | `float` | `7200.0` | Max poll seconds (2 hours) |

Returns `TrainingRun`.

## list / iter

```python
runs = client.training.list(limit=20, status="running")
for run in client.training.iter(page_size=50):
    print(run.id, run.status, run.progress)
```

## get

```python
run = client.training.get("run-uuid")
print(run.status, run.progress, run.current_epoch, "/", run.total_epochs)
```

`status` is one of `{"pending", "queued", "running", "completed", "failed", "cancelled"}`.

## cancel

```python
client.training.cancel("run-uuid")  # stops the worker, refunds remaining minutes
```

## wait_for_completion

If you created with `wait=False`, you can block later:

```python
run = client.training.wait_for_completion("run-uuid", timeout=3600.0)
if run.status == "completed":
    model = client.models.get(run.model_id)
```

## Minimum dataset size

Training requires **at least 5 images** matching the export's `status_filter` so the worker can split into train / val / test. Below that, training fails with a validation error.

```python
ds = client.datasets.get("my-dataset")
assert ds.completed_image_count >= 5
```

## Cost estimation

```python
estimate = client.credits.estimate("training_a10g_per_minute", quantity=30)
if not estimate.sufficient:
    raise RuntimeError(f"Need {estimate.total_credits}, have {estimate.credits_remaining}")
```

Refunds for cancelled or under-budget runs appear automatically as positive ledger entries (`training_refund_<gpu>`).

## Errors

| Status | Exception | Cause |
| --- | --- | --- |
| 402 | `PaymentRequiredError` | Insufficient credits |
| 404 | `NotFoundError` | Dataset or export missing |
| 422 | `ValidationError` | Pipeline / GPU invalid, dataset too small |
| 408 | `PollTimeoutError` | `wait=True` exceeded `timeout` (run keeps going) |

## See also

- [`train_pipeline`](/docs/workflows/train.md) — end-to-end workflow (recommended starting point)
- [Models](/docs/api-reference/models.md) — download trained ONNX weights
- [Credits](/docs/api-reference/credits.md) — `estimate("training_<gpu>_per_minute")`