---
title: Search
description: Find images by SigLIP2 cosine similarity to a reference, or by automatic content tags (objects / scenes / attributes).
section: API Reference
order: 11
---
Two search modes:

1. **Visual similarity** — `by_similarity()` — SigLIP2 (1152-dim)
   embeddings + pgvector HNSW index.
2. **Tag-based** — `by_tag()` — JSONB containment over the
   auto-classified `image_auto_tags` field (objects / scenes / attributes).

Both auto-tag and embedding pipelines run on every upload (zero API
cost; T4 GPU). No setup required.

```python
from pictograph import Client
client = Client()
```

## by_similarity

Find images visually similar to a reference image. Scope is the
reference image's dataset + folder unless overridden.

```python
results = client.search.by_similarity(
    image_id="img-uuid-1",
    threshold=0.6,                   # cosine similarity floor (0–1)
    limit=50,
    folder_path=None,                # None = inherit; "/" = whole dataset
)
for r in results:
    print(r.image_id, r.filename, f"{r.similarity:.3f}")
```

| Arg | Type | Default | Notes |
|---|---|---|---|
| `image_id` | `str` | required | UUID of the reference image |
| `threshold` | `float` | `0.6` | Minimum cosine similarity (`0.6` ≈ "visually related") |
| `limit` | `int` | `50` | Backend cap: 500 |
| `folder_path` | `str \| None` | `None` | Override folder scope |

Returns `list[SimilarImage]`, sorted by descending similarity. The
source image is excluded from results.

## by_tag

Find images with auto-tags matching the given filters. Pass at least
one of `objects` / `scenes` / `attributes` (an empty filter returns
nothing rather than everything — semantically clearer for agents).

```python
results = client.search.by_tag(
    objects=["car", "truck"],            # match ANY object tag
    scenes=["outdoor"],                  # match ANY scene tag
    attributes=["blurry"],               # match ANY attribute tag
    dataset_name="my-dataset",           # restrict scope; None = whole org
    limit=100,
)
for r in results:
    print(r.image_id, r.tags["objects"])
```

| Arg | Type | Default | Notes |
|---|---|---|---|
| `objects` | `Sequence[str] \| None` | `None` | At least one of objects/scenes/attributes required |
| `scenes` | `Sequence[str] \| None` | `None` | |
| `attributes` | `Sequence[str] \| None` | `None` | |
| `dataset_name` | `str \| None` | `None` | Org-wide search if `None` |
| `limit` | `int` | `50` | Backend cap: 500 |

Returns `list[TaggedImage]`. Within a category, tags are OR'd; across
categories they are AND'd:

- `objects=["car","truck"]` → "car OR truck"
- `objects=["car"], scenes=["outdoor"]` → "car AND outdoor"

## Auto-tag taxonomy

The SigLIP2 classifier picks from ~200 curated labels per category.
Common ones:

- **objects**: car, truck, person, bicycle, dog, sign, building, etc.
- **scenes**: outdoor, indoor, urban, rural, daytime, nighttime, etc.
- **attributes**: blurry, dark, bright, high-contrast, low-light, etc.

The full taxonomy ships with the SigLIP2 service prompts; tags not in
the curated list won't be assigned.

## Cost

Search is **free**. Embeddings + auto-tags are computed once per image
on upload (T4 GPU, zero API cost) and cached.

## Common errors

| Status | Exception | Cause |
|---|---|---|
| 404 | `NotFoundError` | `image_id` (similarity) or `dataset_name` (tag) missing |
| 422 | `ValidationError` | `by_tag` called with all three filters None |