Search — StaticOwl

What ships today

BM25 fulltext index

The engine builds a BM25 fulltext index over every text field on every indexed content type. Standard term frequency / inverse document frequency ranking with the Robertson-Spärck Jones probabilistic relevance model. Stopword-aware, stem-aware, case-folded. Tokenization is unicode-correct.

Vector search — flat + HNSW

Two vector search modes, both shipped:

Flat vector for index sizes up to ~50k vectors: exact nearest-neighbor over a contiguous Float32 array. Predictable, audit-friendly, no approximation.
HNSW (Hierarchical Navigable Small World) for index sizes 50k+: approximate nearest-neighbor with tunable ef + M parameters. Sublinear query time. Build cost amortized across compactions.

Auto-rebuild on compaction with lineage stamping

Every time the engine compacts (publishes a new snapshot), affected search indexes are rebuilt automatically. Each rebuild stamps the SHA-256 of its inputs into derived/_lineage.json, so you can prove the index is coherent with the snapshot it was built from. If the lineage doesn't match, queries refuse to execute against a stale index — fail loud, not silently wrong.

Three patterns for getting search onto your static site

Pattern A — Public search route (lowest effort)

Hit POST https://app.staticowl.com/api/public/search with { site, q, type, limit } from your static page. CORS-friendly, rate-limited per IP, scoped to publicly-readable content types you've explicitly marked as searchable. Returns ranked results with snippet highlighting in single-digit milliseconds for typical queries.

// In your static page's search.js
const r = await fetch('https://app.staticowl.com/api/public/search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    site: 'site:your-site',
    q: 'how to deploy lambda functions',
    type: 'lesson',
    limit: 20,
  }),
});
const { results } = await r.json();
// results: [{ id, title, snippet, score, url }]

Pattern B — Authenticated query API (full control)

Use an API key from a server-side process and call the engine's search procedures directly via Cypher:

CALL db.search.fulltext('lesson', 'how to deploy lambda functions')
YIELD node, score
RETURN node.title AS title, node.slug AS slug, score
ORDER BY score DESC
LIMIT 20;

Full Cypher means you can join search results to related entities, filter by arbitrary properties, and shape the response however you need. Use this pattern when search is part of a server-side render or a custom API.

Pattern C — Build-time JSON export (zero runtime)

If you want client-side search with no runtime call to us at all, the build pipeline can emit a search-index.json sidecar that ships to your bucket alongside the HTML. Load it with Fuse.js or Lunr.js or any client-side fuzzy-search library. Index size scales with content; consider this pattern for sites under ~5k pages.

Search at the scale you're worried about

If you have 22k records, this is the part that matters:

Fulltext BM25 over 500K nodes — single-digit milliseconds per query on the production benchmark. Index size ~50–100MB on disk.
Flat vector over 50k vectors — single-digit milliseconds. RAM usage roughly 50k × dims × 4 bytes.
HNSW vector over 500k+ vectors — tens of milliseconds at ef=64. Recall stays above 95% at default parameters.
Index rebuild — amortized into compaction. Adding or editing one content record doesn't trigger a full rebuild; sparse updates patch the index in place.

22,567 content records is roughly the small end of where these numbers were measured. We'd happily run a benchmark on your exact corpus before you committed — email founders@staticowl.com with a sample.

The honest qualifier

Pattern A (the public search route) is the path most people want and the path the marketing front page promises. It's shipping now in the engineering work that accompanies this page — if it's not live yet when you're reading this, Patterns B and C are working today. Email us if you need Pattern A in an SLA-bound way; we'll bump the schedule.

Vector search is shipped end-to-end (Sprint 2026-05, 58 regression tests in test_search_v1.js). HNSW parameters (ef, M, distance metric) are configurable per-index but the defaults are tuned for marketing/blog/docs corpora. If you need cosine vs. euclidean or have a specific recall target, that's a per-index configuration, not a custom build.

Search is a query, not a CMS feature