What ships today
BM25 fulltext index
The engine builds a BM25 fulltext index over every text field on every indexed content type. Standard term frequency / inverse document frequency ranking with the Robertson-Spärck Jones probabilistic relevance model. Stopword-aware, stem-aware, case-folded. Tokenization is unicode-correct.
Vector search — flat + HNSW
Two vector search modes, both shipped:
- Flat vector for index sizes up to ~50k vectors: exact nearest-neighbor over a contiguous Float32 array. Predictable, audit-friendly, no approximation.
- HNSW (Hierarchical Navigable Small World) for index sizes 50k+: approximate nearest-neighbor with tunable
ef+Mparameters. Sublinear query time. Build cost amortized across compactions.
Auto-rebuild on compaction with lineage stamping
Every time the engine compacts (publishes a new snapshot), affected search indexes are rebuilt automatically. Each rebuild stamps the SHA-256 of its inputs into derived/_lineage.json, so you can prove the index is coherent with the snapshot it was built from. If the lineage doesn't match, queries refuse to execute against a stale index — fail loud, not silently wrong.
Three patterns for getting search onto your static site
Pattern A — Public search route (lowest effort)
Hit POST https://app.staticowl.com/api/public/search with { site, q, type, limit } from your static page. CORS-friendly, rate-limited per IP, scoped to publicly-readable content types you've explicitly marked as searchable. Returns ranked results with snippet highlighting in single-digit milliseconds for typical queries.
// In your static page's search.js
const r = await fetch('https://app.staticowl.com/api/public/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
site: 'site:your-site',
q: 'how to deploy lambda functions',
type: 'lesson',
limit: 20,
}),
});
const { results } = await r.json();
// results: [{ id, title, snippet, score, url }]
Pattern B — Authenticated query API (full control)
Use an API key from a server-side process and call the engine's search procedures directly via Cypher:
CALL db.search.fulltext('lesson', 'how to deploy lambda functions')
YIELD node, score
RETURN node.title AS title, node.slug AS slug, score
ORDER BY score DESC
LIMIT 20;
Full Cypher means you can join search results to related entities, filter by arbitrary properties, and shape the response however you need. Use this pattern when search is part of a server-side render or a custom API.
Pattern C — Build-time JSON export (zero runtime)
If you want client-side search with no runtime call to us at all, the build pipeline can emit a search-index.json sidecar that ships to your bucket alongside the HTML. Load it with Fuse.js or Lunr.js or any client-side fuzzy-search library. Index size scales with content; consider this pattern for sites under ~5k pages.
Search at the scale you're worried about
If you have 22k records, this is the part that matters:
- Fulltext BM25 over 500K nodes — single-digit milliseconds per query on the production benchmark. Index size ~50–100MB on disk.
- Flat vector over 50k vectors — single-digit milliseconds. RAM usage roughly
50k × dims × 4 bytes. - HNSW vector over 500k+ vectors — tens of milliseconds at
ef=64. Recall stays above 95% at default parameters. - Index rebuild — amortized into compaction. Adding or editing one content record doesn't trigger a full rebuild; sparse updates patch the index in place.
22,567 content records is roughly the small end of where these numbers were measured. We'd happily run a benchmark on your exact corpus before you committed — email founders@staticowl.com with a sample.
The honest qualifier
Pattern A (the public search route) is the path most people want and the path the marketing front page promises. It's shipping now in the engineering work that accompanies this page — if it's not live yet when you're reading this, Patterns B and C are working today. Email us if you need Pattern A in an SLA-bound way; we'll bump the schedule.
Vector search is shipped end-to-end (Sprint 2026-05, 58 regression tests in test_search_v1.js). HNSW parameters (ef, M, distance metric) are configurable per-index but the defaults are tuned for marketing/blog/docs corpora. If you need cosine vs. euclidean or have a specific recall target, that's a per-index configuration, not a custom build.