HNSW Tuning
Added in vectlite 0.10.0.
vectlite stores its approximate nearest-neighbour graph using HNSW (Hierarchical Navigable Small World). The defaults are good for most workloads, but four knobs let you trade recall, latency, and ingest throughput.
| Knob | What it controls | When to raise it | When to lower it |
|---|---|---|---|
m | Out-degree per node. | Higher recall on hard datasets. | Smaller graph / lower memory. |
ef_construction | Build-time search width. | Higher recall (slower ingest). | Faster ingest (lower recall). |
ef_search | Query-time search width. | Higher recall (slower query). | Faster query (lower recall). |
parallel_insert_threshold | Dataset size at which Rayon-backed parallel HNSW insertion kicks in during bulk_ingest. Default 256. | Raise it if you only ingest small batches and want determinism. | Lower it to parallelise smaller batches. |
ef_search is the first knob to try. Changing it is free — it only affects the query path, no rebuild needed.
Tuning an existing database
import vectlite
with vectlite.open("knowledge.vdb") as db:
db.set_ef_search(128) # higher recall, slightly slower query
# or set multiple knobs at once
db.set_index_config(m=32, ef_construction=200, ef_search=128)
print(db.index_config())
# {"m": 32, "ef_construction": 200, "ef_search": 128,
# "parallel_insert_threshold": 256, "tombstone_rebuild_pct": 30}
const { open } = require('vectlite')
const db = open('knowledge.vdb')
db.setEfSearch(128)
db.setIndexConfig({ m: 32, efConstruction: 200, efSearch: 128 })
console.log(db.indexConfig())
// { m: 32, efConstruction: 200, efSearch: 128,
// parallelInsertThreshold: 256, tombstoneRebuildPct: 30 }
Changing m or ef_construction triggers a full HNSW rebuild on the next operation. Changing only ef_search (or parallel_insert_threshold / tombstone_rebuild_pct) is free.
Tuning at ingest time
bulk_ingest accepts the same knobs, applied for the duration of the call.
db.bulk_ingest(
records,
batch_size=10_000,
m=32,
ef_construction=200,
ef_search=128,
parallel_insert_threshold=512,
)
db.bulkIngest(records, {
batchSize: 10000,
m: 32,
efConstruction: 200,
efSearch: 128,
parallelInsertThreshold: 512,
})
Async variants (bulkIngestAsync in Node) accept the same options.
Benchmark reference
Synthetic 5,000 × 384 cosine benchmark on M-class macOS (from the upstream CHANGELOG):
| Config | Ingest throughput |
|---|---|
| Pre-0.10 baseline | ~47 vec/s |
| 0.10.0 defaults | ~917 vec/s |
0.10.0 fast preset | ~1782 vec/s |
The default settings already capture most of the gain. The fast preset trades a few recall points for the highest throughput; the high_recall preset goes the other way.
What the presets do
The Rust core exposes two named presets that the bindings mirror through set_index_config:
| Preset | m | ef_construction | ef_search | Use when |
|---|---|---|---|---|
fast | small | small | small | Ingest-heavy workloads, recall ≥ 0.9 is acceptable. |
high_recall | large | large | large | Recall-critical workloads, query latency budget is generous. |
How to choose
- Start at defaults. Measure recall against an exact (brute-force) baseline on a held-out query set.
- If recall is too low, raise
ef_searchfirst. - If recall is still too low at high
ef_search, raisemand/oref_construction— these trigger a rebuild but offer the largest gains. - If ingest is too slow, lower
ef_construction, raiseparallel_insert_thresholdif you want the parallel path to kick in earlier, and look at the throughput tuning guide for the WAL-side knobs.
See also
- Throughput tuning — WAL sync mode, tombstoning, vector arena.
- Diagnostics — recall and latency measurement helpers.
- Distance metrics — pick the right metric before tuning recall.