Throughput Tuning
Added in vectlite 0.11.0.
vectlite 0.11.0 rewrites the single-record ingestion hot path. With the default settings you should see 10–50× higher throughput than 0.10 on a typical SSD, and another 5–10× on top of that when you relax the WAL sync_mode. The new knobs are exposed in Python and Node.
What changed in 0.11.0
- Incremental HNSW insertion —
insert/upsertno longer rebuild the full HNSW graph(s) per call. Per-record cost drops fromO(N log N)toO(log N). - Lazy ANN persistence — the HNSW sidecar files are dumped at
flush/compact/close, not on every insert. WAL still gives per-record durability. - Lazy quantized index rebuild — the in-memory PQ / scalar / binary codebook is dropped on the first post-build insert and rebuilt at the next flush. Searches in between transparently fall back to HNSW.
- Cached WAL writer — a single
BufWriter<File>is kept open for the session, eliminating the per-recordopen(2)syscall. - HNSW tombstoning —
deletemarks the record as dead instead of rebuilding the graph. The graph is rebuilt atcompact()or when the dead/live ratio crossestombstone_rebuild_pct. - Contiguous dense-vector arena — vectors are mirrored into a single
Vec<f32>arena for cache-friendly brute-force / rescoring scans.
The ANN manifest format bumped from ANN1 to ANN2 to persist per-graph insertion-order keys. Old ANN1 databases are still readable; the format upgrades on next write.
WAL sync mode
This is the highest-impact knob on macOS APFS and any filesystem where fsync is expensive. It trades a bounded amount of recently-acked data on crash for higher throughput.
| Mode | Behaviour | Durability | Throughput |
|---|---|---|---|
per_op (default) | fsync after every insert. | Strongest. | Lowest. |
every_n | fsync every N inserts. | Lose at most N records on crash. | 3–8× higher than per_op. |
on_flush | fsync only at flush / compact / close. | Lose all in-flight records on crash. | 5–10× higher than per_op. |
import vectlite
with vectlite.open("stream.vdb", dimension=384) as db:
db.set_wal_sync_mode("every_n", n=64)
for record in stream:
db.insert(record["id"], record["vector"], record["metadata"])
db.flush()
print(db.wal_sync_mode())
# {"mode": "every_n", "n": 64}
const { open } = require('vectlite')
const db = open('stream.vdb', { dimension: 384 })
db.setWalSyncMode('every_n', 64)
for (const record of stream) {
db.insert(record.id, record.vector, record.metadata)
}
db.flush()
console.log(db.walSyncMode())
// { mode: 'every_n', n: 64 }
Rule of thumb
- Long-running daemon:
every_nwithnbetween 32 and 256. - One-shot batch job:
on_flush, then calldb.flush()(or close the database) at the end. - Anything financial / regulatory: stay on
per_op.
HNSW tombstoning
delete no longer triggers a full HNSW rebuild. Each deleted record's origin_id is marked in a per-index tombstone set and silently skipped during search. The graph is rebuilt automatically:
- At
compact()time, or - Whenever the tombstone ratio crosses
tombstone_rebuild_pct(default30).
You can tune the threshold:
db.set_index_config(tombstone_rebuild_pct=50) # rebuild less aggressively
db.setIndexConfig({ tombstoneRebuildPct: 50 })
Inspect the current state:
live, dead = db.tombstone_stats()
print(f"{dead}/{live + dead} dead ({100 * dead / (live + dead):.1f}%)")
const { live, dead } = db.tombstoneStats()
console.log(`${dead}/${live + dead} dead (${(100 * dead / (live + dead)).toFixed(1)}%)`)
For delete-heavy workloads, lower tombstone_rebuild_pct (e.g. 15) so search latency stays steady. For mostly-append workloads, raise it (e.g. 50) so you pay the rebuild less often.
Vector arena
The contiguous dense-vector arena is built lazily on first use. For a heavy brute-force or rescoring workload (large fetch_k, MMR, reranking), materialise it up front to avoid the first-call latency spike.
db.prepare_for_scan()
print(db.vector_arena_len()) # number of vectors in the arena, or None
db.prepareForScan()
console.log(db.vectorArenaLen())
The arena is rebuilt lazily after a delete (deletes can't compact in place). Search-path integration is incremental: the arena is currently exposed for callers and used as the cache-friendly storage layer; wiring it into the default collect_results scan is in progress in a follow-up release.
Putting it together — streaming workload
import vectlite
with vectlite.open("stream.vdb", dimension=384) as db:
db.set_wal_sync_mode("every_n", n=128)
db.set_index_config(tombstone_rebuild_pct=20)
for record in incoming_records():
db.upsert(record["id"], record["vector"], record["metadata"])
db.flush() # forces an fsync and persists the ANN sidecar
db.compact() # garbage-collects tombstoned records
What didn't change
- Crash durability — the WAL is still per-record durable on
per_op(the default). Relax the sync mode only if your workload can tolerate a bounded data-loss window. - API surface — every existing method behaves the same. The new knobs are additive.
- Disk format — the
.vdbitself didn't change. Only the ANN sidecar manifest bumped toANN2, and oldANN1files are still readable.
See also
- HNSW tuning —
m,ef_construction,ef_search,parallel_insert_threshold. - Diagnostics — measuring recall after relaxing sync mode or raising tombstone thresholds.
- Changelog 0.11.0 — full release notes.