Skip to main content

Throughput Tuning

Added in vectlite 0.11.0.

vectlite 0.11.0 rewrites the single-record ingestion hot path. With the default settings you should see 10–50× higher throughput than 0.10 on a typical SSD, and another 5–10× on top of that when you relax the WAL sync_mode. The new knobs are exposed in Python and Node.

What changed in 0.11.0

  • Incremental HNSW insertioninsert / upsert no longer rebuild the full HNSW graph(s) per call. Per-record cost drops from O(N log N) to O(log N).
  • Lazy ANN persistence — the HNSW sidecar files are dumped at flush / compact / close, not on every insert. WAL still gives per-record durability.
  • Lazy quantized index rebuild — the in-memory PQ / scalar / binary codebook is dropped on the first post-build insert and rebuilt at the next flush. Searches in between transparently fall back to HNSW.
  • Cached WAL writer — a single BufWriter<File> is kept open for the session, eliminating the per-record open(2) syscall.
  • HNSW tombstoningdelete marks the record as dead instead of rebuilding the graph. The graph is rebuilt at compact() or when the dead/live ratio crosses tombstone_rebuild_pct.
  • Contiguous dense-vector arena — vectors are mirrored into a single Vec<f32> arena for cache-friendly brute-force / rescoring scans.

The ANN manifest format bumped from ANN1 to ANN2 to persist per-graph insertion-order keys. Old ANN1 databases are still readable; the format upgrades on next write.

WAL sync mode

This is the highest-impact knob on macOS APFS and any filesystem where fsync is expensive. It trades a bounded amount of recently-acked data on crash for higher throughput.

ModeBehaviourDurabilityThroughput
per_op (default)fsync after every insert.Strongest.Lowest.
every_nfsync every N inserts.Lose at most N records on crash.3–8× higher than per_op.
on_flushfsync only at flush / compact / close.Lose all in-flight records on crash.5–10× higher than per_op.
import vectlite

with vectlite.open("stream.vdb", dimension=384) as db:
db.set_wal_sync_mode("every_n", n=64)
for record in stream:
db.insert(record["id"], record["vector"], record["metadata"])
db.flush()

print(db.wal_sync_mode())
# {"mode": "every_n", "n": 64}
const { open } = require('vectlite')

const db = open('stream.vdb', { dimension: 384 })

db.setWalSyncMode('every_n', 64)

for (const record of stream) {
db.insert(record.id, record.vector, record.metadata)
}
db.flush()

console.log(db.walSyncMode())
// { mode: 'every_n', n: 64 }

Rule of thumb

  • Long-running daemon: every_n with n between 32 and 256.
  • One-shot batch job: on_flush, then call db.flush() (or close the database) at the end.
  • Anything financial / regulatory: stay on per_op.

HNSW tombstoning

delete no longer triggers a full HNSW rebuild. Each deleted record's origin_id is marked in a per-index tombstone set and silently skipped during search. The graph is rebuilt automatically:

  • At compact() time, or
  • Whenever the tombstone ratio crosses tombstone_rebuild_pct (default 30).

You can tune the threshold:

db.set_index_config(tombstone_rebuild_pct=50)   # rebuild less aggressively
db.setIndexConfig({ tombstoneRebuildPct: 50 })

Inspect the current state:

live, dead = db.tombstone_stats()
print(f"{dead}/{live + dead} dead ({100 * dead / (live + dead):.1f}%)")
const { live, dead } = db.tombstoneStats()
console.log(`${dead}/${live + dead} dead (${(100 * dead / (live + dead)).toFixed(1)}%)`)

For delete-heavy workloads, lower tombstone_rebuild_pct (e.g. 15) so search latency stays steady. For mostly-append workloads, raise it (e.g. 50) so you pay the rebuild less often.

Vector arena

The contiguous dense-vector arena is built lazily on first use. For a heavy brute-force or rescoring workload (large fetch_k, MMR, reranking), materialise it up front to avoid the first-call latency spike.

db.prepare_for_scan()
print(db.vector_arena_len()) # number of vectors in the arena, or None
db.prepareForScan()
console.log(db.vectorArenaLen())

The arena is rebuilt lazily after a delete (deletes can't compact in place). Search-path integration is incremental: the arena is currently exposed for callers and used as the cache-friendly storage layer; wiring it into the default collect_results scan is in progress in a follow-up release.

Putting it together — streaming workload

import vectlite

with vectlite.open("stream.vdb", dimension=384) as db:
db.set_wal_sync_mode("every_n", n=128)
db.set_index_config(tombstone_rebuild_pct=20)

for record in incoming_records():
db.upsert(record["id"], record["vector"], record["metadata"])

db.flush() # forces an fsync and persists the ANN sidecar
db.compact() # garbage-collects tombstoned records

What didn't change

  • Crash durability — the WAL is still per-record durable on per_op (the default). Relax the sync mode only if your workload can tolerate a bounded data-loss window.
  • API surface — every existing method behaves the same. The new knobs are additive.
  • Disk format — the .vdb itself didn't change. Only the ANN sidecar manifest bumped to ANN2, and old ANN1 files are still readable.

See also

  • HNSW tuningm, ef_construction, ef_search, parallel_insert_threshold.
  • Diagnostics — measuring recall after relaxing sync mode or raising tombstone thresholds.
  • Changelog 0.11.0 — full release notes.