Throughput Tuning

Added in vectlite 0.11.0.

vectlite 0.11.0 rewrites the single-record ingestion hot path. With the default settings you should see 10–50× higher throughput than 0.10 on a typical SSD, and another 5–10× on top of that when you relax the WAL sync_mode. The new knobs are exposed in Python and Node.

What changed in 0.11.0

Incremental HNSW insertion — insert / upsert no longer rebuild the full HNSW graph(s) per call. Per-record cost drops from O(N log N) to O(log N).
Lazy ANN persistence — the HNSW sidecar files are dumped at flush / compact / close, not on every insert. WAL still gives per-record durability.
Lazy quantized index rebuild — the in-memory PQ / scalar / binary codebook is dropped on the first post-build insert and rebuilt at the next flush. Searches in between transparently fall back to HNSW.
Cached WAL writer — a single BufWriter<File> is kept open for the session, eliminating the per-record open(2) syscall.
HNSW tombstoning — delete marks the record as dead instead of rebuilding the graph. The graph is rebuilt at compact() or when the dead/live ratio crosses tombstone_rebuild_pct.
Contiguous dense-vector arena — vectors are mirrored into a single Vec<f32> arena for cache-friendly brute-force / rescoring scans.

The ANN manifest format bumped from ANN1 to ANN2 to persist per-graph insertion-order keys. Old ANN1 databases are still readable; the format upgrades on next write.

WAL sync mode

This is the highest-impact knob on macOS APFS and any filesystem where fsync is expensive. It trades a bounded amount of recently-acked data on crash for higher throughput.

Mode	Behaviour	Durability	Throughput
`per_op` (default)	`fsync` after every insert.	Strongest.	Lowest.
`every_n`	`fsync` every N inserts.	Lose at most N records on crash.	3–8× higher than `per_op`.
`on_flush`	`fsync` only at `flush` / `compact` / `close`.	Lose all in-flight records on crash.	5–10× higher than `per_op`.

import vectlite

with vectlite.open("stream.vdb", dimension=384) as db:
    db.set_wal_sync_mode("every_n", n=64)
    for record in stream:
        db.insert(record["id"], record["vector"], record["metadata"])
    db.flush()

    print(db.wal_sync_mode())
    # {"mode": "every_n", "n": 64}

const { open } = require('vectlite')

const db = open('stream.vdb', { dimension: 384 })

db.setWalSyncMode('every_n', 64)

for (const record of stream) {
  db.insert(record.id, record.vector, record.metadata)
}
db.flush()

console.log(db.walSyncMode())
// { mode: 'every_n', n: 64 }

Rule of thumb

Long-running daemon: every_n with n between 32 and 256.
One-shot batch job: on_flush, then call db.flush() (or close the database) at the end.
Anything financial / regulatory: stay on per_op.

HNSW tombstoning

delete no longer triggers a full HNSW rebuild. Each deleted record's origin_id is marked in a per-index tombstone set and silently skipped during search. The graph is rebuilt automatically:

At compact() time, or
Whenever the tombstone ratio crosses tombstone_rebuild_pct (default 30).

You can tune the threshold:

db.set_index_config(tombstone_rebuild_pct=50)   # rebuild less aggressively

db.setIndexConfig({ tombstoneRebuildPct: 50 })

Inspect the current state:

live, dead = db.tombstone_stats()
print(f"{dead}/{live + dead} dead ({100 * dead / (live + dead):.1f}%)")

const { live, dead } = db.tombstoneStats()
console.log(`${dead}/${live + dead} dead (${(100 * dead / (live + dead)).toFixed(1)}%)`)

For delete-heavy workloads, lower tombstone_rebuild_pct (e.g. 15) so search latency stays steady. For mostly-append workloads, raise it (e.g. 50) so you pay the rebuild less often.

Vector arena

The contiguous dense-vector arena is built lazily on first use. For a heavy brute-force or rescoring workload (large fetch_k, MMR, reranking), materialise it up front to avoid the first-call latency spike.

db.prepare_for_scan()
print(db.vector_arena_len())   # number of vectors in the arena, or None

db.prepareForScan()
console.log(db.vectorArenaLen())

The arena is rebuilt lazily after a delete (deletes can't compact in place). Search-path integration is incremental: the arena is currently exposed for callers and used as the cache-friendly storage layer; wiring it into the default collect_results scan is in progress in a follow-up release.

Putting it together — streaming workload

import vectlite

with vectlite.open("stream.vdb", dimension=384) as db:
    db.set_wal_sync_mode("every_n", n=128)
    db.set_index_config(tombstone_rebuild_pct=20)

    for record in incoming_records():
        db.upsert(record["id"], record["vector"], record["metadata"])

    db.flush()        # forces an fsync and persists the ANN sidecar
    db.compact()      # garbage-collects tombstoned records

What didn't change

Crash durability — the WAL is still per-record durable on per_op (the default). Relax the sync mode only if your workload can tolerate a bounded data-loss window.
API surface — every existing method behaves the same. The new knobs are additive.
Disk format — the .vdb itself didn't change. Only the ANN sidecar manifest bumped to ANN2, and old ANN1 files are still readable.

What changed in 0.11.0​

WAL sync mode​

HNSW tombstoning​

Vector arena​

Putting it together — streaming workload​

What didn't change​

See also​