Skip to main content

Storage Format

vectlite stores all data in a small set of files on disk. There is no external server or daemon -- the database is just files.

File Overview

For a database at knowledge.vdb, the following files may exist:

FilePurpose
knowledge.vdbMain database file. Contains all committed records, metadata, and the sparse inverted index.
knowledge.vdb.walWrite-ahead log. Buffers recent writes that have not yet been compacted.
knowledge.vdb.annHNSW index sidecar. Contains the approximate nearest-neighbor graph for dense search.
knowledge.vdb.lockLock file. Used for advisory file locking to coordinate concurrent access.

Main Database File (.vdb)

The .vdb file is a binary format designed for fast sequential and random reads. It contains:

  • Header: Magic bytes, format version, vector dimension, and record count.
  • Record table: A contiguous array of fixed-size record entries containing the ID hash, vector data offset, and metadata offset.
  • Vector data: Dense vectors stored in contiguous blocks, aligned for SIMD reads.
  • Metadata store: CBOR-encoded metadata blobs, referenced by offset from the record table.
  • Sparse index: An inverted index mapping terms to posting lists (record IDs and term frequencies), used for BM25 scoring.
  • Footer: Checksums for integrity verification.

The format is append-friendly. New data written by compact() produces a fresh, defragmented file.

Write-Ahead Log (.wal)

All write operations (insert, upsert, delete) are first appended to the WAL before being acknowledged. This ensures durability even if the process crashes before compaction.

The WAL is a sequential log of operations:

  • Each entry contains an operation type, record ID, vector data, and metadata.
  • Entries are length-prefixed and checksummed.
  • On startup, vectlite replays any WAL entries that were not yet compacted into the main file.

Calling flush() ensures all buffered writes are persisted to the WAL on disk. Calling compact() merges the WAL into the main file and truncates the log.

HNSW Index Sidecar (.ann)

The ANN sidecar stores the HNSW graph structure:

  • Graph layers: Each layer contains a set of nodes with neighbor lists. Layer 0 is the densest, containing all records.
  • Entry point: The top-level entry node for graph traversal.
  • Build parameters: M (max connections per node), ef_construction (search width during build).

The sidecar is rebuilt during compact() when:

  • The database grows past the 128-record auto-build threshold for the first time.
  • A significant fraction of records have been added or deleted since the last build.

Between compactions, newly inserted records are added to the graph incrementally.

For databases with fewer than 128 records, no .ann file is created. Search uses exact brute-force comparison, which is faster at small scale.

Lock File (.lock)

The lock file coordinates concurrent access across processes:

  • Exclusive lock: Acquired by a read-write Database instance. Only one writer is allowed at a time.
  • Shared lock: Acquired by read_only instances. Multiple readers can hold shared locks simultaneously.
  • Advisory: The lock is advisory (using flock on Unix, LockFileEx on Windows). It does not prevent other programs from modifying the file directly.

The lock file is created on first open and left on disk after close (it is harmless and avoids race conditions on re-open).

How compact() Works

Compaction merges the WAL into the main file and optimizes the database:

  1. Replay WAL: All pending WAL operations are applied in order, producing an in-memory snapshot of the current state.
  2. Write new .vdb: A new main file is written atomically (write to a temp file, then rename). This defragments storage and removes deleted records.
  3. Rebuild ANN index: If the record count is at or above the auto-build threshold, the HNSW graph is rebuilt from scratch and written to the .ann sidecar.
  4. Update sparse index: The inverted index is rebuilt and included in the new .vdb file.
  5. Truncate WAL: The WAL file is truncated to zero once the new main file is in place.

Because the main file is replaced atomically, a crash during compaction leaves the old file intact. The WAL ensures no data is lost.

Store Layout

A Store (collection manager) creates a directory structure where each collection is an independent database:

my_collections/
products.vdb
products.vdb.wal
products.vdb.ann
products.vdb.lock
logs.vdb
logs.vdb.wal
logs.vdb.lock

Each collection has its own dimension and can be managed independently.

Snapshots vs. Backups

  • snapshot(dest) creates a single .vdb file containing all committed data. It does not include the WAL or ANN sidecar. This is a compact, portable copy.
  • backup(dest) copies the .vdb file and all sidecar files to a directory. This preserves the ANN index, avoiding a rebuild on restore.

For a fully up-to-date snapshot, call compact() before snapshot() to flush the WAL into the main file.