Multi-Vector / ColBERT Search

Store multiple token-level embeddings per record and search with MaxSim late interaction scoring. This is how ColBERT, ColBERTv2, and ColPali achieve their retrieval quality — they capture token-level semantics that a single pooled vector throws away.

vectlite supports any number of named multi-vector spaces per record (e.g. "colbert", "colpali"), with optional 2-bit quantization for ~16x memory compression.

How MaxSim works

For each query token, find the maximum cosine similarity against all document tokens, then sum across query tokens:

score(q, d) = Σ_qi max_dj cos(qi, dj)

It is more expressive than a pooled dot product (it doesn't collapse the sentence into one vector) but more expensive. Use the 2-bit quantization to keep it tractable on large corpora.

Inserting per-token embeddings

Python

# token_vec_1, token_vec_2, ... are float vectors (one per token)
db.upsert_multi_vectors(
    "doc1",
    dense_vector,                                       # also store a pooled vector
    {"colbert": [token_vec_1, token_vec_2, ...]},       # per-token embeddings
    metadata={"source": "paper"},
)

Node.js

db.upsertMultiVectors(
  'doc1',
  denseVector,
  { colbert: [tokenVec1, tokenVec2 /* ... */] },
  { source: 'paper' }
)

The pooled dense_vector lets you mix multi-vector search with regular dense / hybrid search on the same record.

Searching with MaxSim

Python

results = db.search_multi_vector(
    "colbert",                # space name
    query_token_vectors,      # list of float vectors
    k=10,
    filter={"source": "paper"},
)

Node.js

const results = db.searchMultiVector('colbert', queryTokenVectors, {
  k: 10,
  filter: { source: 'paper' },
})

2-bit quantization (~16x compression)

ColBERTv2-style 2-bit quantization compresses each dimension to 2 bits using per-dimension quartile boundaries. Quantized search uses the standard 2-stage pipeline: fast approximate MaxSim picks candidates, then exact float32 rescores the top hits.

Parameters persist in a .vdb.mvquant.<space> sidecar file per space.

db.enable_multi_vector_quantization("colbert")
print(db.is_multi_vector_quantized("colbert"))  # True
db.disable_multi_vector_quantization("colbert")

db.enableMultiVectorQuantization('colbert')
console.log(db.isMultiVectorQuantized('colbert'))
db.disableMultiVectorQuantization('colbert')

Storage cost rule of thumb

A ColBERT document with 200 tokens × 128 dim × float32 = 102 KB per doc. With 2-bit quantization: ~6.4 KB per doc.

For 1M documents: 102 GB → 6.4 GB. Quantization is essentially mandatory above the 100k-doc range.

Combining with other features

Filters are applied before MaxSim scoring (use payload indexes to keep the candidate set small).
Namespaces scope multi-vector search like any other search.
Multiple spaces per record — store both ColBERT and ColPali on the same record and query whichever fits the modality.

How MaxSim works​

Inserting per-token embeddings​

Python​

Node.js​

Searching with MaxSim​

Python​

Node.js​

2-bit quantization (~16x compression)​

Storage cost rule of thumb​

Combining with other features​

How MaxSim works

Inserting per-token embeddings

Python

Node.js

Searching with MaxSim

Python

Node.js

2-bit quantization (~16x compression)

Storage cost rule of thumb

Combining with other features