Multi-Vector / ColBERT Search
Store multiple token-level embeddings per record and search with MaxSim late interaction scoring. This is how ColBERT, ColBERTv2, and ColPali achieve their retrieval quality — they capture token-level semantics that a single pooled vector throws away.
vectlite supports any number of named multi-vector spaces per record (e.g. "colbert", "colpali"), with optional 2-bit quantization for ~16x memory compression.
How MaxSim works
For each query token, find the maximum cosine similarity against all document tokens, then sum across query tokens:
score(q, d) = Σ_qi max_dj cos(qi, dj)
It is more expressive than a pooled dot product (it doesn't collapse the sentence into one vector) but more expensive. Use the 2-bit quantization to keep it tractable on large corpora.
Inserting per-token embeddings
Python
# token_vec_1, token_vec_2, ... are float vectors (one per token)
db.upsert_multi_vectors(
"doc1",
dense_vector, # also store a pooled vector
{"colbert": [token_vec_1, token_vec_2, ...]}, # per-token embeddings
metadata={"source": "paper"},
)
Node.js
db.upsertMultiVectors(
'doc1',
denseVector,
{ colbert: [tokenVec1, tokenVec2 /* ... */] },
{ source: 'paper' }
)
The pooled dense_vector lets you mix multi-vector search with regular dense / hybrid search on the same record.
Searching with MaxSim
Python
results = db.search_multi_vector(
"colbert", # space name
query_token_vectors, # list of float vectors
k=10,
filter={"source": "paper"},
)
Node.js
const results = db.searchMultiVector('colbert', queryTokenVectors, {
k: 10,
filter: { source: 'paper' },
})
2-bit quantization (~16x compression)
ColBERTv2-style 2-bit quantization compresses each dimension to 2 bits using per-dimension quartile boundaries. Quantized search uses the standard 2-stage pipeline: fast approximate MaxSim picks candidates, then exact float32 rescores the top hits.
Parameters persist in a .vdb.mvquant.<space> sidecar file per space.
db.enable_multi_vector_quantization("colbert")
print(db.is_multi_vector_quantized("colbert")) # True
db.disable_multi_vector_quantization("colbert")
db.enableMultiVectorQuantization('colbert')
console.log(db.isMultiVectorQuantized('colbert'))
db.disableMultiVectorQuantization('colbert')
Storage cost rule of thumb
A ColBERT document with 200 tokens × 128 dim × float32 = 102 KB per doc. With 2-bit quantization: ~6.4 KB per doc.
For 1M documents: 102 GB → 6.4 GB. Quantization is essentially mandatory above the 100k-doc range.
Combining with other features
- Filters are applied before MaxSim scoring (use payload indexes to keep the candidate set small).
- Namespaces scope multi-vector search like any other search.
- Multiple spaces per record — store both ColBERT and ColPali on the same record and query whichever fits the modality.