Skip to main content

Multi-Vector / ColBERT Search

Store multiple token-level embeddings per record and search with MaxSim late interaction scoring. This is how ColBERT, ColBERTv2, and ColPali achieve their retrieval quality — they capture token-level semantics that a single pooled vector throws away.

vectlite supports any number of named multi-vector spaces per record (e.g. "colbert", "colpali"), with optional 2-bit quantization for ~16x memory compression.

How MaxSim works

For each query token, find the maximum cosine similarity against all document tokens, then sum across query tokens:

score(q, d) = Σ_qi max_dj cos(qi, dj)

It is more expressive than a pooled dot product (it doesn't collapse the sentence into one vector) but more expensive. Use the 2-bit quantization to keep it tractable on large corpora.

Inserting per-token embeddings

Python

# token_vec_1, token_vec_2, ... are float vectors (one per token)
db.upsert_multi_vectors(
"doc1",
dense_vector, # also store a pooled vector
{"colbert": [token_vec_1, token_vec_2, ...]}, # per-token embeddings
metadata={"source": "paper"},
)

Node.js

db.upsertMultiVectors(
'doc1',
denseVector,
{ colbert: [tokenVec1, tokenVec2 /* ... */] },
{ source: 'paper' }
)

The pooled dense_vector lets you mix multi-vector search with regular dense / hybrid search on the same record.

Searching with MaxSim

Python

results = db.search_multi_vector(
"colbert", # space name
query_token_vectors, # list of float vectors
k=10,
filter={"source": "paper"},
)

Node.js

const results = db.searchMultiVector('colbert', queryTokenVectors, {
k: 10,
filter: { source: 'paper' },
})

2-bit quantization (~16x compression)

ColBERTv2-style 2-bit quantization compresses each dimension to 2 bits using per-dimension quartile boundaries. Quantized search uses the standard 2-stage pipeline: fast approximate MaxSim picks candidates, then exact float32 rescores the top hits.

Parameters persist in a .vdb.mvquant.<space> sidecar file per space.

db.enable_multi_vector_quantization("colbert")
print(db.is_multi_vector_quantized("colbert")) # True
db.disable_multi_vector_quantization("colbert")
db.enableMultiVectorQuantization('colbert')
console.log(db.isMultiVectorQuantized('colbert'))
db.disableMultiVectorQuantization('colbert')

Storage cost rule of thumb

A ColBERT document with 200 tokens × 128 dim × float32 = 102 KB per doc. With 2-bit quantization: ~6.4 KB per doc.

For 1M documents: 102 GB → 6.4 GB. Quantization is essentially mandatory above the 100k-doc range.

Combining with other features

  • Filters are applied before MaxSim scoring (use payload indexes to keep the candidate set small).
  • Namespaces scope multi-vector search like any other search.
  • Multiple spaces per record — store both ColBERT and ColPali on the same record and query whichever fits the modality.