Skip to main content

Payload Indexes

Payload indexes accelerate filtered queries 10-100x on large collections. Without an index, vectlite scans candidates and applies the filter at evaluation time. With one, the engine narrows the candidate set before full filter evaluation.

Indexes are automatically used by search(), count(), and list(). AND filters intersect index results; OR filters union when all sub-filters are indexed.

Index definitions persist across close/reopen in a .vdb.pidx sidecar file (included in backup()). Data is rebuilt from records on open.

Index types

TypeOptimizesExample filters
keywordString equality, $in, $nin{"source": "blog"}, {"category": {"$in": ["docs", "blog"]}}
numericRange queries{"score": {"$gte": 0.8}}, {"price": {"$lt": 100}}

Creating indexes

Python

db.create_index("source", "keyword")   # string equality / $in
db.create_index("score", "numeric") # range: $gt, $gte, $lt, $lte

# Filtered queries now use indexes automatically
count = db.count(filter={"source": "blog"})
results = db.search(query, k=10, filter={"score": {"$gte": 0.8}})

Node.js

db.createIndex('source', 'keyword')
db.createIndex('score', 'numeric')

const count = db.count({ filter: { source: 'blog' } })
const results = db.search(query, { k: 10, filter: { score: { $gte: 0.8 } } })

Inspecting and dropping

print(db.list_indexes())     # [("source", "keyword"), ("score", "numeric")]
db.drop_index("score")
console.log(db.listIndexes())  // [["source", "keyword"], ["score", "numeric"]]
db.dropIndex('score')

When to create an index

  • Field is filtered on >5% of queries — the index amortizes its maintenance cost.
  • Collection >10k records — small collections are already fast without an index.
  • High-cardinality keyword fieldskeyword indexes thrive when the field has many distinct values (user IDs, tenant IDs, slugs).
  • Numeric range queriesnumeric indexes are nearly always a win for range filters on large collections.

When not to bother

  • Tiny collections (<1k records)
  • Fields filtered rarely (one-off admin queries)
  • Boolean fields with two values — a full scan is comparable to maintaining an index

Maintenance behaviour

Indexes are incrementally maintained on upsert(), delete(), update_metadata(), and bulk_ingest(). There is no batch index rebuild cost when adding records — the per-write overhead is small and amortized.