Skip to main content

Distance Metrics

vectlite supports four distance metrics for dense vector search. The metric is selected at database creation time and persisted in the .vdb file. All metrics use SIMD acceleration via simsimd where available (cosine, euclidean, dot product); manhattan falls back to a scalar implementation.

Scores are always oriented so higher is better. Distance metrics (euclidean, manhattan) are negated internally so you can sort and compare results the same way regardless of metric.

When to use which

MetricBest forNotes
cosine (default)Most text & image embeddingsMagnitude-invariant. The safe default.
euclidean (l2)Embeddings where magnitude mattersSome recommender / clustering models.
dotproduct (dot, ip)Models trained with dot-product lossOpenAI / Cohere often work well with cosine; check your model card.
manhattan (l1)Sparse / count-based featuresRarely needed for modern embeddings.

Creating a database with a specific metric

Python

import vectlite

# Default is cosine
db = vectlite.open("knowledge.vdb", dimension=384)

# Choose another metric at creation time
db = vectlite.open("knowledge.vdb", dimension=384, metric="euclidean")
db = vectlite.open("knowledge.vdb", dimension=384, metric="dotproduct")
db = vectlite.open("knowledge.vdb", dimension=384, metric="manhattan")

# Aliases accepted: "l2", "dot", "ip", "inner_product", "dot_product", "l1"
print(db.metric) # "euclidean"

Node.js

const vectlite = require('vectlite')

const db = vectlite.open('knowledge.vdb', { dimension: 384, metric: 'euclidean' })
console.log(db.metric) // "euclidean"

Inspecting the active metric

The metric property reflects what was persisted in the file. Opening an existing database ignores the metric argument.

db = vectlite.open("knowledge.vdb")  # opens existing
print(db.metric)

What changes under the hood

  • HNSW indexes use metric-specific distance functions (DistCosine, DistL2, DistDot, DistL1) from hnsw_rs
  • MMR diversification, multi-vector MaxSim scoring, and record similarity all go through DistanceMetric::score()
  • The binary format byte after dimension stores the chosen metric so reopens are deterministic

Migration

You cannot change the metric of an existing database in place. To switch:

  1. db.dump() (CLI) or iterate via list_cursor() and export records
  2. Create a new database with the new metric
  3. bulk_ingest() the records back