Skip to main content

Schema Validation

vectlite.schema adds optional typed schemas on top of metadata. It catches malformed writes before they hit the database and persists alongside the .vdb file so the contract is portable.

from vectlite import schema

Defining a schema

s = schema.Schema({
"price": "number",
"title": "string",
"tags": "array<string>",
"author": { # nested object
"name": "string",
"age": "number",
},
}, strict=True) # strict=True rejects unknown top-level fields

Supported types

TypeAccepts
stringstr
numberint or float
integerint
booleanbool
nullNone
anyany value
arrayany list
array<T>list whose elements match T (e.g. array<string>)
objectany dict
{...}nested object with typed fields

Validating manually

s.validate({"price": 9.99, "title": "Hello"})  # OK
s.validate({"price": "free"}) # raises SchemaError

Auto-validating writes

Wrap a database to validate metadata on every write:

validated_db = schema.validated(db, s)
validated_db.upsert("doc1", vector, {"price": 9.99}) # OK
validated_db.upsert("doc2", vector, {"price": "free"}) # raises SchemaError

validated_db exposes the same surface as db; only metadata-bearing writes are intercepted.

Persistence

Schemas live in a .vdb.schema.json sidecar next to the database file:

s.save(db)                  # write the sidecar
loaded = schema.load(db) # read it back

backup() includes the sidecar in its dump; restore() brings it back.

Strict mode

strict=True rejects any top-level field not declared in the schema:

s = schema.Schema({"title": "string"}, strict=True)
s.validate({"title": "ok", "extra": True}) # raises SchemaError: unknown field 'extra'

Without strict, extra fields pass through and the database stores them as-is.

Errors

SchemaError is a subclass of VectLiteError, so catching VectLiteError covers schema failures too.

from vectlite import schema, VectLiteError

try:
validated_db.upsert("doc1", vec, {"price": "free"})
except schema.SchemaError as e:
print(f"Schema violation: {e}")
except VectLiteError as e:
print(f"vectlite error: {e}")

When to use schemas

  • Multi-writer setups — different services writing to the same database. The schema is the shared contract.
  • Long-lived datasets — catches drift introduced by code changes months later.
  • User-facing APIs — surface a clean error before the bad data lands on disk.

If you control all writes in one process and trust your own code, schemas are optional.