Schema Validation
vectlite.schema adds optional typed schemas on top of metadata. It catches malformed writes before they hit the database and persists alongside the .vdb file so the contract is portable.
from vectlite import schema
Defining a schema
s = schema.Schema({
"price": "number",
"title": "string",
"tags": "array<string>",
"author": { # nested object
"name": "string",
"age": "number",
},
}, strict=True) # strict=True rejects unknown top-level fields
Supported types
| Type | Accepts |
|---|---|
string | str |
number | int or float |
integer | int |
boolean | bool |
null | None |
any | any value |
array | any list |
array<T> | list whose elements match T (e.g. array<string>) |
object | any dict |
{...} | nested object with typed fields |
Validating manually
s.validate({"price": 9.99, "title": "Hello"}) # OK
s.validate({"price": "free"}) # raises SchemaError
Auto-validating writes
Wrap a database to validate metadata on every write:
validated_db = schema.validated(db, s)
validated_db.upsert("doc1", vector, {"price": 9.99}) # OK
validated_db.upsert("doc2", vector, {"price": "free"}) # raises SchemaError
validated_db exposes the same surface as db; only metadata-bearing writes are intercepted.
Persistence
Schemas live in a .vdb.schema.json sidecar next to the database file:
s.save(db) # write the sidecar
loaded = schema.load(db) # read it back
backup() includes the sidecar in its dump; restore() brings it back.
Strict mode
strict=True rejects any top-level field not declared in the schema:
s = schema.Schema({"title": "string"}, strict=True)
s.validate({"title": "ok", "extra": True}) # raises SchemaError: unknown field 'extra'
Without strict, extra fields pass through and the database stores them as-is.
Errors
SchemaError is a subclass of VectLiteError, so catching VectLiteError covers schema failures too.
from vectlite import schema, VectLiteError
try:
validated_db.upsert("doc1", vec, {"price": "free"})
except schema.SchemaError as e:
print(f"Schema violation: {e}")
except VectLiteError as e:
print(f"vectlite error: {e}")
When to use schemas
- Multi-writer setups — different services writing to the same database. The schema is the shared contract.
- Long-lived datasets — catches drift introduced by code changes months later.
- User-facing APIs — surface a clean error before the bad data lands on disk.
If you control all writes in one process and trust your own code, schemas are optional.