Accessor Syntax
How VectorDBSchema maps fields from your database
Accessor Syntax
Every vector-database connector in Vecta requires a VectorDBSchema that describes how to extract fields from the raw records your database returns. This page covers the accessor path syntax.
VectorDBSchema Fields
from vecta import VectorDBSchema
schema = VectorDBSchema(
id_accessor="id", # required
content_accessor="content", # required
metadata_accessor="metadata", # optional
source_path_accessor="metadata.source_path", # required
page_nums_accessor="metadata.page_nums", # optional
source_path_default="unknown", # fallback value
page_nums_default=None, # fallback value
)
| Field | Required | Description |
|---|---|---|
id_accessor | ✅ | Path to the unique chunk identifier |
content_accessor | ✅ | Path to the text content |
metadata_accessor | ❌ | Path to a metadata object |
source_path_accessor | ✅ | Path to the document/file name |
page_nums_accessor | ❌ | Path to page numbers |
source_path_default | ❌ | Fallback when source path extraction fails (default: "unknown") |
page_nums_default | ❌ | Fallback page numbers (default: None) |
Path Syntax
Direct field access
id_accessor="id"
Extracts data["id"] from a dict, or data.id from an object.
Dot-prefixed property access
id_accessor=".id"
Same as direct access but explicitly uses property/key lookup. Useful for connectors (like Pinecone or Weaviate) where the ID is an attribute rather than a dict key.
Index access
id_accessor="[0]"
content_accessor="[1]"
metadata_accessor="[2]"
For databases that return tuples or lists (e.g., Databricks).
Nested field access
source_path_accessor="metadata.source_path"
page_nums_accessor="properties.metadata.page_nums"
Traverses nested dicts or objects using dot notation.
Mixed access
source_path_accessor="[2].source_path"
content_accessor="properties.content"
Combine index and field access in a single path.
JSON parsing
When a field contains a JSON string that needs to be parsed before further access:
source_path_accessor="json(metadata.provenance_json).doc_name"
This first extracts metadata.provenance_json, parses the string as JSON, then accesses .doc_name on the parsed result.
Nested JSON parsing
For doubly-serialized JSON:
source_path_accessor="json(json(metadata.provenance_json).inner_json).doc_name"
This parses metadata.provenance_json as JSON, extracts .inner_json, parses that as JSON, then accesses .doc_name.
Default Schemas Per Database
Below are the default schemas Vecta uses for each supported database. You can customize any of these.
ChromaDB
VectorDBSchema(
id_accessor="id",
content_accessor="document",
metadata_accessor="metadata",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
Pinecone
VectorDBSchema(
id_accessor=".id",
content_accessor="metadata.content",
metadata_accessor="metadata",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
pgvector
VectorDBSchema(
id_accessor="id",
content_accessor="content",
metadata_accessor="metadata",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
Weaviate
VectorDBSchema(
id_accessor=".uuid",
content_accessor="properties.content",
metadata_accessor="properties.metadata",
source_path_accessor="properties.metadata.source_path",
page_nums_accessor="properties.metadata.page_nums",
)
Databricks
VectorDBSchema(
id_accessor="[0]",
content_accessor="[1]",
metadata_accessor="[2]",
source_path_accessor="[2].source_path",
page_nums_accessor="[2].page_nums",
)
Azure Cosmos DB
VectorDBSchema(
id_accessor="id",
content_accessor="content",
metadata_accessor="metadata",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
Tips
- Inspect your data first. Log or print one raw record from your database to see the exact field names and structure before writing your schema.
- Test incrementally. Start with just
id_accessorandcontent_accessor, verify they work, then addsource_path_accessor. - Use
json()sparingly. Only use JSON parsing when a field is actually stored as a serialized JSON string. If it's already a dict/object, regular dot notation works.