Accessor Syntax
How to map your database schema
Accessor Syntax Reference
Vecta uses a compact accessor language to translate arbitrary database responses into the normalized structure that powers evaluation, sampling, and analytics. This guide explains how the accessor system works, what syntaxes are supported, and how to validate your configuration with concrete examples.
How the accessor system works
When a connector returns raw records, Vecta passes the payload to your
configured vector database schema and extracts
the fields it needs to construct ChunkData objects. The logic lives in
DataAccessor.extract, which evaluates the accessor string you provide against
any combination of Python dictionaries, objects, lists, or JSON strings.
from vecta.core.schemas import DataAccessor, VectorDBSchema
# This is the standard chunk structure Pinecone returns
{
.id: "chunk_123",
.metadata: {
"content": "Text here...",
"source_path": "doc.pdf"
}
}
# Schema
schema = VectorDBSchema(
id_accessor=".id",
content_accessor=".metadata.content",
source_path_accessor=".metadata.source_path"
)
# If your vector database returns chunks as dictionaries like this:
{
"id": "chunk_123",
"content": "This is the text content of the chunk.",
"metadata": {
"source_path": "/documents/doc1.pdf",
"page_nums": [1, 2, 3],
"author": "Jane Doe"
}
}
# Define a schema with accessors that match that structure
schema = VectorDBSchema(
id_accessor="id",
content_accessor="content",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
# Some vector databases, like Chroma, do not allow list types in metadata.
# Vecta accessors are flexible enough to parse JSON before accessing fields.
{
"id": "chunk_123",
"content": "This is the text content of the chunk.",
"metadata": {
"source_path": "/documents/doc1.pdf",
"page_nums": "[1, 2, 3]", # JSON encoded list
"author": "Jane Doe"
}
}
# Define a schema with accessors that match that structure
schema = VectorDBSchema(
id_accessor="id",
content_accessor="content",
source_path_accessor="metadata.source_path",
page_nums_accessor="json(metadata.page_nums)", # Parse JSON
)
Vecta will use your defined schema to extract fields after querying your vector database. This is done using DataAccessor.extract.
You may use this function directly to test your accessors, for example:
chunk_id = DataAccessor.extract(result, schema.id_accessor)
content = DataAccessor.extract(result, schema.content_accessor)
metadata = DataAccessor.extract(result, schema.metadata_accessor)
source_path = DataAccessor.extract(result, schema.source_path_accessor)
page_nums = DataAccessor.extract(result, schema.page_nums_accessor)
Accessors run from left to right. Each segment tells Vecta how to traverse the
structure—moving through dictionary keys, object attributes, array indices, or
nested JSON blobs. If any hop fails, extraction returns None so that
fallbacks (like source_path_default or page_nums_default) can be applied.
Supported syntax building blocks
| Pattern | Description | Example | Notes |
|---|---|---|---|
field | Access a top-level dictionary key | "document" | Use for direct column names or JSON keys |
.property | Access an attribute on an object | ".id" | Useful when the raw result exposes objects with attributes |
field.nested | Chain through nested keys | "metadata.source_path" | Each . after the first is treated as a dictionary lookup |
[index] | Select a position within a list/tuple | "[0]" | Indices are zero-based |
[index].field | Combine list indexing with further access | "[2].source_path" | Works with tuples returned by SQL drivers |
json(field) | Parse a field as JSON before continuing | "json(metadata_blob).doc_name" | Handles JSON stored as strings |
json(json(field).nested).final | Parse nested JSON multiple times | "json(json(metadata.provenance).inner).page" | Use when JSON fields contain additional encoded JSON |
All patterns can be composed. For example,
"json(metadata_blob).document.pages[0].label" will parse the metadata_blob
field as JSON, then drill into the resulting structure.
Core schema fields
Every schema must define the following accessors:
id_accessor– Produces a unique identifier for each chunk. Strings are preferred, but any value will be stringified automatically.content_accessor– Points to the main text content that Vecta evaluates.source_path_accessor– Resolves to the document or file name associated with the chunk.
Optional fields unlock richer reporting:
metadata_accessor– Extracts a dictionary of additional metadata. If you return a string, Vecta will attempt to JSON decode it before storing.page_nums_accessor– Returns one or more page numbers. Accepts either a single integer, a string that can be cast to an integer, or a list of either.source_path_default– Fallback string used whensource_path_accessordoes not resolve.page_nums_default– Provide a comma-separated list in the UI to force fallback page numbers when no accessor-based value exists.additional_accessors– Advanced feature for exposing extra derived fields to custom pipelines.
Next Steps
- Data Sources → - Learn about connectors
- Vector Databases → - See database-specific examples