Back to Docs
Getting Started

Accessor Syntax

How to map your database schema

Last updated: August 19, 2025
Category: getting-started

Accessor Syntax Reference

Vecta uses a compact accessor language to translate arbitrary database responses into the normalized structure that powers evaluation, sampling, and analytics. This guide explains how the accessor system works, what syntaxes are supported, and how to validate your configuration with concrete examples.

How the accessor system works

When a connector returns raw records, Vecta passes the payload to your configured vector database schema and extracts the fields it needs to construct ChunkData objects. The logic lives in DataAccessor.extract, which evaluates the accessor string you provide against any combination of Python dictionaries, objects, lists, or JSON strings.

from vecta.core.schemas import DataAccessor, VectorDBSchema

# This is the standard chunk structure Pinecone returns
{
    .id: "chunk_123",
    .metadata: {
        "content": "Text here...",
        "source_path": "doc.pdf"
    }
}

# Schema
schema = VectorDBSchema(
    id_accessor=".id",
    content_accessor=".metadata.content",
    source_path_accessor=".metadata.source_path"
)
# If your vector database returns chunks as dictionaries like this:
{
  "id": "chunk_123",
  "content": "This is the text content of the chunk.",
  "metadata": {
    "source_path": "/documents/doc1.pdf",
    "page_nums": [1, 2, 3],
    "author": "Jane Doe"
  }
}

# Define a schema with accessors that match that structure
schema = VectorDBSchema(
  id_accessor="id",
  content_accessor="content",
  source_path_accessor="metadata.source_path",
  page_nums_accessor="metadata.page_nums",
)
# Some vector databases, like Chroma, do not allow list types in metadata.
# Vecta accessors are flexible enough to parse JSON before accessing fields.
{
  "id": "chunk_123",
  "content": "This is the text content of the chunk.",
  "metadata": {
    "source_path": "/documents/doc1.pdf",
    "page_nums": "[1, 2, 3]",  # JSON encoded list
    "author": "Jane Doe"
  }
}

# Define a schema with accessors that match that structure
schema = VectorDBSchema(
  id_accessor="id",
  content_accessor="content",
  source_path_accessor="metadata.source_path",
  page_nums_accessor="json(metadata.page_nums)",  # Parse JSON
)

Vecta will use your defined schema to extract fields after querying your vector database. This is done using DataAccessor.extract. You may use this function directly to test your accessors, for example:

chunk_id = DataAccessor.extract(result, schema.id_accessor)
content = DataAccessor.extract(result, schema.content_accessor)
metadata = DataAccessor.extract(result, schema.metadata_accessor)
source_path = DataAccessor.extract(result, schema.source_path_accessor)
page_nums = DataAccessor.extract(result, schema.page_nums_accessor) 

Accessors run from left to right. Each segment tells Vecta how to traverse the structure—moving through dictionary keys, object attributes, array indices, or nested JSON blobs. If any hop fails, extraction returns None so that fallbacks (like source_path_default or page_nums_default) can be applied.

Supported syntax building blocks

PatternDescriptionExampleNotes
fieldAccess a top-level dictionary key"document"Use for direct column names or JSON keys
.propertyAccess an attribute on an object".id"Useful when the raw result exposes objects with attributes
field.nestedChain through nested keys"metadata.source_path"Each . after the first is treated as a dictionary lookup
[index]Select a position within a list/tuple"[0]"Indices are zero-based
[index].fieldCombine list indexing with further access"[2].source_path"Works with tuples returned by SQL drivers
json(field)Parse a field as JSON before continuing"json(metadata_blob).doc_name"Handles JSON stored as strings
json(json(field).nested).finalParse nested JSON multiple times"json(json(metadata.provenance).inner).page"Use when JSON fields contain additional encoded JSON

All patterns can be composed. For example, "json(metadata_blob).document.pages[0].label" will parse the metadata_blob field as JSON, then drill into the resulting structure.

Core schema fields

Every schema must define the following accessors:

  • id_accessor – Produces a unique identifier for each chunk. Strings are preferred, but any value will be stringified automatically.
  • content_accessor – Points to the main text content that Vecta evaluates.
  • source_path_accessor – Resolves to the document or file name associated with the chunk.

Optional fields unlock richer reporting:

  • metadata_accessor – Extracts a dictionary of additional metadata. If you return a string, Vecta will attempt to JSON decode it before storing.
  • page_nums_accessor – Returns one or more page numbers. Accepts either a single integer, a string that can be cast to an integer, or a list of either.
  • source_path_default – Fallback string used when source_path_accessor does not resolve.
  • page_nums_default – Provide a comma-separated list in the UI to force fallback page numbers when no accessor-based value exists.
  • additional_accessors – Advanced feature for exposing extra derived fields to custom pipelines.

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.