Back to Docs
Getting Started

Accessor Syntax

How VectorDBSchema maps fields from your database

Last updated: August 20, 2025
Category: getting-started

Accessor Syntax

Every vector-database connector in Vecta requires a VectorDBSchema that describes how to extract fields from the raw records your database returns. This page covers the accessor path syntax.

VectorDBSchema Fields

from vecta import VectorDBSchema

schema = VectorDBSchema(
    id_accessor="id",                              # required
    content_accessor="content",                    # required
    metadata_accessor="metadata",                  # optional
    source_path_accessor="metadata.source_path",   # required
    page_nums_accessor="metadata.page_nums",       # optional
    source_path_default="unknown",                 # fallback value
    page_nums_default=None,                        # fallback value
)
FieldRequiredDescription
id_accessorPath to the unique chunk identifier
content_accessorPath to the text content
metadata_accessorPath to a metadata object
source_path_accessorPath to the document/file name
page_nums_accessorPath to page numbers
source_path_defaultFallback when source path extraction fails (default: "unknown")
page_nums_defaultFallback page numbers (default: None)

Path Syntax

Direct field access

id_accessor="id"

Extracts data["id"] from a dict, or data.id from an object.

Dot-prefixed property access

id_accessor=".id"

Same as direct access but explicitly uses property/key lookup. Useful for connectors (like Pinecone or Weaviate) where the ID is an attribute rather than a dict key.

Index access

id_accessor="[0]"
content_accessor="[1]"
metadata_accessor="[2]"

For databases that return tuples or lists (e.g., Databricks).

Nested field access

source_path_accessor="metadata.source_path"
page_nums_accessor="properties.metadata.page_nums"

Traverses nested dicts or objects using dot notation.

Mixed access

source_path_accessor="[2].source_path"
content_accessor="properties.content"

Combine index and field access in a single path.

JSON parsing

When a field contains a JSON string that needs to be parsed before further access:

source_path_accessor="json(metadata.provenance_json).doc_name"

This first extracts metadata.provenance_json, parses the string as JSON, then accesses .doc_name on the parsed result.

Nested JSON parsing

For doubly-serialized JSON:

source_path_accessor="json(json(metadata.provenance_json).inner_json).doc_name"

This parses metadata.provenance_json as JSON, extracts .inner_json, parses that as JSON, then accesses .doc_name.

Default Schemas Per Database

Below are the default schemas Vecta uses for each supported database. You can customize any of these.

ChromaDB

VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

Pinecone

VectorDBSchema(
    id_accessor=".id",
    content_accessor="metadata.content",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

pgvector

VectorDBSchema(
    id_accessor="id",
    content_accessor="content",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

Weaviate

VectorDBSchema(
    id_accessor=".uuid",
    content_accessor="properties.content",
    metadata_accessor="properties.metadata",
    source_path_accessor="properties.metadata.source_path",
    page_nums_accessor="properties.metadata.page_nums",
)

Databricks

VectorDBSchema(
    id_accessor="[0]",
    content_accessor="[1]",
    metadata_accessor="[2]",
    source_path_accessor="[2].source_path",
    page_nums_accessor="[2].page_nums",
)

Azure Cosmos DB

VectorDBSchema(
    id_accessor="id",
    content_accessor="content",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

Tips

  • Inspect your data first. Log or print one raw record from your database to see the exact field names and structure before writing your schema.
  • Test incrementally. Start with just id_accessor and content_accessor, verify they work, then add source_path_accessor.
  • Use json() sparingly. Only use JSON parsing when a field is actually stored as a serialized JSON string. If it's already a dict/object, regular dot notation works.

Need Help?

Can't find what you're looking for? Our team is here to help.