Introduction

Vecta is an open-source SDK and hosted platform for benchmarking and evaluating Retrieval-Augmented Generation (RAG) systems. It measures retrieval and generation performance across multiple semantic granularities — chunk-level, page-level, and document-level — so you can pinpoint exactly where your pipeline succeeds or fails.

Core Concepts

Data Sources

A data source is a connection to the knowledge base your RAG system retrieves from. Vecta supports two categories:

Vector databases — ChromaDB (local & cloud), Pinecone, Weaviate, pgvector, Azure Cosmos DB, Databricks, plus LangChain and LlamaIndex wrappers.
File stores — Local files (PDF, DOCX, PPTX, XLSX, TXT, and more) that Vecta ingests with markitdown and chunks automatically.

Every vector-database connector requires a VectorDBSchema that tells Vecta how to extract id, content, source_path, and page_nums from the raw records your database returns. See Accessor Syntax for details.

Benchmarks

A benchmark is a list of question-answer pairs grounded in your knowledge base. Each entry contains:

Field	Description
`question`	A natural-language query
`answer`	The expected answer
`chunk_ids`	Ground-truth chunk identifiers that answer the question
`page_nums`	Page numbers where answers reside (optional)
`source_paths`	Document/file identifiers (optional)

You can create benchmarks three ways:

Synthetic generation — Vecta samples chunks from your data source and uses an LLM to produce questions, answers, and ground-truth citations.
CSV upload — Import an existing Q&A dataset.
Hugging Face import — Pull standard datasets like MS MARCO or GPQA Diamond.

Evaluations

An evaluation runs your RAG pipeline against a benchmark and computes metrics. Vecta supports three evaluation types:

Type	You provide	Metrics computed
Retrieval only	`query → chunk_ids`	Precision, recall, F1 at chunk / page / document level
Generation only	`query → generated_text`	Accuracy, groundedness (LLM-as-a-judge)
Retrieval + Generation	`query → (chunk_ids, generated_text)`	All of the above

Experiments

An experiment groups related evaluations so you can compare configuration changes side by side — for example, sweeping top_k values or testing different embedding models. Attach arbitrary metadata to each evaluation run and visualize the results.

Two Client Modes

Local Client — `VectaClient`

Use VectaClient when you want to run everything locally (benchmarking, evaluation, and storage). All computation happens on your machine. Ideal for development, local LLMs, and air-gapped environments.

from vecta import VectaClient

client = VectaClient(
    data_source_connector=my_connector,
    openai_api_key="sk-...",  # needed for synthetic generation & generation metrics
)

API Client — `VectaAPIClient`

Use VectaAPIClient when you want the hosted platform to handle AI operations (benchmark generation, LLM-as-a-judge scoring) and store results in the Vecta dashboard.

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-vecta-api-key")

The API client evaluates your pipeline locally (your function runs on your machine), then uploads the results to the server for storage, visualization, and PDF export.

Supported Metrics

Semantic Level	Retrieval	Generation
Chunk-level	Precision, Recall, F1	Accuracy, Groundedness
Page-level	Precision, Recall, F1	Accuracy, Groundedness
Document-level	Precision, Recall, F1	Accuracy, Groundedness

Next Steps

Quickstart — Get running in 5 minutes
Data Sources — Connect your knowledge base
Benchmarks — Create evaluation datasets
Evaluations — Run your first evaluation