Introduction
What is Vecta, core concepts, and two client modes
Introduction
Vecta is an open-source SDK and hosted platform for benchmarking and evaluating Retrieval-Augmented Generation (RAG) systems. It measures retrieval and generation performance across multiple semantic granularities — chunk-level, page-level, and document-level — so you can pinpoint exactly where your pipeline succeeds or fails.
Core Concepts
Data Sources
A data source is a connection to the knowledge base your RAG system retrieves from. Vecta supports two categories:
- Vector databases — ChromaDB (local & cloud), Pinecone, Weaviate, pgvector, Azure Cosmos DB, Databricks, plus LangChain and LlamaIndex wrappers.
- File stores — Local files (PDF, DOCX, PPTX, XLSX, TXT, and more) that Vecta ingests with markitdown and chunks automatically.
Every vector-database connector requires a VectorDBSchema that tells Vecta how to extract id, content, source_path, and page_nums from the raw records your database returns. See Accessor Syntax for details.
Benchmarks
A benchmark is a list of question-answer pairs grounded in your knowledge base. Each entry contains:
| Field | Description |
|---|---|
question | A natural-language query |
answer | The expected answer |
chunk_ids | Ground-truth chunk identifiers that answer the question |
page_nums | Page numbers where answers reside (optional) |
source_paths | Document/file identifiers (optional) |
You can create benchmarks three ways:
- Synthetic generation — Vecta samples chunks from your data source and uses an LLM to produce questions, answers, and ground-truth citations.
- CSV upload — Import an existing Q&A dataset.
- Hugging Face import — Pull standard datasets like MS MARCO or GPQA Diamond.
Evaluations
An evaluation runs your RAG pipeline against a benchmark and computes metrics. Vecta supports three evaluation types:
| Type | You provide | Metrics computed |
|---|---|---|
| Retrieval only | query → chunk_ids | Precision, recall, F1 at chunk / page / document level |
| Generation only | query → generated_text | Accuracy, groundedness (LLM-as-a-judge) |
| Retrieval + Generation | query → (chunk_ids, generated_text) | All of the above |
Experiments
An experiment groups related evaluations so you can compare configuration changes side by side — for example, sweeping top_k values or testing different embedding models. Attach arbitrary metadata to each evaluation run and visualize the results.
Two Client Modes
Local Client — VectaClient
Use VectaClient when you want to run everything locally (benchmarking, evaluation, and storage). All computation happens on your machine. Ideal for development, local LLMs, and air-gapped environments.
from vecta import VectaClient
client = VectaClient(
data_source_connector=my_connector,
openai_api_key="sk-...", # needed for synthetic generation & generation metrics
)
API Client — VectaAPIClient
Use VectaAPIClient when you want the hosted platform to handle AI operations (benchmark generation, LLM-as-a-judge scoring) and store results in the Vecta dashboard.
from vecta import VectaAPIClient
client = VectaAPIClient(api_key="your-vecta-api-key")
The API client evaluates your pipeline locally (your function runs on your machine), then uploads the results to the server for storage, visualization, and PDF export.
Supported Metrics
| Semantic Level | Retrieval | Generation |
|---|---|---|
| Chunk-level | Precision, Recall, F1 | Accuracy, Groundedness |
| Page-level | Precision, Recall, F1 | Accuracy, Groundedness |
| Document-level | Precision, Recall, F1 | Accuracy, Groundedness |
Next Steps
- Quickstart — Get running in 5 minutes
- Data Sources — Connect your knowledge base
- Benchmarks — Create evaluation datasets
- Evaluations — Run your first evaluation