Back to Docs
Getting Started

Introduction

What is Vecta, core concepts, and two client modes

Last updated: August 20, 2025
Category: getting-started

Introduction

Vecta is an open-source SDK and hosted platform for benchmarking and evaluating Retrieval-Augmented Generation (RAG) systems. It measures retrieval and generation performance across multiple semantic granularities — chunk-level, page-level, and document-level — so you can pinpoint exactly where your pipeline succeeds or fails.

Core Concepts

Data Sources

A data source is a connection to the knowledge base your RAG system retrieves from. Vecta supports two categories:

  • Vector databases — ChromaDB (local & cloud), Pinecone, Weaviate, pgvector, Azure Cosmos DB, Databricks, plus LangChain and LlamaIndex wrappers.
  • File stores — Local files (PDF, DOCX, PPTX, XLSX, TXT, and more) that Vecta ingests with markitdown and chunks automatically.

Every vector-database connector requires a VectorDBSchema that tells Vecta how to extract id, content, source_path, and page_nums from the raw records your database returns. See Accessor Syntax for details.

Benchmarks

A benchmark is a list of question-answer pairs grounded in your knowledge base. Each entry contains:

FieldDescription
questionA natural-language query
answerThe expected answer
chunk_idsGround-truth chunk identifiers that answer the question
page_numsPage numbers where answers reside (optional)
source_pathsDocument/file identifiers (optional)

You can create benchmarks three ways:

  1. Synthetic generation — Vecta samples chunks from your data source and uses an LLM to produce questions, answers, and ground-truth citations.
  2. CSV upload — Import an existing Q&A dataset.
  3. Hugging Face import — Pull standard datasets like MS MARCO or GPQA Diamond.

Evaluations

An evaluation runs your RAG pipeline against a benchmark and computes metrics. Vecta supports three evaluation types:

TypeYou provideMetrics computed
Retrieval onlyquery → chunk_idsPrecision, recall, F1 at chunk / page / document level
Generation onlyquery → generated_textAccuracy, groundedness (LLM-as-a-judge)
Retrieval + Generationquery → (chunk_ids, generated_text)All of the above

Experiments

An experiment groups related evaluations so you can compare configuration changes side by side — for example, sweeping top_k values or testing different embedding models. Attach arbitrary metadata to each evaluation run and visualize the results.

Two Client Modes

Local Client — VectaClient

Use VectaClient when you want to run everything locally (benchmarking, evaluation, and storage). All computation happens on your machine. Ideal for development, local LLMs, and air-gapped environments.

from vecta import VectaClient

client = VectaClient(
    data_source_connector=my_connector,
    openai_api_key="sk-...",  # needed for synthetic generation & generation metrics
)

API Client — VectaAPIClient

Use VectaAPIClient when you want the hosted platform to handle AI operations (benchmark generation, LLM-as-a-judge scoring) and store results in the Vecta dashboard.

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-vecta-api-key")

The API client evaluates your pipeline locally (your function runs on your machine), then uploads the results to the server for storage, visualization, and PDF export.

Supported Metrics

Semantic LevelRetrievalGeneration
Chunk-levelPrecision, Recall, F1Accuracy, Groundedness
Page-levelPrecision, Recall, F1Accuracy, Groundedness
Document-levelPrecision, Recall, F1Accuracy, Groundedness

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.