Back to Docs
Benchmarks

Overview

What benchmarks are and how the lifecycle works

Last updated: August 20, 2025
Category: benchmarks

Benchmarks

A benchmark is a dataset of question-answer pairs with ground-truth citations that you evaluate your RAG system against. Benchmarks are the foundation of every evaluation in Vecta.

Benchmark Entries

Each benchmark entry (BenchmarkEntry) contains:

from vecta import BenchmarkEntry

entry = BenchmarkEntry(
    id="unique-id",
    question="What is the maximum dosage of ibuprofen?",
    answer="The maximum daily dosage is 1200mg for OTC use.",
    chunk_ids=["chunk_42", "chunk_87"],       # ground-truth chunks
    page_nums=[3, 4],                          # optional
    source_paths=["drug_guide.pdf"],           # optional
)
FieldRequiredDescription
idAuto-generatedUnique identifier
questionThe query to evaluate
answerExpected correct answer
chunk_idsGround-truth chunk IDs (required for retrieval evaluation)
page_numsPage numbers for page-level metrics
source_pathsDocument names for document-level metrics

Benchmark Lifecycle

Benchmarks follow a simple lifecycle:

  1. Draft — Created but not yet populated with entries (e.g., waiting for synthetic generation)
  2. Active — Populated with entries and ready for evaluation
  3. Archived — No longer in active use

Only active benchmarks can be used for evaluations.

Three Ways to Create Benchmarks

1. Synthetic Generation

Point Vecta at a connected data source and it will automatically generate questions, answers, and ground-truth citations using an LLM. See Synthetic Generation.

2. CSV Upload

Import an existing Q&A dataset from a CSV file. See CSV Upload.

3. Hugging Face Import

Pull standard research datasets like MS MARCO or GPQA Diamond. See Hugging Face.

Managing Benchmarks

API Client

from vecta import VectaAPIClient

client = VectaAPIClient()

# List all benchmarks
benchmarks = client.list_benchmarks()

# Get a specific benchmark
bm = client.get_benchmark("benchmark-id")

# Download entries for local use
entries = client.download_benchmark("benchmark-id")

# Export as CSV
csv_data = client.export_benchmark("benchmark-id")

# Delete
client.delete_benchmark("benchmark-id")

Platform UI

From the Benchmarks dashboard you can:

  • View all benchmarks with status, question count, and linked data source
  • Click into a benchmark to see individual entries
  • Export to CSV
  • Create evaluations directly from a benchmark

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.