Back to Docs
Benchmarks
Overview
What benchmarks are and how the lifecycle works
Last updated: August 20, 2025
Category: benchmarks
Benchmarks
A benchmark is a dataset of question-answer pairs with ground-truth citations that you evaluate your RAG system against. Benchmarks are the foundation of every evaluation in Vecta.
Benchmark Entries
Each benchmark entry (BenchmarkEntry) contains:
from vecta import BenchmarkEntry
entry = BenchmarkEntry(
id="unique-id",
question="What is the maximum dosage of ibuprofen?",
answer="The maximum daily dosage is 1200mg for OTC use.",
chunk_ids=["chunk_42", "chunk_87"], # ground-truth chunks
page_nums=[3, 4], # optional
source_paths=["drug_guide.pdf"], # optional
)
| Field | Required | Description |
|---|---|---|
id | Auto-generated | Unique identifier |
question | ✅ | The query to evaluate |
answer | ✅ | Expected correct answer |
chunk_ids | ❌ | Ground-truth chunk IDs (required for retrieval evaluation) |
page_nums | ❌ | Page numbers for page-level metrics |
source_paths | ❌ | Document names for document-level metrics |
Benchmark Lifecycle
Benchmarks follow a simple lifecycle:
- Draft — Created but not yet populated with entries (e.g., waiting for synthetic generation)
- Active — Populated with entries and ready for evaluation
- Archived — No longer in active use
Only active benchmarks can be used for evaluations.
Three Ways to Create Benchmarks
1. Synthetic Generation
Point Vecta at a connected data source and it will automatically generate questions, answers, and ground-truth citations using an LLM. See Synthetic Generation.
2. CSV Upload
Import an existing Q&A dataset from a CSV file. See CSV Upload.
3. Hugging Face Import
Pull standard research datasets like MS MARCO or GPQA Diamond. See Hugging Face.
Managing Benchmarks
API Client
from vecta import VectaAPIClient
client = VectaAPIClient()
# List all benchmarks
benchmarks = client.list_benchmarks()
# Get a specific benchmark
bm = client.get_benchmark("benchmark-id")
# Download entries for local use
entries = client.download_benchmark("benchmark-id")
# Export as CSV
csv_data = client.export_benchmark("benchmark-id")
# Delete
client.delete_benchmark("benchmark-id")
Platform UI
From the Benchmarks dashboard you can:
- View all benchmarks with status, question count, and linked data source
- Click into a benchmark to see individual entries
- Export to CSV
- Create evaluations directly from a benchmark
Next Steps
- CSV Upload — Import existing datasets
- Hugging Face — Use standard research datasets
- Synthetic Generation — Auto-generate from your data
- Evaluations — Run evaluations against your benchmarks