Back to Docs
Benchmarks
Synthetic Generation
Auto-generate benchmarks from your data
Last updated: August 19, 2025
Category: benchmarks
Synthetic Generation
Auto-generate benchmarks from your knowledge base. Vecta creates questions that only your data can answer.
Quick Start
from vecta import VectaAPIClient
client = VectaAPIClient(api_key="your-key")
# Generate benchmark
benchmark = client.create_benchmark(
data_source_id="...", # ID of your uploaded files or vector DB
questions_count=10
)
How It Works
- Sample chunks from your knowledge base
- Generate questions using LLM - each question is specific to its chunk
- Find related chunks using semantic search
- Validate relevance with LLM judges
- Build ground truth with all relevant chunks/pages/documents

Figure: After generating a question from a chunk, an LLM judge determines whether any similar "candidate" chunks also help answer the generated question. If they do, then they should be included as cited sources in the benchmark dataset.
Configuration Options
benchmark = client.create_benchmark(
data_source_id=db["id"],
questions_count=10, # How many questions to generate
random_seed=42, # For reproducibility
wait_for_completion=True # Wait for generation to finish
)
Local SDK
from vecta import VectaClient
vecta = VectaClient(
vector_db_connector=connector,
openai_api_key="your-openai-key"
)
# Load data
vecta.load_knowledge_base()
# Generate
entries = vecta.generate_benchmark(
n_questions=100,
similarity_top_k=10, # Candidates to validate
random_seed=42
)
# Save
vecta.save_benchmark("my_benchmark.csv")
Quality Control
Vecta ensures high-quality benchmarks:
Question quality:
- Specific to your domain
- Requires chunk content to answer
- Not answerable from general knowledge
Ground truth accuracy:
- All relevant chunks identified via semantic search + LLM validation
- Multi-hop questions automatically detected
- Page and document attribution computed
Best Practices
Start with 5-10 questions:
# Quick iteration
benchmark = client.create_benchmark(
data_source_id=db["id"],
questions_count=5
)
Use random seeds for consistency:
# Same seed = same questions
benchmark = client.create_benchmark(
data_source_id=db["id"],
questions_count=10,
random_seed=42
)
Generate multiple benchmarks:
# Different random samples
for i in range(3):
benchmark = client.create_benchmark(
data_source_id=db["id"],
questions_count=50,
random_seed=i,
description=f"Benchmark variant {i+1}"
)
Requirements
- Minimum chunks: 25+ for reliable benchmarks
- Metadata: Chunks must include
source_path
andpage_nums
- OpenAI API key: Required for question generation (local SDK only)
Benchmark Size Guide
Questions | Use Case | Generation Time |
---|---|---|
5-10 | Quick testing, rapid iteration | 20 seconds |
50-100 | Standard evaluation | 2-5 minutes |
1000+ | SLA / Production monitoring | 2 hours |
Monitoring Generation
# Cloud API - check status
benchmark = client.get_benchmark(benchmark["id"])
print(f"Status: {benchmark['status']}") # generating, active, error
# View in dashboard
print(f"Dashboard: https://runvecta.com/platform/benchmarks/{benchmark['id']}")
Next Steps
- Evaluations → - Use your benchmark in evaluations
- CSV Upload → - Export and modify benchmarks