Back to Docs
Benchmarks

Synthetic Generation

Auto-generate benchmarks from your data

Last updated: August 19, 2025
Category: benchmarks

Synthetic Generation

Auto-generate benchmarks from your knowledge base. Vecta creates questions that only your data can answer.

Quick Start

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-key")

# Generate benchmark
benchmark = client.create_benchmark(
    data_source_id="...",  # ID of your uploaded files or vector DB
    questions_count=10
)

How It Works

  1. Sample chunks from your knowledge base
  2. Generate questions using LLM - each question is specific to its chunk
  3. Find related chunks using semantic search
  4. Validate relevance with LLM judges
  5. Build ground truth with all relevant chunks/pages/documents
Flow showing how Vecta validates chunk relevance during synthetic benchmark creation

Figure: After generating a question from a chunk, an LLM judge determines whether any similar "candidate" chunks also help answer the generated question. If they do, then they should be included as cited sources in the benchmark dataset.

Configuration Options

benchmark = client.create_benchmark(
    data_source_id=db["id"],
    questions_count=10,          # How many questions to generate
    random_seed=42,               # For reproducibility
    wait_for_completion=True      # Wait for generation to finish
)

Local SDK

from vecta import VectaClient

vecta = VectaClient(
    vector_db_connector=connector,
    openai_api_key="your-openai-key"
)

# Load data
vecta.load_knowledge_base()

# Generate
entries = vecta.generate_benchmark(
    n_questions=100,
    similarity_top_k=10,  # Candidates to validate
    random_seed=42
)

# Save
vecta.save_benchmark("my_benchmark.csv")

Quality Control

Vecta ensures high-quality benchmarks:

Question quality:

  • Specific to your domain
  • Requires chunk content to answer
  • Not answerable from general knowledge

Ground truth accuracy:

  • All relevant chunks identified via semantic search + LLM validation
  • Multi-hop questions automatically detected
  • Page and document attribution computed

Best Practices

Start with 5-10 questions:

# Quick iteration
benchmark = client.create_benchmark(
    data_source_id=db["id"],
    questions_count=5
)

Use random seeds for consistency:

# Same seed = same questions
benchmark = client.create_benchmark(
    data_source_id=db["id"],
    questions_count=10,
    random_seed=42
)

Generate multiple benchmarks:

# Different random samples
for i in range(3):
    benchmark = client.create_benchmark(
        data_source_id=db["id"],
        questions_count=50,
        random_seed=i,
        description=f"Benchmark variant {i+1}"
    )

Requirements

  • Minimum chunks: 25+ for reliable benchmarks
  • Metadata: Chunks must include source_path and page_nums
  • OpenAI API key: Required for question generation (local SDK only)

Benchmark Size Guide

QuestionsUse CaseGeneration Time
5-10Quick testing, rapid iteration20 seconds
50-100Standard evaluation2-5 minutes
1000+SLA / Production monitoring2 hours

Monitoring Generation

# Cloud API - check status
benchmark = client.get_benchmark(benchmark["id"])
print(f"Status: {benchmark['status']}")  # generating, active, error

# View in dashboard
print(f"Dashboard: https://runvecta.com/platform/benchmarks/{benchmark['id']}")

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.