Back to Docs
Benchmarks

Synthetic Generation

Auto-generate benchmarks from your knowledge base

Last updated: August 20, 2025
Category: benchmarks

Synthetic Benchmark Generation

Vecta can automatically generate question-answer pairs grounded in your knowledge base. The synthetic benchmark includes multi-hop retrievals, edge cases, and ground-truth citations — providing comprehensive test coverage for your RAG system.

How It Works

  1. Sampling — Vecta randomly samples chunks from your data source (controlled by random_seed for reproducibility).
  2. Question generation — An LLM generates a question that the sampled chunk can answer, along with a canonical answer.
  3. Citation discovery — Vecta performs a similarity sweep across your knowledge base and runs parallel LLM-as-a-judge calls to find all chunks that can answer the question — not just the original chunk. This ensures ground-truth recall is comprehensive.
  4. Assembly — Each entry is assembled with question, answer, chunk_ids, page_nums, and source_paths.

Quality check: For every synthetic Q&A pair, the SDK runs a panel of LLM-as-a-judge calls. Any chunk that the judges deem relevant is automatically merged into the benchmark's ground-truth citations, ensuring your downstream recall/precision numbers are accurate.

Using the API Client

from vecta import VectaAPIClient

client = VectaAPIClient()

benchmark = client.create_benchmark(
    data_source_id="your-data-source-id",
    questions_count=100,
    random_seed=42,
    description="Q4 knowledge base eval",
)

print(f"Benchmark ID: {benchmark['id']}")
print(f"Status: {benchmark['status']}")
print(f"Questions generated: {benchmark['questions_count']}")
ParameterTypeDefaultDescription
data_source_idstrrequiredID of the connected data source
questions_countint100Number of Q&A pairs to generate
random_seedintNoneSeed for reproducible generation
descriptionstrNoneOptional description
wait_for_completionboolTrueBlock until generation finishes

The create_benchmark method creates the benchmark and triggers generation in a single call. When wait_for_completion=True, it polls until the benchmark status becomes active.

Using the Local Client

from vecta import VectaClient, ChromaLocalConnector, VectorDBSchema

schema = VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

connector = ChromaLocalConnector(
    client=chroma_client,
    collection_name="my_docs",
    schema=schema,
)

vecta = VectaClient(
    data_source_connector=connector,
    openai_api_key="sk-...",  # required for LLM generation
)

# Load chunks from the data source
chunks = vecta.load_knowledge_base()
print(f"Loaded {len(chunks)} chunks")

# Generate synthetic benchmark
entries = vecta.generate_benchmark(
    n_questions=50,
    random_seed=42,
)

print(f"Generated {len(entries)} benchmark entries")

# Save for later use
vecta.save_benchmark("my_benchmark.csv")

Local Client Parameters

ParameterTypeDefaultDescription
n_questionsintrequiredNumber of Q&A pairs
random_seedintNoneSeed for reproducibility

Using the Platform

  1. Navigate to Benchmarks
  2. Click Create Benchmark
  3. Select a connected data source
  4. Set the number of questions and random seed
  5. Click Generate Benchmark

The generation runs server-side. Once complete, the benchmark status changes from draft to active and you can view individual entries.

Requirements

  • Data source must be connected and have chunks available
  • Minimum chunks — The data source must have at least as many chunks as the requested question count
  • OpenAI API key — Required for the LLM calls (configured server-side for the platform, or passed to VectaClient for local use)

Saving and Loading

# Save benchmark to CSV
vecta.save_benchmark("benchmark.csv")

# Load benchmark in another session
vecta.load_benchmark("benchmark.csv")

# Or download from the API
entries = client.download_benchmark("benchmark-id")

Next Steps

  • CSV Upload — Import existing datasets instead
  • Evaluations — Run evaluations against your benchmark

Need Help?

Can't find what you're looking for? Our team is here to help.