Overview
Group evaluations, attach metadata, and visualize comparisons
Experiments
An experiment groups related evaluations so you can compare configuration changes side by side. For example, sweeping top_k values, testing different embedding models, or comparing prompt strategies.
Creating an Experiment
API Client
from vecta import VectaAPIClient
client = VectaAPIClient()
experiment = client.create_experiment(
name="Chunk Size Sweep",
description="Testing 256 / 512 / 1024 chunk sizes",
)
print(f"Experiment ID: {experiment['id']}")
Platform UI
Experiments are managed through the Experiments dashboard where you can create, rename, and delete experiments and view their grouped evaluations.
Running Evaluations Within an Experiment
Pass the experiment_id and metadata to any evaluation call:
for chunk_size in [256, 512, 1024]:
results = client.evaluate_retrieval(
benchmark_id="your-benchmark-id",
retrieval_function=make_retriever(chunk_size=chunk_size),
evaluation_name=f"chunk-{chunk_size}",
experiment_id=experiment["id"],
metadata={
"chunk_size": chunk_size,
"model": "text-embedding-3-small",
"top_k": 10,
},
)
print(f"Chunk size {chunk_size}: F1 = {results.chunk_level.f1_score:.2%}")
The metadata is stored alongside each evaluation and used for comparison and plotting.
Viewing Experiment Results
API Client
exp_detail = client.get_experiment(experiment["id"])
print(f"Evaluations: {len(exp_detail['evaluations'])}")
print(f"Metadata keys: {exp_detail['metadata_keys']}")
for ev in exp_detail["evaluations"]:
print(f" {ev['name']}: chunk F1 = {ev.get('chunk_level', {}).get('f1_score', 'N/A')}")
Plotting
Use the built-in plotting module to visualize results across metadata values:
from vecta import plot_experiment, get_metadata_keys
exp_detail = client.get_experiment(experiment["id"])
evaluations = exp_detail["evaluations"]
# See available metadata keys
keys = get_metadata_keys(evaluations)
print(f"Available keys: {keys}")
# e.g., ["chunk_size", "model", "top_k"]
# Plot — auto-detects value type:
# Numeric values → line chart
# String values → grouped bar chart
plot_experiment(evaluations, metadata_key="chunk_size")
plot_experiment(evaluations, metadata_key="model")
Managing Experiments
# List all experiments
experiments = client.list_experiments()
# Rename
from vecta import RenameRequest
client.rename_experiment(experiment["id"], RenameRequest(name="New Name"))
# Delete (evaluations are un-linked, not deleted)
client.delete_experiment(experiment["id"])
Note: Deleting an experiment does not delete its evaluations. The evaluations are un-linked and remain visible in the evaluations list.
Example: Comparing Embedding Models
experiment = client.create_experiment(name="Embedding Model Comparison")
for model_name in ["text-embedding-3-small", "text-embedding-3-large", "e5-large"]:
retriever = build_retriever(embedding_model=model_name)
results = client.evaluate_retrieval(
benchmark_id="bm-id",
retrieval_function=retriever,
evaluation_name=f"embed-{model_name}",
experiment_id=experiment["id"],
metadata={"model": model_name},
)
# Visualize
exp = client.get_experiment(experiment["id"])
plot_experiment(exp["evaluations"], metadata_key="model")
Next Steps
- CI/CD Integration — Automate experiment runs in your pipeline
- Evaluations — Learn about the evaluation types