Hugging Face Datasets

Vecta ships with importers for popular Hugging Face evaluation datasets, so you can start benchmarking without generating your own data.

Available Datasets

MS MARCO

Real Bing search queries paired with passages and human-written answers. Ideal for retrieval + generation evaluation.

Source: microsoft/ms_marco
Default import size: 100 questions
Includes: Passages (as chunks), selected passages, queries, and human answers
Best for: Testing retrieval precision and grounded generation

GPQA Diamond

Graduate-level science questions (physics, chemistry, biology) requiring multi-step reasoning. Ideal for generation-only evaluation.

Source: Idavidrein/gpqa (gpqa_diamond config)
Default import size: 60 questions
Includes: Questions and correct answers (no passages)
Best for: Testing LLM reasoning and accuracy without retrieval

Importing from the Platform

Navigate to Benchmarks
Click Import from Hugging Face
Select a dataset (MS MARCO or GPQA Diamond)
Confirm the import

For MS MARCO, Vecta creates both a data source (containing the passages as chunks) and a benchmark (containing the Q&A pairs with ground-truth citations). For GPQA Diamond, only a benchmark is created since it's a generation-only dataset.

Importing from the SDK

Use the BenchmarkDatasetImporter class:

from vecta import BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()

# MS MARCO — returns (chunks, benchmark_entries)
chunks, entries = importer.import_msmarco(
    split="test",
    max_items=100,
)

print(f"Chunks: {len(chunks)}")
print(f"Benchmark entries: {len(entries)}")

# GPQA Diamond — returns (empty chunks, benchmark_entries)
chunks, entries = importer.import_gpqa_diamond(
    split="train",
    max_items=60,
)

# chunks will be empty for generation-only datasets
print(f"Benchmark entries: {len(entries)}")

Using Imported Data with the Local Client

from vecta import VectaClient, BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()
chunks, entries = importer.import_msmarco(max_items=50)

# Create a Vecta client and set the benchmark entries directly
vecta = VectaClient(data_source_connector=None)
vecta.benchmark_entries = entries

# Now evaluate
results = vecta.evaluate_generation_only(
    my_generation_function,
    evaluation_name="msmarco-gen-test",
)

Authentication

Some Hugging Face datasets require authentication. Set the HUGGINGFACE_HUB_TOKEN environment variable:

export HUGGINGFACE_HUB_TOKEN="hf_..."

Or log in interactively:

huggingface-cli login

Next Steps

Synthetic Generation — Generate benchmarks from your own data
Evaluations — Run evaluations against imported benchmarks