Back to Docs
Benchmarks

Hugging Face

Import MS MARCO, GPQA Diamond, and other datasets

Last updated: August 20, 2025
Category: benchmarks

Hugging Face Datasets

Vecta ships with importers for popular Hugging Face evaluation datasets, so you can start benchmarking without generating your own data.

Available Datasets

MS MARCO

Real Bing search queries paired with passages and human-written answers. Ideal for retrieval + generation evaluation.

  • Source: microsoft/ms_marco
  • Default import size: 100 questions
  • Includes: Passages (as chunks), selected passages, queries, and human answers
  • Best for: Testing retrieval precision and grounded generation

GPQA Diamond

Graduate-level science questions (physics, chemistry, biology) requiring multi-step reasoning. Ideal for generation-only evaluation.

  • Source: Idavidrein/gpqa (gpqa_diamond config)
  • Default import size: 60 questions
  • Includes: Questions and correct answers (no passages)
  • Best for: Testing LLM reasoning and accuracy without retrieval

Importing from the Platform

  1. Navigate to Benchmarks
  2. Click Import from Hugging Face
  3. Select a dataset (MS MARCO or GPQA Diamond)
  4. Confirm the import

For MS MARCO, Vecta creates both a data source (containing the passages as chunks) and a benchmark (containing the Q&A pairs with ground-truth citations). For GPQA Diamond, only a benchmark is created since it's a generation-only dataset.

Importing from the SDK

Use the BenchmarkDatasetImporter class:

from vecta import BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()

# MS MARCO — returns (chunks, benchmark_entries)
chunks, entries = importer.import_msmarco(
    split="test",
    max_items=100,
)

print(f"Chunks: {len(chunks)}")
print(f"Benchmark entries: {len(entries)}")
# GPQA Diamond — returns (empty chunks, benchmark_entries)
chunks, entries = importer.import_gpqa_diamond(
    split="train",
    max_items=60,
)

# chunks will be empty for generation-only datasets
print(f"Benchmark entries: {len(entries)}")

Using Imported Data with the Local Client

from vecta import VectaClient, BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()
chunks, entries = importer.import_msmarco(max_items=50)

# Create a Vecta client and set the benchmark entries directly
vecta = VectaClient(data_source_connector=None)
vecta.benchmark_entries = entries

# Now evaluate
results = vecta.evaluate_generation_only(
    my_generation_function,
    evaluation_name="msmarco-gen-test",
)

Authentication

Some Hugging Face datasets require authentication. Set the HUGGINGFACE_HUB_TOKEN environment variable:

export HUGGINGFACE_HUB_TOKEN="hf_..."

Or log in interactively:

huggingface-cli login

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.