Hugging Face
Import MS MARCO, GPQA Diamond, and other datasets
Hugging Face Datasets
Vecta ships with importers for popular Hugging Face evaluation datasets, so you can start benchmarking without generating your own data.
Available Datasets
MS MARCO
Real Bing search queries paired with passages and human-written answers. Ideal for retrieval + generation evaluation.
- Source: microsoft/ms_marco
- Default import size: 100 questions
- Includes: Passages (as chunks), selected passages, queries, and human answers
- Best for: Testing retrieval precision and grounded generation
GPQA Diamond
Graduate-level science questions (physics, chemistry, biology) requiring multi-step reasoning. Ideal for generation-only evaluation.
- Source: Idavidrein/gpqa (gpqa_diamond config)
- Default import size: 60 questions
- Includes: Questions and correct answers (no passages)
- Best for: Testing LLM reasoning and accuracy without retrieval
Importing from the Platform
- Navigate to Benchmarks
- Click Import from Hugging Face
- Select a dataset (MS MARCO or GPQA Diamond)
- Confirm the import
For MS MARCO, Vecta creates both a data source (containing the passages as chunks) and a benchmark (containing the Q&A pairs with ground-truth citations). For GPQA Diamond, only a benchmark is created since it's a generation-only dataset.
Importing from the SDK
Use the BenchmarkDatasetImporter class:
from vecta import BenchmarkDatasetImporter
importer = BenchmarkDatasetImporter()
# MS MARCO — returns (chunks, benchmark_entries)
chunks, entries = importer.import_msmarco(
split="test",
max_items=100,
)
print(f"Chunks: {len(chunks)}")
print(f"Benchmark entries: {len(entries)}")
# GPQA Diamond — returns (empty chunks, benchmark_entries)
chunks, entries = importer.import_gpqa_diamond(
split="train",
max_items=60,
)
# chunks will be empty for generation-only datasets
print(f"Benchmark entries: {len(entries)}")
Using Imported Data with the Local Client
from vecta import VectaClient, BenchmarkDatasetImporter
importer = BenchmarkDatasetImporter()
chunks, entries = importer.import_msmarco(max_items=50)
# Create a Vecta client and set the benchmark entries directly
vecta = VectaClient(data_source_connector=None)
vecta.benchmark_entries = entries
# Now evaluate
results = vecta.evaluate_generation_only(
my_generation_function,
evaluation_name="msmarco-gen-test",
)
Authentication
Some Hugging Face datasets require authentication. Set the HUGGINGFACE_HUB_TOKEN environment variable:
export HUGGINGFACE_HUB_TOKEN="hf_..."
Or log in interactively:
huggingface-cli login
Next Steps
- Synthetic Generation — Generate benchmarks from your own data
- Evaluations — Run evaluations against imported benchmarks