Back to Docs
Benchmarks
CSV Upload
Import existing Q&A datasets from CSV
Last updated: August 19, 2025
Category: benchmarks
CSV Upload
Import existing Q&A datasets as benchmarks. Perfect for established test sets or human-curated questions.
CSV Format
Your CSV must include these columns:
question,answer,chunk_ids,page_nums,source_paths
"What is the API rate limit?","1000 requests per minute","chunk_1|chunk_2","15|16","api_docs.pdf"
"How do I authenticate?","Use JWT tokens","chunk_3","20","api_docs.pdf"
Required Columns
question
: The question textanswer
: The expected answer
Optional Columns
chunk_ids
: Pipe-separated chunk IDs (e.g.,"id1|id2|id3"
)page_nums
: Pipe-separated page numbers (e.g.,"1|2|3"
)source_paths
: Pipe-separated document names (e.g.,"doc1.pdf|doc2.pdf"
)id
: Unique benchmark entry ID (auto-generated if missing)
Upload with Cloud API
from vecta import VectaAPIClient
client = VectaAPIClient(api_key="your-key")
# Upload CSV via web dashboard at:
# https://runvecta.com/platform/benchmarks
Load with Local SDK
from vecta import VectaClient
vecta = VectaClient(vector_db_connector=connector)
vecta.load_benchmark("your_benchmark.csv")
# Use in evaluations
results = vecta.evaluate_retrieval(my_retriever)
Example CSV
id,question,answer,chunk_ids,page_nums,source_paths
q1,"What are the supported file types?","PDF, DOCX, TXT, and MD files","chunk_101|chunk_102","5|6","user_guide.pdf"
q2,"How do I reset my password?","Click 'Forgot Password' on the login page","chunk_203","12","user_guide.pdf"
q3,"What is the maximum file size?","25 MB per file","chunk_304","8","user_guide.pdf"
CSV from Existing Benchmark
Export a benchmark to see the format:
# Export benchmark to CSV
csv_data = client.export_benchmark("benchmark-id")
with open("example.csv", "w") as f:
f.write(csv_data)
Generation-Only Benchmarks
For generation-only evaluation (no retrieval), omit chunk/page/document columns:
question,answer
"What is 2+2?","4"
"Who wrote Hamlet?","William Shakespeare"
Common Issues
Delimiter errors:
- Use commas for CSV columns
- Use pipes (
|
) for multiple values within a column - Quote fields containing commas
Encoding:
- Save as UTF-8
- Handle special characters properly
Empty fields:
- Use empty string for missing optional fields
- Don't use
null
orNone
Next Steps
- Synthetic Generation → - Auto-generate benchmarks
- Hugging Face → - Use popular datasets
- Evaluations → - Run evaluations