CSV Upload

Import existing Q&A datasets as benchmarks. Perfect for established test sets or human-curated questions.

CSV Format

Your CSV must include these columns:

question,answer,chunk_ids,page_nums,source_paths
"What is the API rate limit?","1000 requests per minute","chunk_1|chunk_2","15|16","api_docs.pdf"
"How do I authenticate?","Use JWT tokens","chunk_3","20","api_docs.pdf"

Required Columns

question: The question text
answer: The expected answer

Optional Columns

chunk_ids: Pipe-separated chunk IDs (e.g., "id1|id2|id3")
page_nums: Pipe-separated page numbers (e.g., "1|2|3")
source_paths: Pipe-separated document names (e.g., "doc1.pdf|doc2.pdf")
id: Unique benchmark entry ID (auto-generated if missing)

Upload with Cloud API

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-key")

# Upload CSV via web dashboard at:
# https://runvecta.com/platform/benchmarks

Load with Local SDK

from vecta import VectaClient

vecta = VectaClient(vector_db_connector=connector)
vecta.load_benchmark("your_benchmark.csv")

# Use in evaluations
results = vecta.evaluate_retrieval(my_retriever)

Example CSV

id,question,answer,chunk_ids,page_nums,source_paths
q1,"What are the supported file types?","PDF, DOCX, TXT, and MD files","chunk_101|chunk_102","5|6","user_guide.pdf"
q2,"How do I reset my password?","Click 'Forgot Password' on the login page","chunk_203","12","user_guide.pdf"
q3,"What is the maximum file size?","25 MB per file","chunk_304","8","user_guide.pdf"

CSV from Existing Benchmark

Export a benchmark to see the format:

# Export benchmark to CSV
csv_data = client.export_benchmark("benchmark-id")
with open("example.csv", "w") as f:
    f.write(csv_data)

Generation-Only Benchmarks

For generation-only evaluation (no retrieval), omit chunk/page/document columns:

question,answer
"What is 2+2?","4"
"Who wrote Hamlet?","William Shakespeare"

Common Issues

Delimiter errors:

Use commas for CSV columns
Use pipes (|) for multiple values within a column
Quote fields containing commas

Encoding:

Save as UTF-8
Handle special characters properly

Empty fields:

Use empty string for missing optional fields
Don't use null or None

Next Steps

Synthetic Generation → - Auto-generate benchmarks
Hugging Face → - Use popular datasets
Evaluations → - Run evaluations