Back to Docs
Data Sources
File Stores
Ingest PDFs, docs, and other files directly
Last updated: August 19, 2025
Category: data-sources
File Store Connectors
Upload documents directly to Vecta. We'll handle processing, chunking, and indexing.
Quick Example
from vecta import VectaAPIClient
client = VectaAPIClient(api_key="your-key")
# Upload files from your computer
data_source = client.upload_local_files(
file_paths=[
"docs/manual.pdf",
"docs/guide.docx",
"docs/faq.txt"
]
)
# Create benchmark from uploaded files
benchmark = client.create_benchmark(
data_source_id=data_source["id"],
questions_count=50
)
Supported File Types
- PDF -
.pdf
- Word -
.doc
,.docx
- Text -
.txt
,.md
- More formats coming soon
Local SDK
from vecta import VectaClient
from vecta.connectors.file_store_connector import FileStoreConnector
connector = FileStoreConnector(
file_paths=[
"manual.pdf",
"guide.docx"
],
base_path="/path/to/files"
)
vecta = VectaClient(
vector_db_connector=connector,
openai_api_key="your-key"
)
vecta.load_knowledge_base()
How It Works
- Upload files to Vecta
- Automatic processing - We extract text and preserve structure
- Smart chunking - Documents split into semantic chunks
- Auto-indexing - Chunks stored with metadata (filename, page numbers)
Chunking Strategy
Files are chunked page-by-page using thepipe:
# Each chunk represents one page
{
"id": "manual.pdf_0_abc123",
"content": "Page content here...",
"metadata": {
"source_path": "manual.pdf",
"page_nums": 0, # Zero-indexed page number
"file_path": "manual.pdf"
}
}
Limitations
- No semantic search - File stores don't support similarity search
- For benchmarks only - Use vector databases for evaluation
- Page-level chunks - One chunk per page
Use Cases
✅ Good for:
- Quick benchmark generation from documents
- Testing with new document sets
- Prototyping without database setup
❌ Not for:
- Maximum evaluation granularity
- Evaluating different chunking strategies
- Retrieval evaluation
Next Steps
To see chunk-level evaluation metrics, connect to a vector database (see Vector DB Connectors).
Next Steps
- Vector Databases → - For production systems
- Benchmarks → - Generate test datasets