Back to Docs
Data Sources

File Stores

Ingest PDFs, docs, and other files directly

Last updated: August 19, 2025
Category: data-sources

File Store Connectors

Upload documents directly to Vecta. We'll handle processing, chunking, and indexing.

Quick Example

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-key")

# Upload files from your computer
data_source = client.upload_local_files(
    file_paths=[
        "docs/manual.pdf",
        "docs/guide.docx",
        "docs/faq.txt"
    ]
)

# Create benchmark from uploaded files
benchmark = client.create_benchmark(
    data_source_id=data_source["id"],
    questions_count=50
)

Supported File Types

  • PDF - .pdf
  • Word - .doc, .docx
  • Text - .txt, .md
  • More formats coming soon

Local SDK

from vecta import VectaClient
from vecta.connectors.file_store_connector import FileStoreConnector

connector = FileStoreConnector(
    file_paths=[
        "manual.pdf",
        "guide.docx"
    ],
    base_path="/path/to/files"
)

vecta = VectaClient(
    vector_db_connector=connector,
    openai_api_key="your-key"
)

vecta.load_knowledge_base()

How It Works

  1. Upload files to Vecta
  2. Automatic processing - We extract text and preserve structure
  3. Smart chunking - Documents split into semantic chunks
  4. Auto-indexing - Chunks stored with metadata (filename, page numbers)

Chunking Strategy

Files are chunked page-by-page using thepipe:

# Each chunk represents one page
{
    "id": "manual.pdf_0_abc123",
    "content": "Page content here...",
    "metadata": {
        "source_path": "manual.pdf",
        "page_nums": 0,  # Zero-indexed page number
        "file_path": "manual.pdf"
    }
}

Limitations

  • No semantic search - File stores don't support similarity search
  • For benchmarks only - Use vector databases for evaluation
  • Page-level chunks - One chunk per page

Use Cases

✅ Good for:

  • Quick benchmark generation from documents
  • Testing with new document sets
  • Prototyping without database setup

❌ Not for:

  • Maximum evaluation granularity
  • Evaluating different chunking strategies
  • Retrieval evaluation

Next Steps

To see chunk-level evaluation metrics, connect to a vector database (see Vector DB Connectors).

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.