File Store Connectors

Upload documents directly to Vecta. We'll handle processing, chunking, and indexing.

Quick Example

from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-key")

# Upload files from your computer
data_source = client.upload_local_files(
    file_paths=[
        "docs/manual.pdf",
        "docs/guide.docx",
        "docs/faq.txt"
    ]
)

# Create benchmark from uploaded files
benchmark = client.create_benchmark(
    data_source_id=data_source["id"],
    questions_count=50
)

Supported File Types

PDF - .pdf
Word - .doc, .docx
Text - .txt, .md
More formats coming soon

Local SDK

from vecta import VectaClient
from vecta.connectors.file_store_connector import FileStoreConnector

connector = FileStoreConnector(
    file_paths=[
        "manual.pdf",
        "guide.docx"
    ],
    base_path="/path/to/files"
)

vecta = VectaClient(
    vector_db_connector=connector,
    openai_api_key="your-key"
)

vecta.load_knowledge_base()

How It Works

Upload files to Vecta
Automatic processing - We extract text and preserve structure
Smart chunking - Documents split into semantic chunks
Auto-indexing - Chunks stored with metadata (filename, page numbers)

Chunking Strategy

Files are chunked page-by-page using thepipe:

# Each chunk represents one page
{
    "id": "manual.pdf_0_abc123",
    "content": "Page content here...",
    "metadata": {
        "source_path": "manual.pdf",
        "page_nums": 0,  # Zero-indexed page number
        "file_path": "manual.pdf"
    }
}

Limitations

No semantic search - File stores don't support similarity search
For benchmarks only - Use vector databases for evaluation
Page-level chunks - One chunk per page

Use Cases

✅ Good for:

Quick benchmark generation from documents
Testing with new document sets
Prototyping without database setup

❌ Not for:

Maximum evaluation granularity
Evaluating different chunking strategies
Retrieval evaluation

Next Steps

To see chunk-level evaluation metrics, connect to a vector database (see Vector DB Connectors).

Next Steps

Vector Databases → - For production systems
Benchmarks → - Generate test datasets

File Stores

File Store Connectors

Quick Example

Supported File Types

Local SDK

How It Works

Chunking Strategy

Limitations

Use Cases

Next Steps

Next Steps

Need Help?