Back to Docs
Data Sources

Overview

Understanding data source types and connectors

Last updated: August 20, 2025
Category: data-sources

Data Sources

A data source is your connection to the knowledge base that your RAG system retrieves from. Vecta supports two categories: vector databases and file stores.

Vector Databases

Connect an existing vector database where your chunks are already stored and embedded. Vecta reads the chunks and metadata through a connector.

Supported databases:

DatabaseSDK ConnectorServer Managed
ChromaDB (local)ChromaLocalConnector✅ (upload via API)
ChromaDB (cloud)ChromaCloudConnector❌ (reads directly)
PineconePineconeConnector❌ (reads directly)
pgvectorPgVectorConnector❌ (reads directly)
WeaviateWeaviateConnector❌ (reads directly)
Azure Cosmos DBAzureCosmosConnector❌ (reads directly)
DatabricksDatabricksConnector❌ (reads directly)
LangChainLangChainVectorStoreConnector✅ (upload via API)
LlamaIndexLlamaIndexConnector✅ (upload via API)

Every vector-database connector requires a VectorDBSchema that tells Vecta how to extract id, content, source_path, and page_nums from each record. See Accessor Syntax.

File Stores

Upload files directly and Vecta will ingest, chunk, and embed them automatically using markitdown. Supported formats include PDF, DOCX, PPTX, XLSX, TXT, HTML, and more.

StoreSDK ConnectorNotes
Local filesFileStoreConnectorIngested with markitdown
Upload via APIVectaAPIClient.upload_local_files()Server handles embedding

File store connectors do not require a VectorDBSchema — Vecta handles the schema internally because it controls how the chunks are created and stored.

Connecting via the Platform

In the Data Sources dashboard you can:

  1. Upload files — Drag and drop PDF, DOCX, and other files
  2. Connect a vector database — Provide connection credentials and configure the schema through the UI
  3. Import from Hugging Face — Pull standard datasets (see Hugging Face)

Connecting via the SDK

API Client (hosted)

from vecta import VectaAPIClient

client = VectaAPIClient()

# Upload local files
ds = client.upload_local_files(file_paths=["report.pdf", "manual.docx"])

# Connect Pinecone
ds = client.connect_pinecone(
    api_key="pk-...",
    index_name="my-index",
)

# Connect Chroma Cloud
ds = client.connect_chroma_cloud(
    tenant="my-tenant",
    database_name="my-db",
    api_key="ck-...",
    collection_name="documents",
)

Local Client

from vecta import VectaClient, ChromaLocalConnector, VectorDBSchema

schema = VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

connector = ChromaLocalConnector(
    client=chroma_client,
    collection_name="my_docs",
    schema=schema,
)

vecta = VectaClient(data_source_connector=connector)
vecta.load_knowledge_base()

Next Steps

Need Help?

Can't find what you're looking for? Our team is here to help.