Overview
Understanding data source types and connectors
Data Sources
A data source is your connection to the knowledge base that your RAG system retrieves from. Vecta supports two categories: vector databases and file stores.
Vector Databases
Connect an existing vector database where your chunks are already stored and embedded. Vecta reads the chunks and metadata through a connector.
Supported databases:
| Database | SDK Connector | Server Managed |
|---|---|---|
| ChromaDB (local) | ChromaLocalConnector | ✅ (upload via API) |
| ChromaDB (cloud) | ChromaCloudConnector | ❌ (reads directly) |
| Pinecone | PineconeConnector | ❌ (reads directly) |
| pgvector | PgVectorConnector | ❌ (reads directly) |
| Weaviate | WeaviateConnector | ❌ (reads directly) |
| Azure Cosmos DB | AzureCosmosConnector | ❌ (reads directly) |
| Databricks | DatabricksConnector | ❌ (reads directly) |
| LangChain | LangChainVectorStoreConnector | ✅ (upload via API) |
| LlamaIndex | LlamaIndexConnector | ✅ (upload via API) |
Every vector-database connector requires a VectorDBSchema that tells Vecta how to extract id, content, source_path, and page_nums from each record. See Accessor Syntax.
File Stores
Upload files directly and Vecta will ingest, chunk, and embed them automatically using markitdown. Supported formats include PDF, DOCX, PPTX, XLSX, TXT, HTML, and more.
| Store | SDK Connector | Notes |
|---|---|---|
| Local files | FileStoreConnector | Ingested with markitdown |
| Upload via API | VectaAPIClient.upload_local_files() | Server handles embedding |
File store connectors do not require a VectorDBSchema — Vecta handles the schema internally because it controls how the chunks are created and stored.
Connecting via the Platform
In the Data Sources dashboard you can:
- Upload files — Drag and drop PDF, DOCX, and other files
- Connect a vector database — Provide connection credentials and configure the schema through the UI
- Import from Hugging Face — Pull standard datasets (see Hugging Face)
Connecting via the SDK
API Client (hosted)
from vecta import VectaAPIClient
client = VectaAPIClient()
# Upload local files
ds = client.upload_local_files(file_paths=["report.pdf", "manual.docx"])
# Connect Pinecone
ds = client.connect_pinecone(
api_key="pk-...",
index_name="my-index",
)
# Connect Chroma Cloud
ds = client.connect_chroma_cloud(
tenant="my-tenant",
database_name="my-db",
api_key="ck-...",
collection_name="documents",
)
Local Client
from vecta import VectaClient, ChromaLocalConnector, VectorDBSchema
schema = VectorDBSchema(
id_accessor="id",
content_accessor="document",
metadata_accessor="metadata",
source_path_accessor="metadata.source_path",
page_nums_accessor="metadata.page_nums",
)
connector = ChromaLocalConnector(
client=chroma_client,
collection_name="my_docs",
schema=schema,
)
vecta = VectaClient(data_source_connector=connector)
vecta.load_knowledge_base()
Next Steps
- File Store Connectors — Details on file ingestion
- Vector DB Connectors — Connector reference for every database
- Accessor Syntax — How to write schema paths