TL;DR: How to set up private RAG with Verba + Tinfoil in 5 steps:
- Clone the Tinfoil fork of Verba from GitHub.
- Install dependencies and set up Weaviate (local or cloud).
- Configure Tinfoil API keys for both embeddings and chat models.
- Import your documents through Verba’s interface.
- Start chatting with your private knowledge base!
Introduction
Verba is an open-source Retrieval Augmented Generation (RAG) application that lets you chat with your own documents using AI. The Tinfoil fork extends Verba to use Tinfoil’s private inference API. This integration brings together Verba for document ingestion and RAG pipeline management, Weaviate as your vector database for semantic search, and Tinfoil’s confidential computing infrastructure for private embeddings and chat completions.Prerequisites
Before you begin, make sure you have a Tinfoil API key from the Tinfoil Dashboard, Git installed to clone the repository, and Docker for running the containerized Weaviate deployment.You’re billed for all usage of the Tinfoil Inference API. See Tinfoil pricing for current rates.
Installation and Setup
Step 1: Clone the Tinfoil Verba Fork
The Tinfoil fork includes pre-configured integrations for Tinfoil’s API endpoints:Step 2: Set Tinfoil API key
Step 3: Bring up Docker Compose
Choosing Your Models
The compose file ships withllama3-3-70b (chat) and nomic-embed-text (embeddings) pre-wired through Tinfoil-CLI proxies. To swap models, edit the OPENAI_MODEL, OPENAI_EMBED_MODEL, and the llm-proxy / embed-proxy --repo / --host flags in docker-compose.yml, then docker-compose up -d again.
See our chat models and embedding models pages for the available models and their capabilities.
Running Verba
docker-compose up -d brings up Verba (on port 8000), Weaviate, and the Tinfoil proxies that handle attested inference and embeddings. Open http://localhost:8000 in your browser.
Using Verba with Tinfoil
Document Import and Processing
Navigate to the “Import Data” section to upload individual files, import entire directories, or pull content from URLs. Verba supports PDF, DOCX, TXT, MD, and HTML files. Verba processes documents through an automated pipeline that extracts text, chunks content into smaller segments, generates embeddings, and stores vectors in Weaviate. The chunking step splits long documents into manageable pieces that can be retrieved without overwhelming the context window. You can customize these processing settings in the Verba UI. Chunk size determines segment size in tokens (roughly equivalent to words). 512 tokens works well for most use cases. Use 256 tokens for more precise retrieval of dense technical content, or 1024 tokens to capture larger concepts. Chunk overlap ensures important information isn’t lost at chunk boundaries. Setting this to around 20% of your chunk size creates a buffer where adjacent chunks share content. With 512-token chunks, a 100-token overlap means each chunk shares its last 100 tokens with the next chunk. Verba generates embeddings using Tinfoil’snomic-embed-text model within confidential computing enclaves. These embeddings are stored in Weaviate as a searchable vector database.

