TL;DR: How to set up private RAG with Verba + Tinfoil in 5 steps:

  1. Clone the Tinfoil fork of Verba from GitHub.
  2. Install dependencies and set up Weaviate (local or cloud).
  3. Configure Tinfoil API keys for both embeddings and chat models.
  4. Import your documents through Verba’s interface.
  5. Start chatting with your private knowledge base!

Your documents and conversations never leave your secure environment or get exposed to third parties.

Introduction

Verba is an open-source Retrieval Augmented Generation (RAG) application that lets you chat with your own documents using AI. The Tinfoil fork extends Verba to work seamlessly with Tinfoil’s private inference API, ensuring your documents and conversations remain completely confidential.

This integration combines:

  • Verba: Document ingestion, chunking, and RAG pipeline management
  • Weaviate: Vector database for semantic search capabilities
  • Tinfoil: Private, confidential computing for embeddings and chat completions

The result is a fully private knowledge base where your sensitive documents never leave your controlled environment.

Prerequisites

Before starting, you’ll need:

  • Tinfoil API Key: Get yours free at tinfoil.sh
  • Git: To clone the repository
  • Docker: For containerized Weaviate deployment

You’re billed for all usage of the Tinfoil Inference API. See Tinfoil Pricing for current rates.

Security Warning: Never share your API key, avoid including it in version control, and never bundle it in client-side code.

Installation and Setup

Step 1: Clone the Tinfoil Verba Fork

The Tinfoil fork includes pre-configured integrations for Tinfoil’s API endpoints:

git clone https://github.com/tinfoilsh/Verba.git
cd Verba
git checkout tinfoil

Step 2: Set Tinfoil API key

export TINFOIL_API_KEY=xxx

Step 3: Bring up Docker Compose

docker-compose up -d

Configuration Options

Available Tinfoil Models

Verba can work with any of Tinfoil’s supported models:

Chat Models

  • deepseek-r1-70b: High-performance reasoning model
  • mistral-small-3-1-24b: Advanced multimodal model
  • llama3-3-70b: Multilingual dialogue model
  • qwen2-5-72b: Powerful multilingual model with superior programming and mathematical reasoning

Embedding Models

  • nomic-embed-text: High-quality text embeddings (recommended)

Running Verba

Start the Application

Launch Verba with your configuration:

verba start --port 8000 --host 0.0.0.0

Access Verba at http://localhost:8000

Using Verba with Tinfoil

Document Import and Processing

Verba processes your documents through several stages, all using Tinfoil’s private infrastructure:

1. Import Documents

Navigate to the “Import Data” section and add your files:

  • Supported Formats: PDF, DOCX, TXT, MD, HTML, and more
  • Upload Methods:
    • Single file upload
    • Bulk directory upload
    • URL import for web content
# Example: Bulk import from command line
python -m goldenverba.import --directory ./my_documents --chunk_size 512

2. Document Processing Pipeline

Verba automatically:

  1. Extracts Text: Parses document content
  2. Chunks Content: Splits into semantic segments (configurable size)
  3. Generates Embeddings: Uses Tinfoil’s Nomic model privately
  4. Stores Vectors: Saves to Weaviate for semantic search

All embedding generation happens through Tinfoil’s confidential computing enclaves.

3. Configuration Options

Customize processing in the Verba UI:

  • Chunk Size: 256, 512, 1024 tokens (512 recommended)
  • Chunk Overlap: Overlap between chunks (20% recommended)
  • Embedding Model: Pre-configured for Tinfoil Nomic embeddings

Querying Your Knowledge Base

1. Basic Chat Interface

Ask questions about your documents:

Q: "What are the key findings in the Q3 financial report?"
A: Based on your uploaded Q3 financial report, the key findings include...

2. Advanced Query Configuration

Configure RAG parameters:

  • Retrieval Count: Number of relevant chunks to retrieve (3-5 recommended)
  • Temperature: Response creativity (0.1-0.3 for factual responses)
  • Max Tokens: Response length limit
  • Context Window: How much retrieved content to include

3. Source Attribution

Verba provides source citations for every response:

  • Document names and page references
  • Confidence scores for retrieved chunks
  • Direct links to source content

Scaling Options

  • Horizontal Scaling: Run multiple Verba instances behind load balancer
  • Weaviate Clustering: Scale vector database with Weaviate clusters
  • Document Preprocessing: Separate ingestion and query workloads

Comparison with Traditional RAG

FeatureTraditional RAGVerba + Tinfoil
PrivacyData exposed to AI providersEnd-to-end confidential
SecurityTrust-basedHardware-verified
ComplianceLimited controlFull data sovereignty

FAQ