Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt

Use this file to discover all available pages before exploring further.

Vision models understand images for visual tasks including image analysis, OCR, and screenshot-to-code generation.

Google DeepMind
Gemma 4 31B
gemma4-31b
Parameters: 31BContext: 256K tokensStrengths: Image understanding, object detection, document parsing, OCR, chart comprehension, and pointingBest for: Image analysis, document understanding, OCR tasks, and visual reasoning with built-in thinking modeConfiguration repo: tinfoilsh/confidential-gemma4-31b
Multimodal: Supports variable aspect ratios and configurable image token budgets for balancing speed and detail. See Image Processing Guide for usage examples.

Qwen
Qwen3-VL 30B
qwen3-vl-30b
Parameters: 30B (3B active)Context: 256K tokensStrengths: Vision-language understanding, GUI interaction, screenshot-to-code generation, spatial understanding, multilingual OCROCR Languages: Supports 32 languagesBest for: Image analysis, screenshot-to-code generation, OCR tasks, GUI automation, and vision-text understandingConfiguration repo: tinfoilsh/confidential-qwen3-vl-30b
Multimodal: Processes images with up to 256K context for long documents. See Image Processing Guide for usage examples.

Moonshot
Kimi K2.6
kimi-k2-6
Parameters: 1T total (32B activated)Context: 256K tokensStrengths: Image and video understanding, screenshot-to-code generation, visual reasoning, design-to-code workflows, parallel agent orchestrationBest for: Converting designs to code, visual analysis, multimodal agentic tasks, and workflows that combine long-context reasoning with visual inputsConfiguration repo: tinfoilsh/confidential-kimi-k2-6
Vision + Language: Supports text, image, and video inputs. See Image Processing Guide for usage examples.