Multimodal Models Only: Image processing requires models with vision capabilities. Currently, only Mistral Small 3.1 24B supports image inputs. Other models (DeepSeek, Llama, Qwen) are text-only and cannot process images.
See the Model Catalog for complete model specifications and multimodal capabilities.
Image processing works through the chat/completions endpoint using base64-encoded images. Images are sent as data URLs in the message content alongside your text prompt.
There are several ways to convert your images to base64 format:
Copy
Ask AI
# Convert image to base64base64 -i image.jpg -o image_base64.txt# Or use it directly in your terminalbase64 image.jpg | pbcopy # Copies to clipboard on macOS