senkani_vision
Live
Replaces
API call
Savings $0/call
Vision model on Apple Silicon via MLX (Gemma). OCR, UI analysis, screenshot reading. Zero API cost.
Signature
senkani_mcp.call(tool="senkani_vision", args={...})
Behavior
Input: image path or base64 PNG. Output: text or structured JSON. Sub-500 ms on M-series. Fully offline after model download. Same `MLXInferenceLock` + memory-pressure semantics as `senkani_embed`.
Inputs
Name
Type
Default
Description
image
string
—
File path OR `data:image/png;base64,...` URI.
prompt
string
"Describe the image."
Vision prompt.
format
string
text
`text` or `json`.
Output
Text or a JSON object matching your prompt's schema.
Example
{"tool":"senkani_vision","args":{"image":"/tmp/screenshot.png","prompt":"Extract UI text."}}
Details
Gemma model loads on first call; idle drops on memory pressure.
See also
senkani_embed— Local text embeddings.Model Manager pane— Manage local models.
Source:
Sources/MCPServer/Tools/VisionTool.swift