Skip to main content
Reference · Information

senkani_vision

Live Replaces API call Savings $0/call

Vision model on Apple Silicon via MLX (Gemma). OCR, UI analysis, screenshot reading. Zero API cost.

Signature

senkani_mcp.call(tool="senkani_vision", args={...})

Behavior

Input: image path or base64 PNG. Output: text or structured JSON. Sub-500 ms on M-series. Fully offline after model download. Same `MLXInferenceLock` + memory-pressure semantics as `senkani_embed`.

Inputs

Name
Type
Default
Description
image
string
File path OR `data:image/png;base64,...` URI.
prompt
string
"Describe the image."
Vision prompt.
format
string
text
`text` or `json`.

Output

Text or a JSON object matching your prompt's schema.

Example

{"tool":"senkani_vision","args":{"image":"/tmp/screenshot.png","prompt":"Extract UI text."}}

Details

Gemma model loads on first call; idle drops on memory pressure.

See also

Source: Sources/MCPServer/Tools/VisionTool.swift