Reference · Information

`senkani_vision`

Live Replaces API call Savings $0/call

Vision model on Apple Silicon via MLX (Gemma). OCR, UI analysis, screenshot reading. Zero API cost.

Signature

senkani_mcp.call(tool="senkani_vision", args={...})

Behavior

Input: image path or base64 PNG. Output: text or structured JSON. Sub-500 ms on M-series. Fully offline after model download. Same `MLXInferenceLock` + memory-pressure semantics as `senkani_embed`.

Inputs

Name

Type

Default

Description

image

string

—

File path OR `data:image/png;base64,...` URI.

prompt

string

"Describe the image."

Vision prompt.

format

string

text

`text` or `json`.

Output

Text or a JSON object matching your prompt's schema.

Example

{"tool":"senkani_vision","args":{"image":"/tmp/screenshot.png","prompt":"Extract UI text."}}

Details

Gemma model loads on first call; idle drops on memory pressure.

senkani_vision

Signature

Behavior

Inputs

Output

Example

Details

See also

`senkani_vision`