senkani_embed
Live
Replaces
API call
Savings $0/call
Text embeddings on Apple Silicon via MLX. MiniLM-L6-v2 → 384-dim Float32. Zero API cost.
Signature
senkani_mcp.call(tool="senkani_embed", args={...})
Behavior
Runs on the Neural Engine through MLX. Sub-200 ms per call on M-series. Shared `MLXInferenceLock` FIFO-serializes every MLX call and drops loaded model containers on macOS memory-pressure warnings.
Inputs
Name
Type
Default
Description
texts
array<string>
—
Batch of strings to embed.
normalize
boolean
true
L2-normalize the output vectors.
Output
Array of 384-dim Float32 vectors, one per input.
Example
{"tool":"senkani_embed","args":{"texts":["orders","payments"]}}
Details
First call loads the model (~80 MB) into unified memory; subsequent calls reuse it. Idle models drop on memory pressure.
Fully offline once the model is downloaded. No outbound traffic.
See also
senkani_vision— Local vision inference via Gemma.Model Manager pane— Download and inspect local models.
Source:
Sources/MCPServer/Tools/EmbedTool.swift + Sources/MLX/