Skip to main content
Reference · Information

senkani_embed

Live Replaces API call Savings $0/call

Text embeddings on Apple Silicon via MLX. MiniLM-L6-v2 → 384-dim Float32. Zero API cost.

Signature

senkani_mcp.call(tool="senkani_embed", args={...})

Behavior

Runs on the Neural Engine through MLX. Sub-200 ms per call on M-series. Shared `MLXInferenceLock` FIFO-serializes every MLX call and drops loaded model containers on macOS memory-pressure warnings.

Inputs

Name
Type
Default
Description
texts
array<string>
Batch of strings to embed.
normalize
boolean
true
L2-normalize the output vectors.

Output

Array of 384-dim Float32 vectors, one per input.

Example

{"tool":"senkani_embed","args":{"texts":["orders","payments"]}}

Details

First call loads the model (~80 MB) into unified memory; subsequent calls reuse it. Idle models drop on memory pressure.

Fully offline once the model is downloaded. No outbound traffic.

See also

Source: Sources/MCPServer/Tools/EmbedTool.swift + Sources/MLX/