Concept · Explanation

Compression layer

Senkani compresses at three places, not one. Input compression shrinks tool outputs before the LLM sees them; redundancy elimination stops redundant tool calls from running; output compression (terse mode) shrinks what the model emits. All three run in the same binary, toggleable per pane.

The three compression surfaces

A naïve "LLM token reducer" would just filter tool outputs. Senkani does that, but it's not enough on its own. A session that reads the same file 5 times costs 5× no matter how compressed each read is. A session that deterministically runs npm test three times in a row is wasting 3× the tokens, with or without filters.

Surface 1 — input compression (FCSIT F + I)

Every tool output that comes back to the model runs through the relevant reducer:

senkani_exec applies 24+ command-specific rules. npm install: 428 lines → 2. git clone: 312 → 4. git diff: adaptive truncation + dedup.
senkani_read returns outlines (symbols + line numbers) by default, not full content. Full mode requires full: true.
senkani_search returns ~50 tokens from the local tree-sitter index instead of ~5,000 from grepping.
senkani_fetch returns only a single symbol's line range.

Surface 2 — redundancy elimination (hook Layer 3)

The hook relay keeps a short-term memory of what the agent has already asked. When the same question comes in again, the hook answers from cache — or denies with a message pointing at the cached result. See three-layer stack for the five denial patterns.

Surface 3 — output compression (FCSIT T)

Terse mode shrinks the model's output, not just its input. Two layers: a system-prompt injection that tells the model to strip filler, and a post-filter that strips filler phrases from tool outputs on the way in (so the model doesn't learn to mimic them).

Why all three

Each surface addresses a different waste mode:

Surface 1 — raw tool output is noisy. Compress the noise.
Surface 2 — agents re-ask. Don't run the call at all.
Surface 3 — models are verbose. Shrink the response.

Skipping any one leaves measurable tokens on the table. The fixture benchmark (senkani bench) isolates each surface; the Savings Test pane shows live-session breakdowns by source.

What it's not

Senkani is not a lossy summarizer. It doesn't paraphrase your code; it doesn't drop information the model legitimately needs. Every elided line is either byte-level redundancy (repeated ANSI codes, blank-line runs, progress bars) or structural redundancy (outline instead of body when the body wasn't asked for). If the agent needs the full file, it passes full: true.