Compression layer
Senkani compresses at three places, not one. Input compression shrinks tool outputs before the LLM sees them; redundancy elimination stops redundant tool calls from running; output compression (terse mode) shrinks what the model emits. All three run in the same binary, toggleable per pane.
The three compression surfaces
A naïve "LLM token reducer" would just filter tool outputs. Senkani does that, but it's not enough on its own. A session that reads the same file 5 times costs 5× no matter how compressed each read is. A session that deterministically runs npm test three times in a row is wasting 3× the tokens, with or without filters.
Surface 1 — input compression (FCSIT F + I)
Every tool output that comes back to the model runs through the relevant reducer:
senkani_execapplies 24+ command-specific rules.npm install: 428 lines → 2.git clone: 312 → 4.git diff: adaptive truncation + dedup.senkani_readreturns outlines (symbols + line numbers) by default, not full content. Full mode requiresfull: true.senkani_searchreturns ~50 tokens from the local tree-sitter index instead of ~5,000 from grepping.senkani_fetchreturns only a single symbol's line range.
Surface 2 — redundancy elimination (hook Layer 3)
The hook relay keeps a short-term memory of what the agent has already asked. When the same question comes in again, the hook answers from cache — or denies with a message pointing at the cached result. See three-layer stack for the five denial patterns.
Surface 3 — output compression (FCSIT T)
Terse mode shrinks the model's output, not just its input. Two layers: a system-prompt injection that tells the model to strip filler, and a post-filter that strips filler phrases from tool outputs on the way in (so the model doesn't learn to mimic them).
Why all three
Each surface addresses a different waste mode:
- Surface 1 — raw tool output is noisy. Compress the noise.
- Surface 2 — agents re-ask. Don't run the call at all.
- Surface 3 — models are verbose. Shrink the response.
Skipping any one leaves measurable tokens on the table. The fixture benchmark (senkani bench) isolates each surface; the Savings Test pane shows live-session breakdowns by source.
What it's not
Senkani is not a lossy summarizer. It doesn't paraphrase your code; it doesn't drop information the model legitimately needs. Every elided line is either byte-level redundancy (repeated ANSI codes, blank-line runs, progress bars) or structural redundancy (outline instead of body when the body wasn't asked for). If the agent needs the full file, it passes full: true.