Detection Tiers
Ki!‘s privacy engine runs entirely on your device. Detection happens across six tiers in sequence — each tier catches what the previous one missed, with increasing sophistication and cost. Every tier completes before the prompt is sent.
Pipeline Overview
Section titled “Pipeline Overview”Your prompt │ ▼┌─────────────────────────────────────────────────────┐│ Tier 0.1 — Base64 / JWT Evasion Decoder ││ Tier 0 — Dictionary (10,000+ names) ││ Tier 0.5 — Entropy Scanner (API keys, secrets) ││ Tier 1 — Rule Engine (50+ structured formats) ││ Tier 1.1 — Phone Shield (libphonenumber) ││ Tier 1.2 — Address Shield (EN/FR/ES/DE) ││ Tier 1.5 — Greeting / Signature Context ││ Tier 2 — Local SLM NER (Sovereign) │└─────────────────────────────────────────────────────┘ │ ▼Masked prompt → Cloud LLMTier 0.1 — Base64 & JWT Evasion
Section titled “Tier 0.1 — Base64 & JWT Evasion”Decodes Base64-encoded strings and JWT payloads before scanning. This closes the most common evasion vector: PII hidden inside encoded blobs (e.g., a Base64-encoded CSV or a JWT containing an email claim) passes through naive regex scanners undetected. Ki! decodes, scans, and re-encodes before the downstream tiers run.
Tier 0 — Dictionary (Instant)
Section titled “Tier 0 — Dictionary (Instant)”A high-speed dictionary of 10,000+ common names using greedy longest-match. This is the fastest tier — pure in-memory lookup with no regex or AI overhead. It fires on full names, surnames, and common given names across EN/FR/ES/DE name corpora.
Any term you add to your custom dictionary in Settings fires here, before any network call.
Tier 0.5 — Entropy Scanner
Section titled “Tier 0.5 — Entropy Scanner”Shannon entropy scoring on every token in the prompt. Strings above the entropy threshold (e.g., ghp_aBcD1234XyZ..., sk-live-..., base64-encoded keys) are flagged as high-confidence credentials even if they match no known pattern.
Catches: API keys, bearer tokens, private keys, long random identifiers.
Tier 1 — Rule Engine
Section titled “Tier 1 — Rule Engine”Deterministic regex rules with mathematical validation:
| Type | Validation method |
|---|---|
| RFC 5321 syntax | |
| IBAN | MOD-97 checksum |
| Credit card | Luhn algorithm |
| SSN (US) | Format + range check |
| PESEL, NIN, BSN, DNI, CPF… | Country-specific checksum |
| Phone | Passed to Tier 1.1 |
| Street address | Passed to Tier 1.2 |
50+ national ID formats are covered across EU, US, UK, Brazil, and APAC jurisdictions.
Tier 1.1 — Phone Shield
Section titled “Tier 1.1 — Phone Shield”Phone numbers are validated using libphonenumber rather than regex alone. This eliminates false positives on digit sequences (order numbers, product codes, timestamps) that match naive phone patterns but fail international validation. Only numbers that parse as real phone numbers in any supported country code are masked.
Tier 1.2 — Address Shield
Section titled “Tier 1.2 — Address Shield”A heuristic street address parser recognising EN, FR, ES, and DE address formats. Catches “23 rue de Rivoli, Paris 75001” and “4200 Wilson Blvd, Arlington VA” using structural grammar, not just keyword matching.
Tier 1.5 — Greeting & Signature Context
Section titled “Tier 1.5 — Greeting & Signature Context”Detects names in salutations, sign-offs, and introductory phrases that the dictionary might miss:
- “My name is Jean-Baptiste Dubois”
- “Kind regards, Sophie Müller”
- “Hi Dr. Chen, as discussed…”
Trigger phrases activate a context window that treats the following proper noun as a probable name regardless of dictionary coverage.
Tier 2 — Local SLM NER (Sovereign)
Section titled “Tier 2 — Local SLM NER (Sovereign)”Available on the Sovereign tier. A small language model (SLM) runs locally — no network call — and performs semantic Named Entity Recognition over the full prompt. This catches entities that all previous tiers missed: unusual names, contextual PII, ambiguous references.
The SLM runs inside a sandboxed sidecar process. If it fails or times out (> 5 seconds), Ki! blocks the prompt (fail-closed) rather than sending unmasked text.
Token Format
Section titled “Token Format”Every detected value is replaced with a deterministic token:
[TYPE_xxxxxxxx]TYPEis the PII category (PERSON,EMAIL,IBAN,PHONE,SSN,ADDRESS,CREDENTIAL,CUSTOM, …)xxxxxxxxis an 8-character hex hash derived from the original value — consistent within a session so the LLM can refer back to the same entity
The mapping between token and original value is stored in your local Vault (vault.db). It never leaves your machine.
Fail-Closed Guarantee
Section titled “Fail-Closed Guarantee”If any tier of the pipeline encounters an error, Ki! blocks the prompt entirely. There is no fallback to sending the unmasked original. The sidecar health check must pass before any prompt is processed.
This behaviour is testable: Ki! ships with a fail-closed verification test in the egress log — kill the sidecar process and observe that the next prompt is blocked, not sent.