Detection Tiers

Ki!‘s privacy engine runs entirely on your device. Detection happens across six tiers in sequence — each tier catches what the previous one missed, with increasing sophistication and cost. Every tier completes before the prompt is sent.

Pipeline Overview

Your prompt
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  Tier 0.1 — Base64 / JWT Evasion Decoder            │
│  Tier 0   — Dictionary (10,000+ names)              │
│  Tier 0.5 — Entropy Scanner (API keys, secrets)     │
│  Tier 1   — Rule Engine (50+ structured formats)    │
│  Tier 1.1 — Phone Shield (libphonenumber)           │
│  Tier 1.2 — Address Shield (EN/FR/ES/DE)            │
│  Tier 1.5 — Greeting / Signature Context            │
│  Tier 2   — Local SLM NER (Sovereign)               │
└─────────────────────────────────────────────────────┘
    │
    ▼
Masked prompt → Cloud LLM

Tier 0.1 — Base64 & JWT Evasion

Decodes Base64-encoded strings and JWT payloads before scanning. This closes the most common evasion vector: PII hidden inside encoded blobs (e.g., a Base64-encoded CSV or a JWT containing an email claim) passes through naive regex scanners undetected. Ki! decodes, scans, and re-encodes before the downstream tiers run.

Tier 0 — Dictionary (Instant)

A high-speed dictionary of 10,000+ common names using greedy longest-match. This is the fastest tier — pure in-memory lookup with no regex or AI overhead. It fires on full names, surnames, and common given names across EN/FR/ES/DE name corpora.

Any term you add to your custom dictionary in Settings fires here, before any network call.

Tier 0.5 — Entropy Scanner

Shannon entropy scoring on every token in the prompt. Strings above the entropy threshold (e.g., ghp_aBcD1234XyZ..., sk-live-..., base64-encoded keys) are flagged as high-confidence credentials even if they match no known pattern.

Catches: API keys, bearer tokens, private keys, long random identifiers.

Tier 1 — Rule Engine

Deterministic regex rules with mathematical validation:

Type	Validation method
Email	RFC 5321 syntax
IBAN	MOD-97 checksum
Credit card	Luhn algorithm
SSN (US)	Format + range check
PESEL, NIN, BSN, DNI, CPF…	Country-specific checksum
Phone	Passed to Tier 1.1
Street address	Passed to Tier 1.2

50+ national ID formats are covered across EU, US, UK, Brazil, and APAC jurisdictions.

Tier 1.1 — Phone Shield

Phone numbers are validated using libphonenumber rather than regex alone. This eliminates false positives on digit sequences (order numbers, product codes, timestamps) that match naive phone patterns but fail international validation. Only numbers that parse as real phone numbers in any supported country code are masked.

Tier 1.2 — Address Shield

A heuristic street address parser recognising EN, FR, ES, and DE address formats. Catches “23 rue de Rivoli, Paris 75001” and “4200 Wilson Blvd, Arlington VA” using structural grammar, not just keyword matching.

Tier 1.5 — Greeting & Signature Context

Detects names in salutations, sign-offs, and introductory phrases that the dictionary might miss:

“My name is Jean-Baptiste Dubois”
“Kind regards, Sophie Müller”
“Hi Dr. Chen, as discussed…”

Trigger phrases activate a context window that treats the following proper noun as a probable name regardless of dictionary coverage.

Tier 2 — Local SLM NER (Sovereign)

Available on the Sovereign tier. A small language model (SLM) runs locally — no network call — and performs semantic Named Entity Recognition over the full prompt. This catches entities that all previous tiers missed: unusual names, contextual PII, ambiguous references.

The SLM runs inside a sandboxed sidecar process. If it fails or times out (> 5 seconds), Ki! blocks the prompt (fail-closed) rather than sending unmasked text.

Token Format

Every detected value is replaced with a deterministic token:

[TYPE_xxxxxxxx]

TYPE is the PII category (PERSON, EMAIL, IBAN, PHONE, SSN, ADDRESS, CREDENTIAL, CUSTOM, …)
xxxxxxxx is an 8-character hex hash derived from the original value — consistent within a session so the LLM can refer back to the same entity

The mapping between token and original value is stored in your local Vault (vault.db). It never leaves your machine.

Fail-Closed Guarantee

If any tier of the pipeline encounters an error, Ki! blocks the prompt entirely. There is no fallback to sending the unmasked original. The sidecar health check must pass before any prompt is processed.

This behaviour is testable: Ki! ships with a fail-closed verification test in the egress log — kill the sidecar process and observe that the next prompt is blocked, not sent.