CloakPipe — Platform: Proxy, Vault, Policy, Audit

Surface 01 · Proxy

The Rust proxy.
Open source core.

A Rust-native reverse proxy that intercepts AI API calls, pseudonymizes outbound data, and rehydrates inbound responses in real-time — under 50 ms p95, including the full detection pipeline. Eight Rust crates compiled to a single native binary. Your application changes exactly one line: the base URL.

OpenAI-compatible API

Change your base URL from api.openai.com to your CloakPipe endpoint. No other code changes. Works with LangChain, LlamaIndex, CrewAI, curl, and any OpenAI-compatible client.

Streaming SSE rehydration

LLMs stream token-by-token. CloakPipe maintains a sliding window, detects pseudonymized tokens mid-stream, looks up real values from the vault, and splices them in — without buffering or breaking the SSE contract.

Multi-provider routing

Route to OpenAI, Anthropic, Google, AWS Bedrock, Azure OpenAI, or any self-hosted model via vLLM or Ollama. Apply different masking policies per provider — strict for closed models, lighter or bypassed for self-hosted.

Native MCP server

A Model Context Protocol server exposing mask_text, mask_file, unmask_in_context, and scan_directory tools. AI agents can call CloakPipe directly as a tool from Claude Desktop, Cursor, or any MCP client.

Pluggable, tiered detection pipeline

Detection is deliberately commoditized — CloakPipe does not lock you into a single model. The proxy runs a tiered pipeline where each tier escalates only as needed: what regex catches deterministically, the neural models never see; what one model misses, the ensemble catches. OpenAI Privacy Filter scores 96% on synthetic benchmarks but Tonic.ai measured 18–65% on real-world EHR notes, call transcripts, and loan contracts. No single model catches everything — which is why the pipeline tiers.

DETECTION PIPELINE · TIERED · CUMULATIVE < 50 MS P95

Catch by escalation, not by one model

Each request flows through the tiers in order. Cheap, deterministic checks fire first; neural detection escalates only for unstructured text and custom entities.

Tier 1 → Tier 4 cumulative · < 50 ms p95

T1

Regex + checksum

Cards (Luhn) · IBAN (mod-97) · SSN range · ABA routing · email · URL · IP. Rust-native.

< 1 ms

T2

Privacy Filter

OpenAI 1.5B (50M active) on ONNX. Apache 2.0, 128K ctx. Names, addresses, dates, secrets.

30–50 ms

T3

GLiNER2-PII

300M params, multilingual, zero-shot. Fires for custom entity types — MRNs, case IDs, policy numbers.

40–80 ms

T4

Ensemble

Run multiple backends and merge for maximum recall. For when missing one entity is unacceptable.

opt-in

Surface 02 · Vault

Reversible by design.
You hold the keys.

Why reversibility matters →

An encrypted tokenization vault that stores the mapping between real sensitive data and pseudonymized tokens. The vault is the source of truth — and the reason pseudonymization is reversible under policy, not a one-way redaction. Without a vault, data goes in and nothing useful comes out.

AES-256-GCM at rest

Every real value stored in the vault is encrypted with AES-256-GCM. Nothing real exists outside the vault — the clean prompt that reaches the provider contains zero original sensitive data.

Customer-managed keys

Bring your own keys via AWS KMS, GCP Cloud KMS, Azure Key Vault, or HashiCorp Vault Transit. Envelope encryption with customer-managed root keys — you never hand CloakPipe the keys to your customer's data.

Deterministic + format-preserving

The same input always produces the same token within a tenant's scope, preserving entity consistency across conversations and batches. Format-preserving FF1 (NIST SP 800-38G) keeps cards Luhn-valid and emails email-shaped.

Per-tenant isolation

Each customer gets their own vault namespace with their own encryption keys. No cross-tenant data access is possible at the cryptographic level.

TTL & session scoping

Tokens can expire after a single conversation, after a configurable TTL, or persist indefinitely for ongoing workflows. Scope the lifetime to the sensitivity of the workload.

Automatic key rotation

Encryption keys rotate on a configurable schedule without disrupting active tokens. Old tokens remain decryptable; new tokens use the latest key. No downtime, no migration.

Surface 03 · Policy

Code, not config.
Versioned in Git.

Every decision is logged →

A policy engine that defines what gets masked, for which models, for which teams, and who can unmask. Policies are code, versioned in Git, and enforced automatically on every request — backed by OPA (Open Policy Agent) or Cedar for sub-millisecond authorization decisions.

Entity policies

"Mask all patient names when routing to external models." "Block financial amounts from reaching any provider." "Allow internal model calls unmasked." Per entity type, per action.

Provider policies

Maximum masking for OpenAI, Anthropic, and Google. Bypass for self-hosted vLLM. Mask only financial data for Venice TEE models. Per provider, per masking level.

Team / role policies

"Legal: mask everything, no exceptions. Engineering: mask PII, allow code. Data science: unmasked access to internal models only." Per RBAC role, mapped from SAML / OIDC.

Context-based access (CBAC)

Unmasking decided at runtime from who is asking, their role, the data's sensitivity, and the workflow context. A sales agent cannot unmask medical records; a supervising physician can — but only during an active case review.

Custom entity definitions

Define domain-specific entity types via regex, keyword lists, or NER labels: medical record numbers, case docket IDs, insurance policy numbers, internal employee IDs — whatever your domain requires.

Policy-as-code

Defined in YAML, backed by OPA or Cedar. Every change is a versioned commit; every evaluation is an audit event. Policies are testable, reviewable, and deployable through standard CI/CD.

Example policy — patient data for external models

# rules — evaluated top to bottom on every request - name: mask_patient_data_for_external_models when: provider: [openai, anthropic, google] entity_type: [PERSON, DIAGNOSIS, MEDICATION, MRN] action: pseudonymize # self-hosted models stay inside the perimeter — pass through - name: allow_internal_models_unmasked when: provider: [internal-vllm, self-hosted] action: passthrough # only clinicians may reverse clinical entities — CBAC enforced - name: restrict_unmask_to_physicians when: action: unmask entity_type: [DIAGNOSIS, MEDICATION] require_role: [physician, clinical_admin]

Surface 04 · Audit

Every decision
logged. Never raw.

See the compliance posture →

A compliance and observability layer that records every privacy-relevant event, generates compliance evidence, and integrates with your existing monitoring stack. Audit logs never contain raw sensitive data — they record what types of data were processed and what actions were taken, not the values. The audit trail is itself privacy-safe.

What gets logged

Every request (timestamp, caller, source IP, destination provider), every detection event (entity types, confidence, model used), every masking action, every unmask request, and every policy evaluation — plus full latency metrics.

Never the raw values

The trail records that a DIAGNOSIS was masked for request req_8af9c2 — never the diagnosis itself. Evidence of control without becoming a second copy of the data you are protecting.

Compliance evidence on demand

Exportable evidence for HIPAA (PHI masked before processors), GDPR/DPDP (data minimization), SOC 2 Type II (access & encryption controls), EU AI Act (de-identification), and PCI-DSS (tokenized cardholder data).

OpenTelemetry-native

Structured traces, metrics, and logs from day one. Export to Datadog, Grafana, Splunk, Honeycomb, Prometheus, or any OTEL collector. Pre-built dashboards for detection rates, entity distribution, latency percentiles, and unmask patterns.

A privacy-safe trail, in practice

14:32:08 mask 7 entities · req_8af9c2 → gpt-5 healthcare-v3

14:32:09 detect T1 regex + T2 privacy-filter · F1 0.96 otel·trace

14:32:11 rehydrate 11 chunks · vault lookup 3.8 ms vault·prod

14:32:14 unmask deny role: sales · entity: MRN cbac-v1

14:32:18 unmask allow role: physician · DIAGNOSIS · case review cbac-v1

Architecture · The hot path

One request.
Under 50 ms.

Read the architecture docs →

Every prompt, response, and tool call passes the same per-request hot path: authenticate, evaluate policy, detect, pseudonymize and write to the vault, forward a clean prompt, stream back, rehydrate, and emit an audit event. All of it in Rust, transparently, with a sub-50 ms p95 target.

REQUEST FLOW · PER-REQUEST · < 50 MS TARGET

Auth → Policy → Detect → Vault → Forward → Rehydrate → Audit

Built on Rust / Axum / Tower with Tokio streams. Detection escalates through tiers; the vault makes the round-trip reversible; the audit log closes every request.

1 · Auth — JWT / mTLS verification at the proxy edge
2 · Policy — OPA evaluation: allow / deny plus per-entity rules
3 · Detection — T1 regex + checksum (<1ms) → T2 Privacy Filter (ONNX, 30–50ms) → T3 GLiNER2 for custom entities (optional)
4 · Pseudonymize + vault write — deterministic tokens encrypted at rest
5 · Forward — clean prompt to the LLM provider; zero real data leaves
6 · Stream + rehydrate — detect tokens in the SSE stream, vault lookup, authorize, splice real values back
7 · Audit — emit a privacy-safe event via OpenTelemetry, then return the response with real values restored

Core technology choices

Component	Technology	Why
Proxy runtime	Rust / Axum / Tower	Sub-millisecond overhead per request. Zero-cost abstractions. Memory safety without GC pauses.
ML inference	ONNX Runtime (ort)	Run Privacy Filter and GLiNER2 locally on CPU or GPU. No Python dependency in the hot path.
Tokenization	HF tokenizers (Rust)	Fast model-input preparation, shared across the detection pipeline.
Vault encryption	AES-256-GCM · AES-SIV · FF1	GCM for general encryption, SIV for deterministic tokens, FF1 (NIST SP 800-38G) for format-preserving values.
Key management	Vault Transit / cloud KMS	Envelope encryption. Customer-managed root keys. Automatic rotation.
Token registry	PostgreSQL (sqlx)	Deterministic token lookup with per-tenant isolation. Proven at scale.
Policy engine	OPA · Cedar	OPA: industry standard, Rego DSL, sub-millisecond decisions. Cedar: typed alternative for compile-time guarantees.
Observability	OpenTelemetry-rust	Traces, metrics, and logs over Tokio streams, exportable to any OTEL collector.

Deployment · Five topologies

Laptop to air-gapped.

See what each tier includes →

Same Rust binary. Same detection pipeline. Same vault encryption. Pick the topology that matches your security posture — from fully managed cloud to a fully offline air-gapped install with no network calls and no telemetry.

01

Managed cloud

Hosted at cloakpipe.co. Endpoint, dashboard, vault, audit logs all managed. Data processed in-transit only, never stored outside the vault. SOC 2 infrastructure.

SAAS

02

Docker

Single container or docker-compose on any Linux host. Customer controls all infrastructure, keys, and data. Open-source proxy plus a commercial license for platform features.

SINGLE BINARY

03

Kubernetes

Production Helm chart for K8s clusters. Horizontal autoscaling, rolling deploys, health checks — for teams running AI workloads on Kubernetes.

HELM · HPA

04

VPC / private

Deployed inside your AWS VPC, GCP VPC, or Azure VNet. No internet egress. CloakPipe engineering assists with deployment and configuration.

CUSTOMER CLOUD

05

Air-gapped

Fully offline. All detection models run via ONNX locally. No network calls. No telemetry. For defense, intelligence, and maximum-security healthcare environments.

OFFLINE · ONNX

Compliance · Posture

Built to be
deployable in audits.

Request compliance docs →

CloakPipe helps your AI application meet regulatory requirements — and the product itself meets the standards needed to be deployable in regulated environments. Each framework maps to what it unlocks for your customers.

Framework	CloakPipe status	What it enables for customers
SOC 2 Type II	In progress · Vanta	Cite CloakPipe's report in your own audits. Required for enterprise procurement in healthcare, finance, and legal.
HIPAA	BAA available	Demonstrate PHI is masked before reaching model providers. Fulfills the HIPAA de-identification safe harbor.
GDPR / DPDP	DPA template · Art. 25/32	Proof of data minimization. Answer data subject access and deletion requests from the audit trail. Data residency controls.
EU AI Act	High-risk · Aug 2026	Demonstrate personal data is de-identified before high-risk AI processing, with a human-oversight audit trail.
PCI-DSS	FPE tokenization	Process payment-related queries without exposing card numbers. No cardholder data stored in plaintext.
ISO 27001	Planned	Required for European and APAC enterprise procurement. ~70% control overlap with SOC 2.

Four surfaces.
One open core.

The Rust proxy.
Open source core.

OpenAI-compatible API

Streaming SSE rehydration

Multi-provider routing

Native MCP server

Pluggable, tiered detection pipeline

Reversible by design.
You hold the keys.

AES-256-GCM at rest

Customer-managed keys

Deterministic + format-preserving

Per-tenant isolation

TTL & session scoping

Automatic key rotation

Code, not config.
Versioned in Git.

Entity policies

Provider policies

Team / role policies

Context-based access (CBAC)

Custom entity definitions

Policy-as-code

Example policy — patient data for external models

Every decision
logged. Never raw.

What gets logged

Never the raw values

Compliance evidence on demand

OpenTelemetry-native

A privacy-safe trail, in practice

One request.
Under 50 ms.

Core technology choices

Laptop to air-gapped.

Built to be
deployable in audits.

Four surfaces.
One open core.

Four surfaces. One open core.

The Rust proxy.Open source core.

OpenAI-compatible API

Streaming SSE rehydration

Multi-provider routing

Native MCP server

Pluggable, tiered detection pipeline

Reversible by design.You hold the keys.

AES-256-GCM at rest

Customer-managed keys

Deterministic + format-preserving

Per-tenant isolation

TTL & session scoping

Automatic key rotation

Code, not config.Versioned in Git.

Entity policies

Provider policies

Team / role policies

Context-based access (CBAC)

Custom entity definitions

Policy-as-code

Example policy — patient data for external models

Every decisionlogged. Never raw.

What gets logged

Never the raw values

Compliance evidence on demand

OpenTelemetry-native

A privacy-safe trail, in practice

One request.Under 50 ms.

Core technology choices

Laptop to air-gapped.

Built to bedeployable in audits.

Four surfaces. One open core.

Four surfaces.
One open core.

The Rust proxy.
Open source core.

Reversible by design.
You hold the keys.

Code, not config.
Versioned in Git.

Every decision
logged. Never raw.

One request.
Under 50 ms.

Built to be
deployable in audits.

Four surfaces.
One open core.