Intelligence is hot-swappable.
Why settle for one-size-fits-all security? The same proprietary eBPF engine that runs the Exfiltration Gate now hot-swaps its on-host SLM at the ioctl boundary — HIPAA, LATAM Health, DoD, Sovereign AI, Robotics, PII, Code Sentinel, or your own LoRA. Local SLM Governance with sub-2 ms context-aware latency. Zero downtime. Air-gap delivered.
The rack-mount.
Eight cartridges. One active at a time. Tap a bay to hot-swap the persona that scores every suspect ioctl on the host.
Most AI security tools ship a single, frozen detection model. Arca.Vision treats the SLM as a cartridge: the kernel boundary stays put, the on-host inference gate stays put, and the persona — the thing that knows what your sector counts as a leak — slides in or out. PHI signals are not PCI signals are not weapons-systems CUI. The persona changes; the engine does not.
// interactive · tap a bay to degauss-swap the active cartridge · LED indicates load state
// all 8 cartridges plotted against the 2.0 ms context-aware budget · live cursor sweeps the gauge
Context-aware latency.
Even with complex domain personas, the gate decision stays under 2 ms p99.
Every cartridge ships as a Q4_K_M GGUF (or a LoRA over a shared base model) sized to fit a single sys_enter_ioctl budget. Stage 1 — the kernel-side heuristic — runs in sub-microsecond time and only forwards qualifying ioctls to the persona. Stage 2 — the SLM verdict — is greedy-decoded on host, with the result back in the gate’s decision loop before the next ring-buffer slot lands.
Three things that make
hot-swap actually safe.
The Persona Switchboard isn't model-routing in userspace. It's a kernel-adjacent inference gate orchestrated by our Rust control plane, with strict memory isolation between observer and observed.
Aya-orchestrated load · zero downtime
SLM weights are orchestrated by our Rust-based control plane and served to the local inference gate over a loopback channel. Hot-swap is a pointer flip, not a process restart — the eBPF probes never detach. We've validated swap latency below the 2 ms context-aware budget across all eight cartridges.
Strict memory isolation · observer ≠ observed
Each persona runs in a strictly isolated memory space, separated from the workload the Sentry is governing. The observer cannot be compromised by the observed: the SLM lives behind the same kernel boundary as the rest of the gate, and a compromised CUDA process has no read/write path to the persona weights.
Model agnostic · GGUF, ExLlamaV2, LoRA
Cartridges ship as signed bundles. Out of the box we support llama-cpp-2 (GGUF, the default), ExLlamaV2 for higher-throughput personas, and LoRA adapters that reuse a base model already loaded on the host. All inference is local — no cloud round-trip, air-gap deployments are first-class.
Eight cartridges.
One eBPF kernel.
One cartridge per regulated sector — plus an empty slot you can fill with your own LoRA. Click a cartridge to read the use case.
Tuned for PHI patterns, ICD-10 codes, NPI numbers, and EHR exfil signatures. Catches PHI smuggling at the cudaMemcpyDeviceToHost boundary before the bytes leave VRAM.
Bilingual PHI scoring with Mexican CURP, Brazilian CPF, and LATAM clinical record templates baked into the prompt. Air-gap-deployed across MX, BR, AR, and CO health networks.
Air-gapped LoRA fine-tuned on classified-handling guidance. Recognizes CUI markings, weapons-system telemetry, and cleared-contractor egress patterns. Signed by Arca engineering.
Sovereign-cloud aware. Watches citizen PII patterns across SSA, IRS, DMV, and EU eIDAS attestation chains. Designed for federal civilian and StateRAMP deployments.
Functional-safety persona. Flags actuator-bound ioctls, watchdog skips, and CAN-bus exfil patterns. Sub-millisecond decision budget — the safety loop cannot wait.
Card numbers, SSN, IBAN, BIK, account routing, and global financial PII patterns. The default cartridge for trading desks, fintech, and any cluster touching customer data.
Embedding-aware persona. Detects model-weight dumps, proprietary algorithm leakage, and source-code exfil — the failure mode that loses you the company, not the lawsuit.
Empty cartridge slot. Co-design with our engineers using your audit logs, your regulator language, your domain. Signed bundle, air-gap delivered.
Two personas.
One multi-tenant cluster.
A scenario you can run today: a single GPU fleet serving regulated tenants under different rule sets, with a different persona enforced per namespace.
Dynamic Compliance & Shielding
A multi-tenant H100 cluster serves Healthcare workloads in Namespace-A and Finance workloads in Namespace-B from the same physical fleet.
The Sentry loads HIPAA Guardian on Namespace-A and PII Redactor on Namespace-B from the same kernel attach. Each tenant gets policy enforcement that understands intent — not just regex. Hallucinated data leaks die at the cudaMemcpyDeviceToHost boundary, before bytes ever leave VRAM.
Namespace-A · HIPAA Guardian
PHI exfil scoring · HIPAA §164.312 mapping
Namespace-B · PII Redactor
PCI-DSS / GDPR / CCPA pattern scoring
Shared kernel · isolated personas
Two SLMs · one Sentry · zero cross-tenant leakage
Hot-swap on policy change
No agent restart · no Helm rollover · no downtime
Pick a cartridge.
Or co-design a custom LoRA with us.
Every persona is signed and air-gap delivered. Custom cartridges are built with our engineering team using your audit logs and your regulator language — usually 4–6 weeks from kickoff to a signed bundle on the host.