UNCERTIFIED
Attestable, per-tenant-isolated agent memory over MCP. Isolation is a sealed compartment, not a WHERE-filter. Inspect the wall, then check the countersignature.
Static dry certified drawing by default. Flip the toggle to run the breach test — nothing animates until you interact.
mem0, Zep and Supermemory scope tenants by metadata — a tenant_id / user_id / group_id filter over one shared store. There is no bulkhead: both tenants sit in the same hull. When a scope is misconfigured or forgotten, the other tenant’s data is still physically present in the same index. It doesn’t fail loudly. It floods.
One gateway, one shared userId, a healthcare-clinic-SOP agent and a consumer personal-assistant agent behind it. Ask the personal assistant “who do I contact after hours?” and the clinic’s marker — ‘Dr. Alvarez, 555-0142’ — floods back into the wrong compartment. On a shared store, CTRR = 100.0%. FLOODED.
See the breach live: use the toggle in the hero above (or the scorecard on Sheet 05) to flip to shared store and watch COMPARTMENT_A flood into B.
Tenant identity is never a request parameter. It is a signed tid claim inside the OAuth 2.1 access token — no MCP tool accepts a tenant argument, so a caller can only ever touch its own compartment. Any body or header tenant field is ignored and logged. Under that identity, three independent enforced layers form the hull:
Each tenant gets its own Postgres schema (t_<sha256[:16]>) owned by a per-tenant NOLOGIN role, with its own tables, own HNSW index and own tsvector. Every op runs under SET ROLE with GRANTs scoped to that schema only. A cross-schema read raises InsufficientPrivilege at the database — there is no shared index to filter.
A FORCE RLS policy binds even the table owner to tenant = current_setting('app.tenant'). Defence-in-depth backstop — never the boundary, always the second wall behind the first.
Memory content is AES-256-GCM ciphertext under a per-tenant DEK with the tenant id as AAD, wrapped by a per-tenant KEK in a self-hosted KMS (Vault Transit / OpenBao). Even a forced logical bypass returns bytes that decrypt to garbage without the peer tenant’s key — provable by a fault-injection probe. Embeddings stay plaintext inside the tenant’s own schema so ANN + BM25 still work.
↤ NO SHARED HULL · NOTHING CROSSES THE AIR-GAP ↦
verify_isolation() runs a live adversarial breach test between two compartments — it writes a secret into COMPARTMENT_A, then as tenant B attempts recall plus a forged body {user_id:A}, and asserts cross_read_results == 0. It returns a compact Ed25519-signed detached-JWS manifest binding {tid → store_id → kms_key_id → schema → policy_hash → probe_result_hash → git_sha → ts}, with a conclusive flag so no probe can false-green on 0/0.
The stamp IS the signature — the seal’s serial (register) is derived deterministically from the manifest.
UNCERTIFIED — signature not checked. Press Countersign to run a real WebCrypto Ed25519 verification of the manifest.
Or edit the tid value in the manifest directly, then re-Countersign — the serial voids.
REPRODUCIBILITY — The strength is reproducibility, not our word: run verify_isolation against your OWN two compartments, re-check the signature against our published JWKS, and hand the signed manifest to your auditor.
HONESTY — This control performs a real WebCrypto Ed25519 sign-and-verify over the manifest bytes and fetches the real production JWKS live (real kid/x shown left, imported via crypto.subtle). It is labelled a demonstration of the verify_isolation() flow: the runnable signature is computed with an in-page keypair, not the production private key — so it never implies a signature it did not compute. Where the network is unavailable, the JWKS panel is clearly marked offline.
CrossTalk is a mechanism-agnostic, MCP-native benchmark harness where ONE parametrized test file turns RED on the shared store and GREEN on hunta-isolated from the same driver. Hermetic CI — pinned mem0 with infer=False, in-memory Qdrant, one deterministic offline embedder for every backend — so the only variable is isolation architecture. No network, no keys, no LLM.
METHOD — Reported in two separate columns — correct-usage AND the #3998 misconfig — so there is no strawmanning. Membership-inference AUC (0.5 = no leak, 1.0 = full leak) tracks the #5439 entity-merge ranking side-channel. Again: method and reproduction, not yet-measured production results.
FAIRNESS — Fairness is the moat: pre-registered methodology, dual judges (Claude + GPT), published raw transcripts, correct-usage vs misconfig columns, steelmanned incumbents, only synthetic PHI. We invite their PRs. Self-monitoring CI: the mem0 leak case is xfail(strict=True) — if mem0 ever stops leaking, our build turns red and the benchmark is flagged for re-basing.
Two columns only — the honest head-to-head is against the funded incumbent whose #3998 we reproduce.
| Dimension | hunta.ai SEALED | mem0 FILTERED |
|---|---|---|
| Tenant isolation mechanism | Schema-per-tenant + role GRANTs + FORCE RLS + per-tenant AES-256-GCM | Metadata scope (user_id filter, one shared store) |
| Tenant identity source | Signed tid claim in OAuth 2.1 token — no tenant argument | Request parameter (spoofable) |
| Structural vs filter | Structural — no shared hull to filter | Filter |
| Machine-checkable isolation proof | Ed25519 signed attestation, re-runnable vs public JWKS | None (attest() == None) |
| Documented cross-tenant leak | 0.0% CTRR in CrossTalk | #3998 (100% CTRR reproduced), #5439 open |
| Recall engine | Forked Graphiti — parity target, no superiority claim | Own retrieval |
We claim recall PARITY with the un-isolated fork (|ΔJ| ≤ 1.0 LLM-judge), not recall superiority. mem0’s mechanism is quoted from its own public docs and reproduced against real mem0 in the CrossTalk harness.
Identity is a signed tid inside an audience-bound OAuth 2.1 access token — EdDSA verified for sig + aud + iss + exp. alg:none and HS/RS confusion are rejected; a missing or forged token returns 401 with RFC 9728 discovery. There is no WHERE tenant_id to spoof or forget.
Per-tenant Postgres schema, own tables, own HNSW index, own tsvector, own NOLOGIN role. Cross-tenant retrieval cannot even be addressed — it raises at the database, it does not return the wrong rows.
Content is AES-256-GCM ciphertext with the tenant id as AAD, keyed per-tenant. Blob-swap is blocked; a cross-tenant Decrypt returns AccessDenied and lands in the KMS audit log.
verify_isolation() returns a live, signed, conclusive fault-injection proof. Re-run it against your own two compartments; re-check the signature against our published JWKS. Reproducibility over trust.
Recall is a thin fork of Graphiti (Apache-2.0, BM25 + vector + graph + RRF, no query-time LLM). We target parity with the un-isolated fork (|ΔJ| ≤ 1.0 LLM-judge) — we fork recall, we do not claim to have solved it.
The server refuses to bind without env-provided auth / KEK / attest keys unless CROSSTALK_DEV=1 is set explicitly. Every read and write is logged — know what, who, and when.
curl https://mcp.hunta.ai/.well-known/oauth-protected-resourcecurl -H "Authorization: Bearer $TOKEN" \
-d '{"tool":"recall","query":"after-hours contact"}' \
https://mcp.hunta.ai/mcpcurl -H "Authorization: Bearer $TOKEN" \
-d '{"tool":"verify_isolation"}' https://mcp.hunta.ai/mcp
# → { attempted_cross_read:true, results:0, jws:"eyJ…", conclusive:true }pip install crosstalk-bench && crosstalk run --your-config
# RED on shared store, GREEN on isolated — same driverFour MCP tools: remember, recall, verify_isolation, whoami — none takes a tenant argument. Apache-2.0. Live at mcp.hunta.ai. Public verification key at /.well-known/jwks.json (kid crosstalk-auth-1).
Metering is best-effort and never on the critical path — Lago downtime never delays or fails remember/recall. Counts use tiktoken cl100k_base so you can reproduce your bill; verify_isolation and whoami are unmetered. The embedder is a deterministic hashed bag-of-words, so ‘tokens’ means content tokens processed — a pure, reproducible pricing unit.
If the flood doesn’t stop at the wall and the countersignature doesn’t check against our JWKS, don’t believe us.