Memory model - UrBrain

Scopes

UrBrain is multi-tenant on purpose. Memory belongs to an account, lives inside an organization, and is partitioned across workspaces that map to a product, a team, a customer engagement, or any other boundary the operator chooses to draw.

Workspace scope is part of the data model, not a filter applied at the edge. Object paths in storage are references, never authorization boundaries. When an agent calls search_memory, the service resolves the caller's account, workspace, and any agent or actor scoping before it reaches the index. A query that names a document the workspace cannot see returns nothing — there is no path that walks past that check.

Source identity is first class

Memories carry three columns that travel with them through every ingest, embed, retrieve, and tombstone path:

source_system — the system that produced the memory (for example, structuredmerge, a connector name, or an agent runtime).
source_ref — the stable external ID for the chunk, document, transcript event, or artifact.
source_hash — the content hash that lets the importer skip unchanged work and detect supersession without reprocessing an entire corpus.

That triplet makes ingestion idempotent, makes deletes propagate, and makes audit, replay, and export tractable instead of forensic.

Lifecycle states

A memory progresses through explicit states rather than disappearing silently into a vector store. The states are part of the public contract for retrieval and audit:

Active — currently retrievable. The default state for newly imported chunks and authored notes.
Superseded — replaced by a newer version of the same source. Kept for audit and replay; excluded from default retrieval.
Archived — intentionally retired by a workspace operator. Recoverable, but not surfaced.
Tombstoned — explicitly removed because the underlying source chunk no longer exists, propagated from a StructuredMerge delete artifact or an authored deletion.

Retrieval

Retrieval combines vector similarity, lexical matching, and structural filters. Embeddings live in pgvector with cosine distance and HNSW indexing. Lexical recall comes from PostgreSQL trigram lookup over the same memories. The two ranked lists are fused with reciprocal rank fusion, then filtered by workspace, memory type, lifecycle state, and any caller-supplied scopes before results are returned.

Because the index lives next to the policy in the same database, retrieval queries can join across them without crossing a service boundary. That keeps the latency budget honest and keeps permission checks from being bolted on after the fact.

Memory types

Memory in UrBrain is heterogeneous. The schema separates explicit authored notes from chunks promoted out of source documents and from events captured by runtime adapters, so retrieval can ask for "what a teammate wrote about onboarding" without competing with every automated ingest in the workspace.

Explicit memory written by a user or agent through MCP remember.
Artifact chunks promoted from source documents through the StructuredMerge ingest path.
Observed events captured by runtime adapters as durable provenance for what an agent did and saw.
Authored overlays for human-written context that should rank above derived material.

Memory with provenance, scope, and a lifecycle.

Scopes

Source identity is first class

Lifecycle states

Retrieval

Memory types