LOD Memory: Why Agent Memory Is Broken

There is a problem every serious agent builder hits, usually around week three of running a long-lived agent in production.

The agent starts forgetting things. Not because the memory system failed. Because every memory system we have today forces you to make permanent decisions about what matters. And those decisions are almost always wrong.

This is a writeup of a concept I arrived at sideways. I have been building Tropic, a governance and security control plane for enterprise AI agents. The problem I kept running into was not security or policy enforcement. It was memory. When you govern multiple agents centrally, you quickly realise there is no good answer for how agents store and learn from their experiences in a shared way. Writing interactions into a RAG store feels like the obvious starting point. It works at small scale. But as the number of agents grows and the volume of shared context accumulates, RAG breaks down. You are rediscovering knowledge from scratch on every query. There is no accumulation, no provenance, no shared understanding that persists and sharpens over time across agents.

I am not shipping this today. This is a concept piece, not a product announcement. But I have a background in game development, and when I sat with this problem long enough, I made a connection. The answer might already exist in a completely different domain: terrain generation.

The problem with how agents remember things today

The OpenClaw ecosystem has produced some genuinely good memory tooling in the last few months. QMD gives you hybrid search across flat memory files. Lossless-claw ensures nothing gets thrown away during context compression, storing conversation history in a directed acyclic graph. Dreaming runs a three-phase background process every night to decide what gets promoted to long-term memory.

Each of these is solving a real problem. But they share a fundamental assumption that I think is wrong.

They are all flat memory architectures. Every memory exists at a single, fixed resolution. When you retrieve from any of them, every result comes back at the same level of detail: full text, or a summary, or a keyword match. There is no sense of distance from the current query, no gradation of how much detail a given memory deserves right now. QMD is a flat list with better search. Lossless-claw is a flat timeline with better compression. The LLM Wiki, which I will come to shortly, is a flat set of compiled pages. Dreaming decides which memories get promoted to the flat long-term store. All of them are flat.

And because the architecture is flat, you eventually have to make permanent decisions about what stays and what gets removed. Dreaming makes that decision at 3am based on frequency and relevance scores. Lossless-claw compresses chronologically. QMD helps you search whatever you kept. The LLM Wiki compiles once and keeps the result until you recompile.

The problem is that importance is not fixed. A memory that seemed trivial six months ago becomes critical today when a new query makes it relevant. Any flat architecture that makes permanent decisions about what to keep is making those decisions with strictly less information than will eventually be available.

What if instead of deciding what to keep, you kept everything but returned it at different levels of detail depending on how relevant it is right now?

Karpathy's LLM Wiki gets close, but not close enough

Andrej Karpathy published a related idea recently that is worth understanding before explaining LOD memory, because it highlights exactly where the gap still is.

His LLM Wiki concept treats raw documents as source code and an LLM as a compiler. Rather than running RAG retrieval on every query and rediscovering the same knowledge repeatedly, you compile your source material into a structured wiki once. The LLM reads incoming documents, extracts what matters, updates existing pages, adds new ones, and notes contradictions. Knowledge accumulates. It does not have to be re-derived on every question.

This is a significant improvement over standard RAG. As one description of the pattern puts it, most retrieval systems are like a research assistant who reads every book in the library and then forgets everything the moment the conversation ends. The LLM Wiki approach is closer to building a real knowledge base that persists and grows.

But it still has the same structural limitation as the other tools: it is a flat memory architecture.

Every compiled wiki page exists at the same resolution. When an agent retrieves from it, it gets back a chunk of text with no sense of how relevant that chunk is relative to the current task, how recently it was accessed, or whether something in it should now be foregrounded because of a new query. The compilation step is a one-time write. The retrieval step is still just flat search at uniform resolution.

For a single person building a personal knowledge base, this is fine. For agents that accumulate context continuously, coordinate with other agents, and run tasks that shift focus constantly, it is not enough.

What terrain generation taught me

In game engine development, rendering a landscape at full detail everywhere simultaneously is computationally impossible. The solution is Level of Detail, or LOD. Terrain close to the camera renders at full resolution. Terrain further away renders at progressively lower resolution. The camera moves and the resolution adjusts in real time. Nothing is discarded. Everything is always there. You just see less detail at distance.

The key insight is that the camera position determines the resolution. Not a fixed schedule, not a permanent decision made at ingestion. The viewpoint in the present moment.

Agent memory has the same structure. You have a vast store of accumulated context. At any given moment, a specific task or query defines a viewpoint. Some memories are close to that viewpoint: highly relevant, recently accessed. Others are far away: old, unrelated to the current task. You do not need the same level of detail from both.

The query is the camera.

How LOD memory works

Diagram: raw input passes through two LLM calls to produce full content, a summary, and a headline, all stored with an embedding in a single record. — At ingestion, each memory is stored at three levels of detail. Two LLM calls, paid once at write time.

At ingestion, every piece of information gets stored at three levels of detail simultaneously. A conversation turn, a document, a tool result, a user preference: all of it.

Full content is the raw information exactly as received. A summary is an LLM-compressed version at roughly twenty percent of the original length. A headline is a single sentence capturing the core idea. An embedding is generated from the summary and stored alongside all three representations.

This compression happens once, at write time. The cost is two LLM calls per ingestion. It is paid once and never again.

On retrieval, the query generates an embedding and pulls a wide set of candidate memories. Not the typical five or ten from a standard RAG setup, but something closer to a thousand. This is important. Standard retrieval optimises for precision, returning only the most similar results. LOD retrieval optimises for coverage, returning a broad landscape at mixed resolution.

For each candidate memory, two signals combine to determine which level of detail gets returned.

Semantic distance measures how closely the memory relates to the current query. Temporal distance measures how recently the memory was created or last accessed. Together they produce a tier assignment. A recent and highly relevant memory returns full content. An old but relevant memory returns a summary. A distant and unrelated memory returns only a headline.

Terrain rendered at three LOD tiers, mapped to headline (far, k≈750), summary (mid-range, k≈200), and full content (near, k≈50). The query is the viewpoint. — Terrain LOD (retrieval perspective). The query is the viewpoint. Memories surface at the resolution their distance from the query warrants.

The result that arrives in the agent's context window looks something like this: fifty full-content memories, two hundred summaries, seven hundred and fifty headlines. Total coverage is one thousand memories. Token cost is manageable because headlines are tiny. The agent sees the entire landscape at appropriate resolution.

The maths behind the resolution decision

The tier assignment for each memory is not arbitrary. It is computed from two signals combined into a single LOD score.

Semantic similarity uses cosine similarity between the query embedding and the memory embedding:

Cosine similarity formula: S(q, m) equals the dot product of q and m divided by the product of their magnitudes.

This returns a value between 0 and 1. A score of 1 means the memory is semantically identical to the query. A score near 0 means it is unrelated.

Temporal decay captures how recently a memory was last accessed, weighted by how often it has been retrieved:

Temporal decay formula: D(t, r) equals e to the power of negative lambda times t divided by 1 plus log of 1 plus r.

Where t is days since last retrieval, r is total retrieval count, and lambda is a decay rate constant. The log term means retrieval frequency slows decay progressively but with diminishing returns. A memory retrieved fifty times decays much slower than one retrieved once, but not fifty times slower. A memory that has never been retrieved and was written six months ago will be close to zero. A memory retrieved yesterday will be close to one regardless of age.

The combined LOD score weights both signals:

Combined LOD score formula: L(q, m) equals alpha times S(q, m) plus (1 minus alpha) times D(t, r).

Alpha is the most important tunable parameter in the whole system. It controls the balance between semantic relevance and recency.

Tier assignment then maps the LOD score to a resolution level:

Piecewise tier assignment: tier(m) is full if L is at least theta-1, summary if between theta-2 and theta-1, headline if between theta-3 and theta-2, excluded if below theta-3.

Where theta-1, theta-2, theta-3 are configurable thresholds.

Alpha is the parameter that changes everything

Alpha deserves its own discussion because it is not just a tuning knob. It is a statement about what kind of agent you are running and what kind of memory behaviour you need.

Consider three very different deployments:

A medical research agent summarising clinical trial literature across thousands of papers. A finding from a 2019 study on drug interactions is just as relevant today as when it was published. A memory should surface if it is semantically related to the query, regardless of when it was written. Here you want alpha close to 1. Semantic relevance dominates. Age is almost irrelevant.

An executive daily digest agent that monitors news, market signals, and competitor activity across Slack, WhatsApp, and email. Yesterday's news is relevant. Last month's news usually is not. Here you want alpha close to 0. Recency dominates. A highly semantically similar memory from three months ago is probably stale and should come back at headline resolution at most.

A legal document agent reviewing contracts and precedents for a law firm. Some documents are foundational and should always surface at full resolution regardless of age. Others are routine and can fade. Here you want alpha around 0.5, with the thresholds tuned to keep key precedents permanently in the high tier regardless of retrieval frequency.

The point is that alpha is not a default you set once. It is a configuration that reflects the nature of the work. Alpha is a per-agent governance parameter, set at deployment time and adjustable without changing anything else about the memory architecture. The same underlying store serves all three agents above. What changes is how each agent weighs the signals when it retrieves.

This is also why the formula matters beyond the theory. Without making the scoring function explicit, alpha is just a vague idea. With it, you can reason precisely about what your agent will and will not surface, and why.

The promotion mechanic

Here is where it diverges most sharply from existing approaches.

When a retrieval pulls a memory that is old but semantically close to the query, that memory gets promoted. Its last-retrieved timestamp updates. Its tier moves up. On the next retrieval it will come back at higher resolution.

This means memories do not decay irreversibly. A memory from two years ago that keeps getting retrieved stays sharp. A memory from last week that never gets touched fades to a headline. The system learns from usage, not from a fixed schedule.

No background process runs at 3am to decide what matters. No threshold gates determine permanent promotion. The usage pattern of the agent itself drives the resolution landscape, continuously, in real time.

Existing solutions compile knowledge once and keep it flat. LOD memory keeps knowledge at dynamic resolution that adjusts with every retrieval. No snapshots, no batch summarisation, and no DAG recalculation.

The tool interface for agents

Because this is a platform-level memory store, it exposes three clean tools that any agent can call regardless of what framework it runs on.

memory_search(query) returns tiered results: a mix of full content, summaries, and headlines based on the LOD calculation for that query.

memory_expand(memory_id) retrieves full content for a specific memory identified from a headline. The agent scans headlines, finds something relevant, and drills in.

memory_promote(memory_id) manually forces a memory up a tier when the agent determines something is more important than the automatic scoring suggests.

This is how human experts actually work. You do not load everything you know into working memory before starting a task. You work from a high-level map, identify what looks relevant, and retrieve detail on demand. LOD memory gives agents the same capability.

Because the interface is three MCP-compatible tools, any agent can use it without knowing anything about the underlying architecture. OpenClaw, Claude Code, LangGraph, anything. The LOD model is invisible infrastructure.

What this replaces

A few things follow from this that are worth naming directly.

You no longer need to decide what is long-term versus short-term memory. Everything is stored. Resolution is dynamic. There is no permanent discard.

You no longer need a consolidation process to decide what to promote. Promotion is a function of retrieval. The agent's own usage pattern determines what stays sharp.

You no longer need to choose between breadth and depth on retrieval. With a k of one thousand at mixed resolution, you get both. The wide retrieval ensures you do not miss something important. The resolution tiers ensure you do not blow your context budget on irrelevant detail.

Seeing your memory landscape

There is an aspect of the terrain generation analogy that has not been mentioned yet, and it is important.

In a game engine, the LOD system is not invisible to the developer. You can fly through the scene and watch resolution shift in real time. You can see exactly what is rendering at full detail and what is a distant low-polygon mesh. That visibility is part of what makes the system trustworthy. You can inspect it, tune it, and understand why it looks the way it does.

Agent memory should work the same way.

Karpathy's LLM Wiki makes a similar point. One of its strengths is that the compiled wiki is human-readable. You can open the markdown files and see what the agent knows. That transparency is not incidental. It is part of what makes the system auditable and trustworthy for real work.

LOD memory should be equally inspectable, but the right interface is not a flat list of markdown files. It is a three-dimensional plane.

Imagine viewing your agent's memory store as a spatial map. The horizontal axes represent semantic clustering: memories that are conceptually related sit near each other. The vertical axis represents the LOD tier: full-content memories float at the top, summaries in the middle, headlines at the bottom. The query drops into this space as a viewpoint, and you can watch the resolution landscape shift around it in real time. Memories that are semantically close and recently accessed rise toward full resolution. Memories at the periphery sink toward headlines.

A 3D point-cloud visualisation of embeddings from Arize Phoenix, with clusters ranked by query relevance in a side panel. — Arize Phoenix already visualises embeddings as a 3D spatial point cloud, with clusters ranked by query relevance. LOD memory extends this idea: position encodes semantic similarity, height encodes resolution tier, and the query is the camera. (Image: Arize Phoenix)

This is not just a visualisation for its own sake. It serves three practical purposes.

First, it lets you audit what your agent actually knows. If a critical piece of context has decayed to headline resolution when it should not have, you can see it immediately and promote it manually.

Second, it lets you tune alpha and the decay function with intuition rather than guesswork. You can watch how changing alpha shifts the resolution landscape around a sample query and understand immediately whether the behaviour matches what you want.

Third, it builds trust. One of the core reasons enterprises hesitate to deploy agents on sensitive workloads is that they cannot see inside the agent's memory. A visual memory plane changes that. It makes the agent's knowledge state an auditable artifact, not a black box.

The right interface for LOD memory is not a list of log entries. It is a spatial view of what every agent knows, at what resolution, and why.

How this relates to prior work

Before claiming novelty it is worth being precise about what already exists, because there is relevant prior art that deserves acknowledgment.

MemPalace, released in April 2026 by Milla Jovovich and Ben Sigman, takes a spatial approach to the same problem. It organises memories into wings, rooms, closets, and drawers: a hierarchical structure inspired by the ancient memory palace technique. The instinct is similar to LOD memory in that navigating a structured space is better than searching a flat list. Their knowledge graph with temporal validity windows for tracking entity relationships over time is genuinely novel and something LOD memory does not address.

The key architectural difference is in how retrieval works. MemPalace stores memories verbatim in ChromaDB and uses wing and room tags to narrow the search scope before running semantic similarity. The palace structure tells the retrieval system where to look, and that scoping is genuinely useful. But everything within that scope returns at the same resolution: full verbatim text, ranked by cosine similarity. There is no dynamic tiering, no decay, no promotion based on usage. The resolution is uniform at retrieval time. LOD memory's resolution is query-driven and continuously adapts based on both semantic relevance and recency.

RAPTOR, a 2024 paper from Stanford, introduced recursive abstractive processing for tree-organised retrieval. It recursively clusters and summarises chunks of text, building a tree with differing levels of summarisation from the bottom up, then retrieves across the tree at different levels of abstraction at query time. If you have read this far and thought “this sounds like RAPTOR”, you are not wrong to see the connection.

But there are three precise differences that matter.

Both LOD memory and RAPTOR pay an LLM cost at write time. RAPTOR runs summarisation at each level of its tree during construction. LOD memory runs two LLM calls per memory to generate the summary and headline. This is a shared weakness and worth naming honestly rather than glossing over it. Mitigations exist: batching summarisation during idle time, using a smaller model for headline and summary generation, skipping summarisation for short memories below a token threshold. But the fundamental cost is real in both cases.

The differences that matter are structural.

RAPTOR's tree is relational. The summarisation at each level depends on what sits below it, which means adding new content can invalidate parent summaries up the chain. Incorporating new documents cleanly requires re-clustering affected nodes and regenerating summaries above them. The paper benchmarks on static corpora: books, NLP papers, fixed passages. This is the right design for that problem.

LOD memory has no relational structure between memories. Each memory is independent. Ingesting a new memory generates its three representations and stores them without touching anything else in the store. The write cost is constant regardless of how large the store has grown. This matters for agent memory that accumulates continuously: a new conversation turn can be ingested immediately without any coordination with existing memories.

The second structural difference is at retrieval. RAPTOR's best-performing retrieval mode, collapsed tree, flattens the entire hierarchy into a single layer and runs cosine similarity across all nodes, returning whichever score highest as full text. The hierarchy informs what exists in the store but does not appear in the retrieval output. LOD memory returns explicitly tiered content in a single pass: full content, summaries, and headlines simultaneously across a wide k. The agent sees the resolution landscape directly and can drill into any headline it chooses. The hierarchy is in the output, not just the index.

The third difference is temporal. RAPTOR has no time dimension. There is no recency weighting, no decay, no retrieval-frequency promotion. Two memories ingested at different times are treated identically. For document retrieval over fixed corpora this is fine. For agent memory that needs to weight recent experience differently from distant history, it is a gap.

RAPTOR is the closest prior work. It is rigorous, well-benchmarked, and genuinely useful. LOD memory addresses a different problem: continuously accumulating experience in a governed multi-agent environment, where temporal dynamics and resolution-aware retrieval matter as much as hierarchical summarisation.

Open questions

This is a proposal, not a finished system. Several things need working out.

Non-semantic retrieval. Agents frequently need memory retrieval that has nothing to do with semantics. “Find my memories from last week” is a pure timestamp range scan. “Give me a random memory” is weighted sampling. “Show me everything I have never looked at” is a filter on retrieval count zero. None of these can be expressed as a vector similarity query, and RAG stores struggle with all of them because they typically only store embeddings and raw content.

LOD memory is better positioned here because the schema stores timestamp, retrieval count, tier, and agent id on every record as first-class fields. Non-semantic retrieval becomes a straightforward SQL-style query on the same store rather than a hack around a vector index.

Temporal retrieval returns everything in the time window, tiered by a simplified score based only on retrieval frequency. Frequently accessed memories within that window come back at full resolution. Untouched ones come back as headlines. The LOD output shape is preserved even without a semantic query.

Random retrieval is more interesting. Rather than uniform random sampling, the most useful mechanic is inverse decay weighted sampling: memories with the lowest D(t, r) values are most likely to surface. The system deliberately resurfaces what it has been ignoring.

Inverse-decay sampling formula: probability of selecting memory m is proportional to 1 minus D(t, r).

All of these non-semantic modes require the memory store to support both vector search and SQL-style filtering on the same records. Pure vector databases cannot do this cleanly. This argues for pgvector or a similar hybrid store with relational indexes on timestamp, retrieval count, and tier alongside the vector index. The architecture choice of storage backend is not cosmetic: it determines which retrieval modes are possible.

Negative retrieval. This is one of the most interesting unsolved problems in agent memory and RAG has no good answer for it. Standard retrieval answers the question: find me memories related to X. But agents frequently need the inverse: find me memories that are not related to X, or explicitly exclude a topic from what surfaces. A research agent told to analyse a company might need to exclude everything it knows about a competitor to avoid contaminating the analysis. An agent drafting a response might need to surface context explicitly outside a particular domain. You cannot express this as a vector similarity query. Negative semantic space is not well defined in standard embedding models. This probably requires an explicit inverse flag in the retrieval API, something like memory_search(query, exclude=["cats"]), with a separate scoring pass that penalises memories semantically close to the exclusion terms. It is unsolved and worth naming directly.

Conflicting memories. If contradictory information has been stored about the same entity, what does retrieval return? Both at full resolution? The more recent one? This is genuinely unsolved and probably requires a conflict detection layer at ingestion time, not retrieval time.

Embedding strategy. Embedding the full content captures more signal but introduces noise. Embedding the summary is cleaner but loses detail that might matter for edge-case queries. This is worth testing empirically.

The right k. A thousand candidate memories is a starting point. The right number depends on context window size, average memory length, and how aggressively summaries and headlines are compressed.

Where this goes next

This is a concept, not a shipped system. I hit the memory problem while building Tropic and started thinking about it from a game development angle. Whether LOD memory gets built as part of Tropic, as a standalone layer, or by someone else entirely is an open question.

What I am reasonably confident about: the existing approaches are patches on a broken foundation. Flat storage with batch consolidation was the best available answer for a while. It is not the right answer for agents that accumulate experience continuously, coordinate with other agents, and need their memory to get smarter over time rather than just bigger.

A proof of concept script is in progress. I will publish it when it is ready to validate or challenge the ideas here.

If you are working on agent memory architectures, have run into the same walls, or think the LOD framing is wrong in a specific way, I would genuinely like to hear it.

The goal is not to replace QMD, lossless-claw, the LLM Wiki pattern, or MemPalace. All of them solve real problems well. The goal is to ask whether the foundation they share, flat memory architectures that store and retrieve at uniform resolution, is the right foundation at all.

I think there is a better one.

Michael is the founder of Tropic, a governance and security control plane for enterprise AI agents. Early access at tropic.bot.

LOD Memory: Why Agent Memory Is Broken and What Terrain Generation Taught Me About Fixing It