Tool Memory Redesign Implementation Plan¶

Goal¶

Replace the current <keen_memory>...</keen_memory> in-band assistant markup with a structured per-turn tool memory model stored in AppState.messages, then render that memory into assistant message content only when building provider requests.

Agreed Requirements¶

After every assistant turn, create a TurnMemory object for that turn.
TurnMemory stores:
FilesChanged: all files written or edited during the turn, deduplicated and kept in stable first-seen order
FailedBash: bash commands from the turn with exit_code != 0, including the command string and exit code
Store TurnMemory in AppState.messages alongside the assistant message.
While converting internal llm.Message objects to OpenAI or Genkit messages, render TurnMemory as part of the assistant message content.
Do not store raw stdout/stderr in turn memory.

Proposed Data Model¶

type Message struct {
    Role       Role
    Content    string
    TurnMemory *TurnMemory
}

type TurnMemory struct {
    FilesChanged []string
    FailedBash   []FailedBashCommand
}

type FailedBashCommand struct {
    Command  string
    ExitCode int
}

Rendering Contract¶

Assistant message content remains the source of user-visible text. TurnMemory is rendered into provider-facing assistant content only when non-empty.

Proposed format:

<assistant content>

Tool memory:
- Files changed: path/a.go, path/b.go
- Failed bash: go test ./... (exit 1)

Constraints:

Render Tool memory: only if there is at least one memory entry.
Omit empty sections.
Keep rendering deterministic.
Never reintroduce <keen_memory> tags.
Use the relative path for files changed

Implementation Plan¶

1. Introduce structured tool memory types¶

Update internal/llm/message.go to add:

TurnMemory *TurnMemory on Message
TurnMemory
FailedBashCommand

This is the core schema change. It should be minimal and backward-compatible for existing messages that have no memory.

2. Add turn-level accumulation in the REPL flow¶

Track tool-memory candidates during each assistant turn in the REPL/model layer where tool events are already observed.

Capture:

file paths from write_file
file paths from edit_file
failed bash commands from bash tool results where exit_code != 0

Do not capture:

read_file
grep
glob
successful bash commands
raw stdout/stderr

Use a deduplicating accumulator for changed files and an append-only list for failed bash commands.

3. Finalize `TurnMemory` at assistant turn completion¶

When the assistant turn ends successfully:

build a TurnMemory object from the accumulated turn state
attach it to the stored assistant llm.Message

When the turn has no retained memory:

store TurnMemory as nil

Decide whether interrupted/error turns should store partial memory. Recommended first version:

store memory only on completed assistant turns
skip storing partial memory on interrupt/error for now

4. Update `AppState.messages` usage¶

Ensure all AppState flows preserve TurnMemory:

AddMessage
GetMessages
ReplaceMessages
compaction request building
any session persistence or replay path that serializes/deserializes messages

Anything that copies llm.Message values must continue to preserve the new field.

5. Render `TurnMemory` only at provider boundary¶

Update provider conversion paths so structured memory becomes part of assistant content only when sending messages to the model:

internal/llm/genkit.go
internal/llm/openai.go
internal/llm/openai_responses.go

Introduce one shared formatter helper so the rendering logic is defined once and reused consistently.

Recommended helper shape:

func FormatMessageForProvider(msg Message) string

Behavior:

for non-assistant messages, return Content
for assistant messages without TurnMemory, return Content
for assistant messages with TurnMemory, append the rendered Tool memory: block

6. Remove `<keen_memory>` prompt instructions and parsing dependency¶

Delete the old model instruction that asks the LLM to emit <keen_memory>...</keen_memory>.

Update:

internal/llm/systemprompt.go

Then remove the legacy parsing/extraction path once the new flow is wired:

streaming parser logic for hidden memory tags
extraction helpers
related logging around extracted keen memory

If safer, do this in two steps:

first stop producing <keen_memory>
then remove parsing code after tests are updated

7. Update compaction behavior¶

Compaction should rely on structured TurnMemory, not hidden tags in assistant content.

Update the compaction prompt and request-building assumptions so:

hidden tag language is removed
provider-formatted assistant messages already contain rendered tool memory where applicable

This keeps compaction behavior working without a separate memory channel.

8. Update tests¶

Add and update tests for:

Message formatting with and without TurnMemory
changed files deduplication
failed bash capture with exit_code != 0
successful bash exclusion
provider conversion for OpenAI, Genkit, and OpenAI Responses
AppState copy/preserve behavior
session persistence/replay if messages are serialized
removal of <keen_memory> parsing behavior

Delete or rewrite tests that assert tag-based hidden memory.

Granular Todo List¶

[ ] Add TurnMemory and FailedBashCommand types to internal/llm/message.go
[ ] Add TurnMemory *TurnMemory to llm.Message
[ ] Find every place llm.Message is constructed and verify the new field is preserved
[ ] Add a REPL turn-memory accumulator type
[ ] Reset the accumulator at the start of each assistant turn
[ ] Record file paths for write_file tool completions
[ ] Record file paths for edit_file tool completions
[ ] Record failed bash commands for bash tool completions with exit_code != 0
[ ] Deduplicate FilesChanged while preserving first-seen order
[ ] Ignore successful bash executions
[ ] Ignore read-only tools entirely
[ ] Build TurnMemory from the accumulator in the assistant turn completion path
[ ] Attach TurnMemory to the stored assistant message in AppState.messages
[ ] Decide and implement behavior for interrupted/error turns; first pass should skip partial memory
[ ] Add a shared formatter helper for provider-facing message text
[ ] Render FilesChanged in a deterministic single line
[ ] Render each failed bash entry as command (exit N)
[ ] Update Genkit message conversion to use the shared formatter
[ ] Update OpenAI-compatible chat conversion to use the shared formatter
[ ] Update OpenAI Responses conversion to use the shared formatter
[ ] Remove <keen_memory> instructions from the system prompt
[ ] Remove compaction prompt references to hidden <keen_memory> blocks
[ ] Remove <keen_memory> extraction from completed assistant message handling
[ ] Remove <keen_memory>-specific streaming/parser logic
[ ] Remove obsolete keen-memory logging hooks
[ ] Rewrite streaming tests that currently depend on hidden tag parsing
[ ] Add unit tests for TurnMemory rendering
[ ] Add unit tests for failed bash detection from exit_code
[ ] Add unit tests for changed-file deduplication and stable ordering
[ ] Add unit tests to verify provider-boundary formatting includes rendered tool memory
[ ] Add/update session persistence tests if llm.Message is serialized anywhere
[ ] Run go mod tidy
[ ] Run go test ./...

Suggested Delivery Order¶

Add the new message schema and provider formatter.
Add turn-memory accumulation and assistant-turn storage.
Update provider conversions to include rendered memory.
Remove <keen_memory> prompt instructions.
Remove legacy tag parsing/extraction.
Update compaction and session-related paths.
Finish with tests, go mod tidy, and go test ./....

Open Decisions¶

Whether interrupted assistant turns should retain partial memory. Recommended now: no.
Whether write_file should be rendered as “created” and edit_file as “updated”, or both should simply feed FilesChanged. Current agreed requirement points to a single FilesChanged list.
Whether failed bash commands should also be deduplicated if the same failing command appears multiple times in one turn. Recommended now: keep all failures in occurrence order unless repeated duplicates become noisy.