Implementation Plan: Tool Memory¶

Purpose¶

This document defines the implementation plan for the tool memory feature based on the approved PRD and the selected minimal-change architecture.

The chosen design is:

tool memory is emitted by the main assistant turn itself
the output contract is enforced through the system prompt
the tool memory is wrapped in a <keen_memory>...</keen_memory> block
the REPL filters that block out of visible rendering only
the raw assistant message, including the block, remains intact in normal conversation history

This keeps the implementation small while ensuring tool memory naturally stays in later LLM context.

Goals¶

Generate tool memory as part of the main assistant turn
Keep tool memory out of the visible REPL transcript
Let tool memory naturally remain in later LLM context through normal conversation history
Preserve tool memory across session resume
Minimize code changes

Non-Goals¶

Showing tool memory in the REPL UI
Storing raw tool input/output in long-term conversation history
Adding a second LLM call for summarization
Introducing a separate hidden memory model
Redesigning the session model

Design Summary¶

The system prompt will instruct the main model:

if the turn used one or more tools, append exactly one <keen_memory>...</keen_memory> block at the very end
if the turn used no tools, emit no such block
the block must summarize outcomes, not raw tool I/O
the block must stay short

Keen will then:

Stream the assistant response as usual.
Detect and suppress the XML tool-memory block from visible REPL rendering.
Keep the raw assistant message unchanged in normal conversation history.
Persist the raw assistant message unchanged so resumed sessions keep the tool memory in context.

Because the raw assistant message is already part of normal conversation history, the tool memory naturally remains in the context window for later turns.

Core Decisions¶

1. Keep Tool Memory Inside The Normal Assistant Message¶

Do not introduce:

a new llm.Message kind
a dedicated hidden-memory store
a new tool-memory field in session events

Instead:

the assistant message saved in AppState.messages remains the raw model output, including the XML tool-memory block
the REPL strips the block only for display purposes

This is the main simplification.

2. Use XML Tags As The Output Contract¶

The main system prompt in internal/llm/systemprompt.go should define the fixed tool-memory delimiters:

<keen_memory>...</keen_memory>

The XML block must appear only at the very end of the assistant turn.

3. Rely On Prompting And Best-Effort Parsing¶

The chosen solution is prompt-based rather than protocol-enforced.

That means:

Keen will instruct the model to emit the block
Keen will filter it out when present
Keen will not fail the turn if the block is missing or malformed

This keeps v1 simple. Reliability is best-effort.

Minimal Code Changes¶

1. `internal/llm/systemprompt.go`¶

Ensure the system prompt clearly requires:

emit one <keen_memory> block at end of turn if tools were used
emit no block if no tools were used
summarize durable outcomes only
do not include raw tool input/output
keep it short

2. `internal/cli/repl/streaming.go`¶

Add streaming-aware parsing so assistant rendering excludes the XML block while the raw response is still preserved.

The parser should:

accumulate the raw assistant response
detect the opening <keen_memory> tag, including across chunk boundaries
stop rendering the XML block into visible assistant content
keep buffering until </keen_memory> appears

This is the main implementation work.

3. `internal/cli/repl/streaming.go`¶

Update HandleDone() so it returns both:

visible assistant text for transcript rendering
raw assistant text for conversation history

A shape like this is sufficient:

type doneResult struct {
    Lines          []string
    VisibleMessage string
    RawMessage     string
}

The important part is the split between what gets rendered and what gets stored.

4. `internal/cli/repl/handlers.go`¶

Update turn finalization so:

AppState stores the raw assistant message
the REPL transcript uses the cleaned visible assistant message

This is the other key change.

5. `internal/cli/repl/session_state.go`¶

Persist the raw assistant message, not the cleaned visible one.

This is necessary so resumed sessions keep the tool-memory block in the rebuilt conversation state.

No new session-event field is required if the existing assistant message field stores the raw message.

6. `internal/session/projection.go`¶

No special tool-memory logic should be needed.

Projection can continue rebuilding conversation state from the persisted assistant message. Because that message is raw, the tool memory naturally survives resume.

Runtime Flow¶

Normal Turn Completion¶

At the end of a turn:

StreamHandler has the raw assistant response.
The visible transcript is built from the cleaned response with the XML block removed.
AppState stores the raw assistant response.
Session persistence stores the raw assistant response.

Resume¶

On session resume:

Transcript replay uses rendered transcript content, so the XML block stays hidden.
Conversation projection uses the raw assistant message, so the tool memory is present in later LLM context.

Compaction Caveat¶

This design keeps tool memory embedded in normal assistant messages, so compaction is the main caveat.

AppState.Compact() currently summarizes prior conversation history. If left unchanged, tool-memory blocks may be lost or blurred during compaction.

Minimal-change v1 approach:

accept the current compaction behavior for now
optionally strengthen the compaction prompt later if preserving tool-memory content becomes important

Compaction-specific tool-memory preservation is out of scope for the minimal implementation.

Parsing And Failure Handling¶

Parser behavior must be forgiving.

Rules:

if no XML block is found, render and store the response normally
if the opening tag is found but the closing tag is missing, avoid showing the partial block in the UI when possible
malformed or missing tool memory must not fail the assistant turn

V1 does not require a repair path.

Suggested Implementation Sequence¶

Phase 1: Prompt And Stream Filtering¶

Finalize the XML block contract in internal/llm/systemprompt.go
Add XML parsing and filtering logic to StreamHandler
Ensure visible assistant rendering excludes the XML block

Phase 2: Turn Finalization And Persistence¶

Update HandleDone() to split visible vs raw assistant text
Update handleLLMDone() to render the visible text but store the raw text
Ensure session persistence stores the raw assistant message

Phase 3: Resume Verification¶

Verify resumed sessions rebuild conversation state with raw assistant messages intact
Verify replayed transcript stays filtered

Tests¶

`internal/cli/repl/streaming_test.go`¶

XML tool-memory blocks are stripped from visible assistant output
opening and closing tags split across chunks are handled correctly
malformed or partial XML blocks do not leak into the visible transcript

`internal/cli/repl/handlers_test.go`¶

completed turn renders cleaned visible text and stores raw assistant text
turns without XML blocks behave normally
malformed XML does not break normal turn completion

`internal/session/projection_test.go`¶

resumed sessions rebuild raw assistant messages unchanged

`internal/session/store_test.go`¶

assistant turn events persist the raw assistant message including the XML block

Acceptance Criteria¶

The feature is complete when all of the following are true:

a turn that emits a valid XML tool-memory block stores that block in the raw assistant message
the XML block is not rendered in the REPL UI
subsequent LLM turns naturally receive tool memory through the existing conversation history
resumed sessions preserve tool memory because the raw assistant message is restored unchanged
malformed or missing XML does not break normal turn completion