Tool Memory

Tool Memory¶

Ideation¶

At this stage, tool calls are only retained within the agent loop in a turn. After the turn, the tool call and its output are not retained. We want to avoid retaining tool calls and their outputs in the conversation history. But at the same time, past tool calls and their outputs can help the model to perform better in the subsequent turns.

How can we improve this without retaining tool input/outputs?
Can we only have key points after each agent turn? that can be an assistant segment? This is like a summary but only for the tool calls and their outputs.
What if we ask the LLM to give a tool memory but we don't show it in the REPL UI? Just store it in the conversation history.
Should we not have one memory after each turn?
What about storing only the latest 10 tool memories? We discard the older ones and don't send them to the LLM.

PRD¶

Based on our discussion, this is the PRD for the tool memory feature:

After each agent turn is finished, the LLM will write a tool memory block at the very end of the turn from the tool usage in that turn
It will be instructed through the system prompt to write the tool memory block with the following fixed delimiters: ...
Tool memory is a summary of the most important signals from the tool calls and their outputs
If no tool calls were made in a turn, LLM will write no tool memory
Tool memory won't be shown in the REPL UI. But it will be stored in the conversation history so that it can be in the context window of the LLM in the subsequent turns
We have to distinguish tool memory from other assistant messsages so that we can hide it in the REPL UI
Tool memory should summarize outcomes, not raw tool I/O.
Tool memory should be short, for example a few bullets or a small paragraph.
Session resume and compaction should preserve the retained tool memories.

Let's create an implementation plan for this feature based on the PRD. Save it in a file called output-3_tool-memory.md in @.ai-interactions/outputs/phase-5 directory.

Follow Ups¶

We can simplify the design even further. Just emit the blocks, and filter them out in REPL. That's it. Since it's part of the conversation, it naturally gets into the context.
Update the plan to reflect the simplified design.
We have a bug. If tag appears somewhere in the agent's message and not intended as tool memory, REPL still strips it.
`Right now, we are using to emit tool memory from LLM in its turn for the whole turn so that the memory can be retained for later turns. The problem with this approach is that it is behaving very bad whenever any part of the LLM messages have tag.
Where does it happen: Persistence treats it as memory only when it is a dedicated trailing block
We should go for the long-term reliable solution. The one you proposed, explain more.
How can we deterministically create summary for tool usages, given that we need to be careful to retain the most important detail concisely?
This is too complex.
There can be many many tool calls, no?
Should we rather only retain memory for write_file, edit_file, and bash tool? Other tools are read-only, model can use them again if needed without side-effects.
Figuring out what bash facts to store will be complex. There can be so many different bash commands. I am thinking to leverage the \isDangerousargument. If LLM marks a command asisDangerous, we can store it in the TurnMemory.
Okay so based on our discussion, this is what we want to retain:
- After every turn, we will create a tool memory object for the turn
- The object will have all the written or edited files, deduplicated in the turn
- It will also have bash command that failed with exit_code != 0
- Bash commands will also have the associated exit_code retained, but not the output
- We will store the TurnMemory in the session history in AppState.messages
- While converting internal Message objects to OpenAI or Genkit messages, we will put it as a part of the assistant message content What do you think of these requirements?
Ok now let's create an implementation plan and save it in .ai-interactions/outputs/phase-5 as output-4_tool-memory-redesign.md