OpenAI-Compatible Client Plan For DeepSeek¶
Goal¶
Replace the Genkit-backed path for DeepSeek with a first-party OpenAICompatibleClient that implements LLMClient. Use that client for:
- DeepSeek chat
- DeepSeek reasoner
The key requirement is support for deepseek-reasoner tool loops by preserving and replaying reasoning_content on assistant tool-call messages.
Why This Change Is Needed¶
The current DeepSeek integration goes through Genkit's compat_oai adapter. That adapter reduces assistant history to plain text plus tool calls and does not preserve reasoning_content.
Relevant code:
internal/llm/genkit.gointernal/llm/message.go- Genkit compat adapter:
go/pkg/mod/github.com/firebase/genkit/go@v1.4.0/plugins/compat_oai/generate.go
DeepSeek reasoner requires prior assistant tool-call messages to include reasoning_content, so the second request in a tool loop fails if that field is missing.
High-Level Approach¶
Introduce a new OpenAICompatibleClient in internal/llm and use it only for DeepSeek for now. Keep the existing Genkit path for all other providers.
Target split:
GenkitClient- Anthropic
- Google AI
- OpenAI
- Moonshot AI
OpenAICompatibleClient- DeepSeek
This keeps the first pass narrow while still establishing the right client abstraction for DeepSeek.
Design Overview¶
1. Add a dedicated provider-scoped OpenAI-compatible client¶
Create:
internal/llm/openai.go
Add a new client type:
type OpenAICompatibleClient struct {
provider Provider
model string
apiKey string
baseURL string
client *openai.Client
}
Use openai-go for transport with the DeepSeek base URL:
- DeepSeek:
https://api.deepseek.com/
The client will own:
- streaming request execution
- tool loop execution
- assistant/tool history replay
- provider-specific request shaping
2. Refactor client construction¶
Current construction is centered around internal/llm/genkit.go and the function name NewGenkitClient.
Refactor to:
- keep
NewGenkitClientfor Genkit-backed providers - add
NewOpenAICompatibleClientforOpenAICompatibleClient - select concrete implementation by provider at the call site or in a small provider-selection helper
Recommended routing:
anthropic->NewGenkitClient(...)googleai->NewGenkitClient(...)openai->NewGenkitClient(...)moonshotai->NewGenkitClient(...)deepseek->NewOpenAICompatibleClient(...)
This keeps the public abstraction unchanged while limiting the new client to DeepSeek.
3. Keep the shared message model unchanged for now¶
The current message type in internal/llm/message.go only stores plain text:
type Message struct {
Role Role
Content string
}
For the current DeepSeek bug, that is acceptable because the failure happens inside the active tool loop within one StreamChat call.
Plan:
- keep
llm.Messageunchanged for now - keep
AppState.messagesunchanged for now - store provider-specific structured state only inside
OpenAICompatibleClientduring the active request
The in-memory SDK request history inside OpenAICompatibleClient should preserve:
- assistant text content
- assistant
reasoning_content - assistant tool calls
- tool response messages
This is enough to continue the active DeepSeek tool loop without expanding persisted app history.
4. Keep app state persistence unchanged for now¶
Current REPL state persistence only stores final assistant text:
internal/cli/repl/state.gointernal/cli/repl/handlers.go
That still loses intermediate assistant/tool state after the request finishes, but that is acceptable for this phase because we are only fixing the active DeepSeek tool loop.
Plan:
- keep
AddMessage(role, content)unchanged - keep REPL persistence behavior unchanged
- revisit structured persisted history only if future-turn replay becomes a requirement
5. Use SDK types first inside OpenAICompatibleClient¶
Do not scatter provider-specific behavior through the REPL or shared Genkit path. Keep the request shaping inside OpenAICompatibleClient.
Recommended helpers:
buildRequestMessages([]Message) ([]openai.ChatCompletionMessageParamUnion, error)buildTools(*tools.Registry) ([]any, error)providerBaseURLOptions() []option.RequestOptionaccumulateStream(...)extractReasoningContent(...)injectReasoningContent(...)
Use typed openai-go request/response models wherever possible. Only use SDK escape hatches for the DeepSeek-specific field that the SDK does not model directly.
The shared flow should stay focused on DeepSeek chat and DeepSeek reasoner. The only non-standard part is reasoning_content.
6. Add provider-specific assistant serialization¶
For DeepSeek:
deepseek-chatshould behave like a standard chat model over the same clientdeepseek-reasonermust includereasoning_contentwhen replaying assistant tool-call turns
Required replay shape for reasoner:
{
"role": "assistant",
"content": "...",
"reasoning_content": "...",
"tool_calls": [...]
}
Because openai-go@v1.8.2 does not model reasoning_content as a first-class field, use the SDK's undocumented-field support rather than replacing the message model.
Recommended request approach:
- build assistant/tool messages with normal
openai-gotypes - inject
reasoning_contentwithSetExtraFields(...)on assistant message params - use
option.WithJSONSet(...)only as a fallback if nested extra fields become awkward
Recommended response approach:
- use normal streamed chunk types from
openai-go - read DeepSeek-specific fields from
chunk.Choices[0].Delta.JSON.ExtraFields - read final-message DeepSeek-specific fields from
choice.Message.JSON.ExtraFieldswhen available
Important caveat:
openai-go'sChatCompletionAccumulatordoes not accumulateJSONmetadata- so
reasoning_contentmust be captured while processing stream chunks, not only from the final accumulated response
7. Implement the provider stream loop in OpenAICompatibleClient¶
The new client should mirror the current GenkitClient.StreamChat behavior:
- build wire messages from internal history
- send streaming request
- emit text chunks as
StreamEventTypeChunk - accumulate final assistant state during streaming:
- content
- reasoning content, if present
- tool calls
- if no tool calls, emit done
- if tool calls exist:
- emit
tool_start - execute tools via the shared registry
- emit
tool_end - append assistant tool-call message to in-memory request history
- append tool result messages to in-memory request history
- repeat
This loop only needs to support DeepSeek right now. The important provider-specific behavior is how messages are encoded and how reasoning_content is parsed and replayed.
8. Keep StreamEvent and REPL behavior unchanged for now¶
Current StreamEvent is sufficient for this phase:
- text chunks
- errors
- UI-facing tool events
Do not expand StreamEvent yet. The client can keep the richer DeepSeek state internally and continue emitting the same external events.
9. Keep the REPL UI unchanged¶
No immediate UI work is required. The REPL should continue to show:
- streamed text output
- tool start/end lines
Persistence changes are not required in this phase.
Files impacted:
- none required for the first pass outside existing client construction call sites
Concrete Task Breakdown¶
Phase 1: Client split¶
- Keep
NewGenkitClientfor Genkit-backed providers. - Add
NewOpenAICompatibleClientreturning*OpenAICompatibleClient. - Ensure both concrete clients implement
LLMClient. - Route only
deepseektoNewOpenAICompatibleClient. - Keep
anthropic,googleai,openai, andmoonshotaionNewGenkitClient.
Phase 2: SDK-based request history¶
- Build request history as
[]openai.ChatCompletionMessageParamUnion. - Build assistant and tool messages with normal SDK types.
- Keep all assistant tool-call and tool-response state in that in-memory request history during the active request.
Phase 3: DeepSeek request building¶
- Build DeepSeek request messages from plain
[]llm.Message. - Add model-specific assistant serialization hooks for:
deepseek-chatdeepseek-reasoner- Ensure DeepSeek reasoner assistant replay includes
reasoning_contentviaSetExtraFields(...).
Phase 4: DeepSeek stream loop¶
- Implement streaming request handling in
OpenAICompatibleClient. - Accumulate:
- visible text
- tool calls
- provider-specific extra fields such as
reasoning_content - Capture
reasoning_contentwhile processing chunks because the SDK accumulator drops JSON metadata.
Phase 5: Tool loop replay¶
- Reuse the existing tool execution behavior conceptually.
- Append assistant tool-call messages with all fields needed for replay into the in-memory SDK request history.
- Append tool result messages with correct
tool_call_idinto the in-memory SDK request history. - Continue looping until the model stops or
maxToolTurnsis reached.
Phase 6: Keep REPL persistence unchanged¶
- Leave
AppState.messagesunchanged. - Leave
handleLLMDoneunchanged. - Verify the UI output still works across:
- plain response
- tool call
- multiple tool rounds
- interrupted stream
Phase 7: Cleanup¶
- Remove DeepSeek initialization from
internal/llm/genkit.go. - Keep Genkit handling Anthropic, Google AI, OpenAI, and Moonshot for now.
- Keep provider wire logic isolated to
openai.go.
Testing Plan¶
Add unit tests for the new client first, then run the existing LLM and REPL suites.
New tests¶
Create:
internal/llm/openai_test.go
Test cases:
- DeepSeek chat request builder
-
standard assistant/tool history serializes correctly
-
DeepSeek reasoner request builder
-
assistant tool-call replay includes
reasoning_content -
response accumulation
- streamed text chunks are emitted
-
reasoning_contentis captured from streamed extra fields before accumulator metadata is lost -
single tool call loop
- assistant tool-call message is appended to in-memory SDK request history with IDs and arguments
-
tool response includes
tool_call_id -
multi-turn tool loop
-
second request replays the first assistant tool-call message exactly
-
tool error handling
- tool errors become valid tool-response messages
Updated tests¶
Update:
internal/llm/genkit_test.go
Focus on preserving current behavior for non-DeepSeek providers.
Verification commands¶
Run after each stage:
go test ./internal/llm/...
go test ./internal/cli/repl/...
go test ./...
Risks and Mitigations¶
Risk: openai-go does not expose reasoning_content¶
Mitigation:
- use
SetExtraFields(...)to sendreasoning_contenton assistant messages - use
JSON.ExtraFieldsto read DeepSeek response fields - use
option.WithJSONSet(...)only as a fallback - capture streamed
reasoning_contentbefore the SDK accumulator drops JSON metadata
Risk: in-memory SDK request history diverges from persisted app history¶
Mitigation:
- accept that limitation for this phase
- keep the scope focused on the active tool loop only
- revisit persisted structured history only if future-turn replay becomes necessary
Risk: duplicated logic between GenkitClient and OpenAICompatibleClient¶
Mitigation:
- accept some duplication at first
- only extract shared helpers after the new client is stable
Recommended Implementation Order¶
- Keep
NewGenkitClientfor existing Genkit-backed providers. - Add
NewOpenAICompatibleClientfor DeepSeek. - Implement SDK-based request history inside
OpenAICompatibleClient. - Implement
OpenAICompatibleClientrequest building for DeepSeek chat and DeepSeek reasoner. - Add DeepSeek chat support on the new client.
- Add DeepSeek reasoner parsing and
reasoning_contentreplay. - Keep OpenAI and Moonshot on Genkit unchanged.
- Run full tests and do manual smoke tests for:
- DeepSeek chat
- DeepSeek reasoner
Definition of Done¶
The work is complete when:
- DeepSeek chat works through
OpenAICompatibleClient - DeepSeek reasoner can complete at least one tool-call round trip without the
reasoning_content400 error - OpenAI and Moonshot continue to behave as they do today
- Anthropic and Google AI continue to pass through
GenkitClient - the active DeepSeek tool loop retains enough in-memory SDK request history to replay assistant tool-call messages correctly