Plan: Add web_fetch Tool¶
Context¶
Keen's LLM agent needs the ability to retrieve web content during task execution — for reading documentation, fetching API specs, or checking URLs referenced in code. HTML responses are converted to Markdown using html-to-markdown for cleaner LLM consumption. The tool follows the same Tool interface pattern as all existing tools.
New Dependency¶
Add github.com/JohannesKaufmann/html-to-markdown for HTML-to-Markdown conversion. Run go get github.com/JohannesKaufmann/html-to-markdown and go mod tidy after implementation.
Implementation¶
New file: internal/tools/web_fetch.go¶
Implement WebFetchTool struct — no guard or permissionRequester needed (read-only network operation, no filesystem involvement):
type WebFetchTool struct{}
func NewWebFetchTool() *WebFetchTool
func (t *WebFetchTool) Name() string // "web_fetch"
func (t *WebFetchTool) Description() string
func (t *WebFetchTool) InputSchema() map[string]any
func (t *WebFetchTool) Execute(ctx context.Context, input any) (any, error)
Input schema — single required parameter:
- url (string, required): The URL to fetch
Execute logic:
1. Assert input.(map[string]any), extract and validate url is a non-empty string
2. http.NewRequestWithContext(ctx, "GET", url, nil) with a User-Agent header
3. Use http.Client{Timeout: 30 * time.Second}
4. Read body, truncate at 512KB with ... (content truncated) suffix
5. If Content-Type contains text/html, convert body to Markdown using html-to-markdown; otherwise use body as-is
6. Return map[string]any{"url": url, "status_code": resp.StatusCode, "content": content}
7. Non-2xx responses: return body + status code (not an error — LLM should see the status)
8. Network errors: return error
Limitations to document in Description(): JavaScript-rendered pages (SPAs) will return the pre-JS skeleton, not the full content.
New file: internal/tools/web_fetch_test.go¶
Tests using httptest.NewServer to avoid real network calls:
- Name() returns "web_fetch"
- Description() is non-empty
- InputSchema() has correct shape (type: object, url property, required: ["url"], additionalProperties: false)
- Execute() with missing url returns error
- Execute() with invalid URL type returns error
- Execute() with HTML response: assert status_code == 200 and content is non-empty Markdown (not raw HTML)
- Execute() with plain text / JSON response: returns body as-is
- Execute() with non-2xx response: returns status_code + body without error
- Execute() truncates large responses at 512KB
Wiring: internal/cli/repl/tooling/tool_registry.go¶
Add after bashTool registration:
webFetchTool := tools.NewWebFetchTool()
appState.RegisterTool(webFetchTool)
Critical Files¶
| File | Change |
|---|---|
internal/tools/web_fetch.go |
New — tool implementation |
internal/tools/web_fetch_test.go |
New — unit tests |
internal/cli/repl/tooling/tool_registry.go |
Register the new tool |
go.mod / go.sum |
Add html-to-markdown dependency |
Verification¶
go get github.com/JohannesKaufmann/html-to-markdown
go mod tidy
go test ./internal/tools/... -run TestWebFetch -v
go test ./internal/tools/...
go test ./...
gofmt -w internal/tools/web_fetch.go