Phase-3: GrepTool Design¶
This design adds a permission-gated GrepTool that allows LLMs to search for text patterns within files while enforcing filesystem guard boundaries and maintaining the consistent REPL approval UX.
1) Requirements Mapping¶
| Prompt Requirement | Design Decision |
|---|---|
Same permission mechanism as ReadFile and Glob |
Use Guard.CheckPath(path, "read"): Granted = no prompt (working dir), Denied = reject, Pending = REPL Allow/Deny prompt. |
| Recursive search from working directory | Walk the directory tree using filepath.WalkDir, skipping blocked/gitignored paths via Guard.IsBlocked. |
pattern parameter (required) |
Regex pattern string compiled via regexp.Compile. |
path parameter (optional, defaults to working dir) |
Resolved via guard.ResolvePath; defaults to "" which resolves to working dir. |
include parameter (optional glob filter) |
When provided, only files matching the glob (via doublestar.Match) are searched. When empty, all non-blocked text files are included. |
output_mode parameter (file or content) |
file: return list of matching file paths. content: return list of {file, line_number, line} entries. |
| Errors sent back to LLM | Execute(...) returns error; existing tool loop wraps as {"error": "...message..."}. |
2) Current Architecture Fit¶
Existing hooks¶
- Tool contract:
internal/tools/tool.go(Tool+Registry). - Tool execution loop:
internal/llm/genkit.go(executeTools(...)). - REPL permission requester:
internal/cli/repl/permission_requester.go. - Path boundary policy:
internal/filesystem/guard.go. - Git awareness filtering:
internal/filesystem/gitawareness.go. - Existing tool patterns:
internal/tools/read_file.go,internal/tools/glob.go. - Glob library already in use:
github.com/bmatcuk/doublestar/v4.
New pieces¶
internal/tools/grep.go- Implements
tools.Toolasgrep. - Contains pattern compilation, guard checks, recursive file walking, line-level matching, result formatting.
- Permission mediation (reuse existing)
- Use
PermissionRequesterinterface for pending paths. - No new dependencies
- Uses stdlib
regexpfor pattern matching. - Reuses
doublestar/v4(already a dependency) forincludeglob filtering. - Reuses
readFileContenthelper fromread_file.gofor text file validation (or similar logic for binary/null-byte checks).
3) Tool Contract¶
Tool name¶
grep
Description¶
Search for text patterns in files recursively after filesystem policy + user permission checks.
Input schema¶
{
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Regular expression pattern to search for in file contents"
},
"path": {
"type": "string",
"description": "Optional base directory for the search (defaults to working directory)"
},
"include": {
"type": "string",
"description": "Optional glob pattern to filter which files to search (e.g., '*.go', '**/*.md')"
},
"output_mode": {
"type": "string",
"enum": ["file", "content"],
"description": "Output mode: 'file' returns matching file paths, 'content' returns matching lines with file and line number (defaults to 'content')"
}
},
"required": ["pattern"],
"additionalProperties": false
}
Success output — file mode¶
{
"pattern": "func.*Handler",
"base_path": "/resolved/search/path",
"output_mode": "file",
"files": [
"/resolved/search/path/handler.go",
"/resolved/search/path/internal/server.go"
],
"count": 2
}
Success output — content mode¶
{
"pattern": "func.*Handler",
"base_path": "/resolved/search/path",
"output_mode": "content",
"matches": [
{
"file": "/resolved/search/path/handler.go",
"line_number": 15,
"line": "func NewHandler(cfg Config) *Handler {"
},
{
"file": "/resolved/search/path/internal/server.go",
"line_number": 42,
"line": "func RegisterHandler(mux *http.ServeMux) {"
}
],
"count": 2
}
Error output behavior¶
Execute(...) returns error; existing tool loop wraps as:
{ "error": "...message..." }
4) Permission and Interaction Flow¶
Guard-first decision flow¶
- Parse
pattern, optionalpath, optionalinclude, optionaloutput_modefrom tool input. - Determine base path for search:
- If
pathprovided:basePath := guard.ResolvePath(path) - Else: resolve
""(working directory). perm := guard.CheckPath(basePath, "read").- Branch:
PermissionDenied-> fail immediately (blocked path).PermissionGranted-> continue search.PermissionPending-> request REPL approval.
REPL approval flow (for PermissionPending)¶
Reuse existing mechanism from read_file and glob:
1. Emit permission request event with:
- tool name (grep),
- requested base path,
- resolved base path,
- operation (read).
2. REPL shows permission selector (Allow/Deny/Allow for this session).
3. User navigates and confirms.
4. Choice sent back to tool execution.
5. Tool execution resumes based on response.
5) Search Algorithm¶
File discovery¶
- Walk the directory tree from the resolved base path using
filepath.WalkDir. - At each directory entry:
- If directory and
guard.IsBlocked(path)->fs.SkipDir. - If file and
guard.IsBlocked(path)-> skip file. - If
includeglob is provided, check each file against the glob pattern usingdoublestar.Match(applied to the relative path from base). - For each candidate file, attempt to read contents:
- Skip binary files (contains null bytes).
- Skip non-UTF-8 files.
- Skip files exceeding 1MB (reuse
maxFileSizeconstant fromread_file.go).
Pattern matching¶
- Compile
patternusingregexp.Compileonce before traversal. - For each valid text file, scan line by line.
- For
filemode: on first match in a file, record the file path and move to the next file. - For
contentmode: record every matching line with file path and line number.
Result limiting¶
- Cap total matches at 1000 (reuse
maxFileLimitconcept fromglob.go). - In
filemode: cap at 1000 matching files. - In
contentmode: cap at 1000 matching lines. - On overflow, return error: "search too broad: found more than 1000 matches".
6) Error Taxonomy¶
Standardized error categories/messages:
1. invalid input: missing/empty/non-string pattern, invalid output_mode value.
2. invalid pattern: regex compilation failure (include the regex error message).
3. permission denied by policy: blocked/sensitive path (PermissionDenied).
4. permission denied by user: user selected Deny for pending path.
5. path resolution failed: cannot resolve the base path.
6. search too broad: more than 1000 matches found.
7. search failed: IO error during filesystem traversal.
Error text should include pattern and path context when safe and useful.
7) Key Design Decisions¶
- Regex over literal strings: Use
regexpfor flexibility. LLMs can useregexp.QuoteMeta-style exact strings if needed. The regex gives more power with minimal complexity. - Shared text file validation: Reuse the binary/null-byte/UTF-8/size checks from
read_file.go. Extract or duplicate the helper (isTextFilelogic) to avoid coupling. SincereadFileContentis unexported and tightly coupled, duplicate the lightweight check inline. - Line-by-line scanning: Use
bufio.Scannerfor memory efficiency — never load entire file into memory for grep (unlikeread_filewhich reads whole file). This allows searching files close to the 1MB limit without excessive memory use. - Default output mode: Default to
contentsince it's the most useful for LLMs (they get context immediately without a follow-upread_filecall). - Include glob uses doublestar: Reuse the existing
doublestardependency for consistency withglobtool behavior. - No
excludeparameter: Keep the interface simple. Theincludeglob combined with guard/gitignore filtering covers the common cases. Can be added later if needed.
8) Granular Implementation Todo List¶
- Create
internal/tools/grep.gowithGrepToolstruct. - Implement constructor
NewGrepTool(guard, permissionRequester). - Implement
Name()returning"grep". - Implement
Description()returning search description. - Implement
InputSchema()withpattern(required),path(optional),include(optional),output_mode(optional enum). - Implement
Execute(): a. Parse and validate input map (pattern required, non-empty string). b. Parse optionalpath(default""). c. Parse optionalincludeglob; validate withdoublestar.ValidatePatternif provided. d. Parse optionaloutput_mode(default"content"); reject invalid values. e. Compile regex pattern viaregexp.Compile. f. Resolve base path viaguard.ResolvePath. g. Check permission viaguard.CheckPath— branch onDenied/Granted/Pending. h. ForPending, callpermissionRequester.RequestPermission. - Implement
searchFiles()private method: a. Walk directory tree withfilepath.WalkDir. b. Skip blocked directories/files viaguard.IsBlocked. c. Ifincludeglob set, filter files viadoublestar.Matchon relative path. d. For each candidate file, callsearchInFile(). e. Enforce 1000 match limit; return error on overflow. - Implement
searchInFile()private method: a. Open file, check size <= 1MB. b. Usebufio.Scannerto read line by line. c. Check first bytes for null byte (binary detection) — read a small buffer first, or check as lines are scanned. d. Match each line against compiled regex. e. Forfilemode: return on first match. f. Forcontentmode: collect all matching lines with line numbers. - Assemble return payload:
a.
filemode:{pattern, base_path, output_mode, files, count}. b.contentmode:{pattern, base_path, output_mode, matches, count}. - Register
GrepToolininternal/cli/repl/repl.goininitialModel()alongsideReadFileToolandGlobTool. - Create
internal/tools/grep_test.gowith tests: a.TestGrepTool_Name— returns"grep". b.TestGrepTool_Description— non-empty. c.TestGrepTool_InputSchema— validates schema structure. d.TestGrepTool_Execute_InvalidInput— nil, wrong type, missing pattern, empty pattern. e.TestGrepTool_Execute_InvalidPattern— bad regex syntax. f.TestGrepTool_Execute_InvalidOutputMode— unsupported output_mode value. g.TestGrepTool_Execute_FileMode— matches correct files, returns paths only. h.TestGrepTool_Execute_ContentMode— matches correct lines with file/line_number/line. i.TestGrepTool_Execute_DefaultContentMode— omitting output_mode defaults to content. j.TestGrepTool_Execute_IncludeFilter— only searches files matching include glob. k.TestGrepTool_Execute_RecursiveSearch— finds matches in nested directories. l.TestGrepTool_Execute_NoMatches— returns empty results, no error. m.TestGrepTool_Execute_BinaryFileSkipped— binary files silently skipped. n.TestGrepTool_Execute_PendingSearch_Allow— permission granted by user. o.TestGrepTool_Execute_PendingSearch_Deny— permission denied by user. p.TestGrepTool_Execute_BlockedPath— guard-blocked path returns error. q.TestGrepTool_Execute_MatchLimit— >1000 matches returns error. r.TestGrepTool_Execute_RelativePath— relative base path resolves correctly. - Run
go test ./internal/tools/...and verify all tests pass. - Run
go build ./...and verify compilation.
9) Definition of Done¶
greptool can search for regex patterns in files recursively.- Supports both
fileandcontentoutput modes. - Supports optional
includeglob filter for file selection. - Supports both relative and absolute base paths.
- All searches respect filesystem guard boundaries and gitignore rules.
- Binary and non-UTF-8 files are silently skipped.
- Pending paths always trigger REPL Allow/Deny prompt.
- Denied policy or denied user choice never performs search.
- Patterns are validated (regex compilation) and rejected if invalid.
- Search limited to 1000 matches with clear error on overflow.
- Tests cover critical success/error paths and permission interaction behavior.
- Tool is registered and available in the REPL.