Browser Agent Commands
CLI and socket API reference for controlling the embedded browser from AI agents.
Browser Agent Commands
AI agents running in Codemux terminals can control the embedded browser programmatically using CLI commands or the socket API.
Detect Codemux
Check if you're inside Codemux before using browser commands:
if [ -n "$CODEMUX_WORKSPACE_ID" ]; then
# Inside Codemux — browser commands available
fiEnvironment variables set by Codemux:
CODEMUX_WORKSPACE_ID— Current workspace IDCODEMUX_SURFACE_ID— Current terminal surface ID
Setup
Create a browser pane first (only needed once per workspace):
codemux browser createCLI Commands
Navigate
codemux browser open <url>Opens a URL in the browser pane. Always use this instead of xdg-open or open.
Get Accessibility Snapshot
codemux browser snapshot [browser_id]Returns the page's accessibility tree. Use this to discover elements before interacting.
Click
codemux browser click <selector> [browser_id]Clicks an element matching the CSS selector.
Fill Input
codemux browser fill <selector> <text> [browser_id]Fills an input field with text.
Screenshot
codemux browser screenshot [browser_id]Takes a screenshot and returns it as base64-encoded PNG.
Console Logs
codemux browser console-logs [browser_id]Returns captured console output from the page.
Coordinate-Based Commands (Tier 2)
These commands use pixel coordinates instead of CSS selectors. Useful for canvas elements, iframes, shadow DOM, or when selectors aren't available. Agents typically get coordinates from a screenshot.
Click at Coordinates
# MCP tool: browser_click_at
{"x": 150, "y": 300}Moves the mouse along a human-like Bezier curve to (x, y), then clicks.
Type at Coordinates
# MCP tool: browser_type_at
{"x": 150, "y": 300, "text": "hello world"}Clicks at the coordinates, then types the text with per-character delays.
Scroll at Coordinates
# MCP tool: browser_scroll_at
{"x": 400, "y": 300, "deltaX": 0, "deltaY": -200}Scrolls at the specified position. Negative deltaY scrolls down.
Key Press
# MCP tool: browser_key_press
{"key": "Enter"}Sends a keyboard event. Supports keys like Enter, Tab, Escape, ArrowDown, etc.
Drag
# MCP tool: browser_drag
{"startX": 100, "startY": 200, "endX": 300, "endY": 200}Drags from start to end coordinates with human-like mouse movement.
OS-Level Commands (Tier 3 — Stealth)
These use ydotool to generate kernel-level input events that are indistinguishable from human interaction. Requires ydotool + ydotoold, headed browser mode, and Hyprland.
OS Click
# MCP tool: browser_click_os
{"x": 150, "y": 300}OS Type
# MCP tool: browser_type_os
{"text": "hello world"}Socket API
For programmatic control, send JSON commands over the Unix socket at $XDG_RUNTIME_DIR/codemux.sock:
echo '{"command":"browser_automation","params":{"browser_id":"default","action":{"kind":"open_url","url":"https://example.com"}}}' | nc -U $XDG_RUNTIME_DIR/codemux.sockAvailable Actions
| Action | Description |
|---|---|
open_url | Navigate to a URL |
screenshot | Capture screenshot |
snapshot | Get accessibility tree |
click | Click an element by selector |
fill | Fill an input field |
type_text | Type text (character by character) |
evaluate | Run JavaScript in the page |
back | Go back in history |
forward | Go forward in history |
reload | Reload the page |
viewport | Set viewport dimensions |
console | Get console logs |
Common Workflows
Testing a Web App
npm run dev &
codemux browser open http://localhost:3000
codemux browser snapshot
codemux browser fill "#search" "test query"
codemux browser click "#submit"
codemux browser snapshotDebugging JavaScript Errors
codemux browser console-logs
codemux browser snapshotTips
- Always get a snapshot before interacting — know what elements exist
- Prefer explicit CSS selectors over guessing
- Check console logs when behavior is unexpected
- The
browser_idparameter is optional — defaults to the active browser