Codemux

Browser Agent Commands

CLI and socket API reference for controlling the embedded browser from AI agents.

Browser Agent Commands

AI agents running in Codemux terminals can control the embedded browser programmatically using CLI commands or the socket API.

Detect Codemux

Check if you're inside Codemux before using browser commands:

if [ -n "$CODEMUX_WORKSPACE_ID" ]; then
  # Inside Codemux — browser commands available
fi

Environment variables set by Codemux:

  • CODEMUX_WORKSPACE_ID — Current workspace ID
  • CODEMUX_SURFACE_ID — Current terminal surface ID

Setup

Create a browser pane first (only needed once per workspace):

codemux browser create

CLI Commands

codemux browser open <url>

Opens a URL in the browser pane. Always use this instead of xdg-open or open.

Get Accessibility Snapshot

codemux browser snapshot [browser_id]

Returns the page's accessibility tree. Use this to discover elements before interacting.

Click

codemux browser click <selector> [browser_id]

Clicks an element matching the CSS selector.

Fill Input

codemux browser fill <selector> <text> [browser_id]

Fills an input field with text.

Screenshot

codemux browser screenshot [browser_id]

Takes a screenshot and returns it as base64-encoded PNG.

Console Logs

codemux browser console-logs [browser_id]

Returns captured console output from the page.

Coordinate-Based Commands (Tier 2)

These commands use pixel coordinates instead of CSS selectors. Useful for canvas elements, iframes, shadow DOM, or when selectors aren't available. Agents typically get coordinates from a screenshot.

Click at Coordinates

# MCP tool: browser_click_at
{"x": 150, "y": 300}

Moves the mouse along a human-like Bezier curve to (x, y), then clicks.

Type at Coordinates

# MCP tool: browser_type_at
{"x": 150, "y": 300, "text": "hello world"}

Clicks at the coordinates, then types the text with per-character delays.

Scroll at Coordinates

# MCP tool: browser_scroll_at
{"x": 400, "y": 300, "deltaX": 0, "deltaY": -200}

Scrolls at the specified position. Negative deltaY scrolls down.

Key Press

# MCP tool: browser_key_press
{"key": "Enter"}

Sends a keyboard event. Supports keys like Enter, Tab, Escape, ArrowDown, etc.

Drag

# MCP tool: browser_drag
{"startX": 100, "startY": 200, "endX": 300, "endY": 200}

Drags from start to end coordinates with human-like mouse movement.

OS-Level Commands (Tier 3 — Stealth)

These use ydotool to generate kernel-level input events that are indistinguishable from human interaction. Requires ydotool + ydotoold, headed browser mode, and Hyprland.

OS Click

# MCP tool: browser_click_os
{"x": 150, "y": 300}

OS Type

# MCP tool: browser_type_os
{"text": "hello world"}

Socket API

For programmatic control, send JSON commands over the Unix socket at $XDG_RUNTIME_DIR/codemux.sock:

echo '{"command":"browser_automation","params":{"browser_id":"default","action":{"kind":"open_url","url":"https://example.com"}}}' | nc -U $XDG_RUNTIME_DIR/codemux.sock

Available Actions

ActionDescription
open_urlNavigate to a URL
screenshotCapture screenshot
snapshotGet accessibility tree
clickClick an element by selector
fillFill an input field
type_textType text (character by character)
evaluateRun JavaScript in the page
backGo back in history
forwardGo forward in history
reloadReload the page
viewportSet viewport dimensions
consoleGet console logs

Common Workflows

Testing a Web App

npm run dev &
codemux browser open http://localhost:3000
codemux browser snapshot
codemux browser fill "#search" "test query"
codemux browser click "#submit"
codemux browser snapshot

Debugging JavaScript Errors

codemux browser console-logs
codemux browser snapshot

Tips

  1. Always get a snapshot before interacting — know what elements exist
  2. Prefer explicit CSS selectors over guessing
  3. Check console logs when behavior is unexpected
  4. The browser_id parameter is optional — defaults to the active browser