Browser
Embedded browser pane with screenshot-based rendering, agent automation, and viewport sync.
Browser
Codemux includes an embedded browser pane for viewing web apps, running tests, and letting AI agents interact with pages programmatically.
How It Works
The browser uses screenshot-based rendering with a 1-second refresh cycle. A Chromium instance runs in the background, captures screenshots, and streams them to the pane via WebSocket. User clicks are mapped from display coordinates to the actual viewport.
Stealth Mode
The browser launches with anti-detection flags to avoid triggering bot protection on sites like Cloudflare:
- Removes
navigator.webdriverdetection - Spoofs user-agent to match the installed Chrome version
- Disables automation-related Blink features and infobars
- Disables background throttling for consistent behavior
Browser Toolbar
The toolbar at the top of every browser pane includes:
- URL bar — Type a URL and press Enter to navigate. Bare domains auto-prefix with
https://. - Back / Forward — Standard navigation history
- Refresh — Reload the current page
- Home — Reset to
about:blank - External link — Open the current URL in your system browser
Creating a Browser Pane
- Press
Ctrl+Shift+Bto create a new browser tab - Or click the
+dropdown in the tab bar and select "Browser" - Or use the Shell + Browser workspace preset
Agent Automation
AI agents running in Codemux terminals can control the browser programmatically. See Browser Agent Commands for the full reference.
codemux browser open http://localhost:3000
codemux browser snapshot # Get accessibility tree
codemux browser click "#submit" # Click an element
codemux browser fill "#email" "test@example.com"
codemux browser screenshot # Capture as base64 PNG
codemux browser console-logs # Get console outputViewport Sync
When the browser pane resizes, the viewport dimensions sync automatically. The rendered page adapts to the pane size.
Hybrid Input System
Browser automation uses a three-tier input architecture. Agents choose the appropriate tier based on the target:
Tier 1: Selector-based (fastest)
Standard CSS selector interaction via CDP. Best for regular web apps.
browser_click— Click element by CSS selectorbrowser_fill— Fill input by CSS selector
Tier 2: Coordinate-based CDP (vision-capable)
Pixel-coordinate interaction via Chrome DevTools Protocol. Works on iframes, shadow DOM, canvas, and protected form fields.
browser_click_at— Click at (x, y) coordinatesbrowser_type_at— Click at coordinates then type textbrowser_scroll_at— Scroll at coordinatesbrowser_key_press— Send keyboard eventsbrowser_drag— Drag from one point to another
Mouse movements use Bezier curve interpolation with randomized control points for human-like motion.
Tier 3: OS-level input (stealth)
Kernel-level input events via ydotool that are indistinguishable from real human interaction. Bypasses anti-bot detection (Cloudflare Turnstile, etc.).
browser_click_os— OS-level click at coordinatesbrowser_type_os— OS-level keyboard typing
Requires ydotool + ydotoold daemon, headed browser mode, and Hyprland window manager for window geometry.
Limitations
- Screenshot-driven rendering (not a native embedded webview)
- Console logs are captured but not a full live stream
- OS-level input (Tier 3) requires ydotool and headed mode
- Tier 3 input requires Hyprland for window geometry detection