KEMBAR78
GitHub - Operative-Sh/web-eval-agent: An MCP server that autonomously evaluates web applications.
Skip to content

Operative-Sh/web-eval-agent

Repository files navigation

πŸš€ operative.sh web-eval-agent MCP Server

Let the coding agent debug itself, you've got better things to do.

Demo

πŸ”₯ Supercharge Your Debugging

operative.sh's MCP Server launches a browser-use powered agent to autonomously execute and debug web apps directly in your code editor.

⚑ Features

  • 🌐 Navigate your webapp using BrowserUse (2x faster with operative backend)
  • πŸ“Š Capture network traffic - requests are intelligently filtered and returned into the context window
  • 🚨 Collect console errors - captures logs & errors
  • πŸ€– Autonomous debugging - the Cursor agent calls the web QA agent mcp server to test if the code it wrote works as epected end-to-end.

🧰 MCP Tool Reference

Tool Purpose
web_eval_agent πŸ€– Automated UX evaluator that drives the browser, captures screenshots, console & network logs, and returns a rich UX report.
setup_browser_state πŸ”’ Opens an interactive (non-headless) browser so you can sign in once; the saved cookies/local-storage are reused by subsequent web_eval_agent runs.

Key arguments

  • web_eval_agent

    • url (required) – address of the running app (e.g. http://localhost:3000)
    • task (required) – natural-language description of what to test ("run through the signup flow and note any UX issues")
    • headless_browser (optional, default false) – set to true to hide the browser window
  • setup_browser_state

    • url (optional) – page to open first (handy to land directly on a login screen)

You can trigger these tools straight from your IDE chat, for example:

Evaluate my app at http://localhost:3000 – run web_eval_agent with the task "Try the full signup flow and report UX issues".

🏁 Quick Start

Easy Setup with One-Click Integration

  1. Get your API key (free) - when you create your API key, you'll see:
    • "Add to Cursor" button with a deeplink for instant Cursor installation
    • Prefilled Claude Code command with your API key automatically included

Manual Setup (macOS/Linux)

  1. Pre-requisites (typically not needed):
  • brew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • npm: (brew install npm)
  • jq: brew install jq
  1. Run the installer after getting an api key (free)
curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh
  1. Visit your favorite IDE and restart to apply the changes
  2. Send a prompt in chat mode to call the web eval agent tool! e.g.
Test my app on http://localhost:3000. Use web-eval-agent.

πŸ› οΈ Manual Installation

  1. Get your API key at operative.sh/mcp
  2. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Source environment variables after installing UV

Mac

source ~/.zshrc

Linux

source ~/.bashrc 
  1. Install playwright:
npm install -g chromium playwright && uvx --with playwright playwright install --with-deps
  1. Add below JSON to your relevant code editor with api key
  2. Restart your code editor

πŸ”ƒ Updating

  • uv cache clean
  • refresh MCP server
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "--refresh-package",
        "webEvalAgent",
        "--from",
        "git+https://github.com/Operative-Sh/web-eval-agent.git",
        "webEvalAgent"
      ],
      "env": {
        "OPERATIVE_API_KEY": "<YOUR_KEY>"
      }
    }

πŸ› οΈ Manual Installation (Mac + Cursor/Cline/Windsurf)

  1. Get your API key at operative.sh/mcp
  2. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh)
  1. Install playwright:
npm install -g chromium playwright && uvx --with playwright playwright install --with-deps
  1. Add below JSON to your relevant code editor with api key
  2. Restart your code editor

Manual Installation (Windows + Cursor/Cline/Windsurf)

We're refining this, please open an issue if you have any issues!

  1. Do all this in your code editor terminal
  2. curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh
  3. Get your API key at operative.sh/mcp
  4. Install uv (curl -LsSf https://astral.sh/uv/install.sh | sh)
  5. uvx --from git+https://github.com/Operative-Sh/web-eval-agent.git playwright install
  6. Restart code editor

🚨 Issues

  • Updates aren't being received in code editors, update or reinstall for latest version: Run uv cache clean for latest
  • Any issues feel free to open an Issue on this repo or in the discord!
  • 5/5 - static apps without changes weren't screencasting, fixed! uv clean + restart to get fix

Changelog

  • 4/29 - Agent overlay update - pause/play/stop agent run in the browser

πŸ“‹ Example MCP Server Output Report

πŸ“Š Web Evaluation Report for http://localhost:5173 complete!
πŸ“ Task: Test the API-key deletion flow by navigating to the API Keys section, deleting a key, and judging the UX.

πŸ” Agent Steps
  πŸ“ 1. Navigate β†’ http://localhost:5173
  πŸ“ 2. Click     "Login"        (button index 2)
  πŸ“ 3. Click     "API Keys"     (button index 4)
  πŸ“ 4. Click     "Create Key"   (button index 9)
  πŸ“ 5. Type      "Test API Key" (input index 2)
  πŸ“ 6. Click     "Done"         (button index 3)
  πŸ“ 7. Click     "Delete"       (button index 10)
  πŸ“ 8. Click     "Delete"       (confirm index 3)
🏁 Flow tested successfully – UX felt smooth and intuitive.

πŸ–₯️ Console Logs (10)
  1. [debug] [vite] connecting…
  2. [debug] [vite] connected.
  3. [info]  Download the React DevTools …
     …

🌐 Network Requests (10)
  1. GET /src/pages/SleepingMasks.tsx                   304
  2. GET /src/pages/MCPRegistryRegistry.tsx             304
     …

⏱️ Chronological Timeline
  01:16:23.293 πŸ–₯️ Console [debug] [vite] connecting…
  01:16:23.303 πŸ–₯️ Console [debug] [vite] connected.
  01:16:23.312 ➑️ GET /src/pages/SleepingMasks.tsx
  01:16:23.318 ⬅️ 304 /src/pages/SleepingMasks.tsx
     …
  01:17:45.038 πŸ€– 🏁 Flow finished – deletion verified
  01:17:47.038 πŸ€– πŸ“‹ Conclusion repeated above
πŸ‘οΈ  See the "Operative Control Center" dashboard for live logs.

Star History

Star History Chart


Built with <3 @ operative.sh