Chrome DevTools MCP: Now Your Agent Can Actually Run the Trace

Every time I've tried to do performance work with an AI agent, the session follows the same pattern. I ask it to improve LCP. It suggests maybe the images are large, maybe there's render-blocking JavaScript, maybe I should preload the font. All reasonable. None of it grounded in what's actually happening on my page.

Then I go to DevTools, run a trace, see the real bottleneck, describe it back to the agent in plain text. The agent fixes that thing. I run the trace again, describe the new state. Repeat until done.

The agent is useful, but the loop has a human in the middle. You are the relay between the agent and the browser. That's the part Chrome DevTools MCP closes.

What the Agent Couldn't See Before

An AI coding agent has no native view into the browser. It can read your source files, suggest changes, reason about patterns. But it can't run your page and observe what happens. It can't see that your LCP element is a 2.4MB unoptimized image. It can't see that a third-party script is blocking the main thread for 800ms before your app even starts. It can only reason about those possibilities.

This isn't a model capability problem. It's an access problem. The data exists in Chrome DevTools, the same panel you'd open yourself. The agent just had no path to it.

Chrome DevTools MCP is a Model Context Protocol server, released by the Chrome DevTools team in September 2025, that gives your agent a direct path to a live Chrome instance. It connects AI coding assistants like Claude to Chrome via the Chrome DevTools Protocol — the same protocol underlying DevTools itself.

The architecture is: AI agent → MCP tools → chrome-devtools-mcp server → Puppeteer → CDP → Chromium. The MCP layer abstracts CDP's low-level complexity into usable high-level tools.

The Performance Toolset

The MCP server exposes 29 tools across eight categories: input automation, navigation, emulation, performance, network, debugging, extensions, and memory. The performance-specific ones are what matter here:

performance_start_trace — starts a Chrome performance trace on a target URL
performance_stop_trace — ends the trace and collects the data
performance_analyze_insight — extracts actionable metrics from the trace: LCP, Total Blocking Time, TTFB, render delays
lighthouse_audit — runs a full Lighthouse audit and returns scores across performance, accessibility, best practices, and SEO

These map directly to what you'd do manually in the Performance panel. Start recording, interact with the page or wait for load, stop recording, read the flame chart. The agent does all of this in sequence, without you opening a single tab. The Lighthouse tool is a bonus: you get structured quality scores alongside the trace analysis.

The Part That Surprised Me

Chrome performance traces are enormous. A trace of a moderately complex page easily generates 20–30MB of JSON. The concern you'd have is reasonable: can an agent actually make sense of that?

The answer is that it doesn't receive the raw file. The MCP server processes the trace through Chrome's own PerformanceTraceFormatter and generates a compact summary. A 29.8MB trace from Substack came through as under 4KB of text: 48 lines with the metrics, breakdowns, and actionable insights pre-extracted. That's a compression ratio somewhere around 7,000:1.

So the agent isn't reading a giant JSON dump. It's receiving something closer to a structured diagnostic report: LCP is 4.2 seconds, the LCP element is this image, load delay accounts for 3.1 seconds of that, here are the contributing factors. The model reads that and reasons about fixes. Efficiently and specifically.

A Real Workflow

Here's what the loop looks like in practice, with and without Chrome DevTools MCP.

Before: Agent writes code with no browser access.

You:  My page has a high LCP. Can you help?
Agent: Possible causes: large images, render-blocking CSS,
       no resource hints, slow TTFB...
You:  [opens DevTools, runs trace, reads flame chart]
You:  It's a 2.4MB hero image. LCP is 4.2s.
Agent: Add fetchpriority="high", convert to WebP, add width/height.
You:  [makes changes, runs trace again]
You:  LCP is now 1.9s. Still slow. What else?
Agent: [guesses again]

After: Agent runs the trace directly.

You:  Check the LCP of my local dev server at localhost:3000
Agent: [runs performance_start_trace on localhost:3000]
       [runs performance_stop_trace]
       [runs performance_analyze_insight]
       LCP: 4.2s. LCP element: <img src="/hero.jpg"> (2.4MB, no fetchpriority).
       Load delay: 3.1s. Render delay: 0.4s.
       Fixing: converting to WebP, adding fetchpriority="high",
       explicit width/height to prevent layout shift.
       [makes the changes]
       Re-checking...
       LCP: 1.1s. Improvement: 74%.

The agent ran the trace, read the data, made the changes, and re-profiled to confirm. You reviewed the diff. That's it.

The DebugBear team documented a similar result: a simple HTML page had an LCP of 1,383ms under Slow 4G throttling. The agent identified a synchronously-loaded jQuery script, deferred it, and brought LCP to 628ms — a 55% drop. One trace, one fix, one confirmation trace.

Setup

Add this to your MCP client config, wherever your editor or Claude Code reads MCP server definitions:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "chrome-devtools-mcp@latest"]
    }
  }
}

If token budget is a concern — the 29 tools consume roughly 18K tokens upfront when loaded — pass --slim to reduce context overhead:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "chrome-devtools-mcp@latest", "--slim"]
    }
  }
}

Restart your client. Then try: "Check the performance of localhost:3000". The agent will launch Chrome, visit the URL, run a trace, and return a structured analysis.

It works with Claude Code, Claude Desktop, Cursor, Cline, VS Code Copilot, Windsurf, and Gemini CLI — anything that supports MCP servers.

What It Doesn't Replace

A few honest caveats.

Authenticated pages are tricky. When the MCP server launches Chrome, it spawns a sandboxed browser instance — no extensions, no existing cookies, no saved sessions. If the page you're profiling requires login, the agent can't access it without additional setup. This is the biggest practical limitation for most real-world apps.

Raw flame charts still need you. The performance summary is useful, but it's a summarized view. For deep analysis — tracing a specific React component's render time, understanding a complex scheduling issue — you'll still want to open the Performance panel yourself and read the raw data.

Architectural issues need your context. The agent can't fix problems that require context it doesn't have. If your LCP is slow because of a server-side rendering decision made three months ago, the trace will surface the symptom but the agent will need your help understanding the cause.

Local dev ≠ production. Throttling helps, but your local environment is missing CDN caching, real user network conditions, and server latency. Always verify improvements against production metrics, not just localhost traces.

Watch what goes into the context window. The agent sees DOM content, network payloads, and console logs from the page it's analyzing. Don't point it at pages with PII, session tokens, or sensitive admin data — that content is passed to your model provider.

One thing worth knowing: as of Chrome M144 (April 2026), there's an auto-connect feature that lets the agent "teleport" into your existing browser window and work side-by-side with you, rather than always opening a new isolated instance. That partially addresses the authentication problem — if you're already logged in, the agent can inherit your session.

The improvement here isn't that AI got better at performance optimization. It's that the feedback loop changed. Before, the loop required you to run the measurement, interpret the data, and hand it back to the agent as prose. That's a lossy translation step. You describe what you saw; the agent works from your description.

Now the agent reads the same data you'd read. It works from the trace, not from your summary of the trace. That's the difference between an agent that can optimize and one that can optimize predictably.

The closed loop is the feature.