OpenClaw Browser 深度解析：让 AI Agent 长出眼睛和手

🎯 一句话版本

关于Openclaw Browser Architecture Deep Dive Report的深度研究报告

1. 一句话概括

OpenClaw 内置了一套完整的浏览器自动化系统——让 AI Agent 能像人一样打开网页、阅读内容、点击按钮、填写表单、截图验证。底层是 Playwright + Chrome DevTools Protocol (CDP)，上层封装成 Agent 可直接调用的工具。

2. 架构总览


┌─────────────────────────────────────┐
│           AI Agent（LLM）            │
│   "帮我打开 GitHub 看看 PR 列表"     │
└──────────────┬──────────────────────┘
               │ tool call
               ▼
┌─────────────────────────────────────┐
│     browser-tool.ts (780+ 行)       │
│  解析参数 → 路由到对应 action        │
│  13 种 action：open/snapshot/act... │
└──────────────┬──────────────────────┘
               │ HTTP
               ▼
┌─────────────────────────────────────┐
│   Browser Control Server            │
│   server.ts + server-context.ts     │
│   端口 18791                         │
│   管理 Profile、Tab、会话状态        │
└──────────────┬──────────────────────┘
               │
       ┌───────┴────────┐
       ▼                ▼
┌──────────────┐  ┌──────────────────┐
│  Playwright  │  │  Chrome MCP      │
│  pw-*.ts     │  │  chrome-mcp.ts   │
│  (默认驱动)   │  │  (existing-session│
│              │  │   驱动)           │
└──────┬───────┘  └────────┬─────────┘
       │                   │
       ▼                   ▼
┌─────────────────────────────────────┐
│        Chrome / Chromium             │
│   CDP 端口: 18800-18899             │
│   每个 Profile 一个端口              │
└─────────────────────────────────────┘

源码规模

目录	文件数	总行数	说明
`src/browser/`	130+	22,183	浏览器核心
`src/agents/tools/browser-tool.ts`	1	780+	Agent 工具接口
`src/agents/tools/browser-tool.actions.ts`	1	-	Action 处理
合计	~135	~23,000	OpenClaw 最大的子系统之一

3. 双驱动架构：Playwright vs Chrome MCP

OpenClaw 支持两种浏览器驱动，解决不同场景：

3.1 Playwright 驱动（默认）


openclaw 自己启动 Chrome → Playwright 通过 CDP 控制

OpenClaw 启动一个独立的无头 Chrome 实例
拥有独立的 user-data 目录（~/.openclaw/browser/openclaw/user-data）
干净环境，无登录状态
适合：网页抓取、自动化测试、截图验证

启动参数（源码 chrome.ts）：


--headless=new
--no-sandbox
--disable-gpu
--disable-dev-shm-usage
--remote-debugging-port=18800
--ozone-platform=headless

3.2 Chrome MCP 驱动（existing-session）


用户已打开的 Chrome → Browser Relay 扩展 → Chrome MCP → OpenClaw

不启动新浏览器，接管用户已有的 Chrome 标签
需要安装 OpenClaw Browser Relay 浏览器扩展
能利用用户已登录的 session（GitHub、Gmail、后台管理系统等）
通过 chrome-devtools-mcp NPM 包实现

源码揭示的启动命令：


npx -y chrome-devtools-mcp@latest \
  --autoConnect \
  --experimentalStructuredContent \
  --experimental-page-id-routing

3.3 对比

维度	Playwright（openclaw）	Chrome MCP（existing-session）
启动方式	OpenClaw 自动启动	用户手动点击扩展
登录状态	无（干净环境）	有（用户的 session）
内存占用	~350MB 常驻	0（复用已有浏览器）
适用场景	爬取、截图、自动化	操作已登录的网站
服务器可用	✅	❌（需要桌面环境）
稳定性	高（完全可控）	中（依赖用户浏览器状态）

4. 13 种 Action 详解

从 browser-tool.ts 源码中提取的完整 action 列表：

4.1 页面导航

Action	功能	典型用途
`open`	打开新标签页并导航到 URL	访问新网站
`navigate`	当前标签页导航到新 URL	页面内跳转

4.2 页面感知（Agent 的"眼睛"）

Action	功能	返回格式	Token 消耗
`snapshot`	获取页面 Accessibility Tree	纯文本	低
`screenshot`	页面截图	PNG 图片	高（需视觉模型）
`console`	获取浏览器控制台日志	文本	低
`pdf`	页面导出 PDF	文件	-

Snapshot：最核心的创新

Snapshot 是 OpenClaw Browser 最聪明的设计。它不截图，而是返回页面的 Accessibility Tree——一种结构化的文本表示：


- heading "Google" [level=1]
- search textbox "Search"              [ref=e12]
- button "Google Search"               [ref=e13]
- button "I'm Feeling Lucky"           [ref=e14]
- navigation "More options"
  - link "Gmail"                       [ref=e15]
  - link "Images"                      [ref=e16]

为什么这比截图好？

1. 省 Token：纯文本几百字 vs 图片几万 token

2. 可操作：每个交互元素有 ref，Agent 可以直接 act(kind=click, ref=e12) 点击

3. 不需要视觉模型：文本模型就能理解

4. 更快：不需要渲染+传输图片

三类 ARIA Role（源码 snapshot-roles.ts）：

类别	角色	处理方式
Interactive（可交互）	button, checkbox, link, textbox, slider...	总是分配 ref
Content（内容）	heading, article, cell, listitem...	有名称时分配 ref
Structural（结构）	group, list, table, grid, menu...	compact 模式下跳过

两种 ref 模式：

refs="role"（默认）：基于 role+name 生成，如 button "Submit"
refs="aria"：Playwright 原生 aria-ref ID，跨调用稳定

4.3 页面交互（Agent 的"手"）

所有交互通过 act action 实现，kind 参数指定操作类型：

kind	功能	参数	示例
`click`	点击元素	ref, doubleClick, button	点击按钮
`type`	逐字输入	ref, text, slowly	搜索框输入
`fill`	直接填充	ref, text	表单字段
`press`	按键	key, modifiers	Enter, Ctrl+A
`hover`	悬停	ref	触发下拉菜单
`select`	选择下拉选项	ref, values	`
`drag`	拖拽	startRef, endRef	拖放操作
`resize`	调整窗口大小	width, height	响应式测试
`wait`	等待条件	text, textGone, timeMs	等加载完成
`evaluate`	执行 JavaScript	fn	自定义逻辑
`close`	关闭标签页	-	清理

4.4 标签页管理

Action	功能
`tabs`	列出所有标签页
`focus`	切换到指定标签页
`close`	关闭标签页

4.5 其他

Action	功能
`upload`	给 `` 上传文件
`dialog`	处理 alert/confirm/prompt 弹窗
`status`	查看浏览器状态
`start/stop`	启动/停止浏览器
`profiles`	管理浏览器 Profile

5. Profile 系统

5.1 设计

每个 Profile 是一个独立的浏览器环境：

独立 CDP 端口：18800-18899 范围，最多 100 个 Profile
独立 user-data 目录：Cookie、缓存、登录状态隔离
独立颜色标识：在 UI 中区分不同 Profile


端口分配（源码 profiles.ts）:
18789  → Gateway WebSocket
18790  → Bridge
18791  → Browser Control Server
18792  → (预留)
18793  → Canvas
18794-18799 → 预留
18800-18899 → CDP Profile 端口池

5.2 默认 Profile

Profile	Driver	说明
`openclaw`	Playwright	自管理的无头 Chrome
`chrome`	existing-session	接管用户的 Chrome（需扩展）

5.3 自定义 Profile


browser:
  profiles:
    work:
      cdpPort: 18801
      userDataDir: "~/.openclaw/browser/work/user-data"
    personal:
      cdpPort: 18802
      cdpUrl: "ws://localhost:9222"  # 连接外部 Chrome
      attachOnly: true

6. 安全机制

6.1 SSRF 防护

浏览器可以访问网络，因此存在 SSRF（服务端请求伪造）风险。OpenClaw 实现了多层防护：


Agent 请求 navigate("http://169.254.169.254/metadata")
           │
           ▼
  Navigation Guard（navigation-guard.ts）
  检查 URL 是否合法
           │
           ▼
  SSRF Policy（request-policy.ts）
  阻止访问内网 IP、元数据服务等
           │
           ▼
  实际导航（或拒绝）

6.2 Evaluate 开关

JavaScript 执行是把双刃剑。browser.evaluateEnabled 配置控制是否允许：


browser:
  evaluateEnabled: false  # 禁止 Agent 执行任意 JS

6.3 上传限制

文件上传限制在 /tmp/openclaw/uploads/ 目录，且不允许 symlink：


upload → proxy-files.ts → saveMediaBuffer()
  └→ 只允许特定目录
  └→ 拒绝 symlink（防止路径穿越）

7. 内存占用分析

7.1 Chrome 进程树


Chrome 主进程 (chrome)              → ~80MB
├── GPU 进程                        → ~20MB
├── 网络服务 (NetworkService)        → ~42MB
├── Crashpad handler                → ~5MB
└── 渲染进程 (per tab)              → ~186MB
                                    ────────
                            总计约    ~350MB（单标签）

每多开一个标签页，渲染进程 +~100-200MB。

7.2 关闭浏览器省多少


browser:
  enabled: false

立省 ~350MB。这是 OpenClaw 最大的单项内存消耗（不算 gateway 本身）。

7.3 不关浏览器但减少内存

用 snapshot 代替 screenshot（减少图片处理内存）
及时 close 不用的标签页
不要同时开多个 Profile

8. 实战用法示例

8.1 基本网页抓取


1. browser(action=open, url="https://example.com")
2. browser(action=snapshot)  → 得到页面结构
3. 从 snapshot 提取信息

8.2 登录后操作（Chrome Relay）


1. 用户在自己的 Chrome 里打开 GitHub
2. 点击 Browser Relay 扩展图标
3. browser(action=snapshot, profile="chrome")
4. browser(action=act, kind=click, ref=e12, profile="chrome")

8.3 表单填写


1. browser(action=open, url="https://form.example.com")
2. browser(action=snapshot)  → 找到表单字段 ref
3. browser(action=act, kind=fill, ref=e5, text="张三")
4. browser(action=act, kind=fill, ref=e6, text="[email protected]")
5. browser(action=act, kind=click, ref=e10)  → 提交按钮

8.4 截图验证部署


1. browser(action=open, url="https://my-site.com")
2. browser(action=screenshot, fullPage=true)
3. 分析截图，检查渲染是否正常

8.5 绕过 Cloudflare

web_fetch 被 Cloudflare 拦截时：


1. browser(action=open, url="https://protected-site.com")
2. （Chrome 自动处理 JS challenge）
3. browser(action=snapshot)  → 正常获取内容

9. 与竞品对比

维度	OpenClaw Browser	Browser Use	Playwright MCP	Stagehand
集成方式	框架内置	独立库	MCP Server	独立库
Snapshot	✅ accessibility tree	❌ 截图为主	✅	✅
Profile 管理	✅ 多 Profile	❌	❌	❌
Chrome Relay	✅ 接管已有浏览器	❌	✅	❌
SSRF 防护	✅ 内置	❌	❌	❌
内存管理	✅ 可完全关闭	无	无	无
代码量	~23K 行	~15K 行	~5K 行	~8K 行

OpenClaw 的优势在于深度集成——浏览器不是一个外部工具，而是框架的一部分。安全策略、Profile 管理、内存控制都是原生的。

10. 配置参考

完整配置项（从源码提取）


browser:
  # 总开关
  enabled: true

  # Chrome 可执行文件路径（通常自动检测）
  executablePath: "/opt/google/chrome/chrome"

  # 强制无头模式（服务器必须 true）
  headless: true

  # 禁用 Chrome 沙箱（容器内需要）
  noSandbox: true

  # CDP 连接 URL（连接远程浏览器）
  cdpUrl: "ws://remote-host:9222"

  # 只连接不启动（配合 cdpUrl 使用）
  attachOnly: false

  # CDP 端口范围起始
  cdpPortRangeStart: 18800

  # 默认 Profile
  defaultProfile: "openclaw"

  # 允许 Agent 执行 evaluate（任意 JS）
  evaluateEnabled: true

  # SSRF 防护策略
  ssrfPolicy: "block-private"

  # Snapshot 默认设置
  snapshotDefaults:
    mode: "efficient"

  # 自定义 Profile
  profiles:
    my-profile:
      cdpPort: 18801
      cdpUrl: null
      userDataDir: null
      driver: "openclaw"  # 或 "existing-session"
      attachOnly: false
      color: "#4a9eff"

11. 总结

OpenClaw Browser 是目前 AI Agent 框架中最完整的浏览器自动化方案之一：

双驱动：Playwright（自管理）+ Chrome MCP（接管已有浏览器）
Snapshot 优先：用 accessibility tree 替代截图，省 10 倍 token
安全内置：SSRF 防护、evaluate 开关、上传目录限制
Profile 隔离：最多 100 个独立浏览器环境
可完全关闭：一行配置省 350MB 内存

它让 AI Agent 从"只能读文本"进化到"能看网页、能操作界面"——相当于给 Agent 装上了眼睛和手。

基于 OpenClaw 源码分析（2026-03-26），browser 子系统 ~23,000 行 TypeScript

源码：github.com/openclaw/openclaw — src/browser/

评分

维度	分数	说明
创意	?/10
技术深度	?/10
实用性	?/10
影响力	?/10
数据支撑	?/10
与我们的相关性	?/10
综合	?/10	需要后续评估

> 一句话总结：（报告的核心价值与我们的关联）