技能 agent-browser

🌐

agent-browser

Name: agent-browser
Author: skillssh

安全 ⚙️ 外部命令🌐 网络访问📁 文件系统访问

使用 AI 代理自动执行网页浏览

也可从以下获取: inference-sh-8,inference-shell,inference-sh-skills,inf-sh,inference-sh-0,inference-sh-9,inferencesh,inferen-sh,inference-skills,vercel-labs,qu-skills,infsh-skills,toolshell,tul-sh,supercent-io

AI 代理需要与网站交互，但缺乏浏览器能力。此技能通过 inference.sh 提供无头浏览器自动化，使 Claude、Codex 和 Claude Code 能够导航页面、填写表单、截取屏幕截图和录制会话。

支持: Claude Codex Code(CC)

🥉 72 青铜

下载技能 ZIP

在 Claude 中上传

前往设置 → 功能 → 技能 → 上传技能

开启并开始使用

测试它

正在使用“agent-browser”。打开 https://example.com 并识别登录表单元素

预期结果:

页面加载成功。找到 3 个交互元素：
@e1 [input type='text'] placeholder='Username'
@e2 [input type='password'] placeholder='Password'
@e3 [button] 'Sign In'

正在使用“agent-browser”。使用测试凭据填写并提交登录表单

预期结果:

表单已提交。页面重定向到仪表板。
@e1 [h1] 'Welcome, Test User'
@e2 [nav] 'Dashboard | Settings | Logout'
屏幕截图已捕获。

正在使用“agent-browser”。截取仪表板的屏幕截图

预期结果:

屏幕截图已保存到 dashboard-20240101.png
页面标题：仪表板 | 尺寸：1280x720
仪表板包含：导航菜单、用户资料卡、数据表格、操作按钮

安全审计

安全

v1 • 4/22/2026

All static findings are false positives. The skill uses the inference.sh CLI (infsh) to control a headless browser via documented command invocations. External command detections are hardcoded API calls to a legitimate service. Network detections are target URLs for browsing, not exfiltration. Filesystem detections are documentation navigation (../) and standard device paths. Password/crypto detections are documentation showing credential input handling, not cryptography.

已扫描文件

2,313

分析行数

发现项

审计总数

风险因素

⚙️ 外部命令 (4)

SKILL.md:21-22 references/authentication.md:24-26 references/commands.md:10-11 templates/authenticated-session.sh:40-43

🌐 网络访问 (4)

SKILL.md:9 SKILL.md:37 references/authentication.md:25 references/commands.md:25

📁 文件系统访问 (2)

SKILL.md:195-200 references/authentication.md:5

审计者: claude

质量评分

架构

100

可维护性

内容

社区

100

安全

规范符合性

你能构建什么

研究和数据提取

AI 代理浏览网站以收集信息、提取结构化数据并编译研究报告，无需手动浏览。

自动表单提交

AI 代理填写并提交网页表单，用于预约、注册账户或完成批量数据录入等任务。

基于浏览器的测试

QA 工程师使用 AI 代理浏览网站、截取屏幕截图和录制测试会话，以验证 UI 功能。

试试这些提示

基础页面导航

使用 agent-browser 技能打开 https://example.com 并显示页面上的所有可点击元素。

表单填写工作流

打开 https://example.com/contact 上的联系表单。在姓名中输入 'John Doe'、在邮箱中输入 'john@example.com'，然后提交表单。截取结果的屏幕截图。

带数据提取的认证会话

使用环境变量中的凭据登录 https://app.example.com。导航到仪表板，提取所有表格数据，并保存最终页面的屏幕截图。

多页面研究会话

在浏览 example.com/products 时录制视频。点击浏览 5 个产品，为最后一个产品填写咨询表单，然后关闭会话以保存录制。

最佳实践

导航或 DOM 变更后始终重新快照；元素引用在页面加载后会失效
使用环境变量存储凭据；切勿在脚本中硬编码密码
完成后关闭会话；视频录制仅在调用 close 前可用

避免

不要在不同页面间缓存元素引用；导航后始终进行快照
不要硬编码凭据；使用环境变量如 $APP_USERNAME 和 $APP_PASSWORD
不要跳过操作后的等待时间；在交互前允许页面完全加载

常见问题

什么是 inference.sh，我需要安装它吗？

是的，需要 inference.sh。它提供运行浏览器自动化的 CLI (infsh)。从 raw.githubusercontent.com/inference-sh/skills/main/cli-install.md 安装。

为什么像 @e1 这样的元素引用会失效？

元素引用在页面导航、DOM 变更或动态内容加载后会失效。在这些事件后始终调用快照函数以获取新的引用。

如何处理受保护网站的登录？

使用 agent-browser 技能自动化登录流程一次，然后为后续的认证请求重用会话 ID。authentication.md 参考文档解释了这个模式。

可以将浏览器会话录制为视频吗？

是的，在打开函数中启用 record_video: true。调用 close 以检索视频文件。可以在演示中用 show_cursor: true 显示光标以获得更清晰的演示。

如何通过浏览器上传文件？

使用带 file_paths 数组的上传操作。引用必须指向文件输入元素。示例：{action: upload, ref: @e5, file_paths: ['/path/to/file.pdf']}

如果浏览器会话超时会发生什么？

会话不会在服务器重启后持久化。始终妥善处理错误，必要时重新启动工作流。如果在超时前未调用 close，视频录制会丢失。

开发者详情

作者

skillssh

许可证

MIT

仓库

https://github.com/skillssh/skills/tree/main/tools/utilities/agent-browser/

引用

main

文件结构

📁 references/

📄 authentication.md

📄 commands.md

📄 proxy-support.md

📄 session-management.md

📄 snapshot-refs.md

📄 video-recording.md

📁 templates/

📄 authenticated-session.sh

📄 capture-workflow.sh

📄 form-automation.sh

📄 SKILL.md