BrowserGym: One Gym Interface for WebArena, VisualWebArena, MiniWoB++
The standardisation move that made browser-agent results comparable across benchmarks.
The API
BrowserGym mirrors the OpenAI Gym API that became standard in reinforcement learning. An environment exposes reset() to start a new task, step(action) to perform an action and receive the next observation, and a small inventory of higher-level helpers. Observations include the accessibility tree, optional rendered screenshot, optional axtree, current URL, and task instruction. Actions include click, fill, hover, scroll, navigate, select_option, press, and a few higher-level macros.
The Wrapped Benchmarks
When To Use BrowserGym
If you are publishing or comparing browser-agent results, use BrowserGym to remove the per-benchmark adapter as a confound. If your agent does something exotic that the BrowserGym action space cannot represent, extend the action space (the upstream maintainers accept PRs) rather than fall back to custom adapters that re-introduce the noise BrowserGym exists to eliminate.
Q.01What is BrowserGym?+
Q.02Why does that matter?+
Q.03Which benchmarks does BrowserGym wrap?+
Q.04Does using BrowserGym constrain my agent?+
Q.05How does it compare to AgentBench's harness?+
Sources
- [1] BrowserGym repository: github.com/ServiceNow/BrowserGym
- [2] WorkArena paper (Drouin et al. 2024): arxiv.org/abs/2403.07718
- [3] VisualWebArena paper (Koh et al. 2024): arxiv.org/abs/2401.13649