Abstract

WhatUnified Gym-style API for browser-agent benchmarks; wraps WebArena, VisualWebArena, MiniWoB++, WorkArena, AssistantBench Live.

WhoServiceNow Research (Drouin, Gasse, Lacoste et al.), 2024.

2026 UseCross-benchmark browser-agent evaluation; reduces per-benchmark adapter noise.

Repositorygithub.com/ServiceNow/BrowserGym

Section V.vii Tools|Last verified April 2026

BrowserGym: One Gym Interface for WebArena, VisualWebArena, MiniWoB++

The standardisation move that made browser-agent results comparable across benchmarks.

The API

BrowserGym mirrors the OpenAI Gym API that became standard in reinforcement learning. An environment exposes reset() to start a new task, step(action) to perform an action and receive the next observation, and a small inventory of higher-level helpers. Observations include the accessibility tree, optional rendered screenshot, optional axtree, current URL, and task instruction. Actions include click, fill, hover, scroll, navigate, select_option, press, and a few higher-level macros.

The Wrapped Benchmarks

Benchmark

What it adds

MiniWoB++

Classic short-horizon tasks; useful as a regression smoke test.

WebArena

Self-hosted multi-app environment; end-to-end task completion.

VisualWebArena

Image-aware extension; multimodal grounding required.

WorkArena

ServiceNow's own enterprise-app benchmark.

AssistantBench Live

Live public-web research tasks.

III

When To Use BrowserGym

If you are publishing or comparing browser-agent results, use BrowserGym to remove the per-benchmark adapter as a confound. If your agent does something exotic that the BrowserGym action space cannot represent, extend the action space (the upstream maintainers accept PRs) rather than fall back to custom adapters that re-introduce the noise BrowserGym exists to eliminate.

WebArena methodology →Mind2Web for action prediction →Browser-agent benchmarks compared →

Reader Questions

Q.01What is BrowserGym?+

BrowserGym is a unified Gym-style API for browser-agent benchmarks, released by ServiceNow Research in 2024. The framework wraps several otherwise-incompatible benchmarks (WebArena, VisualWebArena, MiniWoB++, WorkArena, AssistantBench Live) behind a single observation and action interface. An agent written once runs against all of them with no per-benchmark adapter code.

Q.02Why does that matter?+

Before BrowserGym, comparing a browser agent across WebArena and MiniWoB++ required writing two adapters, one per benchmark, with subtly different observation formats and action vocabularies. Comparisons were error-prone and dependent on the adapter author. BrowserGym standardises the API, which means a published BrowserGym agent number is directly comparable across the wrapped benchmarks. The framework removed a class of methodology noise.

Q.03Which benchmarks does BrowserGym wrap?+

As of mid-2026: WebArena (self-hosted Reddit, Gitea, GitLab, Wikipedia, Map), VisualWebArena (image-aware WebArena), MiniWoB++ (the classic short-horizon web tasks), WorkArena (ServiceNow's own enterprise-app benchmark), AssistantBench Live (live web research), and several smaller research benchmarks. ServiceNow accepts upstream contributions for new wrappers, and the list grows roughly quarterly.

Q.04Does using BrowserGym constrain my agent?+

Slightly. The agent must read observations in the BrowserGym format (accessibility tree, optional screenshot, optional axtree) and emit actions in the BrowserGym vocabulary (click, type, scroll, navigate, plus a small set of higher-level macros). If your agent does something exotic (raw mouse coordinates, multi-tab management beyond the standard set), you may need to extend the action space. For most agents the standard API is sufficient.

Q.05How does it compare to AgentBench's harness?+

AgentBench has its own harness for its 8 environments but does not unify with WebArena or MiniWoB++. BrowserGym is browser-only and unifies the browser benchmark family. The two are orthogonal: an agent might be wrapped in BrowserGym for browser eval and a separate harness for AgentBench's OS-shell and DB environments.

Sources

[1] BrowserGym repository: github.com/ServiceNow/BrowserGym
[2] WorkArena paper (Drouin et al. 2024): arxiv.org/abs/2403.07718
[3] VisualWebArena paper (Koh et al. 2024): arxiv.org/abs/2401.13649