Inspect by UK AISI: The Eval Framework Behind Sonnet 4.7 Safety Tests
The framework that became the de facto standard for AISI-grade safety evaluation in 18 months.
Architecture
Inspect organises an evaluation around three abstractions. A dataset is a collection of samples (input plus expected target). A solver is the strategy the model uses to produce an answer (one-shot, multi-shot, chain-of-thought, tool-use agent). A scorer evaluates the produced answer against the target (exact match, model-graded, custom function). All three are pluggable Python objects.
This is a deliberately small surface. A simple multiple-choice eval is a 20-line Python file. A complex agentic eval with sandboxed code execution and tool calls is a few hundred lines but uses the same abstractions. UK AISI chose this design because it had to support evaluations across capability, safety, and agentic dimensions without forking the framework per category.
Why It Spread
Inspect spread quickly across the frontier-lab ecosystem because it solved three pain points. First, audit-trail logging: every eval run produces a structured log that captures the prompt, the model response, the score, and the metadata. Second, provider portability: the same eval runs against any model with a one-line provider switch. Third, sandboxing: Inspect ships a Docker-based sandbox for agentic evaluations that need to run untrusted code without compromising the host.
Limitations
Inspect is Python-first and assumes the team running it can read and write Python. Teams that prefer a no-code or YAML-only eval definition will find OpenAI Evals or Promptfoo more accessible. Inspect's sandboxing is also Docker-dependent, which makes it harder to run on machines without Docker (some macOS setups, some restricted CI environments).
Q.01What is Inspect?+
Q.02How is Inspect different from OpenAI Evals?+
Q.03Is Inspect tied to UK AISI models or evaluations?+
Q.04When should I pick Inspect over LMEval or OpenAI Evals?+
Q.05Who else uses Inspect besides UK AISI?+
Sources
- [1] Inspect documentation: inspect.ai-safety-institute.org.uk
- [2] Inspect repository: github.com/UKGovernmentBEIS/inspect_ai
- [3] UK AISI launch post: aisi.gov.uk/work/inspect