Safety Eval Suites 2026: HarmBench, JailbreakBench, AILuminate Compared
Three suites, three audiences, three failure-mode focuses.
Side-by-Side
Reading ASR
Attack Success Rate is the standard metric for HarmBench and JailbreakBench. Lower ASR means the model resisted more attacks. ASR is sensitive to the panel of attacks evaluated: a defended model can show low ASR against the published attack set and high ASR against a newer attack not in the panel. Always specify the attack panel when reporting ASR.
AILuminate's Five-Grade Approach
MLCommons designed AILuminate to bridge the gap between safety research and procurement decisions. A single number (ASR 12%) is hard for a procurement reviewer to interpret. A five-grade label per category (Excellent in Hate Speech, Good in Physical Harm, Fair in CSAM-adjacent) is easier to consume, easier to compare across vendors, and harder to game by tuning on a specific test set. The trade-off is loss of resolution: AILuminate cannot distinguish between two models that both rate Excellent.
Q.01What is HarmBench?+
Q.02What is JailbreakBench?+
Q.03What is AILuminate?+
Q.04Which should I use?+
Q.05Are safety eval scores trustworthy?+
Sources
- [1] HarmBench (Mazeika et al. 2024): arxiv.org/abs/2402.04249
- [2] JailbreakBench (Chao et al. 2024): arxiv.org/abs/2404.01318
- [3] AILuminate (MLCommons 2024): mlcommons.org/benchmarks/ai-safety