DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
The research aims to comprehensively evaluate the capabilities of Deep Research Agents.
Code | Website | Paper | Eval Dataset | Total models: 16 | Last Update: 28 May 2025
Model Categories
10
🚀 gemini-2.5-pro-deepresearch
48.88
48.53
45.25
49.18
49.44
81.44
111.21
Deep Research Agent

📊 Column Descriptions

  • Rank: Model ranking based on overall score
  • model: Model name (🚀 = Deep Research Agent)
  • overall: Overall Score (weighted average of all metrics)
  • comp.: Comprehensiveness - How thorough and complete the research is
  • insight: Insight Quality - Depth and value of analysis
  • inst.: Instruction Following - Adherence to user instructions
  • read.: Readability - Clarity and organization of content
  • c.acc.: Citation Accuracy - Correctness of references
  • eff.c.: Effective Citations - Relevance and quality of sources
  • category: Model category