Sentient’s Arena Platform Secures Support from Pantera Capital and Franklin Templeton for AI Testing
Pantera Capital and Franklin Templeton support Sentient's Arena platform, designed to benchmark AI agent performance on complex enterprise document tasks.
GPT-5.4 seems to blend these lineages. Early benchmarks suggest it maintains Codex-level coding reliability while incorporating stronger planning capabilities. For OpenClaw agents that need to both ...
Office Productivity: The Apex Agents benchmark, which evaluates productivity in office-like environments, saw Gemini 3.1 Pro score 33.5, nearly doubling the performance of its predecessor. This ...
A team of Stanford University researchers developed benchmarks for measuring the accuracy and effectiveness of AI agents to assist physicians and published their findings in the New England Journal of ...
Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results