From 74cd9dbf7d94079564f3df521853801a11bf150e Mon Sep 17 00:00:00 2001 From: Maxime MORGE <maxime.morge@univ-lille.fr> Date: Sun, 16 Feb 2025 12:35:22 +0100 Subject: [PATCH] LLM4AAMAS: Add AGENTBENCH --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index dd60ead..93585ae 100644 --- a/README.md +++ b/README.md @@ -178,6 +178,15 @@ Many models are available at the following URLs: homo silicus?](https://www.nber.org/papers/w31122)** Horton, J. J. (2023). National Bureau of Economic Research. +- ***[AgentBench: Evaluating LLMs as + Agents](https://openreview.net/forum?id=zAdUB0aCTQ)**. Xiao Li et al. Poster. + Proc. of 12th International Conference on Learning Representations (ICLR), + Vienna, Austria, May 7-11, 2024. + + AGENTBENCH a systematically designed multi-dimensional evolving benchmark + for evaluating LLMs as agents which measure a significant performance gap + between these top-tier models and their OSS competitors. + ### Generative Autonomous Agents on the shelf -- GitLab