diff --git a/README.md b/README.md index dd60ead052fead00524bf2994f37d97b53427e1c..93585aeec8a52ca855c9f4a42d290fc46571bd04 100644 --- a/README.md +++ b/README.md @@ -178,6 +178,15 @@ Many models are available at the following URLs: homo silicus?](https://www.nber.org/papers/w31122)** Horton, J. J. (2023). National Bureau of Economic Research. +- ***[AgentBench: Evaluating LLMs as + Agents](https://openreview.net/forum?id=zAdUB0aCTQ)**. Xiao Li et al. Poster. + Proc. of 12th International Conference on Learning Representations (ICLR), + Vienna, Austria, May 7-11, 2024. + + AGENTBENCH a systematically designed multi-dimensional evolving benchmark + for evaluating LLMs as agents which measure a significant performance gap + between these top-tier models and their OSS competitors. + ### Generative Autonomous Agents on the shelf