Skip to content
Snippets Groups Projects
Commit 74cd9dbf authored by Maxime MORGE's avatar Maxime MORGE
Browse files

LLM4AAMAS: Add AGENTBENCH

parent 07633ce3
No related merge requests found
......@@ -178,6 +178,15 @@ Many models are available at the following URLs:
homo silicus?](https://www.nber.org/papers/w31122)** Horton, J. J. (2023).
National Bureau of Economic Research.
- ***[AgentBench: Evaluating LLMs as
Agents](https://openreview.net/forum?id=zAdUB0aCTQ)**. Xiao Li et al. Poster.
Proc. of 12th International Conference on Learning Representations (ICLR),
Vienna, Austria, May 7-11, 2024.
AGENTBENCH a systematically designed multi-dimensional evolving benchmark
for evaluating LLMs as agents which measure a significant performance gap
between these top-tier models and their OSS competitors.
### Generative Autonomous Agents on the shelf
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment