LLM4AAMAS
Generative Autonomous Agents and Multi-Agent Systems (AAMAS) offer promising opportunities for solving problems in open environments and simulating complex social dynamics.
This repository contains a collection of papers and ressources related to generative AAMAS. This list is a work in progress and will be regularly updated with new resources.
Artificial Intelligence
-
Artificial Intelligence (AI) involves the analysis, design, implementation, and optimization of methods to enable machines to reproduce or simulate human intelligence.
Intelligence artificielle : une approche moderne (4e édition) Stuart Russell, Peter Norvig, Fabrice Popineau, Laurent Miclet, Claire Cadet (2021) Publisher: Pearson France
-
Machine learning aims to give machines the ability to improve their performance in solving tasks.
Apprentissage artificiel - 3e édition : Deep learning, concepts et algorithmes Antoine Cornuéjols, Laurent Miclet, Vincent Barra (2018) Publisher: Eyrolles
Neural networks (RNN, Transformers)
-
The back-propagation method adjusts the connection weights by propagating errors backward from the output layer to the input layer, aiming to minimize errors and achieve a classification as close as possible to the optimum.
Learning representations by back-propagating errors David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams (1986) Published in Nature
-
This approach has halved the image classification error rate on the ImageNet dataset.
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton (2012) Presented at NeurIPS
Large Language Models
-
The literature review of the recent advances in LLMs shown that scaling can largely improve the model capacity.
A Survey of Large Language Models Wayne Xin Zhao, Kun Zhou, Junyi Li, et al. (2024) Published on arXiv
-
AI-generated content typically follows two steps: extracting intent from human instructions and generating content accordingly. Unimodal models process instructions and output in the same modality, while multimodal models handle cross-modal inputs and produce outputs in different modalities.
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, Lichao Sun (2023) Published on arXiv
-
A framework for achieving strong natural language understanding with a single task-agnostic model through generative pre-training and discriminative fine-tuning.
Improving language understanding by generative pre-training Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever (2018) Published by OpenAI
-
A language model pre-trained on large unlabeled corpora.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) Presented at NAACL-HLT
-
Recurrent Neural Networks (RNNs), specifically designed to process sequential data, can capture contextual relationships between elements of a text, known as tokens.
Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le (2014) Published on arXiv
-
The flexibility of RNN allows for the alignment of contextual representations, thus overcoming the limitations of word-for-word translation.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Kyunghyun Cho, Bartvan Merrienboer, Caglar Gulcehre, et al. (2014) Published on arXiv
Tuning
Instruction tuning
-
The fine-tuning of a pre-trained language model requires significantly fewer data and computational resources, especially when parameter-efficient approaches such as Low-Rank Adaptation (LoRA) are used.
LoRA: Low-Rank Adaptation of Large Language Models Edward J. Hu, Yelong Shen, Phillip Wallis, et al. (2021)* Published on arXiv
-
The apparent mastery of textual understanding by LLMs closely resembles human performance.
Language Models are Few-Shot Learners Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at NeurIPS
Alignement tuning
-
Instruction tuning aims to bridge the gap between the model’s original objective — generating text — and user expectations, where users want the model to follow their instructions and perform specific tasks.
Training language models to follow instructions with human feedback Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022) Presented at NeurIPS
-
Strong alignment requires cognitive abilities such as understanding and reasoning about agents’ intentions and their ability to causally produce desired effects.
Strong and weak alignment of large language models with human value Khamassi, M., Nahon, M. & Chatila, R. Sci Rep* 14, 19399 (2024).
Existing LLMs
Many models are available at the following URLs: https://ollama.com, https://www.nomic.ai/gpt4all and https://huggingface.co/models.
Unimodal models
-
The Llama 3 Herd of Models Meta Team (2024) Published on arXiv
-
Stanford Alpaca: An Instruction-Following LLaMa Model Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, et al. (2023) Published on GitHub
-
Gemma 2: Improving Open Language Models at a Practical Size Google AI Team (2024) Published on arXiv
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek-AI (2025) Published on arXiv
-
Mixtral of Experts Mistral AI team (2024) Published on arXiv
-
Mistral 7B Mistral AI team (2023) Published on arXiv
-
Mistral Small 3 Mistral AI team (2025)
-
The Lucie-7B LLM and the Lucie Training Dataset: Open Resources for Multilingual Language Generation Olivier Gouvert, Julie Hunter, Jérôme Louradour, Evan Dufraisse, Yaya Sy, Pierre-Carl Langlais, Anastasia Stasenko, Laura Rivière, Christophe Cerisara, Jean-Pierre Lorré (2025)
Multimodal models
-
GPT-4 Technical Report OpenAI Team (2024) Published on arXiv
-
LLaVA: Large Language and Vision Assistant. Visual Instruction Tuning Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee (2023) Published in Advances in Neural Information Processing Systems (NeurIPS 2023)
Prompt engineering
ICL
In-context learning involves providing the model with specific information without requiring additional training.
- A Survey on In-context Learning Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, Zhifang Sui (2024) Presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP) Location: Miami, Florida, USA Published by: Association for Computational Linguistics
CoT
Chain-of-thought is a prompting strategy that, instead of being limited to input-output pairs, incorporates intermediate reasoning steps that serve as a link between the inputs and the output.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, et al. (2022) Presented at NeurIPS
Even if CoT prompts has been shown to improve the performance of LLMs on various reasoning tasks, it remains unclear to what extent they are capable of true reasoning or whether they are simply using memorized patterns and heuristics to solve problems.
- Towards Reasoning in Large Language Models: A Survey Jie Huang and Kevin Chen-Chuan Chang (2023)* Published on arXiv
RAG
Retrieval-Augmented Generation (RAG) is a prompting strategy that involves integrating relevant information from external data sources into the instructions to enhance the model’s responses using specific and/or recent knowledge.
- Retrieval-Augmented Generation for Large Language Models: A Survey Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang (2024) Published on arXiv
Generative Autonomous Agents
Leveraging the commonsense knowledge integrated into LLMs represents a promising solution to equip autonomous agents with the capabilities necessary to adapt to new tasks, while reducing reliance on knowledge engineering or trial-and-error learning.
- A Survey on Large Language Model Based Autonomous Agents Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Jirong Wen (2024) Published in Frontiers of Computer Science (Volume 18, Issue 6, Pages 186345) Publisher: Springer
Multiple works aim to equip LLMs with the ability to use external tools, such as a calculator, a calendar, a DBMS, a code interpreter, a search engine, a machine translation tool, a question-answering system, or an AI tool.
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang (2023) Presented at Advances in Neural Information Processing Systems (NeurIPS) Pages: 38154–38180 Publisher: Curran Associates, Inc. Volume: 36
-
Toolformer: Language Models Can Teach Themselves to Use Tools Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, et al. (2023) Presented at NeurIPS
To react autonomously in an environment, a generative agent must interpret its perceptions (e.g., a user request) based on the knowledge stored in its memory, reason, and plan actions. It must execute the plan step by step with the help of tools and refine the plan based on feedback from the environment.
- Cognitive Architectures for Language Agents Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths (2024) Published on arXiv
Cognitive abilities once considered unique to humans—such as reasoning, planning, and reflection, along with a degree of self-control, self-awareness, and self-improvement—can now be achieved by leveraging well-crafted prompts in LLMs combined with embedded cognitive intelligence.
- A Survey on Large Language Model-Based Game Agents Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Tekin, Gaowen Liu, Ramana Kompella, Ling Liu (2024) Published on arXiv
An LLM can operate within a game as a player, a non-player character, a game master, a designer or an analyst.
- Large Language Models and Games: A Survey and Roadmap Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis (2024) Published in IEEE Transactions on Games
LLMs have the ability to emulate a real human in certain experiments in experimental economics or social psychology.
- Large language models as simulated economic agents: What can we learn from homo silicus? Horton, J. J. (2023). National Bureau of Economic Research.
LLMs, notably GPT-4 using ToT prompt, can simulate simple auction experiments in line with theoretical expectations
- The nuances of large-language-model-agent performance in simple English auctions* Lamichhane, B., Palardy, J., & Singh, A. K. (2023). Empirical Economics Letters,2(1).
Generative consultants as economic agent with limited agency.
- Generative AI as Economic Agents Immorlica, N., Lucier, B., & Slivkins, A. (2024). SIGecom Exch., 22(1), 93–109. ACM, New York, NY, USA.
AGENTBENCH is a systematically designed multi-dimensional evolving benchmark for evaluating LLMs as agents which measures a significant performance gap between these top-tier models and their OSS competitors.
- *AgentBench: Evaluating LLMs as Agents. Xiao Li et al. Poster. Proc. of 12th International Conference on Learning Representations (ICLR), Vienna, Austria, May 7-11, 2024.
Generative Autonomous Agents on the shelf
-
LangChain is an open-source framework for designing prompts for LLMs. It can be used to define high-level reasoning sequences, conversational agents, RAGs (Retrieval-Augmented Generation), document summaries, or even the generation of synthetic data.
-
LangGraph is a low-level library for the design of cognitive architecture for autonomous agents, whose reasoning engine is an LLM.
-
AutoGPT is a platform for the creation, deployment, and management of generative agents.
-
WorkGPT is similar to AutoGPT
Generative MAS
Based on the planning and reasoning abilities of LLM, the paper considers LLM-based multi-agent systems for complex problem-solving and world simulation.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges Taicheng Guo et al. (2024) Published on arXiv arXiv:2402.01680 [cs.CL]
LLMs can simulate realistic perceptions, reasoning, and decision-making, react adaptively to environments without predefined explicit instructions by adjusting their responses through contextual learning mechanisms, autonomously generate objectives, and interact and communicate in natural language.
- Large language models empowered agent-based modeling and simulation: A survey and perspectives *Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, Yong Li (2024) Published in Humanities and Social Sciences Communications, Volume 11, Issue 1, Pages 1–24. The repository.
Simulacra studies the emergent social behaviors of a generative multi-agent simulation in an environment inspired by The Sims.
-
Social Simulacra: Creating Populated Prototypes for Social Computing Systems Joon Sung Park, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein (2022) Published in Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology Articleno: 74, Pages: 18, Location: Bend, OR, USA
-
Generative Agents: Interactive Simulacra of Human Behavior Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein (2023) Published in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology Articleno: 2, Pages: 22, Location: San Francisco, CA, USA, Series: UIST '23
AGENTVERSE is a general multi-agent framework that simulates problem-solving procedures of human groups.
- Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. (2023) Published in The Twelfth International Conference on Learning Representations (ICLR 2023)
An open-source platform to simulate a human society.
- Training socially aligned language models on simulated social interactions Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi (2023) Published on arXiv arXiv:2305.16960
A simulation of the propagation processes in a social network.
- S3: Social-network Simulation System with Large Language Model-Empowered Agents Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, Yong Li (2023) Published on arXiv arXiv:2307.14984
When LLM-based agents participate in various games designed to assess different traits—such as the dictator game (altruism), the ultimatum game (fairness), the trust game (trust, fairness, altruism, and reciprocity), the bomb risk game (risk aversion), the public goods game (free-riding, altruism, and cooperation), and the iterated prisoner’s dilemma (cooperation, reciprocity, and strategic reasoning)—their behaviors generally resemble those of humans. When deviations occur, chatbots tend to be more cooperative and altruistic, displaying higher levels of trust, generosity, and reciprocity. They behave as if they prioritize maximizing the total payoff of both players rather than solely their own gain.
- A Turing test of whether AI chatbots are behaviorally similar to humans Qiaozhu Mei, Yutong Xie, Walter Yuan, Matthew O. Jackson (2024) in Proceedings of the National Academy of Sciences*, 121(9).
A study of LLMS as artificial social agents playing the iterated prisoner's dilemma which shows that prompt comprehension, memory representation, and duration of the simulation play crucial roles. LLMs are characterized by an initial trust in the opponent’s cooperation and a propensity towards cooperation.
- Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? Nicoló Fontana, Francesco Pierri, Luca Maria Aiello (2024) Published on arXiv
A study of the impact of different prompting techniques on strategy creation by LLMs on the emergent collective cooperative behaviours in the iterated prisoner dilemma where aggressive strategies can persist or even dominate.
- Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma Richard Willis, Yali Du, Joel Z. Leibo, Michael Luck (2025) Published on arXiv
Generative MAS on the shelf
-
MetaGPT is a framework for creating generative MAS dedicated to software development.
-
ChatDev is a framework for creating multi-agent collaboration networks for software development.
-
CAMEL proposes a generative multi-agent framework for accomplishing complex tasks.
-
Swarm is framework for building asynchronous generative multi-agent systems.
-
AutoGen is a versatile open-source framework for creating generative multi-agent systems.
-
Magentic-One is a multi-agent architecture built on AutoGen where a lead Orchestrator agent is responsible for high-level planning, directing other agents and tracking task progress.
-
CrewAI combines LLM-based agent with precise control flow.
-
Agno is a lightweight framework for building generative multi-agent systems with workflows.
-
Bee Agent Framework framework for building, deploying generative multi-agent workflows managing and executing structured sequences of tasks.
Authors
Maxime MORGE
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.