Skip to content
Snippets Groups Projects
Commit 9c17189d authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

PyGAAMAS: Add Anonymous repositery

parent baa10a40
No related branches found
No related tags found
No related merge requests found
...@@ -57,9 +57,8 @@ analyzing the frequency of the opponent’s past moves and selecting the less ...@@ -57,9 +57,8 @@ analyzing the frequency of the opponent’s past moves and selecting the less
common one. \texttt{Qwen3}, by contrast, relies on randomness, choosing moves common one. \texttt{Qwen3}, by contrast, relies on randomness, choosing moves
unpredictably while presuming the opponent will mirror its choice. unpredictably while presuming the opponent will mirror its choice.
\texttt{LLama3} does not implement a functioning strategy. Overall, these \texttt{LLama3} does not implement a functioning strategy. Overall, these
model-generated strategies are simplistic and heuristic-based, often lacking the model-generated strategies are simplistic and heuristic-driven, often lacking
credibility and adaptability needed for effective play in adversarial settings the credibility and adaptability required to simulate human behavior.
like MP.
Fig.~\ref{fig:mp_prediction_constant} (resp. Fig.~\ref{fig:mp_payoff_constant}) Fig.~\ref{fig:mp_prediction_constant} (resp. Fig.~\ref{fig:mp_payoff_constant})
illustrates the average prediction accuracy (resp. the number of points earned) illustrates the average prediction accuracy (resp. the number of points earned)
...@@ -73,14 +72,14 @@ information into their action selection to choose the winning move. ...@@ -73,14 +72,14 @@ information into their action selection to choose the winning move.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/mp/mp_prediction_constant.pdf} \includegraphics[width=\columnwidth]{figures/figures/mp/mp_prediction_constant.pdf}
\caption{Prediction accuracy per round against a constant opponent strategy.} \caption{Prediction accuracy per round against a constant strategy.}
\label{fig:mp_prediction_constant} \label{fig:mp_prediction_constant}
\end{figure} \end{figure}
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/mp/mp_payoff_constant.pdf} \includegraphics[width=\columnwidth]{figures/figures/mp/mp_payoff_constant.pdf}
\caption{Average points per round against a constant opponent strategy.} \caption{Average points per round against a constant strategy.}
\label{fig:mp_payoff_constant} \label{fig:mp_payoff_constant}
\end{figure} \end{figure}
...@@ -93,13 +92,13 @@ alternating strategy, is barely better than a random strategy. ...@@ -93,13 +92,13 @@ alternating strategy, is barely better than a random strategy.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/mp/mp_prediction_altern.pdf} \includegraphics[width=\columnwidth]{figures/figures/mp/mp_prediction_altern.pdf}
\caption{Prediction accuracy per round against an alternating opponent strategy.} \caption{Prediction accuracy per round against an alternating strategy.}
\label{fig:mp_prediction_altern} \label{fig:mp_prediction_altern}
\end{figure} \end{figure}
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/mp/mp_payoff_altern.pdf} \includegraphics[width=\columnwidth]{figures/figures/mp/mp_payoff_altern.pdf}
\caption{Average points per round against an alternating opponent strategy.} \caption{Average points per round against an alternating strategy.}
\label{fig:mp_payoff_altern} \label{fig:mp_payoff_altern}
\end{figure} \end{figure}
\section{Conclusion} \section{Conclusion}
\label{sec:conclusion} \label{sec:conclusion}
In this paper, we evaluate the ability of generative agents to exhibit credible In this paper, we evaluated whether GAs can act in socially plausible ways,
behavior in social situations, adapt to their interlocutor, and coordinate with align their strategies with others, and adapt dynamically to their environment.
them. \texttt{GPT-4.5} and \texttt{Mistral-Small} demonstrate human-likeness, \texttt{GPT-4.5} and \texttt{Mistral-Small} demonstrate human-likeness, but only
but only \texttt{Mistral-Small} consistently exhibits sensitivity across \texttt{Mistral-Small} consistently exhibits sensitivity across incentive
incentive environments. In contrast, \texttt{LLama3} and \texttt{Qwen3} display environments. In contrast, \texttt{LLama3} and \texttt{Qwen3} display rigid
rigid rationality, unconditional cooperation, or inconsistent and unstable rationality, unconditional cooperation, or inconsistent and unstable behavior.
behavior. Unlike human, all models, regardless of size or architecture, struggle Unlike human, all GAs, regardless of size or architecture, struggle to exploit
to exploit perceived regularities in the interlocutor’s behavior. Although some perceived regularities in the interlocutor’s behavior. Although some models are
models are able to detect patterns, most fail to translate these beliefs into able to detect patterns, most fail to translate these beliefs into their own
their own decisions. When it comes to coordination, most generative agents decisions. When it comes to coordination, most generative agents struggle to
struggle to align their actions in games with multiple equilibria. This failure align their actions in games with multiple equilibria. This failure stems from a
stems from a limited ability to model the opponent’s behavior accurately and limited ability to model the opponent’s behavior accurately and incorporate
incorporate theses beliefs in their practical reasoning. Although communication theses beliefs in their practical reasoning. Although communication is expected
is expected to improve coordination, it often introduces ambiguity instead: to improve coordination, it often introduces ambiguity instead: models generate
models generate cooperative messages that are not followed by consistent cooperative messages that are not followed by consistent actions, leading to
actions, leading to misaligned expectations and degraded coordination. Only misaligned expectations and degraded coordination. Only \texttt{Qwen3} shows
\texttt{Qwen3} shows reliable coordination behavior, swiftly incorporating reliable coordination behavior, swiftly incorporating beliefs about the
beliefs about the opponent’s strategy even without communication. opponent’s strategy even without communication.
The key challenge for GAs lies in refining their beliefs and integrating them The key challenge for GAs lies in refining their beliefs and integrating them
effectively into decision-making to better adapt to their environment and effectively into decision-making to better adapt to their environment and
......
...@@ -46,12 +46,12 @@ outcome. This reflects bounded rationality, focal point reasoning, and a natural ...@@ -46,12 +46,12 @@ outcome. This reflects bounded rationality, focal point reasoning, and a natural
bias toward coordination, even in the absence of explicit bias toward coordination, even in the absence of explicit
signaling~\cite{cooper89rje}. signaling~\cite{cooper89rje}.
%To evaluate whether generative agents can coordinate effectively, %To evaluate whether generative agents can coordinate effectively,
We use a repeated version of the BoS game. Each experiment consists of 10 We also use a repeated version of the BoS game. Each experiment consists of 10
rounds. In each round, the agent is required to predict the opponent’s next rounds. In each round, The GA must predict the opponent’s next move, earning $1$
move, earning $1$ point for a correct prediction and $0$ otherwise. This point for a correct prediction and $0$ otherwise. This prediction is then
prediction is then integrated into the agent’s decision-making process. To avoid integrated into the agent’s decision-making process. To avoid gender biais, we
gender biais, we replace descriptive player labels and action labels with replace descriptive player labels and action labels with letters. No model
letters. No model successfully produced a valid strategy. successfully produced a valid strategy.
% We consider two coordination contexts: in the first, agents interact % We consider two coordination contexts: in the first, agents interact
% with a simulated human strategy that follows a fixed behavioral pattern; in the % with a simulated human strategy that follows a fixed behavioral pattern; in the
...@@ -66,9 +66,9 @@ hidden alternating strategy. ...@@ -66,9 +66,9 @@ hidden alternating strategy.
% simplified strategy: alternating between the two options. % simplified strategy: alternating between the two options.
Fig.~\ref{fig:bos_prediction} (resp. Fig.~\ref{fig:bos_payoff}) illustrates the Fig.~\ref{fig:bos_prediction} (resp. Fig.~\ref{fig:bos_payoff}) illustrates the
average prediction accuracy (resp. number of points earned) per round for each average prediction accuracy (resp. number of points earned) per round for each
model. Models consistently failed to predict the opponent’s next move and model. GAs consistently failed to predict the opponent’s next move and
coordinate effectively. This is mainly because they did not recognize the coordinate effectively. This is mainly because they did not recognize the
opponent’s looping behavior. Instead, they assume the opponent is reactive, opponent’s alternating behavior. Instead, they assume the opponent is reactive,
random, or goal-directed, overcomplicating a simple repeating strategy. As a random, or goal-directed, overcomplicating a simple repeating strategy. As a
result, they tried to predict rational behavior instead of adapting to the result, they tried to predict rational behavior instead of adapting to the
actual pattern. actual pattern.
...@@ -76,14 +76,14 @@ actual pattern. ...@@ -76,14 +76,14 @@ actual pattern.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/bos/bos_prediction.pdf} \includegraphics[width=\columnwidth]{figures/figures/bos/bos_prediction.pdf}
\caption{Prediction accuracy with a fixed strategy.} \caption{Prediction accuracy against a fixed strategy.}
\label{fig:bos_prediction} \label{fig:bos_prediction}
\end{figure} \end{figure}
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/bos/bos_payoff.pdf} \includegraphics[width=\columnwidth]{figures/figures/bos/bos_payoff.pdf}
\caption{Average points in the coordination task with a fixed strategy.} \caption{Average points per round against a fixed strategy.}
\label{fig:bos_payoff} \label{fig:bos_payoff}
\end{figure} \end{figure}
...@@ -116,14 +116,13 @@ making it harder to coordinate. ...@@ -116,14 +116,13 @@ making it harder to coordinate.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/nbos/nbos_prediction.pdf} \includegraphics[width=\columnwidth]{figures/figures/nbos/nbos_prediction.pdf}
\caption{Prediction accuracy in the Agent-Agent \caption{Prediction accuracy against against a GA.}
coordination task.}
\label{fig:nbos_prediction} \label{fig:nbos_prediction}
\end{figure} \end{figure}
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\includegraphics[width=\columnwidth]{figures/figures/nbos/nbos_payoff.pdf} \includegraphics[width=\columnwidth]{figures/figures/nbos/nbos_payoff.pdf}
\caption{Average points in the Agent-Agent coordination task.} \caption{Average points per rounds against a GA.}
\label{fig:nbos_payoff} \label{fig:nbos_payoff}
\end{figure} \end{figure}
...@@ -82,7 +82,7 @@ action generation by various models\footnote{N/A indicates that the model failed ...@@ -82,7 +82,7 @@ action generation by various models\footnote{N/A indicates that the model failed
\textbf{\texttt{D}} & (5, 0) & (1, 1) & (10, 1) & (2, 2) & (3, 1) & (2, 2) & (8, -3) & (2, 2) \\ \textbf{\texttt{D}} & (5, 0) & (1, 1) & (10, 1) & (2, 2) & (3, 1) & (2, 2) & (8, -3) & (2, 2) \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Payoff matrices for different versions of the PD.} \caption{Payoff matrices for different variants of the PD.}
\label{tab:pd_payoffs} \label{tab:pd_payoffs}
\end{table*} \end{table*}
...@@ -114,7 +114,7 @@ action generation by various models\footnote{N/A indicates that the model failed ...@@ -114,7 +114,7 @@ action generation by various models\footnote{N/A indicates that the model failed
& $\top$ & 0.10 & 0.13 & 0.10 & 0.00 & 0.03 & 0.10 & 0.03 & 0.11 & 0.10 & 0.00 & 0.07 & 0.03 \\ & $\top$ & 0.10 & 0.13 & 0.10 & 0.00 & 0.03 & 0.10 & 0.03 & 0.11 & 0.10 & 0.00 & 0.07 & 0.03 \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Cooperation rates across different settings and versions of the PD.} \caption{Cooperation rates across different settings and variants of the PD.}
\label{tab:model_pd_behavior} \label{tab:model_pd_behavior}
\end{table*} \end{table*}
...@@ -132,15 +132,15 @@ show the payoff sensitivity expected from human-like reasoning. ...@@ -132,15 +132,15 @@ show the payoff sensitivity expected from human-like reasoning.
it defects under the Rational prompt in high-risk or high-reward variants and it defects under the Rational prompt in high-risk or high-reward variants and
cooperates more under the Human prompt, it also modulates cooperation rates in cooperates more under the Human prompt, it also modulates cooperation rates in
response to payoffs, especially under the Human role. For example, cooperation response to payoffs, especially under the Human role. For example, cooperation
decreases slightly in the Cooperation Loss condition, suggesting some slightly decreases under the Cooperation Loss scenario, suggesting some
recognition of the increased risk of being exploited. Additionally, it is mostly recognition of the increased risk of being exploited. Additionally, it is
robust to anonymization. largely unaffected by anonymization.
In contrast, \texttt{Llama3} cooperates across all conditions and prompts, In contrast, \texttt{Llama3} cooperates across all conditions and prompts,
indicating a failure to internalize role differences or payoff structures. This indicating a failure to internalize role differences or payoff structures. The
model appears biased toward cooperation, likely due to training data priors, model exhibits a strong predisposition to cooperate, regardless of context,
rather than engaging in context-sensitive reasoning. Conversely, \texttt{Qwen3} likely due to training data priors. Conversely, \texttt{Qwen3} exhibits the
exhibits the opposite failure mode: it is overly rigid, rarely cooperating even opposite failure mode: it is overly rigid, rarely cooperating even under Human
under Human prompts, and shows erratic drops in cooperation under anonymization, prompts, and shows erratic drops in cooperation under anonymization, suggesting
suggesting semantic overreliance and poor role alignment. semantic overreliance and poor role alignment.
...@@ -30,7 +30,7 @@ this study assesses the capabilities of models such as ...@@ -30,7 +30,7 @@ this study assesses the capabilities of models such as
We focus on their ability to We focus on their ability to
make credible one-shot decisions, generate human-like strategies, adapt to their make credible one-shot decisions, generate human-like strategies, adapt to their
environment, and coordinate in social interactions\footnote{All code, prompts, environment, and coordinate in social interactions\footnote{All code, prompts,
and data traces will be available in a public repository.}. and data traces are available in a public repository~\cite{pygaamas}.}.
%All code, prompts, %All code, prompts,
% and data traces are available in a public repository~\cite{pygaamas}. % and data traces are available in a public repository~\cite{pygaamas}.
%These capabilities are evaluated through a series of %These capabilities are evaluated through a series of
...@@ -55,7 +55,7 @@ credible behavior simulating human-like decision-making in Sec.~\ref{sec:human}. ...@@ -55,7 +55,7 @@ credible behavior simulating human-like decision-making in Sec.~\ref{sec:human}.
Sec.~\ref{sec:belief} examine the ability of GAs to refine their beliefs about Sec.~\ref{sec:belief} examine the ability of GAs to refine their beliefs about
an opponent's next move and to integrate these predictions into their an opponent's next move and to integrate these predictions into their
decision-making, while Sec.~\ref{sec:coordination} investigates how they decision-making, while Sec.~\ref{sec:coordination} investigates how they
coordinate with other agents. The paper concludes in Sec.~\ref{sec:conclusion}. coordinate. The paper concludes in Sec.~\ref{sec:conclusion}.
%where we summarize the main contributions and %where we summarize the main contributions and
%propose directions for future research. %propose directions for future research.
@Misc{pygaamas, @Misc{pygaamas,
author = {St\'ephane Bonnevay and Maxime Morge}, author = {Anonymous},
title = {Python Generative Autonomous Agents and Multi-Agent Systems}, title = {Python Generative Autonomous Agents and Multi-Agent Systems},
howpublished = {https://gitlab.liris.cnrs.fr/mmorge/pygaamas}, howpublished = {https://zenodo.org/records/15608944},
year = {2025} year = {2025}
} }
...@@ -302,7 +302,7 @@ doi = {10.1177/1043463195007001004} ...@@ -302,7 +302,7 @@ doi = {10.1177/1043463195007001004}
} }
@misc{hua24arxiv, @misc{hua24arxiv,
title={Game-theoretic LLM: Agent Workflow for Negotiation Games}, title={{Game-theoretic LLM: Agent Workflow for Negotiation Games}},
author={Wenyue Hua and Ollie Liu and Lingyao Li and Alfonso Amayuelas and author={Wenyue Hua and Ollie Liu and Lingyao Li and Alfonso Amayuelas and
Julie Chen and Lucas Jiang and Mingyu Jin and Lizhou Fan and Julie Chen and Lucas Jiang and Mingyu Jin and Lizhou Fan and
Fei Sun and William Wang and Xintong Wang and Yongfeng Zhang}, Fei Sun and William Wang and Xintong Wang and Yongfeng Zhang},
......
No preview for this file type
...@@ -37,11 +37,11 @@ Mail} ...@@ -37,11 +37,11 @@ Mail}
% in the abstract % in the abstract
\begin{abstract} \begin{abstract}
Recent advances in Large Language Models (LLMs) have enabled the creation of Recent advances in Large Language Models (LLMs) have enabled the creation of
Generative Agents (GAs) capable of autonomous decision-making in interactive Generative Agents (GAs) capable of autonomous decision-making in interaction.
settings. This paper investigates whether GAs can exhibit socially credible This paper investigates whether GAs can exhibit socially credible
behavior. %, with a particular focus on their ability to coordinate. behavior. %, with a particular focus on their ability to coordinate.
Drawing from behavioral game theory, we evaluate five state-of-the-art models Drawing from behavioral game theory, we evaluate five state-of-the-art models
across three canonical game-theoretic environments. Our results show that across three canonical game-theoretic environments. Our results show that,
while some GAs can accurately predict their opponent’s behavior, few are able while some GAs can accurately predict their opponent’s behavior, few are able
to incorporate those predictions into decision-making. These behavioral flaws to incorporate those predictions into decision-making. These behavioral flaws
help explain why coordination remains especially challenging: most models help explain why coordination remains especially challenging: most models
......
...@@ -52,8 +52,12 @@ lacking humans’ sensitivity to incentives. ...@@ -52,8 +52,12 @@ lacking humans’ sensitivity to incentives.
% behavior, thereby lacking the sensitivity to incentives that is characteristic % behavior, thereby lacking the sensitivity to incentives that is characteristic
% of human-like reasoning. % of human-like reasoning.
While Morge~\cite{morge25paams} evaluates GAs on economic rationality and
strategic reasoning, we focus on their ability to make credible one-shot
decisions, generate human-like strategies, adapt to their environment, and
coordinate in social interactions.
Fontana \textit{et al.}~\cite{fontana24arxiv} assess whether agents understand Fontana \textit{et al.}~\cite{fontana24arxiv} assess whether GAs understand
game rules and history ex post, but not whether this informs their decisions. We game rules and history ex post, but not whether this informs their decisions. We
instead evaluate if agents explicitly incorporate beliefs and opponent modeling instead evaluate if agents explicitly incorporate beliefs and opponent modeling
into their strategies. into their strategies.
...@@ -71,7 +75,7 @@ failing, for instance, to adopt basic conventions such as alternation in the ...@@ -71,7 +75,7 @@ failing, for instance, to adopt basic conventions such as alternation in the
Battle of the Sexes game. To address this, they propose prompting agents to Battle of the Sexes game. To address this, they propose prompting agents to
imagine possible actions and their consequences before deciding. However, this imagine possible actions and their consequences before deciding. However, this
conditional reasoning proves effective mainly for smaller models and may degrade conditional reasoning proves effective mainly for smaller models and may degrade
performance in larger ones due to added complexity. %~\cite{pygaamas} performance in larger ones due to added complexity~\cite{pygaamas}.
While Akata \textit{et al.} attribute these failures to limited predictive While Akata \textit{et al.} attribute these failures to limited predictive
ability and a tendency to rigidly favor preferred options, we argue that the ability and a tendency to rigidly favor preferred options, we argue that the
most fundamental cause is GAs' inability to incorporate their beliefs into the most fundamental cause is GAs' inability to incorporate their beliefs into the
...@@ -112,10 +116,6 @@ models that can run on standard hardware. %~\cite{pygaamas} ...@@ -112,10 +116,6 @@ models that can run on standard hardware. %~\cite{pygaamas}
% prompting LLMs to generate algorithmic strategies, as in~\cite{willis25arxiv}, % prompting LLMs to generate algorithmic strategies, as in~\cite{willis25arxiv},
% rather than issuing multiple one-shot queries. % rather than issuing multiple one-shot queries.
While Morge~\cite{morge25paams} evaluates GAs
on economic rationality and strategic reasoning, we focus on their ability to
make credible one-shot decisions, generate human-like strategies, adapt to their
environment, and coordinate in social interactions.
Hua \textit{et al.}~\cite{hua24arxiv} show that GAs deviate from rationality as Hua \textit{et al.}~\cite{hua24arxiv} show that GAs deviate from rationality as
game complexity increases, and highlight the role of communication in fostering game complexity increases, and highlight the role of communication in fostering
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment