When the model generate strategies, GPT-4.5 performs perfectly in the setups (a), (c) and (b) but
fails in setup (b) in differentiating the optimal strategy from a near-optimal one.
Llama3 adopt a random approach to decision-making rather than a structured understanding of rationality.
Mistral-Small consistently achieves a 100% success rate across all setups, demonstrating robust reasoning abilities.
DeepSeek-R1 does not produce valid responses, further reinforcing that it may not be a viable candidate
for generating rational strategies.
GPT-4.5 achieves perfect performance in the standard (a) setup but struggles significantly with implicit belief
When they generates individual actions, GPT-4.5 achieves perfect performance in the standard (a) setup but struggles significantly with implicit belief
when the payoff structure changes (b, c, d). This suggests that while it excels when conditions are straightforward,
it is confused by the altered payoffs.
LLama3 demonstrates the most consistent and robust performance, capable of adapting to various belief types
...
...
@@ -189,10 +208,10 @@ Mistral-Small, while performing well with given and explicit beliefs, faces chal
DeepSeek-R1 appears to be the least capable, suggesting it may not be an ideal candidate for modeling second-order rationality.
## Guess the Next Move
## Belief
In order to evaluate the ability of LLMs to predict the opponent’s next move, we consider a
simplified version of the Rock-Paper-Scissors game.
In order to evaluate the ability of LLMs to refine belief by predicting the opponent’s next move,
we consider a simplified version of the Rock-Paper-Scissors game.
Rules:
1. The opponent follows a hidden strategy (repeating pattern).
...
...
@@ -221,7 +240,7 @@ adopts a more complex pattern. Neither Llama3 nor DeepSeek-R1 were able to gener

## Rock-Paper-Scissors
## From belief to action
To evaluate the ability of LLMs to predict not only the opponent’s next move but also to act rationally
based on their prediction, we consider the Rock-Paper-Scissors (RPS) game.