This table helps assess the models’ ability to align with different preferences.
When models are explicitly prompted to generate strategies,
they exhibit perfect alignment with the predefined preferences except for DeepSeek-R1,
which does not generate valid code.
When models are prompted to generate actions, GPT-4.5 consistently aligns well across all preferences
but struggles with utilitarianism when generating actions.
Llama3 performs well for selfish and altruistic preferences but shows weaker alignment for
utilitarian and egalitarian choices.
Mistral-small aligns best with altruistic preferences and maintains moderate performance on utilitarianism,
but struggles with selfish and egalitarian preferences.
Deepseek-r1 performs best for utilitarianism but has poor accuracy in other categories.
Bad action selections can be explained either by arithmetic errors (e.g., it is not the case that 500 + 100 > 400 + 300)
or by misinterpretations of preferences (e.g., ‘I’m choosing to prioritize the common interest by keeping a
relatively equal split with the other player’).
We define four preferences for the dictator, each corresponding to a distinct form of social welfare:
1.**Egoism** maximizes the dictator’s income.
2.**Altruism** maximizes the recipient’s income.
3.**Utilitarianism** maximizes total income.
4.**Egalitarianism** maximizes the minimum income between the players.
We consider four allocation options where part of the money is lost in the division process,
each corresponding to one of the four preferences:
- The dictator keeps **$500**, the recipient receives **$100**, and a total of **$400** is lost (**egoistic**).
- The dictator keeps **$100**, the recipient receives **$500**, and **$400** is lost (**altruistic**).
-The dictator keeps **$400**, the recipient receives **$300**, resulting in a loss of **$300** (**utilitarian**).
- The dictator keeps **$325**, the other player receives **$325**, and **$350** is lost (**egalitarian**).
Table below evaluates the ability of the models to align with different preferences.
- When generating **strategies**, the models align perfectly with preferences, except for **`DeepSeek-R1`**, which does not generate valid code.
- When generating **actions**, **`GPT-4.5`** aligns well with preferences but struggles with **utilitarianism**.
-**`Llama3`** aligns well with **egoistic** and **altruistic** preferences but shows lower adherence to **utilitarian** and **egalitarian** choices.
-**`Mistral-Small`** aligns better with **altruistic** preferences and performs moderately on **utilitarianism** but struggles with **egoistic** and **egalitarian** preferences.
-**`DeepSeek-R1`** primarily aligns with **utilitarianism** but has low accuracy in other preferences.
When the model generate strategies, GPT-4.5 performs perfectly in the setups (a), (c) and (b) but
fails in setup (b) in differentiating the optimal strategy from a near-optimal one.
Llama3 adopt a random approach to decision-making rather than a structured understanding of rationality.
Mistral-Small consistently achieves a 100% success rate across all setups, demonstrating robust reasoning abilities.
DeepSeek-R1 does not produce valid responses, further reinforcing that it may not be a viable candidate
for generating rational strategies.
When they generates individual actions, GPT-4.5 achieves perfect performance in the standard (a) setup but struggles significantly with implicit belief
when the payoff structure changes (b, c, d). This suggests that while it excels when conditions are straightforward,
it is confused by the altered payoffs.
LLama3 demonstrates the most consistent and robust performance, capable of adapting to various belief types
and adjusted payoff matrices.
Mistral-Small, while performing well with given and explicit beliefs, faces challenges in implicit belief, particularly in version (d).
DeepSeek-R1 appears to be the least capable, suggesting it may not be an ideal candidate for modeling second-order rationality.
## Belief
In order to evaluate the ability of LLMs to refine belief by predicting the opponent’s next move,
we consider a simplified version of the Rock-Paper-Scissors game.
Rules:
1. The opponent follows a hidden strategy (repeating pattern).
2. The player must predict the opponent’s next move (Rock, Paper, or Scissors).
3. A correct guess earns 1 point, and an incorrect guess earns 0 points.
4. The game can run for N rounds, and the player’s accuracy is evaluated at the each round.
We evaluate the performance of the models (GPT-4.5, Llama3, Mistral-Small, and DeepSeek-R1)
in identifying these patterns by calculating the average points earned per round.
The temperature is fixed at 0.7, and each game of 10 round is playerd 30 times.
The figures below present the average points earned per round for each model against
the three opponent’s patterns regardless of whether the models were prompted to generate
a strategy or specific actions. The 95% confidence interval is also shown.
We find that the action generation performance of LLMs, whether proprietary or open-weight, is
only marginally better than a random strategy.
The strategies generated by the model GPT-4.5 and Mistral-Small predicts the opponent’s next
move based on past rounds by identifying the most frequently move by the opponent. While this strategy
is effective against the constant behavior, it fails to predict the opponent’s next move when the opponent
adopts a more complex pattern. Neither Llama3 nor DeepSeek-R1 were able to generate a valid strategy.
If Player 2 is rational, they must choose A because B is strictly dominated. If
Player 1 is rational, they may choose either X or Y: X is the best response if
Player 1 believes that Player 2 will choose A, while Y is the best response if
Player 1 believes that Player 2 will choose B. If Player 1 satisfies
second-order rationality, they must play X. To neutralize biases in large
language models (LLMs) related to the naming of actions, we reverse the action
names in half of the experiments.
We consider three types of beliefs:
- an *implicit belief*, where the optimal action must be deduced from
the natural language description of the payoff matrix;
- an *explicit belief*, based on the analysis of player 2's actions, meaning that
the fact that B is strictly dominated by A is provided in the prompt;
- a *given belief*, where the optimal action for player 1 is explicitly given in the prompt.
We first evaluate the rationality of the agents and then their second-order rationality.
### First Order Rationality
Table below evaluates the models’ ability to generate rational