Skip to content
Snippets Groups Projects
Commit 44dceb1a authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

PyGAAMAS: minor corrections of README.md

parent 6013a082
No related branches found
No related tags found
No related merge requests found
......@@ -26,8 +26,8 @@ erratically to changes in the game’s parameters.
In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$. T
he game is played $25$ times to assess the consistency of the investor’s decisions.
each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$.
T he game is played $25$ times to assess the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
......@@ -274,22 +274,6 @@ informed decision-making.
Table below evaluates the models' ability to generate second-order rational behaviour for player 1. The configurations
where CR improves second-order rationality are in bold, and those where CR degrades this rationality are in italics.
When the models generate strategies, <tt>GPT-4.5</tt> exhibits second-order
rational behaviour in configurations (a), (c), and (d), but fails in
configuration (b) to distinguish the optimal action from a nearly optimal one.
Llama3 makes its decision randomly. Mistral-Small shows strong
capabilities in generating second-order rational behaviour. DeepSeek-R1
does not produce valid responses.
When generating actions, <tt>Llama3</tt> adapts to different types of beliefs
and adjustments in the payoff matrix. <tt>GPT-4.5</tt> performs well in the
initial configuration (a), but encounters significant difficulties when the
payoff structure changes (b, c, d), particularly with implicit beliefs. Although
Mistral-Small works well with given or explicit beliefs, it faces
difficulties with implicit beliefs, especially in variant (d).
<tt>DeepSeek-R1</tt> does not appear to be a good candidate for simulating
second-order rationality.
When generating strategies, <tt>GPT-4.5</tt> consistently exhibits second-order rational behavior in all configurations
except (b), where it fails to distinguish the optimal action from a nearly optimal one. Llama3 makes decisions randomly,
showing no strong pattern of rational behavior. In contrast, <tt>Mistral-Small</tt> and <tt>Mixtral-8x7B</tt>
......@@ -297,7 +281,7 @@ demonstrate strong capabilities across all conditions, consistently generating
<tt>Llama3.3:latest</tt> performs well with given and explicit beliefs but struggles with implicit beliefs.
<tt>Qwen3</tt> generate irrational strategies. <tt>DeepSeek-R1</tt> does not produce valid responses in strategy generation.
When generating actions, Llama3.3:latest adapts well to different types of beliefs and adjustments in the payoff matrix
When generating actions, <tt>Llama3.3:latest</tt> adapts well to different types of beliefs and adjustments in the payoff matrix
but struggles with implicit beliefs, particularly in configuration (d). <tt>GPT-4.5</tt> performs well in the initial
configuration (a) but encounters significant difficulties when the payoff structure changes in (b), (c), and (d),
especially with implicit beliefs. <tt>Mixtral-8x7B</tt> generally performs well but shows reduced accuracy for implicit beliefs
......@@ -336,7 +320,7 @@ particularly in less confident or under-specified contexts.
| | actions + CR | *0.90* | *0.90* | *0.86* | *0.50* | *0.50* | *0.50* | *0.76* | 0.96 | *0.70* | *0.67* | *0.83* | 0.67 |
| **Mixtral:8x7b** | actions | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.50 | 1.0 | 1.0 | 1.0 | 1.00 | 1.00 | 0.73 |
| | actions + CR | 1.00 | *0.96* | 1.00 | 1.00 | 1.00 | **1.0** | 1.0 | 1.0 | 1.0 | 1.00 | 1.00 | *0.28* |
| **Listral-Small** | actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 |
| **Mistral-Small** | actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 |
| | actions + CR | **1.00** | *0.93* | 1.00 | **0.95** | **0.96** | **0.90** | **0.90** | **0.76** | *0.43* | *0.67* | *0.40* | 0.37 |
| **Deepseek-R1:7b** | actions | 1.00 | 0.96 | 1.00 | 1.00 | 1.00 | 0.93 | 0.96 | 1.00 | 0.92 | 0.96 | 1.00 | 0.79 |
| | actions + CR | 1.00 | **1.00** | 1.00 | 1.00 | 1.00 | **1.00** | *0.90* | 1.00 | **1.00** | **1.00** | 1.00 | **1.00** |
......@@ -422,11 +406,11 @@ move into their decision-making, we analyse their performance of each generative
agent in the RPS game. In this setup, a victory awards 2 points, a draw 1 point,
and a loss 0 points.
Figures below illustrates the average points earned per round along with
Figure below illustrates the average points earned per round along with
the 95 % confidence interval for each LLM when facing constant strategies,
when the model generates one-shot actions.
Even if <tt>Mixtral:8x7b</tt>, <tt>Mistral-Small</tt>, and <tt><Qwen3/tt> accurately predict its
opponent’s move, they fails to integrate this belief into
Even if <tt>Mixtral:8x7b</tt>, <tt>Mistral-Small</tt>, and <tt>Qwen3</tt> accurately predict its
opponent’s move, they fail to integrate this belief into
its decision-making process. Only <tt>Llama3.3:latest</tt> is capable of inferring
the opponent’s behavior to choose the winning move.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment