Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyGAAMAS
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Maxime Morge
PyGAAMAS
Commits
44dceb1a
Commit
44dceb1a
authored
1 week ago
by
Maxime Morge
Browse files
Options
Downloads
Patches
Plain Diff
PyGAAMAS: minor corrections of README.md
parent
6013a082
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+7
-23
7 additions, 23 deletions
README.md
with
7 additions
and
23 deletions
README.md
+
7
−
23
View file @
44dceb1a
...
...
@@ -26,8 +26,8 @@ erratically to changes in the game’s parameters.
In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
each point assigned to Asset A is worth $
\$
0.8$, while each point allocated to Asset B yields $
\$
0.5$.
T
he game is played $25$ times to assess the consistency of the investor’s decisions.
each point assigned to Asset A is worth $
\$
0.8$, while each point allocated to Asset B yields $
\$
0.5$.
T
he game is played $25$ times to assess the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
...
...
@@ -274,22 +274,6 @@ informed decision-making.
Table below evaluates the models' ability to generate second-order rational behaviour for player 1. The configurations
where CR improves second-order rationality are in bold, and those where CR degrades this rationality are in italics.
When the models generate strategies,
<tt>
GPT-4.5
</tt>
exhibits second-order
rational behaviour in configurations (a), (c), and (d), but fails in
configuration (b) to distinguish the optimal action from a nearly optimal one.
Llama3 makes its decision randomly. Mistral-Small shows strong
capabilities in generating second-order rational behaviour. DeepSeek-R1
does not produce valid responses.
When generating actions,
<tt>
Llama3
</tt>
adapts to different types of beliefs
and adjustments in the payoff matrix.
<tt>
GPT-4.5
</tt>
performs well in the
initial configuration (a), but encounters significant difficulties when the
payoff structure changes (b, c, d), particularly with implicit beliefs. Although
Mistral-Small works well with given or explicit beliefs, it faces
difficulties with implicit beliefs, especially in variant (d).
<tt>
DeepSeek-R1
</tt>
does not appear to be a good candidate for simulating
second-order rationality.
When generating strategies,
<tt>
GPT-4.5
</tt>
consistently exhibits second-order rational behavior in all configurations
except (b), where it fails to distinguish the optimal action from a nearly optimal one. Llama3 makes decisions randomly,
showing no strong pattern of rational behavior. In contrast,
<tt>
Mistral-Small
</tt>
and
<tt>
Mixtral-8x7B
</tt>
...
...
@@ -297,7 +281,7 @@ demonstrate strong capabilities across all conditions, consistently generating
<tt>
Llama3.3:latest
</tt>
performs well with given and explicit beliefs but struggles with implicit beliefs.
<tt>
Qwen3
</tt>
generate irrational strategies.
<tt>
DeepSeek-R1
</tt>
does not produce valid responses in strategy generation.
When generating actions, Llama3.3:latest adapts well to different types of beliefs and adjustments in the payoff matrix
When generating actions,
<tt>
Llama3.3:latest
</tt>
adapts well to different types of beliefs and adjustments in the payoff matrix
but struggles with implicit beliefs, particularly in configuration (d).
<tt>
GPT-4.5
</tt>
performs well in the initial
configuration (a) but encounters significant difficulties when the payoff structure changes in (b), (c), and (d),
especially with implicit beliefs.
<tt>
Mixtral-8x7B
</tt>
generally performs well but shows reduced accuracy for implicit beliefs
...
...
@@ -336,7 +320,7 @@ particularly in less confident or under-specified contexts.
| | actions + CR |
*0.90*
|
*0.90*
|
*0.86*
|
*0.50*
|
*0.50*
|
*0.50*
|
*0.76*
| 0.96 |
*0.70*
|
*0.67*
|
*0.83*
| 0.67 |
|
**Mixtral:8x7b**
| actions | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.50 | 1.0 | 1.0 | 1.0 | 1.00 | 1.00 | 0.73 |
| | actions + CR | 1.00 |
*0.96*
| 1.00 | 1.00 | 1.00 |
**1.0**
| 1.0 | 1.0 | 1.0 | 1.00 | 1.00 |
*0.28*
|
|
**
L
istral-Small**
| actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 |
|
**
M
istral-Small**
| actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 |
| | actions + CR |
**1.00**
|
*0.93*
| 1.00 |
**0.95**
|
**0.96**
|
**0.90**
|
**0.90**
|
**0.76**
|
*0.43*
|
*0.67*
|
*0.40*
| 0.37 |
|
**Deepseek-R1:7b**
| actions | 1.00 | 0.96 | 1.00 | 1.00 | 1.00 | 0.93 | 0.96 | 1.00 | 0.92 | 0.96 | 1.00 | 0.79 |
| | actions + CR | 1.00 |
**1.00**
| 1.00 | 1.00 | 1.00 |
**1.00**
|
*0.90*
| 1.00 |
**1.00**
|
**1.00**
| 1.00 |
**1.00**
|
...
...
@@ -422,11 +406,11 @@ move into their decision-making, we analyse their performance of each generative
agent in the RPS game. In this setup, a victory awards 2 points, a draw 1 point,
and a loss 0 points.
Figure
s
below illustrates the average points earned per round along with
Figure below illustrates the average points earned per round along with
the 95 % confidence interval for each LLM when facing constant strategies,
when the model generates one-shot actions.
Even if
<tt>
Mixtral:8x7b
</tt>
,
<tt>
Mistral-Small
</tt>
, and
<tt>
<
Qwen3
/
tt
>
accurately predict its
opponent’s move, they fail
s
to integrate this belief into
Even if
<tt>
Mixtral:8x7b
</tt>
,
<tt>
Mistral-Small
</tt>
, and
<tt>
Qwen3
<
/tt>
accurately predict its
opponent’s move, they fail to integrate this belief into
its decision-making process. Only
<tt>
Llama3.3:latest
</tt>
is capable of inferring
the opponent’s behavior to choose the winning move.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment