Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyGAAMAS
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Maxime Morge
PyGAAMAS
Commits
ed1b0be9
Commit
ed1b0be9
authored
2 weeks ago
by
Maxime Morge
Browse files
Options
Downloads
Patches
Plain Diff
Improve economic rationality description
parent
8cee9ecc
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+56
-59
56 additions, 59 deletions
README.md
with
56 additions
and
59 deletions
README.md
+
56
−
59
View file @
ed1b0be9
...
...
@@ -3,65 +3,62 @@
Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate
the social behaviors of LLM-based agents.
This prototype allows to analyse the potential of Large Language Models (LLMs) for
social simulation by assessing their ability to: (a) make decisions aligned
with explicit preferences; (b) adhere to principles of rationality; and (c)
refine their beliefs to anticipate the actions of other agents. Through
game-theoretic experiments, we show that certain models, such as
\t
exttt{GPT-4.5} and
\t
exttt{Mistral-Small}, exhibit consistent behaviours in
simple contexts but struggle with more complex scenarios requiring
anticipation of other agents' behaviour. Our study outlines research
directions to overcome the current limitations of LLMs.
## Consistency
To evaluate the decision-making consistency of various LLMs, we introduce an investment
game designed to test whether these models follow stable decision-making patterns or
react erratically to changes in the game’s parameters.
In the game, an investor allocates a basket
\(
(p_t^A, p_t^B)
\)
of 100 points between two assets:
Asset A and Asset B. The value of these points depends on two random parameters
\(
(a_t, b_t)
\)
,
which determine the monetary return per allocated point.
For example, if
\(
a_t = 0.8
\)
and
\(
b_t = 0.5
\)
, each point assigned to Asset A is worth $0.8,
while each point allocated to Asset B yields $0.5. The game is played 25 times to assess
the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use the
**Critical Cost Efficiency Index (CCEI)**
,
a widely used measure in experimental economics and behavioral sciences. The CCEI assesses
whether choices adhere to the
**Generalized Axiom of Revealed Preference (GARP)**
,
a fundamental principle of rational decision-making.
If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
decisions align with rationality. Mathematically, the budget for each basket is calculated as:
\[
I_t = p_t^A
\t
imes a_t + p_t^B
\t
imes b_t
\]
The CCEI is derived from observed decisions by solving a linear optimization
problem that finds the largest
\(\l
ambda
\)
(where
\(
0
\l
eq
\l
ambda
\l
eq 1
\)
)
such that for every observation, the adjusted decisions satisfy the rationality constraint:
\[
p^_t
\c
dot x_s
\l
eq
\l
ambda I_t
\]
This means that if we slightly reduce the budget (multiplying it by
\(\l
ambda
\)
),
the choices will become consistent with rational decision-making.
A CCEI close to 1 indicates high rationality and consistency with economic theory.
A low CCEEI
**
suggests irrational or inconsistent decision-making.
To ensure response consistency, each model undergoes 30 iterations of the game
with a fixed temperature of 0.0.
The results indicate significant differences in decision-making consistency among the evaluated models.
Mistral-Small demonstrates the highest level of rationality, with CCEI values consistently above 0.75.
Llama 3 performs moderately well, with CCEI values ranging between 0.2 and 0.74.
DeepSeek R1 exhibits inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83

This prototype explores the potential of
*homo silicus*
for social
simulation. We examine the behaviour exhibited by intelligent
machines, particularly how generative agents deviate from
the principles of rationality. To assess their responses to simple human-like
strategies, we employ a series of tightly controlled and theoretically
well-understood games. Through behavioral game theory, we evaluate the ability
of
<tt>
GPT-4.5
</tt>
,
<tt>
Llama3
</tt>
,
<tt>
Mistral-Small
</tt>
}, and
<tt>
DeepSeek-R1
</tt>
to make coherent one-shot
decisions, generate algorithmic strategies based on explicit preferences, adhere
to first- and second-order rationality principles, and refine their beliefs in
response to other agents’ behaviours.
## Economic Rationality
## Evaluating Economic Rationality in LLMs
To evaluate the economic rationality of various LLMs, we introduce an investment game
designed to test whether these models follow stable decision-making patterns or react
erratically to changes in the game’s parameters.
In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
each point assigned to Asset A is worth $
\$
0.8$, while each point allocated to Asset B yields $
\$
0.5$. T
he game is played $25$ times to assess the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
experimental economics. The CCEI assesses whether choices adhere to the
generalized axiom of revealed preference (GARP), a fundamental principle of
rational decision-making. If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
decisions align with rationality. Mathematically, the budget for each basket is
calculated as: $ I_t = p_t^A
\t
imes x^A_t + p_t^B
\t
imes x^B_t$. The CCEI is
derived from observed decisions by solving a linear optimization problem that
finds the largest $
\l
ambda$, where $0
\l
eq
\l
ambda
\l
eq 1$, such that for every
observation, the adjusted decisions satisfy the rationality constraint: $p_t
\c
dot x_t
\l
eq
\l
ambda I_t$. This means that if we slightly reduce the budget,
multiplying it by $
\l
ambda$, the choices will become consistent with rational
decision-making. A CCEI close to 1 indicates high rationality and consistency
with economic theory. A low CCEEI suggests irrational or inconsistent
decision-making.
To ensure response consistency, each model undergoes $30$ iterations of the game
with a fixed temperature of $0.0$. The results shown in
Figure below highlight significant differences in decision-making
consistency among the evaluated models.
<tt>
GPT-4.5
</tt>
,
<tt>
LLama3.3:latest
</tt>
and
<tt>
DeepSeek-R1:7b
</tt>
stand out with a
perfect CCEI score of 1.0, indicating flawless rationality in decision-making.
<tt>
Mistral-Small
</tt>
and
<tt>
Mixtral:8x7b
</tt>
demonstrate the next highest level of rationality.
<tt>
Llama3
</tt>
performs moderately well, with CCEI values ranging between 0.2 and 0.74.
<tt>
DeepSeek-R1
</tt>
exhibits
inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.

...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment