Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyGAAMAS
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Maxime Morge
PyGAAMAS
Commits
ed1b0be9
Commit
ed1b0be9
authored
4 weeks ago
by
Maxime Morge
Browse files
Options
Downloads
Patches
Plain Diff
Improve economic rationality description
parent
8cee9ecc
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+56
-59
56 additions, 59 deletions
README.md
with
56 additions
and
59 deletions
README.md
+
56
−
59
View file @
ed1b0be9
...
@@ -3,65 +3,62 @@
...
@@ -3,65 +3,62 @@
Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate
Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate
the social behaviors of LLM-based agents.
the social behaviors of LLM-based agents.
This prototype allows to analyse the potential of Large Language Models (LLMs) for
This prototype explores the potential of
*homo silicus*
for social
social simulation by assessing their ability to: (a) make decisions aligned
simulation. We examine the behaviour exhibited by intelligent
with explicit preferences; (b) adhere to principles of rationality; and (c)
machines, particularly how generative agents deviate from
refine their beliefs to anticipate the actions of other agents. Through
the principles of rationality. To assess their responses to simple human-like
game-theoretic experiments, we show that certain models, such as
strategies, we employ a series of tightly controlled and theoretically
\t
exttt{GPT-4.5} and
\t
exttt{Mistral-Small}, exhibit consistent behaviours in
well-understood games. Through behavioral game theory, we evaluate the ability
simple contexts but struggle with more complex scenarios requiring
of
<tt>
GPT-4.5
</tt>
,
<tt>
Llama3
</tt>
,
<tt>
Mistral-Small
</tt>
}, and
anticipation of other agents' behaviour. Our study outlines research
<tt>
DeepSeek-R1
</tt>
to make coherent one-shot
directions to overcome the current limitations of LLMs.
decisions, generate algorithmic strategies based on explicit preferences, adhere
to first- and second-order rationality principles, and refine their beliefs in
## Consistency
response to other agents’ behaviours.
To evaluate the decision-making consistency of various LLMs, we introduce an investment
game designed to test whether these models follow stable decision-making patterns or
## Economic Rationality
react erratically to changes in the game’s parameters.
## Evaluating Economic Rationality in LLMs
In the game, an investor allocates a basket
\(
(p_t^A, p_t^B)
\)
of 100 points between two assets:
Asset A and Asset B. The value of these points depends on two random parameters
\(
(a_t, b_t)
\)
,
To evaluate the economic rationality of various LLMs, we introduce an investment game
which determine the monetary return per allocated point.
designed to test whether these models follow stable decision-making patterns or react
erratically to changes in the game’s parameters.
For example, if
\(
a_t = 0.8
\)
and
\(
b_t = 0.5
\)
, each point assigned to Asset A is worth $0.8,
while each point allocated to Asset B yields $0.5. The game is played 25 times to assess
In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
the consistency of the investor’s decisions.
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
To evaluate the rationality of the decisions, we use the
**Critical Cost Efficiency Index (CCEI)**
,
each point assigned to Asset A is worth $
\$
0.8$, while each point allocated to Asset B yields $
\$
0.5$. T
a widely used measure in experimental economics and behavioral sciences. The CCEI assesses
he game is played $25$ times to assess the consistency of the investor’s decisions.
whether choices adhere to the
**Generalized Axiom of Revealed Preference (GARP)**
,
a fundamental principle of rational decision-making.
To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
If an individual violates rational choice consistency,
experimental economics. The CCEI assesses whether choices adhere to the
the CCEI determines the minimal budget adjustment required to make their
generalized axiom of revealed preference (GARP), a fundamental principle of
decisions align with rationality. Mathematically, the budget for each basket is calculated as:
rational decision-making. If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
\[
decisions align with rationality. Mathematically, the budget for each basket is
I_t = p_t^A
\t
imes a_t + p_t^B
\t
imes b_t
calculated as: $ I_t = p_t^A
\t
imes x^A_t + p_t^B
\t
imes x^B_t$. The CCEI is
\]
derived from observed decisions by solving a linear optimization problem that
finds the largest $
\l
ambda$, where $0
\l
eq
\l
ambda
\l
eq 1$, such that for every
The CCEI is derived from observed decisions by solving a linear optimization
observation, the adjusted decisions satisfy the rationality constraint: $p_t
problem that finds the largest
\(\l
ambda
\)
(where
\(
0
\l
eq
\l
ambda
\l
eq 1
\)
)
\c
dot x_t
\l
eq
\l
ambda I_t$. This means that if we slightly reduce the budget,
such that for every observation, the adjusted decisions satisfy the rationality constraint:
multiplying it by $
\l
ambda$, the choices will become consistent with rational
decision-making. A CCEI close to 1 indicates high rationality and consistency
\[
with economic theory. A low CCEEI suggests irrational or inconsistent
p^_t
\c
dot x_s
\l
eq
\l
ambda I_t
decision-making.
\]
To ensure response consistency, each model undergoes $30$ iterations of the game
This means that if we slightly reduce the budget (multiplying it by
\(\l
ambda
\)
),
with a fixed temperature of $0.0$. The results shown in
the choices will become consistent with rational decision-making.
Figure below highlight significant differences in decision-making
A CCEI close to 1 indicates high rationality and consistency with economic theory.
consistency among the evaluated models.
<tt>
GPT-4.5
</tt>
,
<tt>
LLama3.3:latest
</tt>
A low CCEEI
**
suggests irrational or inconsistent decision-making.
and
<tt>
DeepSeek-R1:7b
</tt>
stand out with a
perfect CCEI score of 1.0, indicating flawless rationality in decision-making.
To ensure response consistency, each model undergoes 30 iterations of the game
<tt>
Mistral-Small
</tt>
and
<tt>
Mixtral:8x7b
</tt>
demonstrate the next highest level of rationality.
with a fixed temperature of 0.0.
<tt>
Llama3
</tt>
performs moderately well, with CCEI values ranging between 0.2 and 0.74.
<tt>
DeepSeek-R1
</tt>
exhibits
The results indicate significant differences in decision-making consistency among the evaluated models.
inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.
Mistral-Small demonstrates the highest level of rationality, with CCEI values consistently above 0.75.
Llama 3 performs moderately well, with CCEI values ranging between 0.2 and 0.74.

DeepSeek R1 exhibits inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83

...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment