Skip to content
Snippets Groups Projects
Commit ed1b0be9 authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

Improve economic rationality description

parent 8cee9ecc
No related branches found
No related tags found
No related merge requests found
......@@ -3,65 +3,62 @@
Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate
the social behaviors of LLM-based agents.
This prototype allows to analyse the potential of Large Language Models (LLMs) for
social simulation by assessing their ability to: (a) make decisions aligned
with explicit preferences; (b) adhere to principles of rationality; and (c)
refine their beliefs to anticipate the actions of other agents. Through
game-theoretic experiments, we show that certain models, such as
\texttt{GPT-4.5} and \texttt{Mistral-Small}, exhibit consistent behaviours in
simple contexts but struggle with more complex scenarios requiring
anticipation of other agents' behaviour. Our study outlines research
directions to overcome the current limitations of LLMs.
## Consistency
To evaluate the decision-making consistency of various LLMs, we introduce an investment
game designed to test whether these models follow stable decision-making patterns or
react erratically to changes in the game’s parameters.
In the game, an investor allocates a basket \((p_t^A, p_t^B)\) of 100 points between two assets:
Asset A and Asset B. The value of these points depends on two random parameters \((a_t, b_t)\),
which determine the monetary return per allocated point.
For example, if \(a_t = 0.8\) and \(b_t = 0.5\), each point assigned to Asset A is worth $0.8,
while each point allocated to Asset B yields $0.5. The game is played 25 times to assess
the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use the **Critical Cost Efficiency Index (CCEI)**,
a widely used measure in experimental economics and behavioral sciences. The CCEI assesses
whether choices adhere to the **Generalized Axiom of Revealed Preference (GARP)**,
a fundamental principle of rational decision-making.
If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
decisions align with rationality. Mathematically, the budget for each basket is calculated as:
\[
I_t = p_t^A \times a_t + p_t^B \times b_t
\]
The CCEI is derived from observed decisions by solving a linear optimization
problem that finds the largest \(\lambda\) (where \(0 \leq \lambda \leq 1\))
such that for every observation, the adjusted decisions satisfy the rationality constraint:
\[
p^_t \cdot x_s \leq \lambda I_t
\]
This means that if we slightly reduce the budget (multiplying it by \(\lambda\)),
the choices will become consistent with rational decision-making.
A CCEI close to 1 indicates high rationality and consistency with economic theory.
A low CCEEI** suggests irrational or inconsistent decision-making.
To ensure response consistency, each model undergoes 30 iterations of the game
with a fixed temperature of 0.0.
The results indicate significant differences in decision-making consistency among the evaluated models.
Mistral-Small demonstrates the highest level of rationality, with CCEI values consistently above 0.75.
Llama 3 performs moderately well, with CCEI values ranging between 0.2 and 0.74.
DeepSeek R1 exhibits inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83
![CCEI Distribution per model](figures/investment/investment_boxplot.svg)
This prototype explores the potential of *homo silicus* for social
simulation. We examine the behaviour exhibited by intelligent
machines, particularly how generative agents deviate from
the principles of rationality. To assess their responses to simple human-like
strategies, we employ a series of tightly controlled and theoretically
well-understood games. Through behavioral game theory, we evaluate the ability
of <tt>GPT-4.5</tt>, <tt>Llama3</tt>, <tt>Mistral-Small</tt>}, and
<tt>DeepSeek-R1</tt> to make coherent one-shot
decisions, generate algorithmic strategies based on explicit preferences, adhere
to first- and second-order rationality principles, and refine their beliefs in
response to other agents’ behaviours.
## Economic Rationality
## Evaluating Economic Rationality in LLMs
To evaluate the economic rationality of various LLMs, we introduce an investment game
designed to test whether these models follow stable decision-making patterns or react
erratically to changes in the game’s parameters.
In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$. T
he game is played $25$ times to assess the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
experimental economics. The CCEI assesses whether choices adhere to the
generalized axiom of revealed preference (GARP), a fundamental principle of
rational decision-making. If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
decisions align with rationality. Mathematically, the budget for each basket is
calculated as: $ I_t = p_t^A \times x^A_t + p_t^B \times x^B_t$. The CCEI is
derived from observed decisions by solving a linear optimization problem that
finds the largest $\lambda$, where $0 \leq \lambda \leq 1$, such that for every
observation, the adjusted decisions satisfy the rationality constraint: $p_t
\cdot x_t \leq \lambda I_t$. This means that if we slightly reduce the budget,
multiplying it by $\lambda$, the choices will become consistent with rational
decision-making. A CCEI close to 1 indicates high rationality and consistency
with economic theory. A low CCEEI suggests irrational or inconsistent
decision-making.
To ensure response consistency, each model undergoes $30$ iterations of the game
with a fixed temperature of $0.0$. The results shown in
Figure below highlight significant differences in decision-making
consistency among the evaluated models. <tt>GPT-4.5</tt>, <tt>LLama3.3:latest</tt>
and <tt>DeepSeek-R1:7b</tt> stand out with a
perfect CCEI score of 1.0, indicating flawless rationality in decision-making.
<tt>Mistral-Small</tt> and <tt>Mixtral:8x7b</tt> demonstrate the next highest level of rationality.
<tt>Llama3</tt> performs moderately well, with CCEI values ranging between 0.2 and 0.74.
<tt>DeepSeek-R1</tt> exhibits
inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.
![CCEI Distribution per model](figures/investment/investment_violin.svg)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment