Skip to content
Snippets Groups Projects
Commit ed1b0be9 authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

Improve economic rationality description

parent 8cee9ecc
No related branches found
No related tags found
No related merge requests found
...@@ -3,65 +3,62 @@ ...@@ -3,65 +3,62 @@
Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate Python Generative Autonomous Agents and Multi-Agent Systems aims to evaluate
the social behaviors of LLM-based agents. the social behaviors of LLM-based agents.
This prototype allows to analyse the potential of Large Language Models (LLMs) for This prototype explores the potential of *homo silicus* for social
social simulation by assessing their ability to: (a) make decisions aligned simulation. We examine the behaviour exhibited by intelligent
with explicit preferences; (b) adhere to principles of rationality; and (c) machines, particularly how generative agents deviate from
refine their beliefs to anticipate the actions of other agents. Through the principles of rationality. To assess their responses to simple human-like
game-theoretic experiments, we show that certain models, such as strategies, we employ a series of tightly controlled and theoretically
\texttt{GPT-4.5} and \texttt{Mistral-Small}, exhibit consistent behaviours in well-understood games. Through behavioral game theory, we evaluate the ability
simple contexts but struggle with more complex scenarios requiring of <tt>GPT-4.5</tt>, <tt>Llama3</tt>, <tt>Mistral-Small</tt>}, and
anticipation of other agents' behaviour. Our study outlines research <tt>DeepSeek-R1</tt> to make coherent one-shot
directions to overcome the current limitations of LLMs. decisions, generate algorithmic strategies based on explicit preferences, adhere
to first- and second-order rationality principles, and refine their beliefs in
## Consistency response to other agents’ behaviours.
To evaluate the decision-making consistency of various LLMs, we introduce an investment
game designed to test whether these models follow stable decision-making patterns or ## Economic Rationality
react erratically to changes in the game’s parameters.
## Evaluating Economic Rationality in LLMs
In the game, an investor allocates a basket \((p_t^A, p_t^B)\) of 100 points between two assets:
Asset A and Asset B. The value of these points depends on two random parameters \((a_t, b_t)\), To evaluate the economic rationality of various LLMs, we introduce an investment game
which determine the monetary return per allocated point. designed to test whether these models follow stable decision-making patterns or react
erratically to changes in the game’s parameters.
For example, if \(a_t = 0.8\) and \(b_t = 0.5\), each point assigned to Asset A is worth $0.8,
while each point allocated to Asset B yields $0.5. The game is played 25 times to assess In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ points between
the consistency of the investor’s decisions. two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
To evaluate the rationality of the decisions, we use the **Critical Cost Efficiency Index (CCEI)**, each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$. T
a widely used measure in experimental economics and behavioral sciences. The CCEI assesses he game is played $25$ times to assess the consistency of the investor’s decisions.
whether choices adhere to the **Generalized Axiom of Revealed Preference (GARP)**,
a fundamental principle of rational decision-making. To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in
If an individual violates rational choice consistency, experimental economics. The CCEI assesses whether choices adhere to the
the CCEI determines the minimal budget adjustment required to make their generalized axiom of revealed preference (GARP), a fundamental principle of
decisions align with rationality. Mathematically, the budget for each basket is calculated as: rational decision-making. If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
\[ decisions align with rationality. Mathematically, the budget for each basket is
I_t = p_t^A \times a_t + p_t^B \times b_t calculated as: $ I_t = p_t^A \times x^A_t + p_t^B \times x^B_t$. The CCEI is
\] derived from observed decisions by solving a linear optimization problem that
finds the largest $\lambda$, where $0 \leq \lambda \leq 1$, such that for every
The CCEI is derived from observed decisions by solving a linear optimization observation, the adjusted decisions satisfy the rationality constraint: $p_t
problem that finds the largest \(\lambda\) (where \(0 \leq \lambda \leq 1\)) \cdot x_t \leq \lambda I_t$. This means that if we slightly reduce the budget,
such that for every observation, the adjusted decisions satisfy the rationality constraint: multiplying it by $\lambda$, the choices will become consistent with rational
decision-making. A CCEI close to 1 indicates high rationality and consistency
\[ with economic theory. A low CCEEI suggests irrational or inconsistent
p^_t \cdot x_s \leq \lambda I_t decision-making.
\]
To ensure response consistency, each model undergoes $30$ iterations of the game
This means that if we slightly reduce the budget (multiplying it by \(\lambda\)), with a fixed temperature of $0.0$. The results shown in
the choices will become consistent with rational decision-making. Figure below highlight significant differences in decision-making
A CCEI close to 1 indicates high rationality and consistency with economic theory. consistency among the evaluated models. <tt>GPT-4.5</tt>, <tt>LLama3.3:latest</tt>
A low CCEEI** suggests irrational or inconsistent decision-making. and <tt>DeepSeek-R1:7b</tt> stand out with a
perfect CCEI score of 1.0, indicating flawless rationality in decision-making.
To ensure response consistency, each model undergoes 30 iterations of the game <tt>Mistral-Small</tt> and <tt>Mixtral:8x7b</tt> demonstrate the next highest level of rationality.
with a fixed temperature of 0.0. <tt>Llama3</tt> performs moderately well, with CCEI values ranging between 0.2 and 0.74.
<tt>DeepSeek-R1</tt> exhibits
The results indicate significant differences in decision-making consistency among the evaluated models. inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.
Mistral-Small demonstrates the highest level of rationality, with CCEI values consistently above 0.75.
Llama 3 performs moderately well, with CCEI values ranging between 0.2 and 0.74. ![CCEI Distribution per model](figures/investment/investment_violin.svg)
DeepSeek R1 exhibits inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83
![CCEI Distribution per model](figures/investment/investment_boxplot.svg)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment