Skip to content
Snippets Groups Projects
Commit b00de087 authored by Maxime MORGE's avatar Maxime MORGE
Browse files

Investment game results

parent 559b042e
No related branches found
No related tags found
No related merge requests found
......@@ -15,47 +15,55 @@ directions to overcome the current limitations of LLMs.
## Consistency
To assess the decision-making consistency of various LLMs, we introduce an
investment game that allows us to observe whether LLMs follow stable
decision-making patterns or react erratically to variations in the game’s
parameters.
An investor must allocate a basket (p_t^A, p_t^B) of 100 points between Asset A
and Asset B. The value of these points depends on two random parameters (a_t,
b_t), which determine how much money is received per allocated point.
For example, if a = 0.8 and b = 0.5, this means that each point allocated to
Asset A is worth $0.8, while each point allocated to Asset B yields $0.5. The
game is played 25 times to analyze the consistency of the investor’s
decision-making.
The investor’s decisions are then evaluated using an index called the Critical
Cost Efficiency Index (CCEI), which measures whether decisions adhere to a
certain level of economic rationality. This index is commonly used in
experimental economics and behavioral sciences to determine whether an
individual’s choices comply with the Generalized Axiom of Revealed Preference
(GARP). If an individual violates rational choice consistency, it is possible to
make a minimal budget adjustment (reducing resources) to align their decisions
with rationality. The CCEI quantifies the smallest budget reduction necessary to
make the choices consistent with rational principles.
Each basket’s budget is calculated as follows:
To evaluate the decision-making consistency of various LLMs, we introduce an investment
game designed to test whether these models follow stable decision-making patterns or
react erratically to changes in the game’s parameters.
In the game, an investor allocates a basket \((p_t^A, p_t^B)\) of 100 points between two assets:
Asset A and Asset B. The value of these points depends on two random parameters \((a_t, b_t)\),
which determine the monetary return per allocated point.
For example, if \(a_t = 0.8\) and \(b_t = 0.5\), each point assigned to Asset A is worth $0.8,
while each point allocated to Asset B yields $0.5. The game is played 25 times to assess
the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use the **Critical Cost Efficiency Index (CCEI)**,
a widely used measure in experimental economics and behavioral sciences. The CCEI assesses
whether choices adhere to the **Generalized Axiom of Revealed Preference (GARP)**,
a fundamental principle of rational decision-making.
If an individual violates rational choice consistency,
the CCEI determines the minimal budget adjustment required to make their
decisions align with rationality. Mathematically, the budget for each basket is calculated as:
\[
I_t = p_t^A \times a_t + p_t^B \times b_t
\]
The CCEI index is derived from observed decisions. It is based on a linear
optimization problem that seeks the smallest necessary adjustment to budget
constraints to make the decisions conform to a rational model.
The CCEI is derived from observed decisions by solving a linear optimization
problem that finds the largest \(\lambda\) (where \(0 \leq \lambda \leq 1\))
such that for every observation, the adjusted decisions satisfy the rationality constraint:
\[
p^_t \cdot x_s \leq \lambda I_t
\]
This means that if we slightly reduce the budget (multiplying it by \(\lambda\)),
the choices will become consistent with rational decision-making.
A CCEI close to 1 indicates high rationality and consistency with economic theory.
A low CCEEI** suggests irrational or inconsistent decision-making.
To ensure response consistency, each model undergoes 30 iterations of the game
with a fixed temperature of 0.0.
The results indicate significant differences in decision-making consistency among the evaluated models.
Mistral-Small demonstrates the highest level of rationality, with CCEI values consistently above 0.75.
Llama 3 performs moderately well, with CCEI values ranging between 0.2 and 0.74.
DeepSeek R1 exhibits inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83
![CCEI Distribution per model](figures/investment/investment.svg)
We aim to find the largest \lambda (0 ≤ λ ≤ 1) such that, for every observation,
the adjusted decisions satisfy the rationality constraint: p^_t \cdot x_s \leq
\lambda I_t This means that if we slightly reduce the individual’s budget (by
multiplying it by \lambda), their choices will become consistent with
rationality.
A CCEI close to 1 indicates that the choices are rational and consistent with
rational choice theory, whereas a low CCEI suggests irrational behavior. For
each model, 30 iterations of the game are conducted with a fixed temperature of
0.0 to maximize response consistency.
## Preferences
......
iteration,model,temperature,ccei
1,optimal,0.0,1.0
2,optimal,0.0,1.0
3,optimal,0.0,1.0
4,optimal,0.0,1.0
5,optimal,0.0,1.0
6,optimal,0.0,1.0
7,optimal,0.0,1.0
8,optimal,0.0,1.0
9,optimal,0.0,1.0
10,optimal,0.0,1.0
1,random,0.0,1.0
2,random,0.0,1.0
3,random,0.0,1.0
4,random,0.0,1.0
5,random,0.0,1.0
6,random,0.0,1.0
7,random,0.0,1.0
8,random,0.0,1.0
9,random,0.0,1.0
10,random,0.0,1.0
1,random,0.0,0.30081300813008127
2,random,0.0,0.29500000000000004
3,random,0.0,0.24739884393063583
4,random,0.0,0.2987012987012987
5,random,0.0,0.3548387096774194
6,random,0.0,0.3986486486486487
7,random,0.0,0.2810650887573965
8,random,0.0,0.35494880546075086
9,random,0.0,0.272
10,random,0.0,0.22171945701357465
11,random,0.0,0.3150105708245243
12,random,0.0,0.25641025641025644
13,random,0.0,0.1569506726457399
14,random,0.0,0.247787610619469
15,random,0.0,0.5156950672645739
16,random,0.0,0.2645739910313901
17,random,0.0,0.40444444444444444
18,random,0.0,0.27455919395465994
19,random,0.0,0.20800000000000002
20,random,0.0,0.20930232558139536
21,random,0.0,0.3823529411764706
22,random,0.0,0.49244712990936557
23,random,0.0,0.2
24,random,0.0,0.46881287726358145
25,random,0.0,0.30916414904330314
26,random,0.0,0.21908602150537634
27,random,0.0,0.21929824561403508
28,random,0.0,0.23555555555555555
29,random,0.0,0.4501510574018127
30,random,0.0,0.2896174863387978
1,llama3,0.0,0.5533333333333333
2,llama3,0.0,0.23636363636363636
3,llama3,0.0,0.38461538461538464
4,llama3,0.0,0.391304347826087
5,llama3,0.0,0.379746835443038
6,llama3,0.0,0.4473684210526316
7,llama3,0.0,0.2
8,llama3,0.0,0.45454545454545453
9,llama3,0.0,0.6133333333333333
10,llama3,0.0,0.4314720812182741
11,llama3,0.0,0.665
12,llama3,0.0,0.43533930857874514
13,llama3,0.0,0.5
14,llama3,0.0,0.519277108433735
15,llama3,0.0,0.27586206896551724
16,llama3,0.0,0.5555555555555556
17,llama3,0.0,0.5
18,llama3,0.0,0.35714285714285715
19,llama3,0.0,0.4246575342465753
20,llama3,0.0,0.25
21,llama3,0.0,0.23809523809523808
22,llama3,0.0,0.3333333333333333
23,llama3,0.0,0.475
24,llama3,0.0,0.6666666666666666
25,llama3,0.0,0.3151079136690647
26,llama3,0.0,0.3368421052631579
27,llama3,0.0,0.742
28,llama3,0.0,0.5333333333333333
29,llama3,0.0,0.5172413793103449
30,llama3,0.0,0.5
1,mistral-small,0.0,1.0
2,mistral-small,0.0,0.6222222222222222
3,mistral-small,0.0,0.8866666666666667
4,mistral-small,0.0,0.90625
5,mistral-small,0.0,0.6785714285714286
6,mistral-small,0.0,0.90625
7,mistral-small,0.0,0.5555555555555556
8,mistral-small,0.0,0.8844444444444444
9,mistral-small,0.0,0.8266666666666667
10,mistral-small,0.0,0.3333333333333333
11,mistral-small,0.0,0.8333333333333334
12,mistral-small,0.0,0.8888888888888888
13,mistral-small,0.0,0.925
14,mistral-small,0.0,0.78
15,mistral-small,0.0,0.804
16,mistral-small,0.0,0.75
17,mistral-small,0.0,0.84
18,mistral-small,0.0,0.625
19,mistral-small,0.0,0.90625
20,mistral-small,0.0,0.8166666666666667
21,mistral-small,0.0,0.875
22,mistral-small,0.0,0.9866666666666667
23,mistral-small,0.0,0.7733333333333333
24,mistral-small,0.0,0.92
25,mistral-small,0.0,0.8571428571428571
26,mistral-small,0.0,0.9375
27,mistral-small,0.0,0.7150000000000001
28,mistral-small,0.0,0.9625
29,mistral-small,0.0,0.8928571428571429
30,mistral-small,0.0,0.802
1,deepseek-r1,0.0,0.25
2,deepseek-r1,0.0,0.5882352941176471
3,deepseek-r1,0.0,0.42207792207792205
4,deepseek-r1,0.0,0.335
5,deepseek-r1,0.0,0.22105263157894736
6,deepseek-r1,0.0,0.22999999999999998
7,deepseek-r1,0.0,0.3058510638297872
8,deepseek-r1,0.0,0.32142857142857145
9,deepseek-r1,0.0,0.6711409395973154
10,deepseek-r1,0.0,0.25
11,deepseek-r1,0.0,0.75
12,deepseek-r1,0.0,0.8315217391304347
13,deepseek-r1,0.0,0.4597701149425288
14,deepseek-r1,0.0,0.32142857142857145
15,deepseek-r1,0.0,0.16875
16,deepseek-r1,0.0,0.25157232704402516
17,deepseek-r1,0.0,0.574712643678161
18,deepseek-r1,0.0,0.28776978417266186
19,deepseek-r1,0.0,0.32142857142857145
20,deepseek-r1,0.0,0.7304347826086957
21,deepseek-r1,0.0,0.17329910141206675
22,deepseek-r1,0.0,0.175
23,deepseek-r1,0.0,0.45454545454545453
24,deepseek-r1,0.0,0.2992700729927008
25,deepseek-r1,0.0,0.34
26,deepseek-r1,0.0,0.2
27,deepseek-r1,0.0,0.3333333333333333
28,deepseek-r1,0.0,0.1870967741935484
29,deepseek-r1,0.0,0.2575
30,deepseek-r1,0.0,0.1514285714285714
iteration,model,temperature,ccei
1,random,0.0,0.1993
2,random,0.0,0.2841
3,random,0.0,0.2584
4,random,0.0,0.3276
5,random,0.0,0.3676
6,random,0.0,0.1505
7,random,0.0,0.1047
8,random,0.0,0.1385
9,random,0.0,0.1866
10,random,0.0,0.3135
11,random,0.0,0.1835
12,random,0.0,0.29
13,random,0.0,0.2523
14,random,0.0,0.2014
15,random,0.0,0.1676
16,random,0.0,0.2469
17,random,0.0,0.2429
18,random,0.0,0.1676
19,random,0.0,0.1818
20,random,0.0,0.3804
21,random,0.0,0.2718
22,random,0.0,0.336
23,random,0.0,0.2473
24,random,0.0,0.1942
25,random,0.0,0.2857
26,random,0.0,0.2163
27,random,0.0,0.4202
28,random,0.0,0.1175
29,random,0.0,0.2494
30,random,0.0,0.2693
1,llama3,0.0,0.3636
2,llama3,0.0,0.1515
3,llama3,0.0,0.4684
4,llama3,0.0,0.1515
5,llama3,0.0,0.3636
6,llama3,0.0,0.2222
7,llama3,0.0,0.2222
8,llama3,0.0,0.5242
9,llama3,0.0,0.2632
10,llama3,0.0,0.3
11,llama3,0.0,0.1429
12,llama3,0.0,0.1466
13,llama3,0.0,0.3226
14,llama3,0.0,0.1695
15,llama3,0.0,0.25
16,llama3,0.0,0.2381
17,llama3,0.0,0.3391
18,llama3,0.0,0.4386
19,llama3,0.0,0.1639
20,llama3,0.0,0.2909
21,llama3,0.0,0.3077
22,llama3,0.0,0.3906
23,llama3,0.0,0.4
24,llama3,0.0,0.1563
25,llama3,0.0,0.2727
26,llama3,0.0,0.2381
27,llama3,0.0,0.2759
28,llama3,0.0,0.1818
29,llama3,0.0,0.2899
30,llama3,0.0,0.2857
1,mistral-small,0.0,0.1429
2,mistral-small,0.0,0.1111
3,mistral-small,0.0,0.1
4,mistral-small,0.0,0.1111
5,mistral-small,0.0,0.1111
6,mistral-small,0.0,0.1
7,mistral-small,0.0,0.1667
8,mistral-small,0.0,0.1429
9,mistral-small,0.0,0.1309
10,mistral-small,0.0,0.2222
11,mistral-small,0.0,0.1667
12,mistral-small,0.0,0.125
13,mistral-small,0.0,0.1
14,mistral-small,0.0,0.1111
15,mistral-small,0.0,0.2
16,mistral-small,0.0,0.1111
17,mistral-small,0.0,0.1111
18,mistral-small,0.0,0.1111
19,mistral-small,0.0,0.1493
20,mistral-small,0.0,0.1429
21,mistral-small,0.0,0.125
22,mistral-small,0.0,0.1111
23,mistral-small,0.0,0.125
24,mistral-small,0.0,0.2
25,mistral-small,0.0,0.1429
26,mistral-small,0.0,0.1
27,mistral-small,0.0,0.1111
28,mistral-small,0.0,0.1111
29,mistral-small,0.0,0.1111
30,mistral-small,0.0,0.1111
1,deepseek-r1, 0.0,0.1667
2,deepseek-r1,0.0,0.2097
3,deepseek-r1,0.0,0.3333
4,deepseek-r1,0.0,0.1905
5,deepseek-r1,0.0,0.2754
6,deepseek-r1,0.0,0.1111
\ No newline at end of file
This diff is collapsed.
......@@ -24,7 +24,7 @@ class AgentResponse(BaseModel):
# The investment game simulation class
class Investment:
def __init__(self, model: str, temperature: float, max_retries: int = 3):
self.debug = True
self.debug = False
self.model = model
self.temperature = temperature
self.strategy = random
......@@ -159,39 +159,33 @@ class Investment:
"""
Computes the Critical Cost Efficiency Index (CCEI).
"""
n = len(prices) # Number of observations
c = [1] # Minimize theta
A_ub = [] # Constraint matrix
b_ub = [] # Right-hand side values
for t in range(n):
for s in range(n):
lhs = np.dot(prices[t], choices[s]) # p_t * x_s
rhs = np.dot(prices[t], choices[t]) # p_t * x_t
if lhs > rhs: # Only add constraints where direct revealed preference exists
A_ub.append([-budgets[t]]) # -theta * I_t
b_ub.append(lhs - rhs)
# Ensure A_ub is a valid 2D array
if A_ub:
A_ub = np.array(A_ub, dtype=float)
if A_ub.ndim == 1: # Ensure it's 2D
A_ub = A_ub.reshape(-1, 1)
else:
A_ub = np.zeros((1, 1), dtype=float) # If no constraints, use a trivial 1x1 matrix
# Ensure b_ub is also valid
b_ub = np.array(b_ub, dtype=float) if b_ub else np.zeros(1, dtype=float)
bounds = [(0, 1)] # Theta bounds
# Solve the linear program
result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds, method="highs")
if result.success:
return round(result.x[0], 4) # Return optimized theta value
else:
print("CCEI computation failed. Check constraints.")
return 0 # Return 0 if LP fails
num_rounds = len(prices)
lambdas = []
for i in range(num_rounds):
# Objective: Maximize lambda (minimize -lambda)
c = [-1]
# Constraints: p_t * x_s <= lambda * I_t for all rounds s
A = []
b = []
for j in range(num_rounds):
p_t = prices[i]
x_s = choices[j]
I_t = budgets[i]
A.append([p_t[0] * x_s[0] + p_t[1] * x_s[1]])
b.append(I_t)
# Add constraint that lambda is between 0 and 1
bounds = [(0, 1)]
# Solve the linear programming problem
res = linprog(c, A_ub=A, b_ub=b, bounds=bounds, method='highs')
if res.success:
lambdas.append(res.x[0])
else:
lambdas.append(0)
return min(lambdas) # The CCEI is the minimum lambda across all rounds
# Run the async function and return the response
if __name__ == "__main__":
game_agent = Investment(model="mistral-small", temperature=0.0) # Toggle strategy here
response = asyncio.run(game_agent.run_rounds(10))
response = asyncio.run(game_agent.run_rounds(30))
print(response)
......@@ -4,7 +4,7 @@ import matplotlib.pyplot as plt
# Custom color palette
color_palette = {
'random': '#333333', #
'random' : '#333333', # Black
'gpt-4.5-preview-2025-02-27': '#7abaff', # Blue
'llama3': '#32a68c', # Green
'mistral-small': '#ff6941', # Orange
......
......@@ -3,9 +3,9 @@ import csv
from investment import Investment # Assuming this is in a separate file
# Define models, temperature, and iterations
models = ["optimal", "random"] # "gpt-4.5-preview-2025-02-27" "random", "llama3", "mistral-small", "deepseek-r1"
models = ["optimal", "random", "llama3", "mistral-small", "deepseek-r1"] # "gpt-4.5-preview-2025-02-27", "optimal", "random", "llama3", "mistral-small", "deepseek-r1"
temperature = 0.0
iterations = 10
iterations = 30
output_file = "../../data/investment/investment.csv"
async def run_experiment():
......@@ -22,7 +22,7 @@ async def run_experiment():
# Run DictatorConsistency experiment
game_agent = Investment(model=model, temperature=temperature)
ccei_value = await game_agent.run_rounds(10) # Run 25 rounds
ccei_value = await game_agent.run_rounds(25) # Run 25 rounds
# Write results to CSV
writer.writerow([iteration, model, temperature, ccei_value])
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment