Skip to content
Snippets Groups Projects
Commit e5b0d13a authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

PyGAAMAS: Evaluation of the proposer in the ultimatum game

parent 9770cd66
No related branches found
No related tags found
No related merge requests found
...@@ -27,7 +27,7 @@ In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ point ...@@ -27,7 +27,7 @@ In this game, an investor allocates a basket $x_t=(x^A_t, x^B_t)$ of $100$ point
two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$, two assets: Asset A and Asset B. The value of these points depends on random prices $p_t=(p_{t}^A, p_t^B)$,
which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$, which determine the monetary return per allocated point. For example, if $p_t^A= 0.8$ and $p_t^B = 0.8$,
each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$. each point assigned to Asset A is worth $\$0.8$, while each point allocated to Asset B yields $\$0.5$.
T he game is played $25$ times to assess the consistency of the investor’s decisions. The game is played $25$ times to assess the consistency of the investor’s decisions.
To evaluate the rationality of the decisions, we use Afriat's To evaluate the rationality of the decisions, we use Afriat's
critical cost efficiency index (CCEI), i.e. a widely used measure in critical cost efficiency index (CCEI), i.e. a widely used measure in
...@@ -44,7 +44,11 @@ observation, the adjusted decisions satisfy the rationality constraint: $p_t ...@@ -44,7 +44,11 @@ observation, the adjusted decisions satisfy the rationality constraint: $p_t
multiplying it by $\lambda$, the choices will become consistent with rational multiplying it by $\lambda$, the choices will become consistent with rational
decision-making. A CCEI close to 1 indicates high rationality and consistency decision-making. A CCEI close to 1 indicates high rationality and consistency
with economic theory. A low CCEEI suggests irrational or inconsistent with economic theory. A low CCEEI suggests irrational or inconsistent
decision-making. decision-making. n their 2007 study on portfolio choices, Choi et al. found
that participants exhibited a high degree of rationality, with average CCEI values
around 0.95:
Choi, S., Fisman, R., Gale, D., & Kariv, S. (2007).
*Consistency and heterogeneity of individual behavior under uncertainty*. American Economic Review, 97(5), 1921–1938.
To ensure response consistency, each model undergoes $30$ iterations of the game To ensure response consistency, each model undergoes $30$ iterations of the game
with a fixed temperature of $0.0$. The results shown in with a fixed temperature of $0.0$. The results shown in
...@@ -81,8 +85,13 @@ the form of an algorithm implemented in the <tt>Python</tt> language. In all our ...@@ -81,8 +85,13 @@ the form of an algorithm implemented in the <tt>Python</tt> language. In all our
experiments, one-shot actions are repeated 30 times, and the models' temperature experiments, one-shot actions are repeated 30 times, and the models' temperature
is set to $0.7$. is set to $0.7$.
Newt Figure presents a violin plot illustrating the share of the Figure below presents a violin plot illustrating the share of the
total amount (\$100) that the dictator allocates to themselves for each model. total amount (\$100) that the dictator allocates to themselves for each model.
Notably, human participants under similar conditions typically keep around $80 on average :
Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. (1994).
*Fairness in simple bargaining experiments*. **Games and Economic Behavior, 6**(3), 347–369.
[https://doi.org/10.1006/game.1994.1021](https://doi.org/10.1006/game.1994.1021)
The median share taken by <tt>GPT-4.5</tt>, <tt>Llama3</tt>, The median share taken by <tt>GPT-4.5</tt>, <tt>Llama3</tt>,
<tt>Mistral-Small</tt>, <tt>DeepSeek-R1</tt> and <tt>Qwen3</tt> through one-shot decisions is <tt>Mistral-Small</tt>, <tt>DeepSeek-R1</tt> and <tt>Qwen3</tt> through one-shot decisions is
\$50, likely due to a corpus-based biases like term frequency. \$50, likely due to a corpus-based biases like term frequency.
...@@ -173,6 +182,40 @@ preferences but have more difficulty generating individual actions than ...@@ -173,6 +182,40 @@ preferences but have more difficulty generating individual actions than
algorithmic strategies. In contrast, `DeepSeek-R1` does not generate algorithmic strategies. In contrast, `DeepSeek-R1` does not generate
valid strategies and performs poorly when generating specific actions. valid strategies and performs poorly when generating specific actions.
## Social preference
To analyze the behavior of generative agents based on their preferences under strategic interaction, we rely on the
ultimatum game. In this game, the proposer (analogous to the dictator) is tasked with deciding how to divide an
endowment (e.g., a sum of money) between themselves and a second player, the responder. However,
unlike in the dictator game, the responder plays an active role: they can either accept or reject
the proposed allocation. If the offer is rejected, both players receive nothing.
First, we evaluate the choices made by LLMs when playing the role of the proposer, interpreting these decisions as a
reflection of their implicit social norms or strategic preferences, especially when anticipating potential
rejection by the responder.
Here, we consider that the choice of an LLM as a proposer reflects its intrinsic
social preferences. Each LLM is asked to directly produce a one-shot action in the
dictator game. Additionally, we also asked the models to generate a strategy in
the form of an algorithm implemented in the <tt>Python</tt> language.
The figure below presents a violin plot illustrating the share of the total amount (\$100) that the proposer
allocates to themselves for each model. The share selected by strategies generated by <tt>Llama3</tt>,
<tt>Mistral-Small</tt>, and <tt>Qwen3</tt> aligns with the median share chosen by actions generated by the models
<tt>Mistral-Small</tt>, <tt>Mixtral:8x7B</tt>, and <tt>DeepSeek-R1:7B</tt>, around \$50 —
likely reflecting corpus-based biases, such as term frequency.
The share selected by strategies generated by <tt>Llama3.3</tt> and <tt>DeepSeek-R1:7B</tt> resembles
the median share in the actions generated by <tt>GPT-4.5</tt> and <tt>Llama3</tt>, around $60,
which is consistent with what human participants typically choose under similar conditions.
While the shares selected by strategies from <tt>GPT-4.5</tt> and <tt>Mixtral:8x7B</tt> are respectively
overestimated and underestimated, the actions generated by <tt>DeepSeek-R1:7B</tt> and <tt>Qwen3</tt>
can be considered irrational.
![Violin Plot of My Share for Each Model](figures/ultimatum/proposer_violin.svg)
## Strategic Rationality ## Strategic Rationality
An autonomous agent act strategically, considering not only its own preferences An autonomous agent act strategically, considering not only its own preferences
......
This diff is collapsed.
This diff is collapsed.
...@@ -17,11 +17,11 @@ keyring==25.6.0 ...@@ -17,11 +17,11 @@ keyring==25.6.0
lockfile==0.12.2 lockfile==0.12.2
matplotlib==3.10.1 matplotlib==3.10.1
mock==5.1.0 mock==5.1.0
numpy==2.2.3 numpy~=2.2.4
pandas==2.2.3 pandas==2.2.3
Pillow==11.1.0 Pillow==11.1.0
protobuf==5.29.3 protobuf~=5.29.4
pydantic==2.10.6 pydantic~=2.11.1
pyOpenSSL==25.0.0 pyOpenSSL==25.0.0
railroad==0.5.0 railroad==0.5.0
scipy==1.15.2 scipy==1.15.2
...@@ -32,3 +32,8 @@ tornado==6.4.2 ...@@ -32,3 +32,8 @@ tornado==6.4.2
truststore==0.10.1 truststore==0.10.1
urllib3_secure_extra==0.1.0 urllib3_secure_extra==0.1.0
xmlrpclib==1.0.1 xmlrpclib==1.0.1
requests~=2.32.3
httpx~=0.28.1
pip~=25.0.1
distro~=1.9.0
\ No newline at end of file
# Maxime MORGE <maxime.morge@univ-lyon1.fr>
import os import os
import asyncio import asyncio
import json import json
import logging
import requests import requests
from typing import Dict from typing import Dict
from pydantic import BaseModel from pydantic import BaseModel
......
import asyncio import asyncio
import statistics
import pandas as pd
from dictator import Dictator from dictator import Dictator
class DictatorExperiment: class DictatorExperiment:
......
import os
import asyncio
import json
import requests
from typing import Dict
from pydantic import BaseModel
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
import re
# Load API keys from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
raise ValueError("Missing OPENAI_API_KEY. Set it as an environment variable.")
PAGODA_API_KEY = os.getenv("PAGODA_API_KEY")
if not PAGODA_API_KEY:
raise ValueError("Missing PAGODA_API_KEY. Set it as an environment variable.")
# Define the expected response format as a Pydantic model
class AgentResponse(BaseModel):
my_share: int
other_share: int
reasoning: str
# The ultimatum game simulation class
class Proposer:
def __init__(self, amount: int, model: str, temperature: float, strategy=False, max_retries: int = 3):
self.debug = False
self.amount = amount
self.model = model
self.temperature = temperature
self.strategy = strategy
self.max_retries = max_retries
is_openai_model = model.startswith("gpt")
is_pagoda_model = ":" in model
base_url = (
"https://api.openai.com/v1" if is_openai_model else
"https://ollama-ui.pagoda.liris.cnrs.fr/ollama/api/generate" if is_pagoda_model else
"http://localhost:11434/v1"
)
self.base_url = base_url
if self.debug:
print(f"Using model: {model}")
print(f"Using base URL: {base_url}")
key = OPENAI_API_KEY if is_openai_model else PAGODA_API_KEY
model_info = {
"temperature": self.temperature,
"function_calling": True,
"parallel_tool_calls": False,
"family": "unknown",
"json_output": True,
"vision": False
}
self.model_client = OpenAIChatCompletionClient(
model=self.model,
base_url=base_url,
api_key=key,
model_info=model_info,
response_format=AgentResponse
)
async def run(self) -> Dict:
"""Runs the model if strategy is False, otherwise uses a classical method."""
if self.strategy:
return self.apply_strategy()
instruction = f"""
You participate in an economics experiment. In this experiment each of you will
be paired with a different person. You will not be told who these people are
either during or after the experiment, and they will not be told who you are
either during or after the experiment.
The experiment is conducted as follows: A sum of ${self.amount} has been provisionally
allocated to each pair and the person A can propose how much of this each person
is to receive. A proposal form will then be sent to the person B.
The person B will then be given a chance to accept or reject the proposal. If
the person B accepts the proposal, then the amount of money will be divided as
specified in the proposal. If the person B rejects the proposal, then both
people in the pair receive zero. A response form will be sent to the person A.
You are person A.
Return the response in JSON format with `my_share`, `other_share`, and `reasoning`.
"""
if self.debug:
print(instruction)
is_pagoda_model = ":" in self.model
if is_pagoda_model:
return await self.run_pagoda(instruction)
for attempt in range(self.max_retries):
agent = AssistantAgent(
name="Proposer",
model_client=self.model_client,
system_message="You are a helpful assistant."
)
response = await agent.on_messages(
[TextMessage(content=instruction, source="user")],
cancellation_token=CancellationToken(),
)
try:
# Correct: get the content from the chat message
raw_text = response.chat_message.content
# Debug: show the raw content
print(f"Raw content (Attempt {attempt + 1}): {raw_text}")
# Try to load JSON directly
try:
response_json = json.loads(raw_text)
except json.JSONDecodeError:
# If it's wrapped in ```json ... ```, extract it
match = re.search(r'```json\s*(.*?)\s*```', raw_text, re.DOTALL)
if match:
response_json = json.loads(match.group(1))
else:
print(f"Could not parse JSON from response (Attempt {attempt + 1})")
continue
agent_response = AgentResponse(**response_json)
my_share, other_share = agent_response.my_share, agent_response.other_share
# Validate shares
if 0 <= my_share <= self.amount and 0 <= other_share <= self.amount and my_share + other_share <= self.amount:
return agent_response
else:
print(f"Invalid values in response (Attempt {attempt + 1}): {response_json}")
except Exception as e:
print(f"Error in OpenAI response handling (Attempt {attempt + 1}): {e}")
raise ValueError("Model failed to provide a valid response after multiple attempts.")
async def run_pagoda(self, instruction) -> Dict:
"""Runs the Pagoda model using a direct request."""
url = self.base_url
headers = {
"Authorization": f"Bearer {PAGODA_API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": self.model,
"temperature": self.temperature,
"prompt": instruction,
"stream": False,
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "AgentResponse",
"strict": True,
"schema": {
"title": "AgentResponse",
"type": "object",
"properties": {
"my_share": {
"title": "My Share",
"type": "integer"
},
"other_share": {
"title": "Other Share",
"type": "integer"
},
"reasoning": {
"title": "Reasoning",
"type": "string"
}
},
"required": ["my_share", "other_share", "reasoning"],
"additionalProperties": False
}
}
}
}
for attempt in range(self.max_retries):
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
# Get the JSON response
response_data = response.json()
# Debug: print the raw response to check if fields are missing or named differently
if self.debug:
print(f"Raw response (Attempt {attempt+1}): {response_data}")
# The response field should be parsed correctly if it's already valid JSON
response_json = response_data.get('response', '')
# If the response is a string containing JSON, we need to extract and parse it
if isinstance(response_json, str):
# Try to parse the response as JSON
try:
response_dict = json.loads(response_json)
except json.JSONDecodeError:
# If the response is not valid JSON, apply regex to extract the JSON portion
match = re.search(r"```json(.*?)```", response_json, re.DOTALL)
if match:
response_dict = json.loads(match.group(1))
else:
print(f"Invalid response format detected (Attempt {attempt + 1}): {response_json}")
continue
elif isinstance(response_json, dict):
# If response_json is already a dictionary, just use it
response_dict = response_json
else:
print(f"Unexpected format in 'response' field (Attempt {attempt + 1}): {response_json}")
continue
# Validate the response structure
agent_response = AgentResponse(**response_dict)
my_share, other_share = agent_response.my_share, agent_response.other_share
# Validate that the values are within expected bounds
if 0 <= my_share <= self.amount and 0 <= other_share <= self.amount and my_share + other_share <= self.amount:
return agent_response.dict()
else:
print(f"Invalid response detected (Attempt {attempt + 1}): {response_dict}")
except Exception as e:
print(f"Error in Pagoda request (Attempt {attempt + 1}): {e}")
raise ValueError("Pagoda model failed to provide a valid response after multiple attempts.")
def apply_strategy(self) -> Dict:
"""Generates a response based on predefined strategies."""
if self.model == "gpt-4.5-preview-2025-02-27":
my_share = int(self.amount * 0.7)
other_share = self.amount - my_share
reasoning = (
f"I offered ${my_share} to myself and ${other_share} to the other player. "
"This 70-30 split is slightly in my favor but still leaves a meaningful amount "
"for the other player, increasing the chance of acceptance while maximizing my gain."
)
return {"my_share": my_share, "other_share": other_share, "reasoning": reasoning}
if self.model in ["mixtral:8x7b"]:
if self.amount <= 10:
my_share= round(self.amount * 0.7),
other_share= round(self.amount * 0.3),
reasoning= f'Since the amount ({self.amount}) is small, I will take a larger share of {round(self.amount * 0.7)} and give the other party a smaller share of {round(self.amount * 0.3)}.',
else:
my_share= round(self.amount * 0.4),
other_share= round(self.amount * 0.6),
reasoning= f'Since the amount ({self.amount}) is large, I will take a smaller share of {round(self.amount * 0.4)} and give the other party a larger share of {round(self.amount * 0.6)}.',
agent_response = AgentResponse(my_share=my_share, other_share=other_share, reasoning=reasoning)
return agent_response.dict()
if self.model in ["llama3.3:latest", "deepseek-r1:7b"]:
my_share = int(0.6 * self.amount)
other_share = self.amount - my_share
reasoning = "I chose this split because it seems like a fair and reasonable division."
agent_response = AgentResponse(my_share=my_share, other_share=other_share, reasoning=reasoning)
return agent_response.dict()
if self.model in ["mistral_small","qwen3", "llama3"]:
half_amount = self.amount // 2
return {
"my_share": half_amount,
"other_share": half_amount,
"reasoning": "Split equally between both players."
}
if self.model in ["deepseek-r1"]:
return None
return None
return None
# Run the async function and return the response
if __name__ == "__main__":
agent = Proposer(amount=100, model="llama3", temperature=0.7, strategy=False) # "llama3.3:latest", "mixtral:8x7b"
response_json = asyncio.run(agent.run())
print(response_json)
\ No newline at end of file
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Definition of the color palette
color_palette = {
'random': '#333333', # Black
'gpt-4.5-preview-2025-02-27': '#7abaff', # BlueEscape
'llama3': '#32a68c', # GreenFuture
'llama3.3:latest': '#4b9f7d', # GreenLlama3.3
'mistral-small': '#ff6941', # WarmOrange
'mixtral:8x7b': '#f1a61a', # YellowMixtral
'deepseek-r1': '#5862ed', # InclusiveIndigo
'deepseek-r1:7b': '#9a7bff', # PurpleDeepseek-r1:7b
'qwen3': '#c02942'
}
# Load the data
data = pd.read_csv("../../data/ultimatum/proposer.csv") # Replace with the correct path to your CSV file
# Specify the order of models for the x-axis
model_order = [
'gpt-4.5-preview-2025-02-27',
'llama3', 'llama3.3:latest', # Place llama3 and llama3.3:latest together
'mistral-small', 'mixtral:8x7b', # Bring mistral-small and mixtral:8x7b closer
'deepseek-r1', 'deepseek-r1:7b',
'qwen3'
]
# Create the violin plot
plt.figure(figsize=(12, 6))
sns.violinplot(
data=data,
x="model",
y="my_share",
hue="model", # Use hue to manage the colors
palette=color_palette,
inner="quartile", # Displays quartiles inside the violin
density_norm="width", # Normalizes the width of the violins for comparison
order=model_order # Explicitly set the order of the models on the x-axis
)
# Add vertical lines for strategies
strategy_values = {
'gpt-4.5-preview-2025-02-27': 70,
'llama3': 50,
'llama3.3:latest': 60,
'mistral-small': 50,
'mixtral:8x7b': 40,
'deepseek-r1:7b': 60,
'qwen3': 50
}
for model, value in strategy_values.items():
plt.axhline(y=value, color=color_palette[model], linestyle="dashed", linewidth=2, label=f"{model} strategy")
# Ajouter les valeurs médianes comme annotations sur le graphique
for model in model_order:
mediane = data[data['model'] == model]['my_share'].median()
plt.text(model_order.index(model), mediane, f'{mediane:.1f}',
horizontalalignment='center', verticalalignment='bottom')
# Set the y-axis limits between 0 and 100
plt.ylim(0, 100)
# Labels and title
plt.xlabel("Model")
plt.ylabel("Share of money assigned to oneself")
plt.title("Distribution of personal share by model in the ultimatum game")
plt.legend()
# Save and display the plot
plt.savefig("../../figures/ultimatum/proposer_violin.svg", format="svg")
\ No newline at end of file
import asyncio
from proposer import Proposer
class UltimatumExperiment:
debug = True
def __init__(self, models: list[str], temperature: float, amount: int, iterations: int, output_file: str):
self.models = models
self.temperature = temperature
self.amount = amount
self.iterations = iterations
self.output_file = output_file
with open(self.output_file, 'w', encoding='utf-8') as f:
f.write("iteration,model,temperature,amount,my_share,other_share,reasoning\n")
async def run_experiment(self):
for model in self.models:
if self.debug:
print(f"Running experiment for model: {model}")
for iteration in range(1, self.iterations + 1):
game_agent = Proposer(amount=self.amount, model=model, temperature=self.temperature)
response = await game_agent.run()
if self.debug:
print(response)
# Utilisation de la notation point pour accéder aux attributs
my_share = response.my_share
other_share = response.other_share
reasoning = response.reasoning.replace('"', '""')
with open(self.output_file, 'a', encoding='utf-8') as f:
f.write(f'{iteration},{model},{self.temperature},{self.amount},{my_share},{other_share},"{reasoning}"\n')
if __name__ == "__main__":
models = ["qwen3"]
# # "gpt-4.5-preview-2025-02-27" "llama3", "mistral-small", "deepseek-r1", "qwen3", "mixtral:8x7b", "llama3.3:latest", "deepseek-r1:7b"
temperature = 0.7
amount = 100
iterations = 30
output_file = '../../data/ultimatum/proposer.csv'
experiment = UltimatumExperiment(models=models, temperature=temperature, amount=amount, iterations=iterations, output_file=output_file)
asyncio.run(experiment.run_experiment())
print(f"Experiment results saved to {output_file}")
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment