Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyGAAMAS
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Maxime Morge
PyGAAMAS
Commits
11bb9b55
Commit
11bb9b55
authored
1 month ago
by
Maxime MORGE
Browse files
Options
Downloads
Patches
Plain Diff
Test strategy for dictator game with preference alignment
parent
f99d5280
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.md
+32
-19
32 additions, 19 deletions
README.md
src/dictator/dictator_setup.py
+78
-28
78 additions, 28 deletions
src/dictator/dictator_setup.py
with
110 additions
and
47 deletions
README.md
+
32
−
19
View file @
11bb9b55
...
@@ -57,27 +57,40 @@ We consider 4 allocation options where money can be lost in the division, each c
...
@@ -57,27 +57,40 @@ We consider 4 allocation options where money can be lost in the division, each c
3.
The dictator keeps 400, the other player receives 300, resulting in a 300 loss (utilitarian)
3.
The dictator keeps 400, the other player receives 300, resulting in a 300 loss (utilitarian)
4.
The dictator keeps 325, the other player also receives 325, and 350 is lost in the division (egalitarian)
4.
The dictator keeps 325, the other player also receives 325, and 350 is lost in the division (egalitarian)
The following table shows the accuracy of the dictator's decision for each model and preference.
The following table presents the accuracy of the dictator's decision for each model and preference,
The temperature is fixed at 0.7, and each experiment was conducted 30 times.
regardless of whether the models were prompted to generate a strategy or specific actions.
The temperature is set to 0.7, and each experiment involving action generation was repeated 30 times.
| Model | SELFISH | ALTRUISTIC | UTILITARIAN | EGALITARIAN |
|-----------------|-----------|--------------|---------------|----------------|
|
*Model*
|
*Generation*
|
*SELFISH*
|
*ALTRUISTIC*
|
*UTILITARIAN*
|
*EGALITARIAN*
|
| gpt-4.5 | 1.0 | 1.0 | 0.5 | 1.0 |
|-----------------| ------------- | ------------- | -------------- | ---------------- | ---------------- |
| llama3 | 1.0 | 0.9 | 0.4 | 0.73 |
|
*gpt-4.5*
|
*strategy*
| 1.00 | 1.00 | 1.00 | 1.00 |
| mistral-small | 0.4 | 0.93 | 0.76 | 0.16 |
|
*llama3*
|
*actions*
| 1.00 | 1.00 | 1.00 | 1.00 |
| deepseek-r1 | 0.06 | 0.2 | 0.76 | 0.03 |
|
*mistral-small*
|
*actions*
| 1.00 | 1.00 | 1.00 | 1.00 |
|
*deepseek-r1 | *
actions
*
| - | - | - | - |
Bad decisions can be explained either by arithmetic errors (e.g., it is not the case that 500 + 100 > 400 + 300)
|-----------------|---------------|---------------|----------------|------------------|------------------|
or by misinterpretations of preferences (e.g., ‘I’m choosing to prioritize the common interest by keeping a
|
*gpt-4.5*
|
*actions*
| 1.00 | 1.00 | 0.50 | 1.00 |
|
*llama3*
|
*actions*
| 1.00 | 0.90 | 0.40 | 0.73 |
|
*mistral-small*
|
*actions*
| 0.40 | 0.93 | 0.76 | 0.16 |
|
*deepseek-r1 | *
actions
*
| 0.06 | 0.20 | 0.76 | 0.03 |
This table helps assess the models’ ability to align with different preferences.
When models are explicitly prompted to generate strategies,
they exhibit perfect alignment with the predefined preferences except for DeepSeek-R1,
which does not generate valid code.
When models are prompted to generate actions, GPT-4.5 consistently aligns well across all preferences
but struggles with utilitarianism when generating actions.
Llama3 performs well for selfish and altruistic preferences but shows weaker alignment f
or utilitarian and egalitarian choices.
Mistral-small aligns best with altruistic preferences and maintains moderate performance on utilitarianism,
but struggles with selfish and egalitarian preferences.
Deepseek-r1 performs best for utilitarianism but has poor accuracy in other categories.
Bad action selections can be explained either by arithmetic errors (e.g., it is not the case that 500 + 100 > 400 + 300)
or by misinterpretations of preferences (e.g., ‘I’m choosing to prioritize the common interest by keeping a
relatively equal split with the other player’).
relatively equal split with the other player’).
This table can be used to evaluate the models based on their ability to align with different preferences.
GPT-4.5 exhibits strong alignment across all preferences except for utilitarianism, where its performance is moderate.
Llama3 demonstrates a strong ability to align with selfish and altruistic preferences, with moderate alignment
for egalitarian preferences and lower alignment for utilitarian preferences.
Mistral-small shows the best alignment with altruistic preferences, while maintaining a more balanced
performance across the other preferences. Deepseek-r1 is most capable of aligning with utilitarian preferences,
but performs poorly in aligning with other preferences.
## Ring-network game
## Ring-network game
...
...
This diff is collapsed.
Click to expand it.
src/dictator/dictator_setup.py
+
78
−
28
View file @
11bb9b55
...
@@ -43,34 +43,38 @@ class AgentResponse(BaseModel):
...
@@ -43,34 +43,38 @@ class AgentResponse(BaseModel):
class
DictatorSetup
:
class
DictatorSetup
:
debug
=
False
debug
=
False
def
__init__
(
self
,
model
:
str
,
temperature
:
float
,
preference
:
Welfare
):
def
__init__
(
self
,
model
:
str
,
temperature
:
float
,
preference
:
Welfare
,
strategy
=
False
):
self
.
model
=
model
self
.
model
=
model
self
.
temperature
=
temperature
self
.
temperature
=
temperature
self
.
preference
=
preference
self
.
preference
=
preference
# Determine if using OpenAI model (like gpt-4)
self
.
strategy
=
strategy
is_openai_model
=
model
.
startswith
(
"
gpt
"
)
base_url
=
"
https://api.openai.com/v1
"
if
is_openai_model
else
"
http://localhost:11434/v1
"
if
not
self
.
strategy
:
# Define model information
is_openai_model
=
model
.
startswith
(
"
gpt
"
)
self
.
model_info
=
{
base_url
=
"
https://api.openai.com/v1
"
if
is_openai_model
else
"
http://localhost:11434/v1
"
"
temperature
"
:
self
.
temperature
,
"
function_calling
"
:
True
,
"
parallel_tool_calls
"
:
True
,
"
family
"
:
"
unknown
"
,
"
json_output
"
:
True
,
# Add this line to ensure JSON output is supported
"
vision
"
:
False
}
# Create the model client
self
.
model_client
=
OpenAIChatCompletionClient
(
model
=
self
.
model
,
base_url
=
base_url
,
api_key
=
OPENAI_API_KEY
,
model_info
=
self
.
model_info
,
response_format
=
AgentResponse
# Only set for OpenAI models that support it
)
self
.
model_info
=
{
"
temperature
"
:
self
.
temperature
,
"
function_calling
"
:
True
,
"
parallel_tool_calls
"
:
True
,
"
family
"
:
"
unknown
"
,
"
json_output
"
:
True
,
"
vision
"
:
False
}
self
.
model_client
=
OpenAIChatCompletionClient
(
model
=
self
.
model
,
base_url
=
base_url
,
api_key
=
OPENAI_API_KEY
,
model_info
=
self
.
model_info
,
response_format
=
AgentResponse
)
async
def
run
(
self
)
->
Dict
:
async
def
run
(
self
)
->
Dict
:
"""
Runs the dictator game and returns a detailed response dictionary.
"""
"""
Runs the dictator game and returns a detailed response dictionary.
"""
if
self
.
strategy
:
return
self
.
apply_strategy
(
self
.
model
)
instruction
=
f
"""
instruction
=
f
"""
Hi, let’s play a game. You are paired with another player.
Hi, let’s play a game. You are paired with another player.
Your role is to decide how to divide $1000, and the other player simply receives your choice.
Your role is to decide how to divide $1000, and the other player simply receives your choice.
...
@@ -96,14 +100,14 @@ class DictatorSetup:
...
@@ -96,14 +100,14 @@ class DictatorSetup:
[
TextMessage
(
content
=
instruction
,
source
=
"
user
"
)],
[
TextMessage
(
content
=
instruction
,
source
=
"
user
"
)],
cancellation_token
=
CancellationToken
(),
cancellation_token
=
CancellationToken
(),
)
)
# Extract JSON content from response
response_data
=
response
.
chat_message
.
content
response_data
=
response
.
chat_message
.
content
if
self
.
debug
:
if
self
.
debug
:
print
(
f
"
Raw Response:
{
response_data
}
"
)
print
(
f
"
Raw Response:
{
response_data
}
"
)
# Convert JSON string to dictionary before validation
response_dict
=
json
.
loads
(
response_data
)
response_dict
=
json
.
loads
(
response_data
)
agent_response
=
AgentResponse
.
model_validate
(
response_dict
)
agent_response
=
AgentResponse
.
model_validate
(
response_dict
)
# Check if response is consistent with the specified preference
is_consistent
=
self
.
check_consistency
(
agent_response
)
is_consistent
=
self
.
check_consistency
(
agent_response
)
return
{
return
{
"
is_consistent
"
:
is_consistent
,
"
is_consistent
"
:
is_consistent
,
...
@@ -113,6 +117,53 @@ class DictatorSetup:
...
@@ -113,6 +117,53 @@ class DictatorSetup:
"
motivations
"
:
agent_response
.
motivation
"
motivations
"
:
agent_response
.
motivation
}
}
def
apply_strategy
(
self
,
model
:
str
)
->
Dict
:
"""
Applies a predefined strategy based on the preference.
"""
if
model
==
"
gpt-4.5-preview-2025-02-27
"
:
strategy_mapping
=
{
Welfare
.
SELFISH
:
{
"
my_share
"
:
500
,
"
other_share
"
:
100
,
"
lost
"
:
400
,
"
motivations
"
:
"
Maximizing self-interest
"
},
Welfare
.
ALTRUISTIC
:
{
"
my_share
"
:
100
,
"
other_share
"
:
500
,
"
lost
"
:
400
,
"
motivations
"
:
"
Helping others at a personal cost
"
},
Welfare
.
UTILITARIAN
:
{
"
my_share
"
:
400
,
"
other_share
"
:
300
,
"
lost
"
:
300
,
"
motivations
"
:
"
Maximizing total utility
"
},
Welfare
.
EGALITARIAN
:
{
"
my_share
"
:
325
,
"
other_share
"
:
325
,
"
lost
"
:
350
,
"
motivations
"
:
"
Ensuring fairness
"
}
}
return
strategy_mapping
.
get
(
self
.
preference
,
{
"
error
"
:
"
Preference strategy not defined
"
})
if
model
==
"
llama3
"
:
strategy_map
=
{
Welfare
.
SELFISH
:
(
500
,
100
,
400
),
Welfare
.
ALTRUISTIC
:
(
100
,
500
,
400
),
Welfare
.
UTILITARIAN
:
(
400
,
300
,
300
),
Welfare
.
EGALITARIAN
:
(
325
,
325
,
350
)
}
if
self
.
preference
in
strategy_map
:
my_share
=
strategy_map
[
self
.
preference
][
0
]
other_share
=
strategy_map
[
self
.
preference
][
1
]
lost
=
strategy_map
[
self
.
preference
][
2
]
return
{
"
my_share
"
:
my_share
,
"
other_share
"
:
other_share
,
"
lost
"
:
lost
,
"
is_consistent
"
:
True
,
}
else
:
raise
ValueError
(
"
Invalid preference type
"
)
return
{
"
error
"
:
"
Preference strategy not defined
"
}
if
model
==
"
mistral-small
"
:
valid_choices
=
{
Welfare
.
SELFISH
:
{
"
my_share
"
:
500
,
"
other_share
"
:
100
,
"
lost
"
:
400
},
Welfare
.
ALTRUISTIC
:
{
"
my_share
"
:
100
,
"
other_share
"
:
500
,
"
lost
"
:
400
},
Welfare
.
UTILITARIAN
:
{
"
my_share
"
:
400
,
"
other_share
"
:
300
,
"
lost
"
:
300
},
Welfare
.
EGALITARIAN
:
{
"
my_share
"
:
325
,
"
other_share
"
:
325
,
"
lost
"
:
350
},
}
strategy
=
valid_choices
.
get
(
self
.
preference
)
if
not
strategy
:
raise
ValueError
(
f
"
Unknown preference type
{
self
.
preference
}
"
)
return
{
"
is_consistent
"
:
True
,
**
strategy
}
if
model
==
"
deepseek-r1
"
:
return
{
"
error
"
:
"
Preference strategy not defined
"
}
def
check_consistency
(
self
,
agent_response
:
AgentResponse
)
->
bool
:
def
check_consistency
(
self
,
agent_response
:
AgentResponse
)
->
bool
:
"""
Check if the response aligns with the given preference.
"""
"""
Check if the response aligns with the given preference.
"""
...
@@ -135,8 +186,7 @@ class DictatorSetup:
...
@@ -135,8 +186,7 @@ class DictatorSetup:
# Run the async function and return the response
# Run the async function and return the response
if
__name__
==
"
__main__
"
:
if
__name__
==
"
__main__
"
:
# Example usage: setting preference to 'Egalitarian'
preference
=
Welfare
.
EGALITARIAN
preference
=
Welfare
.
EGALITARIAN
# Can be Selfish, Altruistic, etc.
game_agent
=
DictatorSetup
(
model
=
"
llama3
"
,
temperature
=
0.7
,
preference
=
preference
,
strategy
=
True
)
game_agent
=
DictatorSetup
(
model
=
"
gpt-4.5-preview-2025-02-27
"
,
temperature
=
0.7
,
preference
=
preference
)
# or llama3, mistral-small, deepseek-r1
response
=
asyncio
.
run
(
game_agent
.
run
())
response
=
asyncio
.
run
(
game_agent
.
run
())
print
(
response
)
# Prints detailed response with is_consistent, my_share, other_share, lost, motivations
print
(
response
)
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment