Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyGAAMAS
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Maxime Morge
PyGAAMAS
Commits
7e502dab
Commit
7e502dab
authored
1 month ago
by
Maxime Morge
Browse files
Options
Downloads
Patches
Plain Diff
Improve Preference Elicitation
parent
ed1b0be9
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+19
-33
19 additions, 33 deletions
README.md
with
19 additions
and
33 deletions
README.md
+
19
−
33
View file @
7e502dab
...
...
@@ -60,10 +60,7 @@ inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.

## Preferences
To analyse the behaviour of generative agents based on their preferences, we
rely on the dictator game. This variant of the ultimatum game features a single
player, the dictator, who decides how to distribute an endowment (e.g., a sum of
...
...
@@ -79,47 +76,36 @@ preferences to assess their ability to consider them in their decisions.
### Preference Elicitation
Here, we consider that the choice of an LLM as a dictator reflects its intrinsic
preferences. Each LLM
wa
s asked to directly produce a one-shot action in the
preferences. Each LLM
i
s asked to directly produce a one-shot action in the
dictator game. Additionally, we also asked the models to generate a strategy in
the form of an algorithm implemented in the Python language. In all our
the form of an algorithm implemented in the
<tt>
Python
</tt>
language. In all our
experiments, one-shot actions are repeated 30 times, and the models' temperature
is set to 0.7
Figure below presents a violin plot illustrating the share of the
total amount (100) that the dictator allocates to themselves for each model.
The median share taken by GPT-4.5, Llama3, Mistral-Small, and DeepSeek-R1
through one-shot decisions is 50.
is set to $0.7$.
Newt Figure presents a violin plot illustrating the share of the
total amount (
\$
100) that the dictator allocates to themselves for each model.
The median share taken by
<tt>
GPT-4.5
</tt>
,
<tt>
Llama3
</tt>
,
<tt>
Mistral-Small
</tt>
, and
<tt>
DeepSeek-R1
</tt>
through one-shot decisions is
\$
50, likely due to a corpus-based biases like term frequency. When we ask the
models to generate a strategy rather than a one-shot action, all models
distribute the amount equally, except
<tt>
GPT-4.5
</tt>
, which retains about
$70
\%
$ of the total amount. Interestingly, under these standard conditions,
humans typically keep
\$
80 on average. When the role
assigned to the model is that of a human rather than an assistant agent, only
Llama3 deviates with a median share of
\$
60. Unlike the deterministic strategies
generated by LLMs, the intra-model variability in generated actions can be used
to simulate the diversity of human behaviours based on their experiences,
preferences, or contexts.

When we ask the models to generate a strategy rather than a one-shot action, all
models distribute the amount equally, except GPT-4.5, which retains
about 70 % of the total amount. Interestingly, under these standard
conditions, humans typically keep 80 on average.
*[Fairness in Simple Bargaining Experiments](https://doi.org/10.1006/game.1994.1021)*
Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M.
Games and Economic Behavior, 6(3), 347-369. 1994.
When the role assigned to the model is that of a human rather than an assistant agent,
only Llama3 deviates with a median share of $60.
Unlike the deterministic strategies generated by LLMs, the intra-model variability in
generated actions can be used to simulate the diversity of human behaviours based
on their experiences, preferences, or contexts.
Figure below illustrates the evolution of the dictator's share
as a function of temperature with a 95 % confidence interval when we ask each
models to generate decisions.

Our sensitivity analysis of the temperature parameter reveals that the portion
retained by the dictator remains stable. However, the decisions become more
deterministic at low temperatures, whereas allocation diversity increases at
high temperatures, reflecting a more random exploration of available options.

### Preference alignment
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment