diff --git a/README.md b/README.md index 14ea787609fb9b90c0c8ed58e9a0717bbc06f51c..fc2ce228d5db7784da84299ce19cbacc14171d76 100644 --- a/README.md +++ b/README.md @@ -60,10 +60,7 @@ inconsistent behavior, with CCEI scores varying widely between 0.15 and 0.83.  - - ## Preferences - To analyse the behaviour of generative agents based on their preferences, we rely on the dictator game. This variant of the ultimatum game features a single player, the dictator, who decides how to distribute an endowment (e.g., a sum of @@ -79,47 +76,36 @@ preferences to assess their ability to consider them in their decisions. ### Preference Elicitation Here, we consider that the choice of an LLM as a dictator reflects its intrinsic -preferences. Each LLM was asked to directly produce a one-shot action in the +preferences. Each LLM is asked to directly produce a one-shot action in the dictator game. Additionally, we also asked the models to generate a strategy in -the form of an algorithm implemented in the Python language. In all our +the form of an algorithm implemented in the <tt>Python</tt> language. In all our experiments, one-shot actions are repeated 30 times, and the models' temperature -is set to 0.7 - -Figure below presents a violin plot illustrating the share of the -total amount (100) that the dictator allocates to themselves for each model. -The median share taken by GPT-4.5, Llama3, Mistral-Small, and DeepSeek-R1 -through one-shot decisions is 50. +is set to $0.7$. + +Newt Figure presents a violin plot illustrating the share of the +total amount (\$100) that the dictator allocates to themselves for each model. +The median share taken by <tt>GPT-4.5</tt>, <tt>Llama3</tt>, +<tt>Mistral-Small</tt>, and <tt>DeepSeek-R1</tt> through one-shot decisions is +\$50, likely due to a corpus-based biases like term frequency. When we ask the +models to generate a strategy rather than a one-shot action, all models +distribute the amount equally, except <tt>GPT-4.5</tt>, which retains about +$70\%$ of the total amount. Interestingly, under these standard conditions, +humans typically keep \$80 on average. When the role +assigned to the model is that of a human rather than an assistant agent, only +Llama3 deviates with a median share of \$60. Unlike the deterministic strategies +generated by LLMs, the intra-model variability in generated actions can be used +to simulate the diversity of human behaviours based on their experiences, +preferences, or contexts.  -When we ask the models to generate a strategy rather than a one-shot action, all -models distribute the amount equally, except GPT-4.5, which retains -about 70 % of the total amount. Interestingly, under these standard -conditions, humans typically keep 80 on average. - -*[Fairness in Simple Bargaining Experiments](https://doi.org/10.1006/game.1994.1021)* -Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. -Games and Economic Behavior, 6(3), 347-369. 1994. - -When the role assigned to the model is that of a human rather than an assistant agent, -only Llama3 deviates with a median share of $60. - -Unlike the deterministic strategies generated by LLMs, the intra-model variability in -generated actions can be used to simulate the diversity of human behaviours based -on their experiences, preferences, or contexts. - -Figure below illustrates the evolution of the dictator's share -as a function of temperature with a 95 % confidence interval when we ask each -models to generate decisions. - - - Our sensitivity analysis of the temperature parameter reveals that the portion retained by the dictator remains stable. However, the decisions become more deterministic at low temperatures, whereas allocation diversity increases at high temperatures, reflecting a more random exploration of available options. + ### Preference alignment