From 6013a082918b618f38b959a016efd8ddbae38206 Mon Sep 17 00:00:00 2001
From: mmorge <maxime.morge@univ-lyon1.fr>
Date: Sat, 3 May 2025 17:15:14 +0200
Subject: [PATCH] PyGAAMAS: Update synthesis with Qwen3 outcome

---
 README.md | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/README.md b/README.md
index 5c47ce9..2db9bd7 100644
--- a/README.md
+++ b/README.md
@@ -438,26 +438,29 @@ incorporate other agents’ actions into their decision-making.
 
 ## Synthesis
 
-Our findings reveal notable differences in the cognitive capabilities of LLMs 
-across multiple dimensions of decision-making.
-<tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, 
-with <tt>Llama3</tt> showing moderate adherence and </tt>DeepSeek-R1</tt> displaying considerable inconsistency.
+Our findings reveal notable differences in the cognitive capabilities of LLMs across multiple dimensions of 
+decision-making. <tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, 
+with <tt>Llama3</tt> showing moderate adherence and DeepSeek-R1 displaying considerable inconsistency. 
+<tt>Qwen3</tt> performs moderately well, showing rational behavior but struggling with implicit reasoning.
 
 <tt>GPT-4.5</tt>, <tt>Llama3</tt>, and <tt>Mistral-Small</tt> generally align well with declared preferences, 
-particularly when generating algorithmic strategies rather than isolated one-shot actions. 
-These models tend to struggle more with one-shot decision-making, where responses are less structured and 
-more prone to inconsistency. In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and 
-performs poorly in aligning actions with specified preferences.
-<tt>GPT-4.5</tt> and <tt>Mistral-Small</tt> consistently display rational behavior at both first- and second-order levels.
-<tt>Llama3</tt>, although prone to random behavior when generating strategies, adapts more effectively in one-shot 
-decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly in both strategic and one-shot formats, rarely
-exhibiting  coherent rationality.
+particularly when generating algorithmic strategies rather than isolated one-shot actions. These models tend to 
+struggle more with one-shot decision-making, where responses are less structured and more prone to inconsistency.
+In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and performs poorly in aligning actions with 
+specified preferences. <tt>Qwen3</tt> aligns well with utilitarian preferences and moderately with altruistic 
+ones but struggles with egoistic and egalitarian preferences.
+
+<tt>GPT-4.5</tt> and </tt>Mistral-Small</tt> consistently display rational behavior at both 
+first- and second-order levels. <tt>Llama3<tt>, although prone to random behavior when generating strategies, 
+adapts more effectively in one-shot decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly 
+in both strategic and one-shot formats, rarely exhibiting coherent rationality. <tt>Qwen3</tt> shows strong 
+first-order rationality when producing actions, especially under explicit or guided conditions, 
+but struggles with deeper inferential reasoning.
 
 All models—regardless of size or architecture—struggle to anticipate or incorporate the behaviors of other agents 
-into their own decisions. Despite some being able to identify patterns, 
-most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to 
-infer and act on opponents’ simple behaviour
-
+into their own decisions. Despite some being able to identify patterns, most fail to translate these beliefs 
+into optimal responses. Only <tt>Llama3.3:latest<//tt> shows any reliable ability to infer and act on 
+opponents’ simple behavior.
 ## Authors
 
 Maxime MORGE
-- 
GitLab