PyGAAMAS: Update synthesis with Qwen3 outcome

6013a082 · Maxime Morge · d4ecd14a · 6013a082
Commit 6013a082 authored 1 month ago by Maxime Morge
--- a/README.md
+++ b/README.md
@@ -438,26 +438,29 @@ incorporate other agents’ actions into their decision-making.
 ## Synthesis
-Our findings reveal notable differences in the cognitive capabilities of LLMs 
+Our findings reveal notable differences in the cognitive capabilities of LLMs across multiple dimensions of 
-across multiple dimensions of decision-making.
+decision-making. <tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, 
-<tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, 
+with <tt>Llama3</tt> showing moderate adherence and DeepSeek-R1 displaying considerable inconsistency. 
-with <tt>Llama3</tt> showing moderate adherence and </tt>DeepSeek-R1</tt> displaying considerable inconsistency.
+<tt>Qwen3</tt> performs moderately well, showing rational behavior but struggling with implicit reasoning.
 <tt>GPT-4.5</tt>, <tt>Llama3</tt>, and <tt>Mistral-Small</tt> generally align well with declared preferences, 
-particularly when generating algorithmic strategies rather than isolated one-shot actions. 
+particularly when generating algorithmic strategies rather than isolated one-shot actions. These models tend to 
-These models tend to struggle more with one-shot decision-making, where responses are less structured and 
+struggle more with one-shot decision-making, where responses are less structured and more prone to inconsistency.
-more prone to inconsistency. In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and 
+In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and performs poorly in aligning actions with 
-performs poorly in aligning actions with specified preferences.
+specified preferences. <tt>Qwen3</tt> aligns well with utilitarian preferences and moderately with altruistic 
-<tt>GPT-4.5</tt> and <tt>Mistral-Small</tt> consistently display rational behavior at both first- and second-order levels.
+ones but struggles with egoistic and egalitarian preferences.
-<tt>Llama3</tt>, although prone to random behavior when generating strategies, adapts more effectively in one-shot 
-decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly in both strategic and one-shot formats, rarely
+<tt>GPT-4.5</tt> and </tt>Mistral-Small</tt> consistently display rational behavior at both 
-exhibiting  coherent rationality.
+first- and second-order levels. <tt>Llama3<tt>, although prone to random behavior when generating strategies, 
+adapts more effectively in one-shot decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly 
+in both strategic and one-shot formats, rarely exhibiting coherent rationality. <tt>Qwen3</tt> shows strong 
+first-order rationality when producing actions, especially under explicit or guided conditions, 
+but struggles with deeper inferential reasoning.
 All models—regardless of size or architecture—struggle to anticipate or incorporate the behaviors of other agents 
-into their own decisions. Despite some being able to identify patterns, 
+into their own decisions. Despite some being able to identify patterns, most fail to translate these beliefs 
-most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to 
+into optimal responses. Only <tt>Llama3.3:latest<//tt> shows any reliable ability to infer and act on 
-infer and act on opponents’ simple behaviour
+opponents’ simple behavior.
 ## Authors
 Maxime MORGE