Skip to content
Snippets Groups Projects
Commit 1aa713d7 authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

Improve synthesis

parent d84341aa
No related branches found
No related tags found
No related merge requests found
......@@ -430,26 +430,6 @@ into their own decisions. Despite some being able to identify patterns,
most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to
infer and act on opponents’ simple behaviour
Our findings reveal notable differences in the cognitive capabilities of LLMs
across multiple dimensions of decision-making.
<tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making,
with <tt>Llama3</tt> showing moderate adherence and </tt>DeepSeek-R1</tt> displaying considerable inconsistency.
<tt>GPT-4.5</tt>, <tt>Llama3</tt>, and <tt>Mistral-Small</tt> generally align well with declared preferences,
particularly when generating algorithmic strategies rather than isolated one-shot actions.
These models tend to struggle more with one-shot decision-making, where responses are less structured and
more prone to inconsistency. In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and
performs poorly in aligning actions with specified preferences.
<tt>GPT-4.5</tt> and <tt>Mistral-Small</tt> consistently display rational behavior at both first- and second-order levels.
<tt>Llama3</tt>, although prone to random behavior when generating strategies, adapts more effectively in one-shot
decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly in both strategic and one-shot formats, rarely
exhibiting coherent rationality.
All models—regardless of size or architecture—struggle to anticipate or incorporate the behaviors of other agents
into their own decisions. Despite some being able to identify patterns,
most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to
infer and act on opponents’ simple behaviour
## Authors
Maxime MORGE
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment