diff --git a/README.md b/README.md index 165e933fcdfe9afcfeab57fa92843673d0f7e272..aa1e4896d51d4b38f6480220da1bbea8a9415fae 100644 --- a/README.md +++ b/README.md @@ -430,26 +430,6 @@ into their own decisions. Despite some being able to identify patterns, most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to infer and act on opponents’ simple behaviour -Our findings reveal notable differences in the cognitive capabilities of LLMs -across multiple dimensions of decision-making. -<tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, -with <tt>Llama3</tt> showing moderate adherence and </tt>DeepSeek-R1</tt> displaying considerable inconsistency. - -<tt>GPT-4.5</tt>, <tt>Llama3</tt>, and <tt>Mistral-Small</tt> generally align well with declared preferences, -particularly when generating algorithmic strategies rather than isolated one-shot actions. -These models tend to struggle more with one-shot decision-making, where responses are less structured and -more prone to inconsistency. In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and -performs poorly in aligning actions with specified preferences. -<tt>GPT-4.5</tt> and <tt>Mistral-Small</tt> consistently display rational behavior at both first- and second-order levels. -<tt>Llama3</tt>, although prone to random behavior when generating strategies, adapts more effectively in one-shot -decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly in both strategic and one-shot formats, rarely -exhibiting coherent rationality. - -All models—regardless of size or architecture—struggle to anticipate or incorporate the behaviors of other agents -into their own decisions. Despite some being able to identify patterns, -most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to -infer and act on opponents’ simple behaviour - ## Authors Maxime MORGE