diff --git a/README.md b/README.md index aa1e4896d51d4b38f6480220da1bbea8a9415fae..165e933fcdfe9afcfeab57fa92843673d0f7e272 100644 --- a/README.md +++ b/README.md @@ -430,6 +430,26 @@ into their own decisions. Despite some being able to identify patterns, most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to infer and act on opponents’ simple behaviour +Our findings reveal notable differences in the cognitive capabilities of LLMs +across multiple dimensions of decision-making. +<tt>Mistral-Small</tt> demonstrates the highest level of consistency in economic decision-making, +with <tt>Llama3</tt> showing moderate adherence and </tt>DeepSeek-R1</tt> displaying considerable inconsistency. + +<tt>GPT-4.5</tt>, <tt>Llama3</tt>, and <tt>Mistral-Small</tt> generally align well with declared preferences, +particularly when generating algorithmic strategies rather than isolated one-shot actions. +These models tend to struggle more with one-shot decision-making, where responses are less structured and +more prone to inconsistency. In contrast, <tt>DeepSeek-R1</tt> fails to generate valid strategies and +performs poorly in aligning actions with specified preferences. +<tt>GPT-4.5</tt> and <tt>Mistral-Small</tt> consistently display rational behavior at both first- and second-order levels. +<tt>Llama3</tt>, although prone to random behavior when generating strategies, adapts more effectively in one-shot +decision-making tasks. <tt>DeepSeek-R1</tt> underperforms significantly in both strategic and one-shot formats, rarely +exhibiting coherent rationality. + +All models—regardless of size or architecture—struggle to anticipate or incorporate the behaviors of other agents +into their own decisions. Despite some being able to identify patterns, +most fail to translate these beliefs into optimal responses. Only <tt>Llama3.3:latest</tt> shows any reliable ability to +infer and act on opponents’ simple behaviour + ## Authors Maxime MORGE