@@ -429,9 +429,10 @@ For our experiments, we consider two simple models for the opponent where:
...
@@ -429,9 +429,10 @@ For our experiments, we consider two simple models for the opponent where:
We evaluate the models' ability to identify these behavioural patterns by calculating the average number of points earned per round.
We evaluate the models' ability to identify these behavioural patterns by calculating the average number of points earned per round.
Figures present the average points earned and prediction per round (95% confidence interval) for each LLM against the two opponent behavior (constant and alternate) models in the matching pennies game.
Figures present the average points earned and prediction per round (95% confidence interval) for each LLM against the two opponent behavior models (constant and alternate) in the matching pennies game.
Against Constant behavior, <tt>GPT-4.5</tt> and <tt>Qwen3</tt> ...
Against Constant behavior, <tt>GPT-4.5</tt> and <tt>Qwen3</tt> were able to generate a valid strategy. The charts show that they are able to correctly predict their opponent's strategy after just a few rounds. They perfectly identify the fact that their opponent always plays the same move.
The predictions made by <tt>Mistral-Small<tt>, <tt>LLaMA3</tt>, and <tt>DeepSeek-R1</tt> are not incorrect, but the moves played are not in line with these predictions, which leads to a fairly low expected gain.
The models exhibit varied approaches to decision-making in the MP game.
The models exhibit varied approaches to decision-making in the MP game.
<tt>GPT-4.5</tt> follows a fixed alternating pattern, switching between "Head" and "Tail" each turn, assuming the opponent behaves similarly.
<tt>GPT-4.5</tt> follows a fixed alternating pattern, switching between "Head" and "Tail" each turn, assuming the opponent behaves similarly.