Skip to content
Snippets Groups Projects
Commit 7b8f9f33 authored by stephanebonnevay's avatar stephanebonnevay
Browse files

Readme

parent 53a2ff9a
No related branches found
No related tags found
No related merge requests found
......@@ -429,9 +429,10 @@ For our experiments, we consider two simple models for the opponent where:
We evaluate the models' ability to identify these behavioural patterns by calculating the average number of points earned per round.
Figures present the average points earned and prediction per round (95% confidence interval) for each LLM against the two opponent behavior (constant and alternate) models in the matching pennies game.
Figures present the average points earned and prediction per round (95% confidence interval) for each LLM against the two opponent behavior models (constant and alternate) in the matching pennies game.
Against Constant behavior, <tt>GPT-4.5</tt> and <tt>Qwen3</tt> ...
Against Constant behavior, <tt>GPT-4.5</tt> and <tt>Qwen3</tt> were able to generate a valid strategy. The charts show that they are able to correctly predict their opponent's strategy after just a few rounds. They perfectly identify the fact that their opponent always plays the same move.
The predictions made by <tt>Mistral-Small<tt>, <tt>LLaMA3</tt>, and <tt>DeepSeek-R1</tt> are not incorrect, but the moves played are not in line with these predictions, which leads to a fairly low expected gain.
The models exhibit varied approaches to decision-making in the MP game.
<tt>GPT-4.5</tt> follows a fixed alternating pattern, switching between "Head" and "Tail" each turn, assuming the opponent behaves similarly.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment