diff --git a/README.md b/README.md index 825f4bd1d0e2b6d17757353e3161a4524c5ebc2c..dfac7907a50d4bbde3dae3b3ffcdbe0a2a059d0c 100644 --- a/README.md +++ b/README.md @@ -429,10 +429,9 @@ For our experiments, we consider two simple models for the opponent where: We evaluate the models' ability to identify these behavioural patterns by calculating the average number of points earned per round. -Figures present the average points earned per round and the 95% confidence interval for each LLM against the two opponent behavior -models in the matching pennies game, whether the LLM generates a strategy or one-shot actions. +Figures present the average points earned and prediction per round (95% confidence interval) for each LLM against the two opponent behavior (constant and alternate) models in the matching pennies game. -... +Against Constant behavior, <tt>GPT-4.5</tt> and <tt>Qwen3</tt> ...  