diff --git a/README.md b/README.md
index 5ca7a283b1299ac70a25e0b27d6a48403a845bec..811716a5b7c6422b6e5dee427ae3d2453c376960 100644
--- a/README.md
+++ b/README.md
@@ -48,14 +48,14 @@ to generative AAMAS. This list is a work in progress and will be regularly updat
   learning models in resource-constrained environments by making these models
   more lightweight without compromising too much on performance.
 
-   **[A survey of quantization methods for efficient neural 
-   network inference](https://www.crcpress.com/Low-Power-Computer-Vision/Gholami-Kim-Dong-Yao-Mahoney-Keutzer/p/book/9780367707095)**  
+   **[A survey of quantization methods for efficient neural
+   network inference](https://www.crcpress.com/Low-Power-Computer-Vision/Gholami-Kim-Dong-Yao-Mahoney-Keutzer/p/book/9780367707095)**
    Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer (2022)
    Published in *Low-Power Computer Vision*, Chapman and Hall/CRC, pp. 291–326.
 
-   **[Knowledge Distillation: A Survey](https://doi.org/10.1007/s11263-021-01453-z)**  
-   Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao (2021) 
-   Published in *International Journal of Computer Vision*, Volume 129, pp. 1789–1819.  
+   **[Knowledge Distillation: A Survey](https://doi.org/10.1007/s11263-021-01453-z)**
+   Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao (2021)
+   Published in *International Journal of Computer Vision*, Volume 129, pp. 1789–1819.
 
 ## Large Language Models
 
@@ -479,6 +479,28 @@ dilemma where aggressive strategies can persist or even dominate.
   Leibo, Michael Luck (2025) Published on arXiv
 
 
+The authors consider LLMs that play finitely repeated games with full
+information and analyze their behavior when competing against other LLMs as well
+as simple, human-like strategies. Their findings show that GPT-4 acts
+particularly unforgivingly in the iterated Prisoner’s Dilemma, always defecting
+after another agent has defected just once. Additionally, it fails to follow the
+simple convention of alternating between options in the Battle of the Sexes
+game. GPT-4 performs poorly in these games due to a lack of coordination. These
+behaviors are not caused by an inability to predict the other player’s actions
+and persist across multiple robustness checks and variations in the payoff
+matrices. Rather than adjusting its choices based on the other player, GPT-4
+consistently selects its preferred option. As a result, it fails to coordinate
+with a simple, human-like agent—an instance of a behavioral flaw. However, these
+behaviors can be modified. GPT-4 becomes more forgiving when explicitly reminded
+that the other player might make mistakes. Furthermore, its coordination
+improves when first prompted to predict the other player’s actions before
+selecting its own. By prompting LLMs to imagine possible actions and their
+outcomes before making a decision, the authors improve GPT-4’s behavior, leading
+it to alternate more effectively.
+
+- **[Playing Repeated Games with Large Language Models](https://arxiv.org/abs/2305.16867)**
+  Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, Eric Schulz (2023) Published on arXiv
+
 
 ### Generative MAS on the shelf