From 904889971324113a0c5e69c00ead7122406dabd1 Mon Sep 17 00:00:00 2001
From: Maxime MORGE <maxime.morge@univ-lille.fr>
Date: Fri, 28 Mar 2025 12:59:14 +0100
Subject: [PATCH] LLM4AAMAS: Addkumar25arxiv

---
 README.md | 102 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 41 deletions(-)

diff --git a/README.md b/README.md
index 983dc2a..eaf8ab7 100644
--- a/README.md
+++ b/README.md
@@ -106,43 +106,6 @@ to generative AAMAS. This list is a work in progress and will be regularly updat
     Machine Translation](https://arxiv.org/abs/1406.1078)** *Kyunghyun Cho,
     Bartvan Merrienboer, Caglar Gulcehre, et al. (2014)* Published on *arXiv*
 
-## Tuning
-
-### Instruction tuning
-
-- The fine-tuning of a pre-trained language model requires significantly fewer
-  data and computational resources, especially when parameter-efficient
-  approaches such as Low-Rank Adaptation (LoRA) are used.
-
-    **[LoRA: Low-Rank Adaptation of Large Language
-    Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen,
-    Phillip Wallis, et al. (2021)* Published on *arXiv*
-
-- The apparent mastery of textual understanding by LLMs closely resembles human
-  performance.
-
-    **[Language Models are Few-Shot
-    Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)**
-    Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS*
-
-### Alignement tuning
-
-- Instruction tuning aims to bridge the gap between the model’s original
-  objective — generating text — and user expectations, where users want the
-  model to follow their instructions and perform specific tasks.
-
-    **[Training language models to follow instructions with human
-   feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)**
-   *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS*
-
-- Strong alignment requires cognitive abilities such as understanding and
-  reasoning about agents’ intentions and their ability to causally produce
-  desired effects.
-
-    **[Strong and weak alignment of large language models with human
-    value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M.
-    & Chatila, R. *Sci Rep** **14**, 19399 (2024).
-
 ## Existing LLMs
 
 Many models are available at the following URLs:
@@ -196,9 +159,66 @@ Many models are available at the following URLs:
   Published in *Advances in Neural Information Processing Systems (NeurIPS
   2023)*
 
-## Prompt engineering
+## Post-training
+
+
+This survey explores post-training methodologies for LLMs, including
+fine-tuning, reinforcement learning, and scaling techniques like LoRA and RAG.
+Fine-tuning improves task-specific performance but risks overfitting and high
+computational costs. Test-Time Scaling (TTS) optimizes inference dynamically
+without updating the model, making it suitable for tasks with flexible
+computational budgets. Pretraining and TTS serve different purposes—pretraining
+enhances fundamental capabilities through extensive training, while TTS improves
+performance at inference time. Pretraining is crucial for novel tasks requiring
+new skills, whereas TTS is effective when base models already perform reasonably
+well.
+
+- **[LLM Post-Training: A Deep Dive into Reasoning Large Language Models](https://arxiv.org/abs/2502.21321)**  
+  Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham
+  Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman Khan,
+  Fahad Shahbaz Khan (2025) on *arXiv* (cs.CL).
+
+### Tuning
+
+#### Instruction tuning
+
+- The fine-tuning of a pre-trained language model requires significantly fewer
+  data and computational resources, especially when parameter-efficient
+  approaches such as Low-Rank Adaptation (LoRA) are used.
+
+    **[LoRA: Low-Rank Adaptation of Large Language
+    Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen,
+    Phillip Wallis, et al. (2021)* Published on *arXiv*
+
+- The apparent mastery of textual understanding by LLMs closely resembles human
+  performance.
+
+    **[Language Models are Few-Shot
+    Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)**
+    Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS*
+
+#### Alignement tuning
+
+- Instruction tuning aims to bridge the gap between the model’s original
+  objective — generating text — and user expectations, where users want the
+  model to follow their instructions and perform specific tasks.
+
+    **[Training language models to follow instructions with human
+   feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)**
+   *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS*
+
+- Strong alignment requires cognitive abilities such as understanding and
+  reasoning about agents’ intentions and their ability to causally produce
+  desired effects.
+
+    **[Strong and weak alignment of large language models with human
+    value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M.
+    & Chatila, R. *Sci Rep** **14**, 19399 (2024).
+
+
+### Prompt engineering
 
-### ICL
+#### ICL
 
 In-context learning involves providing the model with specific information
 without requiring additional training.
@@ -209,7 +229,7 @@ without requiring additional training.
   Methods in Natural Language Processing (EMNLP)* Location: Miami, Florida, USA
   Published by: Association for Computational Linguistics
 
-### CoT
+#### CoT
 
 Chain-of-thought is a prompting strategy that, instead of being limited to
 input-output pairs, incorporates intermediate reasoning steps that serve as a
@@ -229,7 +249,7 @@ solve problems.
   Survey](https://arxiv.org/abs/2212.10403)** Jie Huang and Kevin Chen-Chuan
   Chang (2023)* Published on *arXiv*
 
-### RAG
+#### RAG
 
 Retrieval-Augmented Generation (RAG) is a prompting strategy that involves
 integrating relevant information from external data sources into the
-- 
GitLab