LLM4AAMAS: Addkumar25arxiv

90488997 · Maxime MORGE · 7216871c · 90488997
Commit 90488997 authored 3 months ago by Maxime MORGE
--- a/README.md
+++ b/README.md
@@ -106,43 +106,6 @@ to generative AAMAS. This list is a work in progress and will be regularly updat
    Machine Translation](https://arxiv.org/abs/1406.1078)** *Kyunghyun Cho,
    Bartvan Merrienboer, Caglar Gulcehre, et al. (2014)* Published on *arXiv*
-## Tuning
-### Instruction tuning
- The fine-tuning of a pre-trained language model requires significantly fewer
-  data and computational resources, especially when parameter-efficient
-  approaches such as Low-Rank Adaptation (LoRA) are used.
-    **[LoRA: Low-Rank Adaptation of Large Language
-    Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen,
-    Phillip Wallis, et al. (2021)* Published on *arXiv*
- The apparent mastery of textual understanding by LLMs closely resembles human
-  performance.
-    **[Language Models are Few-Shot
-    Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)**
-    Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS*
-### Alignement tuning
- Instruction tuning aims to bridge the gap between the model’s original
-  objective — generating text — and user expectations, where users want the
-  model to follow their instructions and perform specific tasks.
-    **[Training language models to follow instructions with human
-   feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)**
-   *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS*
- Strong alignment requires cognitive abilities such as understanding and
-  reasoning about agents’ intentions and their ability to causally produce
-  desired effects.
-    **[Strong and weak alignment of large language models with human
-    value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M.
-    & Chatila, R. *Sci Rep** **14**, 19399 (2024).
 ## Existing LLMs
 Many models are available at the following URLs:
@@ -196,9 +159,66 @@ Many models are available at the following URLs:
  Published in *Advances in Neural Information Processing Systems (NeurIPS
  2023)*
-## Prompt engineering
+## Post-training
+This survey explores post-training methodologies for LLMs, including
+fine-tuning, reinforcement learning, and scaling techniques like LoRA and RAG.
+Fine-tuning improves task-specific performance but risks overfitting and high
+computational costs. Test-Time Scaling (TTS) optimizes inference dynamically
+without updating the model, making it suitable for tasks with flexible
+computational budgets. Pretraining and TTS serve different purposes—pretraining
+enhances fundamental capabilities through extensive training, while TTS improves
+performance at inference time. Pretraining is crucial for novel tasks requiring
+new skills, whereas TTS is effective when base models already perform reasonably
+well.
+- **[LLM Post-Training: A Deep Dive into Reasoning Large Language Models](https://arxiv.org/abs/2502.21321)**  
+  Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham
+  Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman Khan,
+  Fahad Shahbaz Khan (2025) on *arXiv* (cs.CL).
+### Tuning
+#### Instruction tuning
+- The fine-tuning of a pre-trained language model requires significantly fewer
+  data and computational resources, especially when parameter-efficient
+  approaches such as Low-Rank Adaptation (LoRA) are used.
+    **[LoRA: Low-Rank Adaptation of Large Language
+    Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen,
+    Phillip Wallis, et al. (2021)* Published on *arXiv*
+- The apparent mastery of textual understanding by LLMs closely resembles human
+  performance.
+    **[Language Models are Few-Shot
+    Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)**
+    Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS*
+#### Alignement tuning
+- Instruction tuning aims to bridge the gap between the model’s original
+  objective — generating text — and user expectations, where users want the
+  model to follow their instructions and perform specific tasks.
+    **[Training language models to follow instructions with human
+   feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)**
+   *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS*
+- Strong alignment requires cognitive abilities such as understanding and
+  reasoning about agents’ intentions and their ability to causally produce
+  desired effects.
+    **[Strong and weak alignment of large language models with human
+    value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M.
+    & Chatila, R. *Sci Rep** **14**, 19399 (2024).
+### Prompt engineering
-### ICL
+#### ICL
 In-context learning involves providing the model with specific information
 without requiring additional training.
@@ -209,7 +229,7 @@ without requiring additional training.
  Methods in Natural Language Processing (EMNLP)* Location: Miami, Florida, USA
  Published by: Association for Computational Linguistics
-### CoT
+#### CoT
 Chain-of-thought is a prompting strategy that, instead of being limited to
 input-output pairs, incorporates intermediate reasoning steps that serve as a
@@ -229,7 +249,7 @@ solve problems.
  Survey](https://arxiv.org/abs/2212.10403)** Jie Huang and Kevin Chen-Chuan
  Chang (2023)* Published on *arXiv*
-### RAG
+#### RAG
 Retrieval-Augmented Generation (RAG) is a prompting strategy that involves
 integrating relevant information from external data sources into the