From 904889971324113a0c5e69c00ead7122406dabd1 Mon Sep 17 00:00:00 2001 From: Maxime MORGE <maxime.morge@univ-lille.fr> Date: Fri, 28 Mar 2025 12:59:14 +0100 Subject: [PATCH] LLM4AAMAS: Addkumar25arxiv --- README.md | 102 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 61 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index 983dc2a..eaf8ab7 100644 --- a/README.md +++ b/README.md @@ -106,43 +106,6 @@ to generative AAMAS. This list is a work in progress and will be regularly updat Machine Translation](https://arxiv.org/abs/1406.1078)** *Kyunghyun Cho, Bartvan Merrienboer, Caglar Gulcehre, et al. (2014)* Published on *arXiv* -## Tuning - -### Instruction tuning - -- The fine-tuning of a pre-trained language model requires significantly fewer - data and computational resources, especially when parameter-efficient - approaches such as Low-Rank Adaptation (LoRA) are used. - - **[LoRA: Low-Rank Adaptation of Large Language - Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen, - Phillip Wallis, et al. (2021)* Published on *arXiv* - -- The apparent mastery of textual understanding by LLMs closely resembles human - performance. - - **[Language Models are Few-Shot - Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)** - Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS* - -### Alignement tuning - -- Instruction tuning aims to bridge the gap between the model’s original - objective — generating text — and user expectations, where users want the - model to follow their instructions and perform specific tasks. - - **[Training language models to follow instructions with human - feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)** - *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS* - -- Strong alignment requires cognitive abilities such as understanding and - reasoning about agents’ intentions and their ability to causally produce - desired effects. - - **[Strong and weak alignment of large language models with human - value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M. - & Chatila, R. *Sci Rep** **14**, 19399 (2024). - ## Existing LLMs Many models are available at the following URLs: @@ -196,9 +159,66 @@ Many models are available at the following URLs: Published in *Advances in Neural Information Processing Systems (NeurIPS 2023)* -## Prompt engineering +## Post-training + + +This survey explores post-training methodologies for LLMs, including +fine-tuning, reinforcement learning, and scaling techniques like LoRA and RAG. +Fine-tuning improves task-specific performance but risks overfitting and high +computational costs. Test-Time Scaling (TTS) optimizes inference dynamically +without updating the model, making it suitable for tasks with flexible +computational budgets. Pretraining and TTS serve different purposes—pretraining +enhances fundamental capabilities through extensive training, while TTS improves +performance at inference time. Pretraining is crucial for novel tasks requiring +new skills, whereas TTS is effective when base models already perform reasonably +well. + +- **[LLM Post-Training: A Deep Dive into Reasoning Large Language Models](https://arxiv.org/abs/2502.21321)** + Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham + Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman Khan, + Fahad Shahbaz Khan (2025) on *arXiv* (cs.CL). + +### Tuning + +#### Instruction tuning + +- The fine-tuning of a pre-trained language model requires significantly fewer + data and computational resources, especially when parameter-efficient + approaches such as Low-Rank Adaptation (LoRA) are used. + + **[LoRA: Low-Rank Adaptation of Large Language + Models](https://arxiv.org/abs/2106.09685)** Edward J. Hu, Yelong Shen, + Phillip Wallis, et al. (2021)* Published on *arXiv* + +- The apparent mastery of textual understanding by LLMs closely resembles human + performance. + + **[Language Models are Few-Shot + Learners](https://papers.nips.cc/paper/2020/file/fc2c7f9a3f3f86cde5d8ad2c7f7e57b2-Paper.pdf)** + Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)* Presented at *NeurIPS* + +#### Alignement tuning + +- Instruction tuning aims to bridge the gap between the model’s original + objective — generating text — and user expectations, where users want the + model to follow their instructions and perform specific tasks. + + **[Training language models to follow instructions with human + feedback](https://papers.nips.cc/paper/2022/hash/17f4c5f98073d1fb95f7e53f5c7fdb64-Abstract.html)** + *Long Ouyang, Jeffrey Wu, Xu Jiang, et al. (2022)* Presented at *NeurIPS* + +- Strong alignment requires cognitive abilities such as understanding and + reasoning about agents’ intentions and their ability to causally produce + desired effects. + + **[Strong and weak alignment of large language models with human + value](https://doi.org/10.1038/s41598-024-70031-3)** Khamassi, M., Nahon, M. + & Chatila, R. *Sci Rep** **14**, 19399 (2024). + + +### Prompt engineering -### ICL +#### ICL In-context learning involves providing the model with specific information without requiring additional training. @@ -209,7 +229,7 @@ without requiring additional training. Methods in Natural Language Processing (EMNLP)* Location: Miami, Florida, USA Published by: Association for Computational Linguistics -### CoT +#### CoT Chain-of-thought is a prompting strategy that, instead of being limited to input-output pairs, incorporates intermediate reasoning steps that serve as a @@ -229,7 +249,7 @@ solve problems. Survey](https://arxiv.org/abs/2212.10403)** Jie Huang and Kevin Chen-Chuan Chang (2023)* Published on *arXiv* -### RAG +#### RAG Retrieval-Augmented Generation (RAG) is a prompting strategy that involves integrating relevant information from external data sources into the -- GitLab