Skip to content
Snippets Groups Projects
Commit 3f4e0626 authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

Correct results

parent ea89e1bf
No related branches found
No related tags found
No related merge requests found
......@@ -132,23 +132,27 @@ Table below evaluates the ability of the models to align with different preferen
- <tt>Llama3<tt> aligns well with **egoistic** and **altruistic** preferences but shows lower adherence to **utilitarian** and **egalitarian** choices.
- <tt>Mistral-Small</tt> aligns better with **altruistic** preferences and performs moderately on **utilitarianism** but struggles with **egoistic** and **egalitarian** preferences.
- <tt>DeepSeek-R1</tt> primarily aligns with **utilitarianism** but has low accuracy in other preferences.
It is surprising to note that larger versions of LLMs do not improve, and may even deteriorate, the results.
While a larger LLM typically aligns better with preferences, a model like Mixtral-8x7B may occasionally
underperform compared to its smaller counterpart, Mistral-Small due to their architectural complexity.
Mixture-of-Experts (MoE) models, like Mixtral, dynamically activate only a subset of their parameters.
If the routing mechanism isn’t well-tuned, it might select less optimal experts, leading to degraded performance.
| **Model** | **Generation** | **Egoistic** | **Altruistic** | **Utilitarian** | **Egalitarian** |
|------------------------------|----------------|--------------|----------------|-----------------|-----------------|
| **<tt>GPT-4.5</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>Llama32:latest</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>Llama3.3:latest</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>Llama3</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>Mixtral:8x7b</tt>** | **Strategy** | - | - | - | - |
| **<tt>Mistral-Small</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>DeepSeek-R1:7b</tt>** | **Strategy** | 1.00 | 1.00 | 1.00 | 1.00 |
| **<tt>DeepSeek-R1</tt>** | **Strategy** | - | - | - | - |
| **<tt>GPT-4.5<tt>** | **Actions** | 1.00 | 1.00 | 0.50 | 1.00 |
| **<tt>Llama3.3:latest</tt>** | **Actions** | 0.50 | 0.50 | 0.21 | 0.48 |
| **<tt>Llama3.3:latest</tt>** | **Actions** | 1.00 | 1.00 | 0.43 | 0.96 |
| **<tt>Llama3</tt>** | **Actions** | 1.00 | 0.90 | 0.40 | 0.73 |
| **<tt>Mixtral:8x7b</tt>** | **Actions** | 0.00 | 0.00 | 0.00 | 0.50 |
| **<tt>Mistral-Small</tt>** | **Actions** | 0.40 | 0.93 | 0.76 | 0.16 |
| **<tt>DeepSeek-R1:7b</tt>** | **Actions** | 0.23 | 0.28 | 0.33 | 0.45 |
| **<tt>Mixtral:8x7b</tt>** | **Actions** | 0.00 | 0.00 | 0.30 | 1.00 |
| **<tt>Mistral-Small</tt>** | **Actions** | 0.40 | 0.94 | 0.76 | 0.16 |
| **<tt>DeepSeek-R1:7b</tt>** | **Actions** | 0.46 | 0.56 | 0.66 | 0.90 |
| **<tt>DeepSeek-R1</tt>** | **Actions** | 0.06 | 0.20 | 0.76 | 0.03 |
Errors in action selection may stem from either arithmetic miscalculations
......
This diff is collapsed.
Model,ALTRUISTIC,EGALITARIAN,SELFISH,UTILITARIAN
deepseek-r1,0.2,0.03333333333333333,0.06666666666666667,0.7666666666666667
deepseek-r1:7b,0.2833333333333333,0.45,0.23333333333333334,0.3333333333333333
deepseek-r1:7b,0.5666666666666667,0.9,0.4666666666666667,0.6666666666666666
gpt-4.5-preview-2025-02-27,1.0,1.0,1.0,0.5
llama3,0.9,0.7333333333333333,1.0,0.4
llama3.3:latest,0.5,0.48333333333333334,0.5,0.21666666666666667
mistral-small,0.9333333333333333,0.16666666666666666,0.4,0.7666666666666667
mixtral:8x7b,0.0,0.5,0.0,0.15
llama3.3:latest,1.0,0.9666666666666667,1.0,0.43333333333333335
mistral-small,0.9411764705882353,0.16666666666666666,0.4,0.7666666666666667
mixtral:8x7b,0.0,1.0,0.0,0.3
......@@ -74,7 +74,7 @@ class DictatorSetup:
self.base_url = base_url
self.api_key = api_key
if not self.strategy and is_openai_model:
if not self.strategy and is_openai_model or not is_pagoda_model:
self.model_client = OpenAIChatCompletionClient(
model=self.model,
base_url=base_url,
......
......@@ -65,7 +65,7 @@ class DictatorSetupExperiment:
# Running the experiment
if __name__ == "__main__":
models = ["gpt-4.5-preview-2025-02-27", "llama3", "mistral-small", "deepseek-r1", "mixtral:8x7b", "llama3.3:latest", "deepseek-r1:7b"]
models = [ "llama3", "mistral-small", "deepseek-r1"] # "gpt-4.5-preview-2025-02-27", "mixtral:8x7b", "llama3.3:latest", "deepseek-r1:7b",
temperature = 0.7
iterations = 30
output_file = '../../data/dictator/dictator_setup.csv'
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment