Correct results

3f4e0626 · Maxime Morge · ea89e1bf · 3f4e0626 · 3f4e0626 · 3f4e0626
Commit 3f4e0626 authored 4 months ago by Maxime Morge
--- a/README.md
+++ b/README.md
@@ -132,23 +132,27 @@ Table below evaluates the ability of the models to align with different preferen
  - <tt>Llama3<tt> aligns well with **egoistic** and **altruistic** preferences but shows lower adherence to **utilitarian** and **egalitarian** choices.
  - <tt>Mistral-Small</tt> aligns better with **altruistic** preferences and performs moderately on **utilitarianism** but struggles with **egoistic** and **egalitarian** preferences.
  - <tt>DeepSeek-R1</tt> primarily aligns with **utilitarianism** but has low accuracy in other preferences.
-It is surprising to note that larger versions of LLMs do not improve, and may even deteriorate, the results.
+While a larger LLM typically aligns better with preferences, a model like Mixtral-8x7B may occasionally 
+underperform compared to its smaller counterpart, Mistral-Small due to their architectural complexity. 
+Mixture-of-Experts (MoE) models, like Mixtral, dynamically activate only a subset of their parameters. 
+If the routing mechanism isn’t well-tuned, it might select less optimal experts, leading to degraded performance.
+

 | **Model**                    | **Generation** | **Egoistic** | **Altruistic** | **Utilitarian** | **Egalitarian** |
 |------------------------------|----------------|--------------|----------------|-----------------|-----------------|
 | **<tt>GPT-4.5</tt>**         | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
-| **<tt>Llama32:latest</tt>**  | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
+| **<tt>Llama3.3:latest</tt>** | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
 | **<tt>Llama3</tt>**          | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
 | **<tt>Mixtral:8x7b</tt>**    | **Strategy**   | -            | -              | -               | -               |
 | **<tt>Mistral-Small</tt>**   | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
 | **<tt>DeepSeek-R1:7b</tt>**  | **Strategy**   | 1.00         | 1.00           | 1.00            | 1.00            |
 | **<tt>DeepSeek-R1</tt>**     | **Strategy**   | -            | -              | -               | -               |
 | **<tt>GPT-4.5<tt>**          | **Actions**    | 1.00         | 1.00           | 0.50            | 1.00            |
-| **<tt>Llama3.3:latest</tt>** | **Actions**    | 0.50         | 0.50           | 0.21            | 0.48            |
+| **<tt>Llama3.3:latest</tt>** | **Actions**    | 1.00         | 1.00           | 0.43            | 0.96            |
 | **<tt>Llama3</tt>**          | **Actions**    | 1.00         | 0.90           | 0.40            | 0.73            |
-| **<tt>Mixtral:8x7b</tt>**    | **Actions**    | 0.00         | 0.00           | 0.00            | 0.50            |
-| **<tt>Mistral-Small</tt>**   | **Actions**    | 0.40         | 0.93           | 0.76            | 0.16            |
-| **<tt>DeepSeek-R1:7b</tt>**  | **Actions**    | 0.23         | 0.28           | 0.33            | 0.45            |
+| **<tt>Mixtral:8x7b</tt>**    | **Actions**    | 0.00         | 0.00           | 0.30            | 1.00            |
+| **<tt>Mistral-Small</tt>**   | **Actions**    | 0.40         | 0.94           | 0.76            | 0.16            |
+| **<tt>DeepSeek-R1:7b</tt>**  | **Actions**    | 0.46         | 0.56           | 0.66            | 0.90            |
 | **<tt>DeepSeek-R1</tt>**     | **Actions**    | 0.06         | 0.20           | 0.76            | 0.03            |

 Errors in action selection may stem from either arithmetic miscalculations  

--- a/data/dictator/dictator_setup.csv
+++ b/data/dictator/dictator_setup.csv
--- a/figures/dictator/dictator_setup_accuracy.csv
+++ b/figures/dictator/dictator_setup_accuracy.csv
 Model,ALTRUISTIC,EGALITARIAN,SELFISH,UTILITARIAN
 deepseek-r1,0.2,0.03333333333333333,0.06666666666666667,0.7666666666666667
-deepseek-r1:7b,0.2833333333333333,0.45,0.23333333333333334,0.3333333333333333
+deepseek-r1:7b,0.5666666666666667,0.9,0.4666666666666667,0.6666666666666666
 gpt-4.5-preview-2025-02-27,1.0,1.0,1.0,0.5
 llama3,0.9,0.7333333333333333,1.0,0.4
-llama3.3:latest,0.5,0.48333333333333334,0.5,0.21666666666666667
-mistral-small,0.9333333333333333,0.16666666666666666,0.4,0.7666666666666667
-mixtral:8x7b,0.0,0.5,0.0,0.15
+llama3.3:latest,1.0,0.9666666666666667,1.0,0.43333333333333335
+mistral-small,0.9411764705882353,0.16666666666666666,0.4,0.7666666666666667
+mixtral:8x7b,0.0,1.0,0.0,0.3
--- a/src/dictator/dictator_setup.py
+++ b/src/dictator/dictator_setup.py
@@ -74,7 +74,7 @@ class DictatorSetup:
        self.base_url = base_url
        self.api_key = api_key

-        if not self.strategy and is_openai_model:
+        if not self.strategy and is_openai_model or not is_pagoda_model:
            self.model_client = OpenAIChatCompletionClient(
                model=self.model,
                base_url=base_url,

--- a/src/dictator/dictator_setup_experiments.py
+++ b/src/dictator/dictator_setup_experiments.py
@@ -65,7 +65,7 @@ class DictatorSetupExperiment:

 # Running the experiment
 if __name__ == "__main__":
-    models = ["gpt-4.5-preview-2025-02-27", "llama3", "mistral-small", "deepseek-r1", "mixtral:8x7b", "llama3.3:latest", "deepseek-r1:7b"]
+    models = [ "llama3", "mistral-small", "deepseek-r1"] # "gpt-4.5-preview-2025-02-27", "mixtral:8x7b", "llama3.3:latest", "deepseek-r1:7b",
    temperature = 0.7
    iterations = 30
    output_file = '../../data/dictator/dictator_setup.csv'