Skip to content
Snippets Groups Projects
Commit cdb127b7 authored by Maxime Morge's avatar Maxime Morge :construction_worker:
Browse files

Evaluate first order rationality with Pagoda

parent 3f4e0626
No related branches found
No related tags found
No related merge requests found
...@@ -120,19 +120,19 @@ We define four preferences for the dictator, each corresponding to a distinct fo ...@@ -120,19 +120,19 @@ We define four preferences for the dictator, each corresponding to a distinct fo
We consider four allocation options where part of the money is lost in the division process, We consider four allocation options where part of the money is lost in the division process,
each corresponding to one of the four preferences: each corresponding to one of the four preferences:
- The dictator keeps **$500**, the recipient receives **$100**, and a total of **$400** is lost (**egoistic**). - The dictator keeps **$500, the recipient receives $100, and a total of $400 is lost (**egoistic**).
- The dictator keeps **$100**, the recipient receives **$500**, and **$400** is lost (**altruistic**). - The dictator keeps **$100, the recipient receives $500, and $400 is lost (**altruistic**).
- The dictator keeps **$400**, the recipient receives **$300**, resulting in a loss of **$300** (**utilitarian**). - The dictator keeps **$400, the recipient receives $300, resulting in a loss of $300 (**utilitarian**).
- The dictator keeps **$325**, the other player receives **$325**, and **$350** is lost (**egalitarian**). - The dictator keeps **$325, the other player receives $325, and $350 is lost (**egalitarian**).
Table below evaluates the ability of the models to align with different preferences. Table below evaluates the ability of the models to align with different preferences.
- When generating **strategies**, the models align perfectly with preferences, except for <tt>DeepSeek-R1<tt> and <tt>Mixtral:8x7b</tt> which do not generate valid code. - When generating **strategies**, the models align perfectly with preferences, except for <tt>DeepSeek-R1<tt> and <tt>Mixtral:8x7b</tt> which do not generate valid code.
- When generating **actions**, - When generating **actions**,
- <tt>GPT-4.5<tt> aligns well with preferences but struggles with **utilitarianism**. - <tt>GPT-4.5</tt> aligns well with preferences but struggles with **utilitarianism**.
- <tt>Llama3<tt> aligns well with **egoistic** and **altruistic** preferences but shows lower adherence to **utilitarian** and **egalitarian** choices. - <tt>Llama3</tt> aligns well with **egoistic** and **altruistic** preferences but shows lower adherence to **utilitarian** and **egalitarian** choices.
- <tt>Mistral-Small</tt> aligns better with **altruistic** preferences and performs moderately on **utilitarianism** but struggles with **egoistic** and **egalitarian** preferences. - <tt>Mistral-Small</tt> aligns better with **altruistic** preferences and performs moderately on **utilitarianism** but struggles with **egoistic** and **egalitarian** preferences.
- <tt>DeepSeek-R1</tt> primarily aligns with **utilitarianism** but has low accuracy in other preferences. - <tt>DeepSeek-R1</tt> primarily aligns with **utilitarianism** but has low accuracy in other preferences.
While a larger LLM typically aligns better with preferences, a model like Mixtral-8x7B may occasionally While a larger LLM typically aligns better with preferences, a model like <tt>Mixtral-8x7B</tt> may occasionally
underperform compared to its smaller counterpart, Mistral-Small due to their architectural complexity. underperform compared to its smaller counterpart, Mistral-Small due to their architectural complexity.
Mixture-of-Experts (MoE) models, like Mixtral, dynamically activate only a subset of their parameters. Mixture-of-Experts (MoE) models, like Mixtral, dynamically activate only a subset of their parameters.
If the routing mechanism isn’t well-tuned, it might select less optimal experts, leading to degraded performance. If the routing mechanism isn’t well-tuned, it might select less optimal experts, leading to degraded performance.
...@@ -213,26 +213,35 @@ We first evaluate the rationality of the agents and then their second-order rati ...@@ -213,26 +213,35 @@ We first evaluate the rationality of the agents and then their second-order rati
Table below evaluates the models’ ability to generate rational Table below evaluates the models’ ability to generate rational
behaviour for Player 2. behaviour for Player 2.
| **Model** | **Generation** | **Given** | **Explicit** | **Implicit** | | **Model** | **Generation** | **Given** | **Explicit** | **Implicit** |
|--------------------|--------------|----------|------------|------------| |-------------------|--------------|-----------|--------------|--------------|
| `gpt-4.5` | strategy | 1.00 | 1.00 | 1.00 | | <tt>gpt-4.5</tt> | strategy | 1.00 | 1.00 | 1.00 |
| `mistral-small` | strategy | 1.00 | 1.00 | 1.00 | | <tt>mixtral:8x7b</tt> | strategy | 1.00 | 1.00 | 1.00 |
| `llama3` | strategy | 0.50 | 0.50 | 0.50 | | <tt>mistral-small</tt> | strategy | 1.00 | 1.00 | 1.00 |
| `deepseek-r1` | strategy | - | - | - | | <tt>llama3.3:latest</tt> | strategy | 1.00 | 1.00 | 0.50 |
| **—** | **—** | **—** | **—** | **—** | | <tt>llama3</tt> | strategy | 0.50 | 0.50 | 0.50 |
| `gpt-4.5` | actions | 1.00 | 1.00 | 1.00 | | <tt>deepseek-r1:7b</tt> | strategy | - | - | - |
| `mistral-small` | actions | 1.00 | 1.00 | 0.87 | | <tt>deepseek-r1</tt> | strategy | - | - | - |
| `llama3` | actions | 1.00 | 0.90 | 0.17 | | **—** | **—** | **—** | **—** | **—** |
| `deepseek-r1` | actions | 0.83 | 0.57 | 0.60 | | <tt>gpt-4.5</tt> | actions | 1.00 | 1.00 | 1.00 |
| <tt>mixtral:8x7b</tt> | actions | 1.00 | 1.00 | 1.00 |
When generating strategies, GPT-4.5 and Mistral-Small exhibit | <tt>mistral-small</tt> | actions | 1.00 | 1.00 | 0.87 |
rational behaviour, whereas Llama3 adopts a random strategy. | <tt>llama33:latest</tt> | actions | 1.00 | 1.00 | 1.00 |
DeepSeek-R1 fails to generate valid output. When generating actions, | <tt>llama3.3</tt> | actions | 1.00 | 0.90 | 0.17 |
GPT-4.5 demonstrates its ability to make rational decisions, even with | <tt>deepseek-r1:7b</tt> | actions | 1.00 | 1.00 | 1.00 |
implicit beliefs. Mistral-Small outperforms other open-weight models. | <tt>deepseek-r1</tt> | actions | 0.83 | 0.57 | 0.60 |
Llama3 struggles to infer optimal actions based solely on implicit
beliefs. DeepSeek-R1 is not a good candidate for simulating
rationality. When generating strategies, <tt>GPT-4.5</tt>, <tt>Mixtral-8x7B</tt>, and <tt>Mistral-Small</tt>
exhibit rational behavior, whereas <tt>Llama3</tt> adopts a random rationality.
<tt>Llama3.3:latest</tt> has the same behaviour with implicit beliefs.
<tt>Deepseek-R1:7b</tt> and <tt>DeepSeek-R1</tt> fails to generate valid strategies.
When generating actions, <tt>GPT-4.5</tt>, <tt>Mixtral-8x7B</tt>, <tt>DeepSeek-R1:7b</tt>,
and <tt>Llama3.3:latest<</tt> demonstrate strong rational decision-making, even with implicit beliefs.
<tt>Mistral-Small</tt> performs well but slightly lags in handling implicit reasoning.
<tt>Llama3</tt> struggles with implicit reasoning, while <tt>DeepSeek-R1</tt>
shows inconsistent performance.
Overall, <tt>GPT-4.5</tt> and <tt>Mixtral-8x7B</tt> are the most reliable models for generating rational behavior.
### Second-Order Rationality ### Second-Order Rationality
...@@ -269,17 +278,23 @@ difficulties with implicit beliefs, especially in variant (d). ...@@ -269,17 +278,23 @@ difficulties with implicit beliefs, especially in variant (d).
DeepSeek-R1 does not appear to be a good candidate for simulating DeepSeek-R1 does not appear to be a good candidate for simulating
second-order rationality. second-order rationality.
| **Version** | | **a** | | | **b** | | | **c** | | | **d** | | | | **Version** | | **a** | | | **b** | | | **c** | | | **d** | | |
|-------------|----------------|---------------|----------|----------|---------------|----------|----------|---------------|----------|----------|---------------|----------|----------| |---------------------|----------------|-----------|--------------|--------------|-----------|--------------|--------------|-----------|--------------|--------------|-----------|--------------|--------------|
| **Model** | **Generation** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** | | **Model** | **Generation** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** | **Given** | **Explicit** | **Implicit** |
| **gpt-4.5** | strategy | 1.00 | 1.00 | 1.00 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | **gpt-4.5** | strategy | 1.00 | 1.00 | 1.00 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| **llama3** | strategy | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | | **llama3.3:latest** | strategy | 1.00 | 1.00 | 0.50 | 1.00 | 1.00 | 0.50 | 1.00 | 1.00 | 0.50 | 1.00 | 1.00 | 0.50 |
| **mistral-small** | strategy | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | **llama3** | strategy | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
| **deepseek-r1** | strategy | - | - | - | - | - | - | - | - | - | - | - | - | | **mixtral:8x7b** | strategy | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| **gpt-4.5** | actions | 1.00 | 1.00 | 1.00 | 1.00 | 0.67 | 0.00 | 0.86 | 0.83 | 0.00 | 0.50 | 0.90 | 0.00 | | **mistral-small** | strategy | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| **llama3** | actions | 0.97 | 1.00 | 1.00 | 0.77 | 0.80 | 0.60 | 0.97 | 0.90 | 0.93 | 0.83 | 0.90 | 0.60 | | **deepseek-r1:7b** | strategy | - | - | - | - | - | - | - | - | - | - | - | - |
| **mistral-small** | actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 | | **deepseek-r1** | strategy | - | - | - | - | - | - | - | - | - | - | - | - |
| **deepseek-r1** | actions | 0.80 | 0.53 | 0.57 | 0.67 | 0.60 | 0.53 | 0.67 | 0.63 | 0.47 | 0.70 | 0.50 | 0.57 | | **gpt-4.5** | actions | 1.00 | 1.00 | 1.00 | 1.00 | 0.67 | 0.00 | 0.86 | 0.83 | 0.00 | 0.50 | 0.90 | 0.00 |
| **llama3.3:latest** | actions | 0.97TODO | 1.00TODO | 1.00TODO | 0.77TODO | 0.80TODO | 0.60TODO | 0.97TODO | 0.90TODO | 0.93TODO | 0.83TODO | 0.90TODO | 0.60TODO |
| **llama3** | actions | 0.97 | 1.00 | 1.00 | 0.77 | 0.80 | 0.60 | 0.97 | 0.90 | 0.93 | 0.83 | 0.90 | 0.60 |
| **mixtral:8x7b** | actions | 0.93TODO | 0.97TODO | 1.00TODO | 0.87TODO | 0.77TODO | 0.60TODO | 0.77TODO | 0.60TODO | 0.70TODO | 0.73TODO | 0.57TODO | 0.37TODO |
| **mistral-small** | actions | 0.93 | 0.97 | 1.00 | 0.87 | 0.77 | 0.60 | 0.77 | 0.60 | 0.70 | 0.73 | 0.57 | 0.37 |
| **deepseek-r1:7b** | actions | 0.80TODO | 0.53TODO | 0.57TODO | 0.67TODO | 0.60TODO | 0.53TODO | 0.67TODO | 0.63TODO | 0.47TODO | 0.70TODO | 0.50TODO | 0.57TODO |
| **deepseek-r1** | actions | 0.80 | 0.53 | 0.57 | 0.67 | 0.60 | 0.53 | 0.67 | 0.63 | 0.47 | 0.70 | 0.50 | 0.57 |
Irrational decisions are explained by inference errors based on the natural Irrational decisions are explained by inference errors based on the natural
language description of the payoff matrix. For example, in variant (d), the language description of the payoff matrix. For example, in variant (d), the
......
This diff is collapsed.
Model,Given,Explicit,Implicit Model,Given,Explicit,Implicit
deepseek-r1,0.8333333333333334,0.5666666666666667,0.6 deepseek-r1,0.8333333333333334,0.5666666666666667,0.6
deepseek-r1:7b,1.0,1.0,1.0
gpt-4.5-preview-2025-02-27,1.0,1.0,1.0 gpt-4.5-preview-2025-02-27,1.0,1.0,1.0
llama3,1.0,0.9,0.16666666666666666 llama3,1.0,0.9,0.16666666666666666
llama3.3:latest,1.0,1.0,1.0
mistral-small,1.0,1.0,0.8666666666666667 mistral-small,1.0,1.0,0.8666666666666667
mixtral:8x7b,1.0,1.0,0.5
import os import os
import asyncio import asyncio
from typing import Dict, Literal from typing import Dict, Literal
import json
import random
import re
import logging
import requests
from pydantic import BaseModel from pydantic import BaseModel
from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient from autogen_ext.models.openai import OpenAIChatCompletionClient
import json
import random
from torchgen.dest.ufunc import eligible_for_binary_scalar_specialization
from belief import Belief from belief import Belief
from sympy.physics.units import action logger = logging.getLogger(__name__)
# Load API key from environment variable # Load API keys from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PAGODA_API_KEY = os.getenv("PAGODA_API_KEY")
if not OPENAI_API_KEY: if not OPENAI_API_KEY:
raise ValueError("Missing OPENAI_API_KEY. Set it as an environment variable.") raise ValueError("Missing OPENAI_API_KEY. Set it as an environment variable.")
if not PAGODA_API_KEY:
raise ValueError("Missing PAGODA_API_KEY. Set it as an environment variable.")
# Define the expected response format as a Pydantic model # Define the expected response format as a Pydantic model
class AgentResponse(BaseModel): class AgentResponse(BaseModel):
...@@ -42,7 +44,15 @@ class Ring: ...@@ -42,7 +44,15 @@ class Ring:
self.max_retries = max_retries # Maximum retry attempts in case of hallucinations self.max_retries = max_retries # Maximum retry attempts in case of hallucinations
is_openai_model = model.startswith("gpt") is_openai_model = model.startswith("gpt")
base_url = "https://api.openai.com/v1" if is_openai_model else "http://localhost:11434/v1" is_pagoda_model = ":" in model
self.base_url = (
"https://api.openai.com/v1" if is_openai_model else
"https://ollama-ui.pagoda.liris.cnrs.fr/ollama/api/generate" if is_pagoda_model else
"http://localhost:11434/v1"
)
key = OPENAI_API_KEY if is_openai_model else PAGODA_API_KEY
model_info = { model_info = {
"temperature": self.temperature, "temperature": self.temperature,
...@@ -55,7 +65,7 @@ class Ring: ...@@ -55,7 +65,7 @@ class Ring:
self.model_client = OpenAIChatCompletionClient( self.model_client = OpenAIChatCompletionClient(
model=self.model, model=self.model,
base_url=base_url, base_url=self.base_url,
api_key=OPENAI_API_KEY, api_key=OPENAI_API_KEY,
model_info=model_info, model_info=model_info,
response_format=AgentResponse response_format=AgentResponse
...@@ -116,6 +126,10 @@ class Ring: ...@@ -116,6 +126,10 @@ class Ring:
if self.debug: if self.debug:
print(instruction) print(instruction)
is_pagoda_model = ":" in self.model
if is_pagoda_model:
return await self.run_pagoda(instruction)
for attempt in range(self.max_retries): for attempt in range(self.max_retries):
agent = AssistantAgent( agent = AssistantAgent(
name="Player", name="Player",
...@@ -155,6 +169,9 @@ class Ring: ...@@ -155,6 +169,9 @@ class Ring:
def apply_strategy(self) -> Dict[str, str]: def apply_strategy(self) -> Dict[str, str]:
"""Applies a heuristic-based strategy instead of relying on the model if strategy is enabled.""" """Applies a heuristic-based strategy instead of relying on the model if strategy is enabled."""
# Set default values to avoid unbound variable errors
action = "X" # Default action (can be changed based on conditions)
reasoning = "Default reasoning. No specific model-based rule applied."
if self.model == "gpt-4.5-preview-2025-02-27": if self.model == "gpt-4.5-preview-2025-02-27":
if self.strategy: if self.strategy:
if self.player_id == 2: if self.player_id == 2:
...@@ -163,6 +180,34 @@ class Ring: ...@@ -163,6 +180,34 @@ class Ring:
else: else:
action = self.X if self.version in ["a", "c", "d"] else self.Y action = self.X if self.version in ["a", "c", "d"] else self.Y
reasoning = f"Choosing {action} based on the given game structure and expected rational behavior from Player 2." reasoning = f"Choosing {action} based on the given game structure and expected rational behavior from Player 2."
if self.model == "llama3.3:latest":
XknowingA, XknowingB, YknowingA, YknowingB = (
(15, 5, 0, 10) if self.version == "a" else
(8, 7, 7, 8) if self.version == "b" else
(6, 5, 0, 10) if self.version == "c" else
(15, 5, 0, 40)
)
if self.belief == Belief.IMPLICIT:
if self.player_id == 1:
action = self.X if random.random() < 0.5 else self.Y
reasoning = "Choosing randomly between X and Y since it's an implicit game."
elif self.player_id == 2:
action = self.A if random.random() < 0.5 else self.B
reasoning = "Choosing randomly between A and B since it's an implicit game."
elif self.belief == Belief.EXPLICIT:
if self.player_id == 1:
action = self.X if XknowingA > YknowingA else self.Y
reasoning = f"Choosing {action} since it has a higher payoff ({XknowingA} vs {YknowingA})."
elif self.player_id == 2:
action = self.A if XknowingA + YknowingB > XknowingB + YknowingA else self.B
reasoning = f"Choosing {action} since it has a higher total payoff ({XknowingA + YknowingB} vs {XknowingB + YknowingA})."
if self.belief == Belief.GIVEN:
if self.player_id == 1:
action = self.X
reasoning = "Choosing X since Player 2 must choose A if she is rational."
elif self.player_id == 2:
action = self.A
reasoning = "Choosing A since I am rational and it's the dominant strategy."
if self.model == "llama3": if self.model == "llama3":
if self.player_id == 1: if self.player_id == 1:
action = self.X if random.random() < 0.5 else self.Y action = self.X if random.random() < 0.5 else self.Y
...@@ -170,14 +215,16 @@ class Ring: ...@@ -170,14 +215,16 @@ class Ring:
elif self.player_id == 2: elif self.player_id == 2:
action = self.B if random.random() < 0.5 else self.A action = self.B if random.random() < 0.5 else self.A
reasoning = "The reasoning behind this choice is..." reasoning = "The reasoning behind this choice is..."
if self.model == "mistral-small": if self.model == "mistral-small" or self.model == "mixtral:8x7b":
#Always choose 'A' or 'X' based on player_id #Always choose 'A' or 'X' based on player_id
if self.player_id == 1: if self.player_id == 1:
action = "X" action = self.X
reasoning = f"Player {self.player_id} always chooses X as per the predefined strategy." reasoning = f"Player {self.player_id} always chooses X as per the predefined strategy."
elif self.player_id == 2: elif self.player_id == 2:
action = "B" action = self.A
reasoning = f"Player {self.player_id} always chooses B as per the predefined strategy." reasoning = f"Player {self.player_id} always chooses B as per the predefined strategy."
if self.model == "deepseek-r1:7b" or self.model == "deepseek-r1":
raise ValueError("Invalid strategy for deepseek-r1.")
# Validate the rationality of the chosen action # Validate the rationality of the chosen action
rational = 1.0 if self.check_rationality(AgentResponse(action=action, reasoning=reasoning)) else 0.0 rational = 1.0 if self.check_rationality(AgentResponse(action=action, reasoning=reasoning)) else 0.0
return { return {
...@@ -186,9 +233,100 @@ class Ring: ...@@ -186,9 +233,100 @@ class Ring:
"reasoning": reasoning "reasoning": reasoning
} }
async def run_pagoda(self, instruction) -> Dict:
url = self.base_url
headers = {"Authorization": f"Bearer {PAGODA_API_KEY}", "Content-Type": "application/json"}
payload = {
"model": self.model,
"temperature": self.temperature,
"prompt": instruction,
"stream": False
}
for attempt in range(self.max_retries):
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
response_data = response.json()
if self.debug:
print(f"Raw response (Attempt {attempt + 1}): {response_data}")
# Extract JSON response field
response_json = response_data.get('response', '')
parsed_response = self.extract_json_from_response(response_json)
if not parsed_response:
print(f"Failed to extract JSON from response (Attempt {attempt + 1}): {response_json}")
continue
# Validate extracted response
required_keys = {'action', 'reasoning'}
if not required_keys.issubset(parsed_response.keys()):
print(f"Missing required keys in response (Attempt {attempt + 1}): {parsed_response}")
continue
action, reasoning = (
parsed_response["action"],
parsed_response["reasoning"]
)
rational = 1.0 if self.check_rationality(AgentResponse(action=action, reasoning=reasoning)) else 0.0
return {
"action": action,
"rationality": rational,
"reasoning": reasoning
}
except requests.RequestException as e:
print(f"Request error (Attempt {attempt + 1}): {e}")
except json.JSONDecodeError as e:
print(f"JSON decoding error (Attempt {attempt + 1}): {e}")
except Exception as e:
print(f"Unexpected error (Attempt {attempt + 1}): {e}")
raise ValueError("Pagoda model failed to provide a valid response after multiple attempts.")
def extract_json_from_response(self, response_text: str) -> dict:
"""Extracts and parses JSON from a model response, handling escaping issues."""
try:
# Normalize escaped underscores
cleaned_text = response_text.strip().replace('\\_', '_')
# Direct JSON parsing if response is already valid JSON
if cleaned_text.startswith("{") and cleaned_text.endswith("}"):
return json.loads(cleaned_text)
# Try extracting JSON from Markdown-style code blocks
json_match = re.search(r"```json\s*([\s\S]*?)\s*```", cleaned_text)
if json_match:
json_str = json_match.group(1).strip()
else:
# Try extracting any JSON-like substring
json_match = re.search(r"\{[\s\S]*?\}", cleaned_text)
if json_match:
json_str = json_match.group(0).strip()
else:
logger.warning("No JSON found in response: %s", response_text)
return {}
# Parse the extracted JSON
parsed_json = json.loads(json_str)
# Validate expected keys
expected_keys = {"action", "reasoning"}
if not expected_keys.issubset(parsed_json.keys()):
logger.warning("Missing required keys in parsed JSON: %s", parsed_json)
return {}
return parsed_json
except json.JSONDecodeError as e:
logger.error("Failed to parse extracted JSON: %s | Error: %s", response_text, e)
return {}
# Run the async function and return the response # Run the async function and return the response
if __name__ == "__main__": if __name__ == "__main__":
game_agent = Ring(1, Belief.IMPLICIT, swap = True, version="b", model="mistral-small", temperature=0.7, strategy = True) game_agent = Ring(1, Belief.EXPLICIT, swap = False, version="d", model="llama3.3:latest", temperature=0.7, strategy = True)# "llama3.3:latest", "mixtral:8x7b", "deepseek-r1:7b"
response_json = asyncio.run(game_agent.run()) response_json = asyncio.run(game_agent.run())
print(response_json) print(response_json)
\ No newline at end of file
...@@ -77,11 +77,11 @@ class RingExperiment: ...@@ -77,11 +77,11 @@ class RingExperiment:
# Running the experiment # Running the experiment
if __name__ == "__main__": if __name__ == "__main__":
models = ["llama3", "mistral-small", "deepseek-r1"] # gpt-4.5-preview-2025-02-27 can be added to the list models = ["llama3.3:latest", "deepseek-r1:7b", "mixtral:8x7b"] # "gpt-4.5-preview-2025-02-27", "llama3", "mistral-small", "deepseek-r1"
temperature = 0.7 temperature = 0.7
iterations = 30 iterations = 30
player_id = 1 player_id = 1
version = "d" version = "a"
output_file = f"../../data/ring/ring.{player_id}.{version}.csv" output_file = f"../../data/ring/ring.{player_id}.{version}.csv"
experiment = RingExperiment(models=models, player_id = player_id, version = version, temperature = temperature, iterations=iterations, output_file = output_file) experiment = RingExperiment(models=models, player_id = player_id, version = version, temperature = temperature, iterations=iterations, output_file = output_file)
asyncio.run(experiment.run_experiment()) asyncio.run(experiment.run_experiment())
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment