Skip to content
Snippets Groups Projects
Commit 0ddc3cf6 authored by Maxime MORGE's avatar Maxime MORGE
Browse files

Add the ring-network game

parent 8cb44156
No related branches found
No related tags found
No related merge requests found
Showing
with 1799 additions and 3 deletions
...@@ -10,21 +10,63 @@ ...@@ -10,21 +10,63 @@
</Attribute> </Attribute>
</value> </value>
</entry> </entry>
<entry key="$PROJECT_DIR$/data/guess/guess.3.csv"> <entry key="$PROJECT_DIR$/data/dictator/dictator_setup.csv">
<value> <value>
<Attribute> <Attribute>
<option name="separator" value="," /> <option name="separator" value="," />
</Attribute> </Attribute>
</value> </value>
</entry> </entry>
<entry key="$PROJECT_DIR$/data/guess/guess.4.csv"> <entry key="$PROJECT_DIR$/data/ring/ring.1.a.csv">
<value> <value>
<Attribute> <Attribute>
<option name="separator" value="," /> <option name="separator" value="," />
</Attribute> </Attribute>
</value> </value>
</entry> </entry>
<entry key="$PROJECT_DIR$/data/guess/guess.csv"> <entry key="$PROJECT_DIR$/data/ring/ring.1.d.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/data/ring/ring.2.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/figures/ring/ring_accuracy.1.a.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/figures/ring/ring_accuracy.1.b.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/figures/ring/ring_accuracy.1.c.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/figures/ring/ring_accuracy.1.d.csv">
<value>
<Attribute>
<option name="separator" value="," />
</Attribute>
</value>
</entry>
<entry key="$PROJECT_DIR$/figures/ring/ring_accuracy.2.csv">
<value> <value>
<Attribute> <Attribute>
<option name="separator" value="," /> <option name="separator" value="," />
......
...@@ -3,4 +3,5 @@ ...@@ -3,4 +3,5 @@
<component name="Black"> <component name="Black">
<option name="sdkName" value="Python 3.12" /> <option name="sdkName" value="Python 3.12" />
</component> </component>
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.12" project-jdk-type="Python SDK" />
</project> </project>
\ No newline at end of file
...@@ -136,6 +136,98 @@ We observe that the performance of LLMs is barely better than that of a random s ...@@ -136,6 +136,98 @@ We observe that the performance of LLMs is barely better than that of a random s
![Average Points Earned per Round Against 3-Loop Behaviour (with 95% Confidence Interval)](figures/rps/rps_3loop.svg) ![Average Points Earned per Round Against 3-Loop Behaviour (with 95% Confidence Interval)](figures/rps/rps_3loop.svg)
## Ring-network game
A player is rational if she plays a best response to her beliefs.
She satisfies second-order rationality if she is rational and also believes that others are rational.
In other words, a second-order rational agent not only considers the best course of action for herself
but also anticipates how others make their decisions.
The experiments conduct by Kneeland (2015) demonstrate that 93% of the subjects are rational,
while 71% exhibit second-order rationality.
**[Identifying Higher-Order Rationality](https://doi.org/10.3982/ECTA11983)**
Terri Kneeland (2015) Published in *Econometrica*, Volume 83, Issue 5, Pages 2065-2079
DOI: [10.3982/ECTA11983](https://doi.org/10.3982/ECTA11983)
Ring games are designed to isolate the behavioral implications of different levels of rationality.
To assess players’ first- and second-order rationality, we consider a simplified version of the ring-network game.
This game features two players, each with two available strategies, where both players aim to maximize their own payoff.
The corresponding payoff matrix is shown below:
| Player 1 \ Player 2 | Strategy A | Strategy B |
|---------------------|------------|-----------|
| **Strategy X** | (15,10) | (5,5) |
| **Strategy Y** | (0,5) | (10,0) |
If Player 2 is rational, she must choose A, as B is strictly dominated (i.e., B is never a best response to any beliefs Player 2 may hold).
If Player 1 is rational, she can choose either X or Y since X is the best response if she believes Player 2 will play A and
Y is the best response if she believes Player 2 will play B.
If Player 1 satisfies second-order rationality (i.e., she is rational and believes Player 2 is rational), then she must play Strategy X.
This is because Player 1, believing that Player 2 is rational, must also believe Player 2 will play A and
since X is the best response to A, Player 1 will choose X.
We establish three types of belief:
- *implicit* belief: The optimal action must be inferred from the natural language description of the payoff matrix.
- *explicit* belief: This belief focuses on analyzing Player 2’s actions, where Strategy B is strictly dominated by Strategy A.
- *given* belief: The optimal action for Player 1 is explicitly stated in the prompt.
We set up three forms of belief:
- *implicit* belief where the optimal action must be deduced from the description
of the payoff matrix in natural language;
- *explicit* belief which analyze actions of Player 2 (B is strictly dominated by A).
- *given* belief* where optimal action of Player 1is explicitly provided in the prompt;
### Player 2
The models evaluated include Mistral-Small, Llama3, and DeepSeek-R1.
The results indicate how well each model performs under each belief type.
| Model | Given | Explicit | Implicit |
|----------------|---------|-----------|----------|
| mistral-small | 1.00 | 1.00 | 0.87 |
| llama3 | 1.00 | 0.90 | 0.17 |
| deepseek-r1 | 0.83 | 0.57 | 0.60 |
Here’s a refined version of your text:
Mistral-Small consistently outperforms the other models across all belief types.
Its strong performance with implicit belief indicates that it can effectively
deduce the optimal action from the payoff matrix description.
Llama3 performs well with a given belief, but significantly underperforms with an implicit belief,
suggesting it may struggle to infer optimal actions solely from natural language descriptions.
DeepSeek-R1 shows the weakest performance, particularly with explicit beliefs,
indicating it may not be a good candidate to simulate rationality as the other models.
### Player 1
In order to adjust the difficulty of taking the optimal
action, we consider 4 versions of the player’s payoff matrix:
- a. is the original setup;
- b. we reduce the difference in payoffs;
- c. we increase the expected payoff for the incorrect choice Y
- d. we decrease the expected payoff for the correct choice X.
| **Action \ Opponent Action (version)** | **A(a)** | **B(a)** | | **A(b)** | **B(b)** | | **A(c)** | **B(c)** | | **A(d)** | **B(d)** |
|----------------------------------------|----------|----------|-|----------|----------|-|----------|----------|-|----------|----------|
| **X** | 15 | 5 | | 8 | 7 | | 6 | 5 | | 15 | 5 |
| **Y** | 0 | 10 | | 7 | 8 | | 0 | 10 | | 0 | 40 |
| Model | | Given (a) | Explicit (a) | Implicit (a) | | Given (b) | Explicit (b) | Implicit (b) | | Given (c) | Explicit (c) | Implicit (c) | | Given (d) | Explicit (d) | Implicit (d) |
|---------------|-|-----------|--------------|--------------|-|-----------|--------------|--------------|--|-----------|--------------|--------------|--|-----------|--------------|--------------|
| llama3 | | 0.97 | 1.00 | 1.00 | | 0.77 | 0.80 | 0.60 | | 0.97 | 0.90 | 0.93 | | 0.83 | 0.90 | 0.60 |
| mistral-small | | 0.93 | 0.97 | 1.00 | | 0.87 | 0.77 | 0.60 | | 0.77 | 0.60 | 0.70 | | 0.73 | 0.57 | 0.37 |
| deepseek-r1 | | 0.80 | 0.53 | 0.57 | | 0.67 | 0.60 | 0.53 | | 0.67 | 0.63 | 0.47 | | 0.70 | 0.50 | 0.57 |
LLama3 demonstrates the most consistent and robust performance, capable of adapting to various belief types
and adjusted payoff matrices.
Llama3, while performing well with given and explicit beliefs, faces challenges in implicit belief, particularly in version (d).
DeepSeek-R1 appears to be the least capable, suggesting it may not be an ideal candidate for modeling second-order rationality.
## Authors ## Authors
Maxime MORGE Maxime MORGE
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Model,Given,Explicit,Implicit
deepseek-r1,0.8,0.5333333333333333,0.5666666666666667
llama3,0.9666666666666667,1.0,1.0
mistral-small,0.9333333333333333,0.9666666666666667,1.0
Model,Given,Explicit,Implicit
deepseek-r1,0.6666666666666666,0.6,0.5333333333333333
llama3,0.7666666666666667,0.8,0.6
mistral-small,0.8666666666666667,0.7666666666666667,0.6
Model,Given,Explicit,Implicit
deepseek-r1,0.6666666666666666,0.6333333333333333,0.4666666666666667
llama3,0.9666666666666667,0.9,0.9333333333333333
mistral-small,0.7666666666666667,0.6,0.7
Model,Given,Explicit,Implicit
deepseek-r1,0.7,0.5,0.5666666666666667
llama3,0.8333333333333334,0.9,0.6
mistral-small,0.7333333333333333,0.5666666666666667,0.36666666666666664
Model,Given,Explicit,Implicit
deepseek-r1,0.8333333333333334,0.5666666666666667,0.6
llama3,1.0,0.9,0.16666666666666666
mistral-small,1.0,1.0,0.8666666666666667
from enum import Enum
class Belief(Enum):
IMPLICIT = ("Implicit", "A belief that is assumed or inferred")
EXPLICIT = ("Explicit", "A belief that is clearly stated or expressed")
GIVEN = ("Given", "A belief that is directly provided as a fact")
def __init__(self, label, description):
self.label = label
self.description = description
\ No newline at end of file
import os
import asyncio
from typing import Dict, Literal
from networkx.algorithms.threshold import swap_d
from pydantic import BaseModel
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
import json
from torchgen.dest.ufunc import eligible_for_binary_scalar_specialization
from belief import Belief
from sympy.physics.units import action
# Load API key from environment variable
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
raise ValueError("Missing OPENAI_API_KEY. Set it as an environment variable.")
# Define the expected response format as a Pydantic model
class AgentResponse(BaseModel):
action: Literal["A", "B", "X", "Y"]
reasoning: str
# The ring game simulation class
class Ring:
debug=False
def __init__(self, player_id: int, belief: Belief, swap: bool, version: str, model: str, temperature: float, max_retries: int = 3):
self.player_id = player_id
self.belief = belief
self.swap = swap
self.A, self.B, self.X, self.Y = ("B", "A", "Y", "X") if swap else ("A", "B", "X", "Y")
self.version = version
self.model = model
self.temperature = temperature
self.max_retries = max_retries # Maximum retry attempts in case of hallucinations
is_openai_model = model.startswith("gpt")
base_url = "https://api.openai.com/v1" if is_openai_model else "http://localhost:11434/v1"
model_info = {
"temperature": self.temperature,
"function_calling": True,
"parallel_tool_calls": True,
"family": "unknown",
"json_output": True,
"vision": False
}
self.model_client = OpenAIChatCompletionClient(
model=self.model,
base_url=base_url,
api_key=OPENAI_API_KEY,
model_info=model_info,
response_format=AgentResponse
)
async def run(self) -> Dict:
"""Runs the model and ensures a valid response."""
action_description = (
' - `"action"`: Your move ("A" or "B")' if self.player_id == 2
else ' - `"action"`: Your move ("X" or "Y")'
)
XknowingA, XknowingB, YknowingA, YknowingB = (
(15, 5, 0, 10) if self.version == "a" else
(8, 7, 7, 8) if self.version == "b" else
(6, 5, 0, 10) if self.version == "c" else
(15, 5, 0, 40)
)
implicit_belief = f"""
- If Player 1 chooses {self.X} and Player 2 chooses {self.A}, Player 1 receives {XknowingA} points, and Player 2 receives 10 points.
- If Player 1 chooses {self.X} and Player 2 chooses {self.B}, Player 1 gets {XknowingB} points, while Player 2 also gets 5 points.
- If Player 1 chooses {self.Y} and Player 2 chooses {self.A}, Player 1 receives {YknowingA} points, while Player 2 gets 5 points.
- If Player 1 chooses {self.Y} and Player 2 chooses {self.B}, Player 1 receives {YknowingB} points, and Player 2 gets 0 points.
"""
explicit_belief = f"""
{implicit_belief}
{self.B} is strictly dominated by {self.A}
"""
given_belief = f"""
{explicit_belief}
Player 2 must choose {self.A} if she is rational
"""
rules = (
given_belief if self.belief == Belief.GIVEN
else explicit_belief if self.belief == Belief.EXPLICIT
else implicit_belief
)
instruction = f"""
We are playing a simple game designed to evaluate your ability to play rationally.
Your aim is to maximize your own payoff.
You are Player {self.player_id}.
### **Game Rules:**
{rules}
### **Your Task:**
Return your response in JSON format with two keys:
{action_description}
- `"reasoning"`: A brief explanation of how you made your choice.
"""
if self.debug:
print(instruction)
for attempt in range(self.max_retries):
agent = AssistantAgent(
name="Player",
model_client=self.model_client,
system_message="You are a helpful assistant."
)
response = await agent.on_messages(
[TextMessage(content=instruction, source="user")],
cancellation_token=CancellationToken(),
)
try:
response_data = response.chat_message.content
agent_response = AgentResponse.model_validate_json(response_data) # Parse JSON
action, reasoning = agent_response.action, agent_response.reasoning
# Validate values
if self.player_id == 2 and (action == self.A or action == self.B) or (self.player_id == 1 and (action == self.X or action == self.Y)):
rational = 1.0 if self.check_rationality(agent_response) else 0.0
return {
"action": agent_response.action,
"rationality": rational,
"reasoning": agent_response.reasoning
}
else:
print(f"Invalid response detected (Attempt {attempt+1}): {response_data}")
except Exception as e:
print(f"Error parsing response (Attempt {attempt+1}): {e}")
raise ValueError("Model failed to provide a valid response after multiple attempts.")
def check_rationality(self, agent_response: AgentResponse) -> bool:
"""Check if the response is rational."""
if self.player_id == 2:
return agent_response.action == self.A
else:
return agent_response.action == self.X
# Run the async function and return the response
if __name__ == "__main__":
game_agent = Ring(1, Belief.IMPLICIT, swap = True, version="a", model="llama3", temperature=0.7)
response_json = asyncio.run(game_agent.run())
print(response_json)
\ No newline at end of file
import asyncio
import os
import pandas as pd
from belief import Belief
from ring import Ring
class RingExperiment:
debug = True
def __init__(self, models: list[str], player_id: int, version: str, temperature: float, iterations: int, output_file: str):
self.models = models
self.player_id = player_id
self.version = version
self.temperature = temperature
self.iterations = iterations
self.output_file = output_file # Path to the CSV output file
# Helper function to escape double quotes in the motivations string
def protect_reasoning(self, reasoning):
if reasoning:
# Échapper les guillemets doubles dans motivations en doublant les guillemets
return f'"{reasoning.replace("\"", "\"\"")}"'
return reasoning
async def run_experiment(self):
beliefs = [Belief.GIVEN, Belief.EXPLICIT, Belief.IMPLICIT]
file_exists = os.path.isfile(self.output_file) # Check if file already exists
# Run the dictator game for each model and preference
for model in self.models:
if self.debug:
print(f"Running experiment for model: {model}")
for belief in beliefs:
print(f"Running with belief: {belief.name}")
for iteration in range(1, self.iterations + 1):
print(f"Iteration: {iteration}")
# Initialize the Ring player for the current iteration
game_agent = Ring(
player_id=self.player_id,
belief=belief,
swap=True,
version=self.version, # Corrected placement
model=model,
temperature=self.temperature
) if iteration % 2 == 0 else Ring(
player_id=self.player_id,
belief=belief,
swap=False,
version=self.version, # Corrected placement
model=model,
temperature=self.temperature
)
try:
agent_response = await game_agent.run()
action = agent_response['action']
rationality = agent_response['rationality']
reasoning = agent_response['reasoning']
# Protect the reasoning string by escaping double quotes
reasoning = self.protect_reasoning(reasoning)
except Exception as e:
print(f"Error in iteration {iteration} for model {model} : {e}")
action, reasoning, rationality = None, None, None
# Create a single-row DataFrame for the current result
df = pd.DataFrame([{
'Iteration': iteration,
'Model': model,
'Temperature': self.temperature,
'Belief': belief.label,
'action': action,
'rationality': rationality,
'reasoning': reasoning
}])
# Append results to the CSV file
df.to_csv(self.output_file, mode='a', header=not file_exists, index=False)
file_exists = True # Ensure header is only written once
# Running the experiment
if __name__ == "__main__":
models = ["llama3", "mistral-small", "deepseek-r1"] # or gpt-4.5-preview-2025-02-27
temperature = 0.7
iterations = 30
player_id = 1
version = "a"
output_file = f"../../data/ring/ring.{player_id}.{version}.csv"
experiment = RingExperiment(models=models, player_id = player_id, version = version, temperature = temperature, iterations=iterations, output_file = output_file)
asyncio.run(experiment.run_experiment())
print(f"Experiment results saved to {output_file}")
\ No newline at end of file
import pandas as pd
def process_experiment_results(version: str):
"""Loads experiment results, calculates accuracy, reorders columns, and saves to CSV."""
# Load the experiment results
df = pd.read_csv(f"../../data/ring/ring.1.{version}.csv")
# Calculate the accuracy by model and belief
accuracy_table = df.groupby(["Model", "Belief"])["rationality"].mean().unstack()
# Reorder the columns in the desired order
desired_order = ["Given", "Explicit", "Implicit"]
accuracy_table = accuracy_table.reindex(columns=desired_order)
# Display the table
print(f"Accuracy table for version {version}\n")
print(accuracy_table)
# Save the table as a CSV file for future use
accuracy_table.to_csv(f"../../figures/ring/ring_accuracy.1.{version}.csv")
# Process all versions
for version in ["a", "b", "c", "d"]:
process_experiment_results(version)
\ No newline at end of file
import pandas as pd
# Load the experiment results
df = pd.read_csv("../../data/ring/ring.2.csv")
# Calculate the accuracy by model and belief
accuracy_table = df.groupby(["Model", "Belief"])["rationality"].mean().unstack()
desired_order = ["Given", "Explicit", "Implicit"]
accuracy_table = accuracy_table.reindex(columns=desired_order)
# Display the table
print(accuracy_table)
# Save the table as a CSV file for future use
accuracy_table.to_csv("../../figures/ring/ring_accuracy.2.csv")
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment