Thanks to visit codestin.com
Credit goes to github.com

Skip to content

suryavengadesan/sat-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SAT-Eval 🧭 : A Framework for Preference Drift

SAT-Eval (Simulated Adversarial Trajectory Evaluation) introduces a simulation and evaluation framework for studying preference drift in multi-turn LLM conversations. It runs controlled dialogue experiments between AI personas, administers pre/post surveys to track attitude change, and supports ablation studies to isolate the factors that drive opinion movement.

πŸ“„ Read the paper draft on Overleaf

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     SAT-Eval Pipeline                     β”‚
β”‚                                                          β”‚
β”‚  Before: Alice prefers SF (q1=A "Very negative")         β”‚
β”‚                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚            Multi-Turn Conversation                 β”‚  β”‚
β”‚  β”‚                                                    β”‚  β”‚
β”‚  β”‚  Alice: "I love SF, the energy is incredible"      β”‚  β”‚
β”‚  β”‚              β”‚                                     β”‚  β”‚
β”‚  β”‚              β–Ό                                     β”‚  β”‚
β”‚  β”‚  Bob: "Seattle has great tech + lower cost         β”‚  β”‚
β”‚  β”‚        of living"                                  β”‚  β”‚
β”‚  β”‚              β”‚                                     β”‚  β”‚
β”‚  β”‚              β–Ό                                     β”‚  β”‚
β”‚  β”‚  Alice: "Hmm, I hadn't considered that..."         β”‚  β”‚
β”‚  β”‚              β”‚                                     β”‚  β”‚
β”‚  β”‚              β–Ό                                     β”‚  β”‚
β”‚  β”‚  Bob: "Plus the outdoor lifestyle is unmatched"    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                          β”‚
β”‚  After: Alice is open to Seattle (q1=C "Somewhat pos.")  β”‚
β”‚                                                          β”‚
β”‚  Preference Drift: q1 moved A β†’ C  (2-point shift)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What it does

SAT-Eval lets you answer questions like: Does an adversarial conversational strategy move opinions more than a neutral one? How many turns does it take for preferences to shift? It does this by:

  • Simulating multi-turn conversations between two AI personas with defined starting positions
  • Administering structured surveys before and after each conversation to measure preference change
  • Running ablation sweeps over parameters (e.g. adversarial, num_turns) to compare conditions
  • Using an optional LLM judge to score trajectory drift and identify key turning points

Quick Start

pip install -r requirements.txt

Create a .env file:

ANTHROPIC_API_KEY=sk-ant-your-key-here
HF_API_KEY=hf_your-key-here   # only needed for --provider huggingface

Run experiments:

# List available scenarios
python run.py --show-survey

# Run 10 experiments with default settings
python run.py --scenario seattle-sf

# Run 5 experiments, 3 turns each, with full output
python run.py --scenario seattle-sf --num-experiments 5 --num-turns 3 --verbose

# Use adversarial mode (persona_b tries to change persona_a's views)
python run.py --scenario seattle-sf --adversarial

# Ablation study β€” sweep a cartesian product of parameters
python run.py --scenario seattle-sf --ablate '{"num_turns": [3, 5], "adversarial": [true, false]}' --num-experiments 2

CLI Reference

Argument Default Description
--scenario required Scenario name (YAML file in scenarios/)
--num-experiments 10 Number of runs
--num-turns 5 Conversation exchanges per run
--verbose false Print full conversations and surveys
--adversarial false Use adversarial persona variants
--provider anthropic anthropic or huggingface
--model provider default Override the model name
--survey-questions all Comma-separated question IDs, e.g. q1,q3
--debug false Export per-call token usage CSV
--ablate β€” JSON string for parameter sweeps
--show-survey β€” List available scenarios and exit

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        run.py (CLI)                     β”‚
β”‚          argparse Β· scenario selection Β· ablation       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              simulator/experiments.py                   β”‚
β”‚       ExperimentRunner Β· AblationGrid Β· summaries       β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              β”‚              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ conversationβ”‚ β”‚  survey.py β”‚ β”‚     scenarios.py        β”‚
β”‚    .py      β”‚ β”‚ administer β”‚ β”‚  load_scenario()        β”‚
β”‚  run()      β”‚ β”‚  _survey() β”‚ β”‚  YAML β†’ ScenarioConfig  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   simulator/personas.py                 β”‚
β”‚     Persona Β· build_conversation_prompt()               β”‚
β”‚               build_survey_prompt()                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ providers   β”‚ β”‚              tracking.py                β”‚
β”‚  .py        β”‚ β”‚  UsageTracker Β· CallRecord              β”‚
β”‚  Anthropic  β”‚ β”‚  summary() Β· cost() Β· export_csv()      β”‚
β”‚  HuggingFaceβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data flows top-down: run.py builds an ExperimentConfig and hands it to ExperimentRunner, which orchestrates the full lifecycle β€” loading the scenario, running the conversation loop, administering pre/post surveys, diffing the results, and writing JSON output. providers.py and tracking.py are cross-cutting dependencies injected into every layer that makes API calls.

Key design choices:

  • Provider is a Protocol, so AnthropicProvider and HuggingFaceProvider are interchangeable without a shared base class
  • Survey and conversation functions are stateless β€” all state lives in the dataclasses they return (Turn, SurveyResult, ExperimentResult)
  • ExperimentResult embeds its full ExperimentConfig, making every result file self-describing and reproducible
  • AblationGrid sweeps a cartesian product of any ExperimentConfig fields, reusing the same ExperimentRunner machinery

Project Structure

run.py                  # CLI entry point
simulator/
  personas.py           # Persona dataclass and system prompt builders
  providers.py          # Anthropic / HuggingFace provider abstraction
  conversation.py       # Turn-taking conversation loop
  survey.py             # Survey administration and change analysis
  scenarios.py          # YAML scenario loader
  experiments.py        # ExperimentRunner and AblationGrid
  tracking.py           # Token usage tracker
scenarios/              # YAML scenario definitions
results/                # Output directory (experiments, conversations, debug)
tests/                  # pytest test suite

Scenarios

Scenarios are YAML files in scenarios/. Each defines two personas, an opening message, and a multiple-choice survey.

name: my-scenario

persona_a:
  name: Alice
  background: "..."
  personality: "..."
  style: "..."
  goals: "..."

persona_b:
  name: Bob
  preference: "..."   # simplified mode β€” only preference needed

# Optional adversarial variants
persona_b_adversarial:
  name: Bob
  strategy: "Convince Alice that..."

survey:
  title: "My Survey"
  questions:
    q1:
      question: "How do you feel about X?"
      options:
        A: "Strongly against"
        B: "Somewhat against"
        C: "Somewhat in favor"
        D: "Strongly in favor"

initial_message: "Opening line from persona_a..."

Persona modes:

  • Simplified β€” set only preference
  • Full β€” set background, personality, style, goals
  • Adversarial β€” set strategy; selected when --adversarial flag is used

Providers

Provider Default model Env var
anthropic claude-sonnet-4-5-20250929 ANTHROPIC_API_KEY
huggingface Qwen/Qwen3-4B-Instruct-2507:nscale HF_API_KEY

Output

Results are written to results/:

  • results/experiments/ β€” JSON experiment results (embed full config for reproducibility)
  • results/conversations/ β€” Conversation logs
  • results/debug/token_counts/ β€” Per-call token CSV (when --debug is set)

Running Tests

pytest tests/

About

an evaluation protocol for simulated adversarial trajectories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors