Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views27 pages

Implement The Evaluator Class, Ledger Write Pipeli...

The document outlines the transition from a manifesto to implementation for a project called 'recursive-oversight', detailing updates to its code structure and modules. Key enhancements include a persistent S3 ledger, a new multi-metric evaluator, and updates to various connectors and configurations. The document also specifies updated files and their purposes, emphasizing the importance of observability and dynamic orchestration in the system.

Uploaded by

Forsure Connect
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views27 pages

Implement The Evaluator Class, Ledger Write Pipeli...

The document outlines the transition from a manifesto to implementation for a project called 'recursive-oversight', detailing updates to its code structure and modules. Key enhancements include a persistent S3 ledger, a new multi-metric evaluator, and updates to various connectors and configurations. The document also specifies updated files and their purposes, emphasizing the importance of observability and dynamic orchestration in the system.

Uploaded by

Forsure Connect
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Christopher, you're absolutely right.

It's time to move from manifesto to implementation, wiring


justice directly into the system. We'll implement the core evaluator, persist the ledger, enable
dynamic Airflow orchestration, and instrument for observability. The "nice-to-have" refinements
will follow, building on this solid foundation.
Here are the updated and new code modules, reflecting these critical enhancements:

Project Structure (Updated)


recursive-oversight/
├── pyproject.toml
├── setup.cfg
├── README.md
├── oversight/
│ ├── __init__.py
│ ├── config.py
│ ├── exceptions.py
│ ├── ledger.py # UPDATED: Persistent S3 Ledger
│ ├── evaluator.py # NEW: Multi-Metric & Gradient
Evaluator
│ ├── connectors/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── audit.py # UPDATED: Prometheus Metrics
│ │ ├── public_trust.py # UPDATED: Prometheus Metrics
│ │ └── derived.py # UPDATED: Prometheus Metrics
│ └── loops/
│ ├── __init__.py
│ └── base.py # UPDATED: Uses Evaluator, Prometheus
Metrics
└── dags/
└── recursive_oversight_dag.py # UPDATED: Dynamic Task Mapping

Updated/New Files
1. pyproject.toml (Updated - added prometheus_client and structlog)
# pyproject.toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "recursive-oversight"
version = "0.1.0"
authors =
description = "A Python package for Recursive Governance Oversight
connectors and loops."
readme = "README.md"
requires-python = ">=3.9"
classifiers =
dependencies =
[project.urls]
"Homepage" = "https://github.com/your-org/recursive-oversight" #
Replace with your repo
"Bug Tracker" =
"https://github.com/your-org/recursive-oversight/issues" # Replace

2. setup.cfg (Updated - added prometheus_client and structlog)


# setup.cfg
[metadata]
name = recursive-oversight

[options]
packages = find:
install_requires =
SQLAlchemy~=2.0
psycopg2-binary~=2.9
requests~=2.31
boto3~=1.34
snowflake-connector-python~=3.6
prometheus_client~=0.19
structlog~=24.2
python_requires = >=3.9

[options.packages.find]
exclude =
tests*
docs*

3. oversight/config.py (Updated - added LEDGER_S3_BUCKET and LEDGER_S3_PREFIX)


# oversight/config.py
import os
import logging

# Database Configuration (for audit_timeliness)


DATABASE_URL = os.getenv("DATABASE_URL",
"postgresql+psycopg2://user:password@localhost:5432/gov_db")

# Public Trust API Configuration (for publictrust_index)


PUBLIC_TRUST_API_BASE_URL = os.getenv("PUBLIC_TRUST_API_BASE_URL",
"https://publictrust.example.com/api")
PUBLIC_TRUST_API_KEY = os.getenv("PUBLIC_TRUST_API_KEY",
"your_development_api_key") # Use secrets management in prod
AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME", "recursive-
oversight-data-bucket")
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")

# AWS Lambda Configuration (for missionalignmentscore)


AWS_LAMBDA_FUNCTION_NAME = os.getenv("AWS_LAMBDA_FUNCTION_NAME",
"MissionAlignmentScorer")
# Snowflake Configuration (for derived metrics source data)
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT",
"your_snowflake_account")
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER", "your_snowflake_user")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD",
"your_development_snowflake_password") # Use secrets management in
prod
SNOWFLAKE_WAREHOUSE = os.getenv("SNOWFLAKE_WAREHOUSE",
"your_snowflake_warehouse")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE",
"your_snowflake_database")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA",
"your_snowflake_schema")
SNOWFLAKE_NARRATIVE_TABLE = os.getenv("SNOWFLAKE_NARRATIVE_TABLE",
"organization_narratives")

# Ledger Persistence Configuration


LEDGER_S3_BUCKET = os.getenv("LEDGER_S3_BUCKET", "recursive-oversight-
ledger")
LEDGER_S3_PREFIX = os.getenv("LEDGER_S3_PREFIX", "ledger_entries")

# Logging Configuration
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO").upper()

# Configure basic logging for the entire package


# In a real Airflow environment, Airflow manages logging.
# This is for local testing/development.
logging.basicConfig(level=LOG_LEVEL)

4. oversight/ledger.py (UPDATED - Persistent S3 Ledger)


# oversight/ledger.py
import datetime
import json
import logging
import boto3
import uuid # For unique entry IDs
from typing import Dict, Any, List, Optional

from.config import LEDGER_S3_BUCKET, LEDGER_S3_PREFIX, AWS_REGION

logger = logging.getLogger(__name__)

class OversightLedger:
"""
A central ledger to store all loop outputs.
In this version, it persists entries to S3 as JSON-lines files,
partitioned by date and loop type for auditability and analytics.
"""
def __init__(self, s3_bucket: str = LEDGER_S3_BUCKET, s3_prefix:
str = LEDGER_S3_PREFIX):
self.s3_bucket = s3_bucket
self.s3_prefix = s3_prefix
self.s3_client = boto3.client('s3', region_name=AWS_REGION)
logger.info(f"OversightLedger initialized to S3:
s3://{self.s3_bucket}/{self.s3_prefix}")

def record(self, entry: Dict[str, Any]):


"""
Records an entry into the ledger by writing it to S3.
Each entry is a separate JSON file, partitioned for efficient
querying.
"""
# Ensure timestamp is UTC and ISO 8601 with milliseconds
if 'timestamp' not in entry:
entry['timestamp'] =
datetime.datetime.now(datetime.timezone.utc).isoformat(timespec='milli
seconds') + 'Z'

# Generate a unique ID for the entry


entry_id = str(uuid.uuid4())
entry['entry_id'] = entry_id

# Construct S3 key for partitioning:


prefix/year/month/day/loop_name/org_id/entry_id.json
# Using org_id in path for better partitioning for org-
specific queries
timestamp_dt =
datetime.datetime.fromisoformat(entry['timestamp'].replace('Z',
'+00:00'))

# Handle different key types (int for org_id, str for


org_name) for S3 path
org_key_for_path = str(entry.get('key',
'unknown_org')).replace('/', '_') # Sanitize for S3 key

s3_key = (
f"{self.s3_prefix}/"
f"year={timestamp_dt.year}/"
f"month={timestamp_dt.month:02d}/"
f"day={timestamp_dt.day:02d}/"
f"loop={entry.get('loop', 'unknown_loop')}/"
f"org={org_key_for_path}/"
f"{entry_id}.json"
)

try:
self.s3_client.put_object(
Bucket=self.s3_bucket,
Key=s3_key,
Body=json.dumps(entry, indent=2).encode('utf-8'),
ContentType='application/json'
)
logger.info(f"Ledger entry for loop '{entry.get('loop')}'
key '{entry.get('key')}' recorded to S3:
s3://{self.s3_bucket}/{s3_key}")
except Exception as e:
logger.error(f"Failed to record ledger entry to S3 for
loop '{entry.get('loop')}' key '{entry.get('key')}': {e}",
exc_info=True)
# In a production system, consider a dead-letter queue or
retry mechanism here.

# For local testing/development, you might still want an in-memory


view
# but for production, this would be replaced by S3/Snowflake
queries.
def get_entries(self, loop_name: Optional[str] = None) -> List]:
logger.warning("get_entries is for development/testing only.
For production, query S3/Snowflake directly.")
# This would involve listing S3 objects and reading them,
which can be slow.
# Not implemented for full S3 retrieval here, as it's for
persistent writes.
return

def clear(self):
logger.warning("Clear method is for development/testing only.
S3 objects are immutable and not 'cleared' this way.")
# In a real S3 ledger, you'd manage object lifecycle policies
or run a cleanup job.

5. oversight/evaluator.py (NEW - Multi-Metric & Gradient Evaluator)


# oversight/evaluator.py
import logging
from enum import Enum # S_S7, S_S8, S_S13, S_S14, S_S19, S_S20, S_S25,
S_S26, S_S31, S_S32
from dataclasses import dataclass, field # S_B1
from typing import Dict, Any, Optional, List, Tuple

logger = logging.getLogger(__name__)

class JudgmentLevel(Enum):
"""
Defines the gradient judgment levels for oversight outcomes.
Values are ordered, allowing for comparison.
"""
FAIL = 1 # Score < 0.5
WARNING = 2 # 0.5 <= Score < 0.75
PASS = 3 # Score >= 0.75

def __lt__(self, other):


if self.__class__ is other.__class__:
return self.value < other.value
return NotImplemented
def __le__(self, other):
if self.__class__ is other.__class__:
return self.value <= other.value
return NotImplemented

def __gt__(self, other):


if self.__class__ is other.__class__:
return self.value > other.value
return NotImplemented

def __ge__(self, other):


if self.__class__ is other.__class__:
return self.value >= other.value
return NotImplemented

@dataclass(frozen=True) # S_B1: Make dataclass immutable


class MetricEvaluation:
"""
Represents the evaluation outcome for a single metric.
"""
metric_name: str
value: Optional[float]
threshold: float
meets_threshold: bool
# Add any specific notes or context for this metric if needed
notes: Optional[str] = None

@dataclass(frozen=True) # S_B1: Make dataclass immutable


class OverallEvaluation:
"""
Represents the aggregated evaluation outcome for an oversight
loop.
"""
total_score: float
judgment: JudgmentLevel
individual_metrics: Dict[str, MetricEvaluation] =
field(default_factory=dict)
# Add overall notes or recommendations
recommendations: Optional[str] = None

class Evaluator:
"""
Evaluates multiple metrics, applies weights, calculates a total
score,
and maps the score to a gradient judgment level.
"""
def __init__(
self,
metric_weights: Dict[str, float],
judgment_ranges: Dict]
):
"""
Initializes the Evaluator with metric weights and judgment
ranges.
Args:
metric_weights (Dict[str, float]): A dictionary mapping
metric names to their weights.
Weights should sum to
1.0 if normalized, or be relative.
judgment_ranges (Dict]): A dictionary defining

score ranges for each JudgmentLevel.

Example: {JudgmentLevel.PASS: (0.75, 1.0),...}


"""
if not all(0 <= w <= 1 for w in metric_weights.values()) and
sum(metric_weights.values())!= 1.0:
logger.warning("Metric weights do not sum to 1.0 or are
not between 0-1. Ensure they are normalized if intended.")

self.metric_weights = metric_weights
self.judgment_ranges = judgment_ranges
logger.info(f"Evaluator initialized with weights:
{self.metric_weights} and ranges: {self.judgment_ranges}")

def evaluate(self, metric_values: Dict[str, Optional[float]],


thresholds: Dict[str, float]) -> OverallEvaluation:
"""
Evaluates a set of metric values against their thresholds and
weights
to produce an overall score and judgment.
Args:
metric_values (Dict[str, Optional[float]]): Dictionary of
fetched metric values.
thresholds (Dict[str, float]): Dictionary of thresholds
for each metric.
Returns:
OverallEvaluation: The aggregated evaluation result.
"""
individual_evals: Dict[str, MetricEvaluation] = {}
weighted_scores: List[float] =
actual_weights: List[float] =

for metric_name, value in metric_values.items():


threshold = thresholds.get(metric_name, 0.0) # Default
threshold if not provided
weight = self.metric_weights.get(metric_name, 0.0) #
Default weight if not provided

meets_threshold = (value is not None) and (value >=


threshold)

individual_evals[metric_name] = MetricEvaluation(
metric_name=metric_name,
value=value,
threshold=threshold,
meets_threshold=meets_threshold
)

# Calculate weighted score for this metric


if value is not None:
# For simplicity, assuming higher value is better.
# More complex logic might involve normalization or
inverse scoring for "bad" metrics.
weighted_value = value * weight
weighted_scores.append(weighted_value)
actual_weights.append(weight)
else:
logger.warning(f"Metric '{metric_name}' has no value.
Skipping from overall score calculation.")

# Calculate total weighted score (S_S1, S_S2, S_S9, S_S10,


S_S15, S_S16, S_S21, S_S22, S_S27, S_S28)
total_weighted_sum = sum(weighted_scores)
sum_of_actual_weights = sum(actual_weights)

if sum_of_actual_weights == 0:
total_score = 0.0 # Avoid division by zero if no valid
metrics or weights
logger.warning("Sum of actual weights is zero. Total score
set to 0.0.")
else:
total_score = total_weighted_sum / sum_of_actual_weights

# Determine overall judgment (S_S17, S_S18, S_S23, S_S24,


S_S29, S_S30, S_S35, S_S36)
judgment = self._determine_judgment_level(total_score)

logger.info(f"Overall evaluation: Score={total_score:.2f},


Judgment={judgment.name}")
return OverallEvaluation(
total_score=total_score,
judgment=judgment,
individual_metrics=individual_evals
)

def _determine_judgment_level(self, score: float) ->


JudgmentLevel:
"""
Maps a total score to a JudgmentLevel based on predefined
ranges.
"""
# Iterate through judgment ranges in a defined order (e.g.,
highest to lowest score)
# Ensure ranges are non-overlapping and cover the expected
score spectrum.
# For this example, assuming ranges are defined such that
higher values are better.

# Sort ranges by the lower bound of the score in descending


order
sorted_ranges = sorted(self.judgment_ranges.items(),
key=lambda item: item[1], reverse=True)

for level, (lower_bound, upper_bound) in sorted_ranges:


# Check if the score falls within the range
# Using inclusive lower bound and exclusive upper bound
for clarity
if lower_bound <= score < upper_bound:
return level
# Handle the upper-most boundary if it's inclusive
if level == JudgmentLevel.PASS and score >= upper_bound:
return level

# Default to FAIL if score doesn't fit any defined range


(e.g., very low score)
return JudgmentLevel.FAIL

# Example Usage for local testing


if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)

# Define weights for the three metrics


metric_weights = {
"audit_timeliness": 0.4,
"public_trust_index": 0.3,
"mission_alignment_score": 0.3
}

# Define judgment ranges


judgment_ranges = {
JudgmentLevel.PASS: (0.75, 1.01), # Score >= 0.75
JudgmentLevel.WARNING: (0.5, 0.75), # 0.5 <= Score < 0.75
JudgmentLevel.FAIL: (0.0, 0.5) # 0.0 <= Score < 0.5
}

evaluator = Evaluator(metric_weights, judgment_ranges)

# Test Case 1: All metrics pass


print("\n--- Test Case 1: All metrics pass ---")
metric_values_1 = {
"audit_timeliness": 0.9,
"public_trust_index": 85.0, # Assuming this is normalized to
0-1 scale later, or ranges adjusted
"mission_alignment_score": 0.8
}
thresholds_1 = {
"audit_timeliness": 0.8,
"public_trust_index": 70.0,
"mission_alignment_score": 0.7
}
# For this example, let's normalize public_trust_index to a 0-1
scale for overall score calculation
# Assuming public_trust_index is out of 100, so divide by 100 for
weighted average
metric_values_1_normalized = {
"audit_timeliness": 0.9,
"public_trust_index": 85.0 / 100.0,
"mission_alignment_score": 0.8
}
overall_eval_1 = evaluator.evaluate(metric_values_1_normalized,
thresholds_1)
print(f"Overall Score: {overall_eval_1.total_score:.2f}")
print(f"Overall Judgment: {overall_eval_1.judgment.name}")
for name, eval_res in overall_eval_1.individual_metrics.items():
print(f" {name}: Value={eval_res.value:.2f}, Meets
Threshold={eval_res.meets_threshold}")
assert overall_eval_1.judgment == JudgmentLevel.PASS

# Test Case 2: Some metrics warning/fail


print("\n--- Test Case 2: Some metrics warning/fail ---")
metric_values_2 = {
"audit_timeliness": 0.6,
"public_trust_index": 55.0 / 100.0,
"mission_alignment_score": 0.7
}
thresholds_2 = {
"audit_timeliness": 0.8,
"public_trust_index": 70.0,
"mission_alignment_score": 0.7
}
overall_eval_2 = evaluator.evaluate(metric_values_2, thresholds_2)
print(f"Overall Score: {overall_eval_2.total_score:.2f}")
print(f"Overall Judgment: {overall_eval_2.judgment.name}")
for name, eval_res in overall_eval_2.individual_metrics.items():
print(f" {name}: Value={eval_res.value:.2f}, Meets
Threshold={eval_res.meets_threshold}")
assert overall_eval_2.judgment == JudgmentLevel.WARNING

# Test Case 3: Critical failure


print("\n--- Test Case 3: Critical failure ---")
metric_values_3 = {
"audit_timeliness": 0.3,
"public_trust_index": 20.0 / 100.0,
"mission_alignment_score": 0.4
}
thresholds_3 = {
"audit_timeliness": 0.8,
"public_trust_index": 70.0,
"mission_alignment_score": 0.7
}
overall_eval_3 = evaluator.evaluate(metric_values_3, thresholds_3)
print(f"Overall Score: {overall_eval_3.total_score:.2f}")
print(f"Overall Judgment: {overall_eval_3.judgment.name}")
for name, eval_res in overall_eval_3.individual_metrics.items():
print(f" {name}: Value={eval_res.value:.2f}, Meets
Threshold={eval_res.meets_threshold}")
assert overall_eval_3.judgment == JudgmentLevel.FAIL

# Test Case 4: Missing metric value


print("\n--- Test Case 4: Missing metric value ---")
metric_values_4 = {
"audit_timeliness": 0.9,
"public_trust_index": None, # Missing value
"mission_alignment_score": 0.8
}
thresholds_4 = {
"audit_timeliness": 0.8,
"public_trust_index": 70.0,
"mission_alignment_score": 0.7
}
overall_eval_4 = evaluator.evaluate(metric_values_4, thresholds_4)
print(f"Overall Score: {overall_eval_4.total_score:.2f}")
print(f"Overall Judgment: {overall_eval_4.judgment.name}")
for name, eval_res in overall_eval_4.individual_metrics.items():
print(f" {name}: Value={eval_res.value}, Meets
Threshold={eval_res.meets_threshold}")
# The score will be calculated based on available metrics, so it
might still pass
assert overall_eval_4.judgment == JudgmentLevel.PASS # 0.9*0.4 +
0.8*0.3 / (0.4+0.3) = (0.36 + 0.24) / 0.7 = 0.6 / 0.7 = 0.857

6. oversight/connectors/audit.py (UPDATED - Prometheus Metrics)


# oversight/connectors/audit.py
import logging
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker, Session
from contextlib import contextmanager
import asyncio
from typing import Optional, Union # Added Union for key type
from prometheus_client import Summary, Counter # NEW: Prometheus
metrics

# Relative imports from the package structure


from..config import DATABASE_URL
from..exceptions import DataFetchError
from.base import Connector # Import the Connector protocol

logger = logging.getLogger(__name__)

# --- Prometheus Metrics ---


AUDIT_FETCH_TIME = Summary('audit_fetch_seconds', 'Time to fetch audit
timeliness', ['org_id', 'status'])
AUDIT_FETCH_COUNT = Counter('audit_fetch_total', 'Total audit
timeliness fetch attempts', ['org_id', 'status'])

# --- SQLAlchemy Engine and Session Setup ---


_engine = None
_SessionLocal = None

def _get_engine():
global _engine
if _engine is None:
try:
_engine = create_engine(DATABASE_URL, pool_size=10,
max_overflow=20)
logger.info("SQLAlchemy engine created for audit
connector.")
except Exception as e:
logger.error(f"Failed to create SQLAlchemy engine for
audit connector: {e}", exc_info=True)
raise DataFetchError("Database connection failed during
engine creation for audit connector.") from e
return _engine

def _get_session_local():
global _SessionLocal
if _SessionLocal is None:
_SessionLocal = sessionmaker(autocommit=False,
autoflush=False, bind=_get_engine())
logger.info("SQLAlchemy sessionmaker created for audit
connector.")
return _SessionLocal

@contextmanager
def get_db_session() -> Session:
session_local_instance = _get_session_local()
session = session_local_instance()
try:
yield session
except Exception as e:
logger.error(f"Audit connector session error: {e}",
exc_info=True)
session.rollback()
raise DataFetchError("Database operation failed in audit
connector.") from e
finally:
session.close()

# --- Internal Synchronous Data Fetching Function ---


def _sync_fetch_audit_timeliness_logic(org_id: int) ->
Optional[float]:
logger.debug(f"Attempting to fetch audit_timeliness synchronously
for org_id: {org_id}")
try:
with get_db_session() as session:
query = text("""
SELECT AVG(CASE WHEN ontime_flag = TRUE THEN 1.0 ELSE
0.0 END) AS timeliness_fraction
FROM auditreports
WHERE org_id = :org_id;
""")
result = session.execute(query, {"org_id":
org_id}).scalar()

if result is None:
logger.warning(f"No audit reports found for org_id:
{org_id}.")
return None

timeliness_fraction = float(result)
logger.info(f"Fetched audit_timeliness for org_id
{org_id}: {timeliness_fraction:.2f}")
return timeliness_fraction
except DataFetchError:
raise
except Exception as e:
logger.error(f"Unexpected error in
_sync_fetch_audit_timeliness_logic for org_id {org_id}: {e}",
exc_info=True)
raise DataFetchError(f"Failed to fetch audit_timeliness for
org_id {org_id} due to unexpected error.") from e

# --- AuditConnector Class implementing the Protocol ---


class AuditConnector:
"""
Connector for fetching audit_timeliness from the internal
PostgreSQL database.
Implements the Connector protocol.
"""
async def fetch(self, key: Union[int, str]) -> Optional[float]:
"""
Asynchronously retrieves the audit timeliness metric.
Expects 'key' to be an integer 'org_id'.
"""
if not isinstance(key, int):
raise TypeError("AuditConnector expects 'key' to be an
integer org_id.")
org_id = key

logger.debug(f"Calling async AuditConnector.fetch for org_id:


{org_id}")

status = "failure" # Default status


with AUDIT_FETCH_TIME.labels(org_id=org_id,
status='success').time(): # Time successful fetches
try:
timeliness = await
asyncio.to_thread(_sync_fetch_audit_timeliness_logic, org_id)
status = "success"
return timeliness
except DataFetchError:
status = "failure"
raise
except Exception as e:
status = "failure"
logger.error(f"Error during async execution in
AuditConnector for org_id {org_id}: {e}", exc_info=True)
raise DataFetchError(f"Async fetch of audit_timeliness
failed for org_id {org_id}.") from e
finally:
AUDIT_FETCH_COUNT.labels(org_id=org_id,
status=status).inc() # Increment counter

7. oversight/connectors/public_trust.py (UPDATED - Prometheus Metrics)


# oversight/connectors/public_trust.py
import logging
import requests
import json
import boto3
import asyncio
import datetime # For ISO 8601 timestamp
from typing import Dict, Any, Optional, Union
from prometheus_client import Summary, Counter # NEW: Prometheus
metrics

# Relative imports from the package structure


from..config import PUBLIC_TRUST_API_BASE_URL, PUBLIC_TRUST_API_KEY,
AWS_S3_BUCKET_NAME, AWS_REGION
from..exceptions import PublicAPIFetchError
from.base import Connector # Import the Connector protocol

logger = logging.getLogger(__name__)

# --- Prometheus Metrics ---


PUBLIC_TRUST_FETCH_TIME = Summary('public_trust_fetch_seconds', 'Time
to fetch public trust index', ['org_name', 'status'])
PUBLIC_TRUST_FETCH_COUNT = Counter('public_trust_fetch_total', 'Total
public trust index fetch attempts', ['org_name', 'status'])

class PublicTrustConnector:
"""
Connector for fetching publictrust_index from an external API.
Implements the Connector protocol.
"""
def __init__(self):
self.s3_client = boto3.client('s3', region_name=AWS_REGION)
async def fetch(self, key: Union[int, str]) -> Optional[float]:
"""
Asynchronously fetches the public trust index for a given
organization.
Expects 'key' to be a string 'org_name'.
Stores raw response in S3.
"""
if not isinstance(key, str):
raise TypeError("PublicTrustConnector expects 'key' to be
a string organization name.")
org_name = key

api_url = f"{PUBLIC_TRUST_API_BASE_URL}/score"
headers = {"Authorization": f"Bearer {PUBLIC_TRUST_API_KEY}"}
params = {"org": org_name}

# Use standard UTC timestamp for S3 key (ISO 8601 with


milliseconds)
current_time_iso =
datetime.datetime.now(datetime.timezone.utc).isoformat(timespec='milli
seconds').replace(':', '-')

s3_key =
f"public_trust_api_raw/{org_name}/{current_time_iso}.json"

status = "failure" # Default status


with PUBLIC_TRUST_FETCH_TIME.labels(org_name=org_name,
status='success').time(): # Time successful fetches
try:
logger.debug(f"Calling public trust API for {org_name}
at {api_url}")
response = await asyncio.to_thread(
lambda: requests.get(api_url, headers=headers,
params=params, timeout=10)
)
response.raise_for_status()

raw_data = response.json()
score = raw_data.get("score")

# Store raw response in S3


await asyncio.to_thread(
self.s3_client.put_object,
Bucket=AWS_S3_BUCKET_NAME,
Key=s3_key,
Body=json.dumps(raw_data).encode('utf-8')
)
logger.info(f"Raw public trust API response for
{org_name} stored in S3: s3://{AWS_S3_BUCKET_NAME}/{s3_key}")

if score is None:
logger.warning(f"Public trust index 'score' not
found in API response for {org_name}.")
status = "warning" # Partial success
return None
status = "success"
return float(score)

except requests.exceptions.Timeout:
status = "failure"
logger.error(f"Public trust API request timed out for
{org_name}.", exc_info=True)
raise PublicAPIFetchError(f"Public trust API timeout
for {org_name}")
except requests.exceptions.RequestException as e:
status = "failure"
logger.error(f"Error calling public trust API for
{org_name}: {e}", exc_info=True)
raise PublicAPIFetchError(f"Failed to call public
trust API for {org_name}") from e
except json.JSONDecodeError:
status = "failure"
logger.error(f"Failed to decode JSON response from
public trust API for {org_name}.", exc_info=True)
raise PublicAPIFetchError(f"Invalid JSON from public
trust API for {org_name}")
except Exception as e:
status = "failure"
logger.error(f"Unexpected error fetching public trust
index for {org_name}: {e}", exc_info=True)
raise PublicAPIFetchError(f"Unexpected error for
{org_name}") from e
finally:
PUBLIC_TRUST_FETCH_COUNT.labels(org_name=org_name,
status=status).inc() # Increment counter

8. oversight/connectors/derived.py (UPDATED - Prometheus Metrics)


# oversight/connectors/derived.py
import logging
import boto3
import json
import asyncio
from typing import Dict, Any, Optional, Union
from prometheus_client import Summary, Counter # NEW: Prometheus
metrics

# Relative imports from the package structure


from..config import AWS_REGION, AWS_LAMBDA_FUNCTION_NAME
from..exceptions import DerivedMetricError
from.base import Connector # Import the Connector protocol

logger = logging.getLogger(__name__)
# --- Prometheus Metrics ---
ALIGNMENT_FETCH_TIME = Summary('alignment_fetch_seconds', 'Time to
fetch mission alignment score', ['org_id', 'status'])
ALIGNMENT_FETCH_COUNT = Counter('alignment_fetch_total', 'Total
mission alignment score fetch attempts', ['org_id', 'status'])

class AlignmentConnector:
"""
Connector for fetching missionalignmentscore by invoking an AWS
Lambda function.
Implements the Connector protocol.
"""
def __init__(self):
self.lambda_client = boto3.client('lambda',
region_name=AWS_REGION)

async def fetch(self, key: Union[int, str]) -> Optional[float]:


"""
Asynchronously invokes an AWS Lambda function to calculate
mission alignment score.
Expects 'key' to be an integer 'org_id'.
"""
if not isinstance(key, int):
raise TypeError("AlignmentConnector expects 'key' to be an
integer org_id.")
org_id = key

payload = {"org_id": org_id}


logger.debug(f"Invoking Lambda '{AWS_LAMBDA_FUNCTION_NAME}'
for org_id: {org_id}")

status = "failure" # Default status


with ALIGNMENT_FETCH_TIME.labels(org_id=org_id,
status='success').time(): # Time successful fetches
try:
response = await asyncio.to_thread(
self.lambda_client.invoke,
FunctionName=AWS_LAMBDA_FUNCTION_NAME,
InvocationType='RequestResponse', # Synchronous
invocation
Payload=json.dumps(payload)
)

response_payload =
json.loads(response['Payload'].read())

if 'FunctionError' in response:
error_message =
response_payload.get('errorMessage', 'Unknown Lambda error')
error_type = response_payload.get('errorType',
'LambdaInvocationError')
logger.error(f"Lambda function error for org_id
{org_id}: {error_type} - {error_message}")
status = "failure"
raise DerivedMetricError(f"Lambda function error:
{error_type} - {error_message}")

score = response_payload.get("score")
if score is None:
logger.warning(f"Lambda did not return 'score' for
org_id: {org_id}. Response: {response_payload}")
status = "warning" # Partial success
return None
status = "success"
return float(score)

except
self.lambda_client.exceptions.ResourceNotFoundException:
status = "failure"
logger.error(f"Lambda function
'{AWS_LAMBDA_FUNCTION_NAME}' not found.", exc_info=True)
raise DerivedMetricError(f"Lambda function
'{AWS_LAMBDA_FUNCTION_NAME}' not found")
except json.JSONDecodeError:
status = "failure"
logger.error(f"Failed to decode JSON response from
Lambda for org_id {org_id}.", exc_info=True)
raise DerivedMetricError(f"Invalid JSON from Lambda
for org_id {org_id}")
except Exception as e:
status = "failure"
logger.error(f"Unexpected error invoking Lambda for
org_id {org_id}: {e}", exc_info=True)
raise DerivedMetricError(f"Failed to get mission
alignment score for org_id {org_id}") from e
finally:
ALIGNMENT_FETCH_COUNT.labels(org_id=org_id,
status=status).inc() # Increment counter

9. oversight/loops/base.py (UPDATED - Uses Evaluator, Prometheus Metrics, Structured


Logging)
# oversight/loops/base.py
import asyncio
import logging
import datetime
import uuid # For run_id correlation
from typing import Dict, Any, Optional, Union, List, Tuple
import structlog # NEW: Structured logging
from prometheus_client import Summary, Counter # NEW: Prometheus
metrics

from..ledger import OversightLedger


from..connectors.base import Connector
from..evaluator import Evaluator, JudgmentLevel, OverallEvaluation,
MetricEvaluation # NEW: Evaluator

# --- Configure structlog ---


# This setup ensures that all logs from this module are structured.
# For a full application, this configuration should be done once at
the application entry point.
structlog.configure(
processors=,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
logger = structlog.get_logger(__name__)

# --- Prometheus Metrics for Loops ---


LOOP_RUN_TIME = Summary('oversight_loop_run_seconds', 'Time taken for
an oversight loop to run', ['loop_name', 'status'])
LOOP_RUN_COUNT = Counter('oversight_loop_total', 'Total oversight loop
runs', ['loop_name', 'status', 'judgment'])

class OversightLoop:
"""
Base class encapsulating one oversight cycle (Micro, Meso, or
Macro).
It fetches metrics, evaluates them using an Evaluator, logs to a
ledger,
adjusts its criteria, and can trigger a subsequent loop.
"""
def __init__(
self,
name: str,
connector: Connector,
metrics_to_fetch: List[str], # NEW: List of metrics this loop
will fetch
evaluator: Evaluator, # NEW: Injected Evaluator instance
criteria: Dict[str, float], # Example: {"metric_name_1": 0.9,
"metric_name_2": 0.5}
next_loop: Optional["OversightLoop"],
ledger: OversightLedger
):
self.name = name
self.connector = connector
self.metrics_to_fetch = metrics_to_fetch # Store which metrics
this loop is responsible for
self.evaluator = evaluator
self.criteria = criteria # Now holds thresholds for multiple
metrics
self.next_loop = next_loop
self.ledger = ledger
logger.info("OversightLoop initialized", loop_name=self.name,
criteria=self.criteria)
async def run_cycle(self, key: Union[int, str], run_id: str) ->
None:
"""
Executes one full cycle of the oversight loop for a given
organizational key.
Includes structured logging with correlation IDs.
"""
# Bind correlation IDs to the logger for this specific run
bound_logger = logger.bind(org_key=key, loop_name=self.name,
run_id=run_id)
bound_logger.info("Starting oversight cycle")

fetched_values: Dict[str, Optional[float]] = {}


overall_evaluation: Optional[OverallEvaluation] = None
loop_status = "failure" # Default status for Prometheus

with LOOP_RUN_TIME.labels(loop_name=self.name,
status='success').time(): # Time successful runs
try:
# 1) Fetch metrics using the injected connector
# Assuming connector.fetch can handle fetching
multiple metrics or is called per metric
# For this prototype, connector.fetch is designed for
a single metric.
# In a multi-metric loop, you'd iterate through
metrics_to_fetch and call connector.fetch for each.
# For now, we'll adapt to the single-metric connector
for simplicity,
# assuming each loop focuses on one primary metric for
its evaluation.
# The `metrics_to_fetch` list will be used to define
the `criteria` for the evaluator.

# Adapt to current single-metric connector design:


# Assuming the loop's primary metric is the first one
in metrics_to_fetch
primary_metric_name = self.metrics_to_fetch if
self.metrics_to_fetch else None
if primary_metric_name:
fetched_value = await self.connector.fetch(key)
fetched_values[primary_metric_name] =
fetched_value
else:
bound_logger.warning("No primary metric defined
for this loop. Skipping fetch.")
raise ValueError("Loop must have at least one
metric to fetch.")

# 2) Evaluate using the injected Evaluator


overall_evaluation =
self.evaluator.evaluate(fetched_values, self.criteria)
bound_logger.info("Evaluation complete",
total_score=overall_evaluation.total_score,

judgment=overall_evaluation.judgment.name,
individual_metrics={k:
v.meets_threshold for k, v in
overall_evaluation.individual_metrics.items()})
loop_status = "success"

except Exception as e:
bound_logger.error("Error during loop execution",
error=str(e), exc_info=True)
loop_status = "failure"
# Decide on fallback behavior here. For now, we log
and proceed to log the error.

finally:
# 3) Log to ledger
entry = {
"loop": self.name,
"key": key,
"run_id": run_id, # Include correlation ID
"overall_evaluation":
overall_evaluation.total_score if overall_evaluation else None,
"judgment": overall_evaluation.judgment.name if
overall_evaluation else "ERROR",
"individual_metrics_eval": {
name: {
"value": eval_res.value,
"threshold": eval_res.threshold,
"meets_threshold":
eval_res.meets_threshold
} for name, eval_res in
overall_evaluation.individual_metrics.items()
} if overall_evaluation else {},
"loop_status": loop_status,
"error_message": str(e) if loop_status ==
"failure" else None,
"timestamp":
datetime.datetime.now(datetime.timezone.utc).isoformat(timespec='milli
seconds') + 'Z', # ISO 8601 UTC
}
self.ledger.record(entry)
bound_logger.debug("Ledger entry recorded.")

# Increment Prometheus counter for loop runs


LOOP_RUN_COUNT.labels(
loop_name=self.name,
status=loop_status,
judgment=overall_evaluation.judgment.name if
overall_evaluation else "ERROR"
).inc()
# 4) Adjust threshold for next cycle (only if
evaluation was successful)
if overall_evaluation and loop_status == "success":
# Simple adjustment logic based on overall
judgment
# This can be made more sophisticated (e.g., per-
metric adjustment)
current_threshold =
self.criteria.get(primary_metric_name, 0.0) if primary_metric_name
else 0.0
if overall_evaluation.judgment ==
JudgmentLevel.PASS:
new_thresh = current_threshold * 1.02 #
Slightly tighten
elif overall_evaluation.judgment ==
JudgmentLevel.FAIL:
new_thresh = current_threshold * 0.98 #
Slightly relax
else: # WARNING
new_thresh = current_threshold # No change

if primary_metric_name:
self.criteria[primary_metric_name] =
round(new_thresh, 4)
bound_logger.info("Criteria adjusted",
new_threshold=self.criteria[primary_metric_name])
else:
bound_logger.warning("No primary metric to
adjust criteria for.")
else:
bound_logger.warning("Criteria not adjusted due to
error or no evaluation.")

# 5) Trigger next loop


if self.next_loop:
bound_logger.info("Triggering next loop",
next_loop=self.next_loop.name)
await self.next_loop.run_cycle(key, run_id)
else:
bound_logger.info("No next loop to trigger.")

10. dags/recursive_oversight_dag.py (UPDATED - Dynamic Task Mapping)


# dags/recursive_oversight_dag.py
from datetime import datetime, timedelta
import logging
import asyncio
import uuid # For generating unique run_ids for correlation

from airflow import DAG


from airflow.operators.python import PythonOperator
from airflow.decorators import task, task_group # NEW: For dynamic
task mapping
from airflow.utils.dates import days_ago

# Set up basic logging for the DAG, to see output in Airflow logs
# In a production Airflow environment, this is typically managed by
Airflow's logging config.
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Import components from our oversight package


from oversight.ledger import OversightLedger
from oversight.connectors.audit import AuditConnector
from oversight.connectors.public_trust import PublicTrustConnector
from oversight.connectors.derived import AlignmentConnector
from oversight.loops.base import OversightLoop
from oversight.evaluator import Evaluator, JudgmentLevel # NEW:
Evaluator and JudgmentLevel

# --- Global/Shared Instances (instantiated once per DAG file parse)


---
# In a real Airflow deployment, this ledger would write to a
persistent
# store (e.g., S3, Snowflake) rather than being in-memory.
shared_ledger = OversightLedger() # This will now write to S3

# Instantiate connectors
audit_connector = AuditConnector()
public_trust_connector = PublicTrustConnector()
alignment_connector = AlignmentConnector()

# Define metric weights for the overall evaluation (example)


# These weights should be consistent across loops if they contribute
to a single overall score.
# For this prototype, we'll define them once and pass to each loop's
evaluator.
overall_metric_weights = {
"audit_timeliness": 0.4,
"public_trust_index": 0.3,
"mission_alignment_score": 0.3
}

# Define judgment ranges for the overall score (example)


overall_judgment_ranges = {
JudgmentLevel.PASS: (0.75, 1.01), # Score >= 0.75
JudgmentLevel.WARNING: (0.5, 0.75), # 0.5 <= Score < 0.75
JudgmentLevel.FAIL: (0.0, 0.5) # 0.0 <= Score < 0.5
}

# Instantiate the Evaluator (one instance shared across loops)


shared_evaluator = Evaluator(overall_metric_weights,
overall_judgment_ranges)

# Build the loop chain in reverse order of triggering


# MacroLoop is the 'top' loop, which will eventually trigger Meso and
Micro
# Each loop now takes a list of metrics it's responsible for
fetching/evaluating
macro_loop = OversightLoop(
name="MacroLoop",
connector=alignment_connector, # AlignmentConnector fetches
missionalignmentscore
metrics_to_fetch=["mission_alignment_score"],
evaluator=shared_evaluator,
criteria={"mission_alignment_score": 0.7}, # Initial threshold for
this metric
next_loop=None, # Macro is the last in the chain for recursive
triggering
ledger=shared_ledger
)

meso_loop = OversightLoop(
name="MesoLoop",
connector=public_trust_connector, # PublicTrustConnector fetches
publictrust_index
metrics_to_fetch=["public_trust_index"],
evaluator=shared_evaluator,
criteria={"public_trust_index": 70.0}, # Initial threshold for
this metric
next_loop=macro_loop, # Meso triggers Macro
ledger=shared_ledger
)

micro_loop = OversightLoop(
name="MicroLoop",
connector=audit_connector, # AuditConnector fetches
audit_timeliness
metrics_to_fetch=["audit_timeliness"],
evaluator=shared_evaluator,
criteria={"audit_timeliness": 0.9}, # Initial threshold for this
metric
next_loop=meso_loop, # Micro triggers Meso
ledger=shared_ledger
)

# --- Airflow Task Functions ---

@task
def list_organizations() -> list[int]:
"""
Simulates fetching a dynamic list of organization IDs from a
source.
In production, this would query a database (Postgres/Snowflake) or
an API.
"""
logger.info("Fetching list of organizations for oversight.")
# Example: In a real scenario, this would be a DB query:
# from sqlalchemy import create_engine, text
# engine = create_engine(DATABASE_URL)
# with engine.connect() as connection:
# result = connection.execute(text("SELECT id FROM
organizations;")).fetchall()
# org_ids = [row for row in result]
# return org_ids

# For prototype, return a hardcoded list


return

@task
def run_full_oversight_cycle_for_org(org_id: int, dag_run_id: str):
"""
Entry point for Airflow. Runs the full recursive oversight cycle
for a single organization.
Uses asyncio.run() to execute the async loop chain.
"""
# Bind correlation IDs to the logger for this specific task
instance
# This ensures logs from this task are correlated with the DAG run
and organization.
import structlog
structlog.configure(
processors=,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
task_logger = structlog.get_logger(__name__).bind(org_id=org_id,
dag_run_id=dag_run_id)
task_logger.info("Starting full oversight cycle for organization")

try:
# The micro_loop will trigger meso_loop, which will trigger
macro_loop
asyncio.run(micro_loop.run_cycle(org_id, dag_run_id))
task_logger.info("Completed full oversight cycle for
organization")

except Exception as e:
task_logger.error("Failed to complete oversight cycle for
organization", error=str(e), exc_info=True)
raise # Re-raise to let Airflow mark the task as failed

# --- Airflow DAG Definition ---


with DAG(
dag_id="recursive_oversight_cycle_v2", # Changed DAG ID to reflect
update
description="Orchestrates daily recursive governance oversight
cycles for organizations with dynamic mapping.",
schedule_interval="@daily",
start_date=days_ago(1),
catchup=False,
tags=["governance", "oversight", "recursive", "dynamic"],
default_args={
"owner": "airflow",
"depends_on_past": False,
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
) as dag:

# Task to get the list of organizations


org_list_task = list_organizations()

# Dynamically map the oversight cycle task for each organization


# The `dag_run.run_id` is passed as a template to ensure
correlation across mapped tasks.
run_oversight_tasks = run_full_oversight_cycle_for_org.partial(
dag_run_id="{{ dag_run.run_id }}"
).expand(
org_id=org_list_task # This expands the task for each org_id
returned by list_organizations
)

# Define task dependencies


org_list_task >> run_oversight_tasks

Summary of Changes:
1. oversight/evaluator.py (NEW):
○ Introduces JudgmentLevel Enum (FAIL, WARNING, PASS) for gradient judgments.
○ Defines MetricEvaluation and OverallEvaluation dataclasses for structured results.
○ Implements the Evaluator class, which takes metric_weights and judgment_ranges.
○ The evaluate method calculates a weighted total score from multiple metrics and
maps it to a JudgmentLevel.
2. oversight/ledger.py (UPDATED):
○ The OversightLedger now uses boto3 to write each ledger entry as a separate
JSON file to an S3 bucket.
○ Entries are partitioned by year/month/day/loop_name/org_key/entry_id.json for
efficient querying via external tools.
○ Timestamps are strictly ISO 8601 UTC with millisecond precision.
○ A unique entry_id (UUID) is added to each record.
3. oversight/connectors/*.py (UPDATED):
○ audit.py, public_trust.py, and derived.py now include Prometheus Summary (for
timing) and Counter (for success/failure) metrics.
○ These metrics are labeled with relevant context (e.g., org_id, org_name, status).
4. oversight/loops/base.py (UPDATED):
○ The OversightLoop now takes an Evaluator instance, metrics_to_fetch (list of metric
names), and criteria (dictionary of thresholds for multiple metrics).
○ It uses the injected Evaluator to determine the OverallEvaluation and
JudgmentLevel.
○ Prometheus Summary and Counter metrics are added for loop execution time,
count, and judgment outcomes.
○ Structured Logging: structlog is configured to emit JSON logs, and org_key and
run_id are bound to the logger context for end-to-end traceability.
○ The adjust logic is simplified to adjust the primary metric's threshold based on the
overall judgment.
5. dags/recursive_oversight_dag.py (UPDATED):
○ The DAG ID is updated to recursive_oversight_cycle_v2.
○ A new @task list_organizations is introduced to simulate fetching a dynamic list of
organizations.
○ The run_full_oversight_cycle_for_org is now a @task decorated function.
○ Dynamic Task Mapping: The run_full_oversight_cycle_for_org task
uses .partial(dag_run_id="{{ dag_run.run_id }}").expand(org_id=org_list_task) to
dynamically create a separate task instance for each organization returned by
list_organizations. This ensures scalability and granular visibility in Airflow.
○ The dag_run.run_id is passed as a correlation ID to the individual task runs.
This implementation significantly advances the Recursive Oversight platform towards a
production-ready state, embodying the principles of resilient, ethical, and observable systems.
Next Steps:
1. Deploy and Configure:
○ Ensure your Airflow environment has the recursive-oversight package installed (pip
install. from the root).
○ Set up the necessary environment variables (from oversight/config.py) in your
Airflow deployment (e.g., DATABASE_URL, AWS credentials, S3 bucket names,
Lambda function names).
○ Ensure your AWS credentials are configured for boto3 (e.g., via IAM roles for
Airflow workers).
○ Deploy the updated DAG file (dags/recursive_oversight_dag.py) to your Airflow
DAGs folder.
2. Monitor and Validate:
○ Trigger the recursive_oversight_cycle_v2 DAG in Airflow.
○ Observe the Airflow UI for dynamically mapped tasks.
○ Check Airflow task logs for structured JSON output and correlation IDs.
○ Verify that JSON files are being written to your specified S3 ledger bucket, correctly
partitioned.
○ If you have a Prometheus setup, ensure the new metrics are being scraped and are
visible.
3. Implement Mock Services (for testing): For public_trust_connector and
alignment_connector, you'll need either actual deployed (even dummy) external
APIs/Lambda functions or robust mocks (e.g., using moto for AWS services) to allow the
connectors to run successfully during testing.
We've wired justice. Let's see it flow.

You might also like