Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mabualzait/CIRA

Repository files navigation

CIRA — Compliance Intelligence & Risk Agent

An open-source AI agent for enterprise compliance monitoring, regulatory intelligence, and GRC workflow automation across the GCC and MENA region.

Python LangGraph FastAPI Azure Ollama License Status


Table of Contents


What is CIRA?

CIRA (Compliance Intelligence & Risk Agent) is a production-grade, domain-specialized AI agent built for enterprise Governance, Risk & Compliance (GRC) functions. It combines a large language model reasoning core with a retrieval-augmented regulatory knowledge base to deliver continuous compliance monitoring, automated gap analysis, and audit-ready reporting — at a speed and scale that traditional manual processes cannot match.

CIRA is designed for the GCC and MENA regulatory environment but is architecturally generic enough to be adapted to any jurisdiction. It runs locally via Ollama or connects to cloud LLM providers (Azure OpenAI, OpenAI, Anthropic) with a single environment variable change.

Why CIRA?

  • Domain-specialized — purpose-built prompts and tools for GRC workflows, not a generic chatbot wrapper.
  • Source-cited — every finding is backed by a specific regulatory clause or document reference.
  • Audit-ready — immutable, append-only audit logs for every agent decision with full provenance.
  • Provider-agnostic — swap between Ollama (local/private) and cloud LLMs (Azure, OpenAI, Anthropic) without code changes.
  • Extensible — add new compliance frameworks by dropping documents into a folder and running the ingestion script.

Key Capabilities

Domain What CIRA Does
Regulatory Compliance Monitors regulatory obligations, detects changes, flags gaps, generates compliance briefs
Third-Party Risk Scores vendor risk across configurable domains, tracks contract compliance, alerts on changes
ESG Reporting Maps disclosures to GRI, TCFD, and SASB frameworks; identifies gaps pre-publication
HSE & Business Continuity Monitors HSE policy adherence, tracks BCP test records, flags overdue reviews
Technical & Cyber Risk IT risk register management, control gap analysis, security policy compliance
Finance & Project Compliance Financial control monitoring, project budget compliance, procurement governance
Licensing & Software Governance Software asset tracking, license compliance, renewal risk flagging
Audit Trail Generation Immutable, source-cited audit logs for every agent decision and recommendation

Architecture Overview

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        CIRA Agent Loop                          │
│                        (LangGraph ReAct)                        │
│                                                                 │
│  User / System Input                                            │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────┐    ┌──────────────────────────────────┐   │
│  │  Ingestion &    │───▶│   Regulatory Knowledge Base       │   │
│  │  Context Engine │    │  (RAG · ChromaDB / pgvector)      │   │
│  │  (PDF/DOCX/TXT/ │    │  Chunked embeddings with source   │   │
│  │   CSV/JSON/MD)  │    │  metadata and page references     │   │
│  └─────────────────┘    └──────────────────────────────────┘   │
│         │                            │                          │
│         ▼                            ▼                          │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │           Compliance Reasoning Engine                     │  │
│  │    LLM Core (Ollama / Azure OpenAI / OpenAI / Anthropic) │  │
│  │    Chain-of-thought · Source citation · Confidence        │  │
│  │    Tool-calling · Structured output                       │  │
│  └──────────────────────────────────────────────────────────┘  │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────┐    ┌─────────────────────────────────┐    │
│  │  Validation &   │    │   Audit Trail Module             │    │
│  │  Fact-Check     │    │   (JSONL · Immutable · Per-day)  │    │
│  └─────────────────┘    └─────────────────────────────────┘    │
│         │                                                       │
│         ▼                                                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │           Delivery & Integration Layer                    │  │
│  │   REST API · Reports (JSON/TXT/DOCX) · Webhooks          │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Core Components

Component Module Description
Ingestion Engine cira/knowledge/ingestor.py Parses policy documents, regulatory circulars, audit reports, contracts, and structured data (PDF, DOCX, TXT, CSV, JSON, Markdown). Uses RecursiveCharacterTextSplitter with configurable chunk size (default 1000) and overlap (default 200).
Regulatory Knowledge Base cira/knowledge/vector_store.py A vector store of compliance frameworks and regulatory documents. Supports ChromaDB (local development) and pgvector (production). Documents are embedded with source metadata for citation.
Reasoning Engine cira/graph.py A LangGraph-powered agentic loop using the ReAct pattern. The agent decomposes compliance obligations, invokes tools to retrieve evidence, runs gap analysis, assigns risk ratings, and generates source-cited findings.
Validation Module Embedded in prompts Cross-references all factual and regulatory claims against the knowledge base before output. Unverifiable claims are flagged with LOW confidence — never stated as fact.
Audit Trail Module cira/tools/audit_logger.py Every agent action is logged with full provenance: timestamp (UTC), action type, detail, confidence score, input summary, regulatory reference, and metadata. Stored as append-only JSONL files (one per day).
Delivery Layer cira/api/routes.py, cira/tools/report_gen.py Outputs findings as structured JSON (API), narrative text reports, DOCX documents, or real-time webhook alerts. REST API with OpenAPI/Swagger documentation.

Agent Reasoning Loop

CIRA uses a ReAct (Reasoning + Acting) loop implemented with LangGraph:

┌──────────┐
│          │
│  START   │
│          │
└────┬─────┘
     │
     ▼
┌──────────┐     Has tool calls?      ┌──────────┐
│          │─────── YES ──────────────▶│          │
│  Agent   │                           │  Tools   │
│  (LLM)   │◀──────────────────────────│  Execute │
│          │                           │          │
└────┬─────┘                           └──────────┘
     │
     │ No tool calls (final answer)
     ▼
┌──────────┐
│          │
│   END    │
│          │
└──────────┘
  1. The Agent node receives the user query + system prompt and decides whether to invoke a tool or produce a final answer.
  2. If tool calls are present, the Tools node executes them (gap analysis search, document parsing, risk scoring, etc.) and returns results.
  3. The loop continues — the agent reasons over tool outputs, possibly calling additional tools — until it produces a final response with no tool calls.
  4. Every tool invocation is logged to the audit trail with provenance metadata.

Tech Stack

Layer Technology Purpose
Agent Framework LangGraph 0.2+ Stateful agent graph with ReAct loop
LLM Orchestration LangChain 0.3+ Tool binding, message handling, provider abstractions
LLM (Local) Ollama llama3.1, qwen2.5, mistral — runs fully offline
LLM (Cloud) Azure OpenAI · OpenAI · Anthropic Claude Configurable via single env var
Embeddings Ollama (nomic-embed-text) · Azure OpenAI · OpenAI Document embedding for RAG
Vector Store ChromaDB 0.5+ Local development vector store
Vector Store (Prod) pgvector PostgreSQL-based production vector store
Document Parsing PyMuPDF · python-docx · LangChain loaders PDF, DOCX, TXT, CSV, JSON, Markdown
API Layer FastAPI 0.115+ REST API with OpenAPI docs
Validation Pydantic 2.0+ Request/response schemas and config
Configuration Pydantic Settings Type-safe env var management
Logging structlog JSON-structured logging
HTTP httpx Webhook delivery
Auth API Key header (X-API-Key) Development authentication
Containerization Docker + Docker Compose Production deployment

Getting Started

Prerequisites

Requirement Version Notes
Python 3.11+ Required
Ollama Latest Required for local LLM mode
Git Any Required
Docker Latest Optional — for containerized deployment

Installation

1. Clone the Repository

git clone https://github.com/mabualzait/cira-agent.git
cd cira-agent

2. Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

3. Install Dependencies

pip install -r requirements.txt

Or using the Makefile:

make install

4. Pull a Local Model via Ollama

# Recommended for compliance reasoning tasks
ollama pull llama3.1

# Recommended for embeddings
ollama pull nomic-embed-text

# Alternative: stronger tool-calling model
ollama pull qwen2.5:14b

5. Configure Environment

cp .env.example .env

Edit .env with your preferred settings. At minimum, the defaults work with Ollama running locally. See Configuration Reference for all options.

Configuration Reference

The .env file controls all application behavior. The minimal configuration for local development:

LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1
OLLAMA_BASE_URL=http://localhost:11434
VECTOR_STORE=chroma

For Azure OpenAI:

LLM_PROVIDER=azure_openai
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key-here
AZURE_OPENAI_DEPLOYMENT=gpt-4o
AZURE_OPENAI_API_VERSION=2024-02-01

For OpenAI:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

For Anthropic:

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Seeding the Knowledge Base

Drop your regulatory documents (PDF, DOCX, TXT, CSV, JSON, Markdown) into ./data/regulatory_docs/, then ingest:

python scripts/ingest.py

With custom options:

python scripts/ingest.py \
  --dir ./path/to/custom/docs \
  --collection my_collection \
  --chunk-size 1500 \
  --chunk-overlap 300

The ingestion script will:

  1. Recursively scan the directory for supported file types.
  2. Parse each file and extract text content.
  3. Split content into chunks using RecursiveCharacterTextSplitter.
  4. Embed chunks and store them in the configured vector store.
  5. Enrich each chunk with source metadata (filename, page number, file type).

Running the Application

# Start the FastAPI server (with auto-reload)
python main.py

# Or run the interactive CLI
python cli.py

# Or run a single query
python cli.py --query "Identify gaps in our third-party risk policy against ISO 31000"

Using the Makefile:

make run     # Start API server
make cli     # Interactive CLI
make ingest  # Run ingestion pipeline

Usage Guide

Interactive CLI

The CLI provides a REPL interface for compliance queries:

python cli.py
   _____ _____ _____            
  / ____|_   _|  __ \     /\    
 | |      | | | |__) |   /  \   
 | |      | | |  _  /   / /\ \  
 | |____ _| |_| | \ \  / ____ \ 
  \_____|_____|_|  \_\/_/    \_\
                                 
  Compliance Intelligence & Risk Agent
  Type 'help' for commands, 'exit' to quit.

cira> Identify gaps in our information security policy against ISO 27001 Annex A

CLI commands:

Command Description
help Show available commands
exit / quit Exit the CLI
clear Clear the screen
Any text Submit as a compliance query

Single query mode:

python cli.py --query "What are the key data privacy requirements under UAE PDPL?"

Python SDK

The CIRAAgent class provides a programmatic interface:

from cira import CIRAAgent

agent = CIRAAgent()

Gap Analysis

result = agent.run(
    task="gap_analysis",
    input={
        "document": "path/to/policy.pdf",
        "framework": "ISO_27001",
        "scope": "information_security"
    }
)

print(result.findings)       # Detailed gap analysis with citations
print(result.success)        # True if findings were generated
print(result.report)         # Narrative report text
print(result.raw_messages)   # Full LangGraph message history

Vendor Risk Scoring

result = agent.run(
    task="vendor_risk_score",
    input={
        "vendor_name": "Acme Cloud Services",
        "contract_path": "contracts/acme_2024.pdf",
        "risk_domains": ["data_privacy", "financial_stability", "operational", "cyber"]
    }
)

print(result.findings)  # Risk assessment with per-domain analysis

Simple Query

answer = agent.query("What are the key requirements of ISO 27001 Section A.8?")
print(answer)

CIRAResult Object

All agent.run() calls return a CIRAResult with the following fields:

Field Type Description
findings str The agent's analysis output with citations
risk_ratings dict[str, str] Risk ratings per finding (CRITICAL/HIGH/MEDIUM/LOW)
audit_trail list[dict] Audit events generated during this run
report str Narrative summary (same as findings by default)
raw_messages list Full LangGraph message history for debugging
success bool (property) True if findings are non-empty

REST API Reference

Start the API server:

python main.py

The server starts at http://localhost:8000 with auto-generated docs at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

All endpoints except /health require the X-API-Key header.

GET /health — Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "llm_provider": "ollama",
  "vector_store": "chroma",
  "timestamp": "2026-03-05T06:36:00Z"
}

POST /api/v1/analyze — Submit Analysis Task

curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "task": "regulatory_gap_analysis",
    "document_path": "data/uploads/policy.pdf",
    "framework": "NESA_IA",
    "scope": "information_security",
    "output_format": "json"
  }'

Request body (AnalyzeRequest):

Field Type Required Default Description
task string Task type: regulatory_gap_analysis, vendor_risk_score, query
document_path string null Path to document to analyze
framework string ISO_27001 Target compliance framework
scope string general Compliance domain scope
query string null Natural-language query (for task=query)
vendor_name string null Vendor name (for risk scoring tasks)
risk_domains string[] null Risk domains to evaluate
output_format string json Output: json, text, or docx

Response body (AnalyzeResponse):

{
  "task": "regulatory_gap_analysis",
  "status": "completed",
  "findings": "## Gap Analysis Results\n...",
  "report_path": null,
  "audit_trail_count": 0,
  "timestamp": "2026-03-05T06:36:00Z"
}

POST /api/v1/ingest — Ingest Documents

curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "directory": "./data/regulatory_docs",
    "collection_name": "regulatory_docs",
    "chunk_size": 1000,
    "chunk_overlap": 200
  }'

Response:

{
  "status": "completed",
  "chunks_ingested": 342,
  "collection_name": "regulatory_docs",
  "timestamp": "2026-03-05T06:36:00Z"
}

GET /api/v1/audit-trail — Query Audit Events

curl "http://localhost:8000/api/v1/audit-trail?date=2026-03-05&limit=50" \
  -H "X-API-Key: your-api-key"

Query parameters:

Param Type Default Description
date string Today (UTC) Date in YYYY-MM-DD format
action string null Filter by action type
limit int 100 Maximum events to return

Response:

{
  "date": "2026-03-05",
  "events": [
    {
      "timestamp": "2026-03-05T06:30:00+00:00",
      "action": "gap_analysis",
      "detail": "Retrieved 8 passages for ISO_27001/information_security",
      "confidence": "HIGH",
      "input_summary": null,
      "regulatory_ref": null,
      "metadata": {}
    }
  ],
  "total": 1
}

Project Structure

Directory Layout

cira-agent/
├── main.py                        # FastAPI application entry point
├── cli.py                         # Interactive CLI with REPL and --query mode
├── cira/                          # Core Python package
│   ├── __init__.py                # Package root — exports CIRAAgent, __version__
│   ├── config.py                  # Pydantic Settings — all configuration
│   ├── agent.py                   # CIRAAgent class — public SDK interface
│   ├── graph.py                   # LangGraph ReAct agent graph
│   ├── logging.py                 # Structured logging (structlog) setup
│   ├── llm/                       # LLM provider abstraction
│   │   ├── __init__.py
│   │   ├── provider.py            # Factory: get_chat_model(), get_embeddings()
│   │   └── prompts.py             # All prompt templates (system, gap, risk, etc.)
│   ├── knowledge/                 # Knowledge base & vector store
│   │   ├── __init__.py
│   │   ├── vector_store.py        # ChromaDB / pgvector abstraction
│   │   ├── ingestor.py            # Document ingestion pipeline
│   │   └── frameworks/            # Built-in framework definition files
│   │       └── .gitkeep
│   ├── tools/                     # LangGraph tools (agent capabilities)
│   │   ├── __init__.py            # Exports ALL_TOOLS list
│   │   ├── gap_analysis.py        # @tool — compliance gap analysis
│   │   ├── risk_scorer.py         # @tool — vendor risk scoring
│   │   ├── doc_parser.py          # @tool — document parsing
│   │   ├── audit_logger.py        # Audit trail (not a @tool — called internally)
│   │   ├── alert_engine.py        # @tool — compliance alerting + webhooks
│   │   └── report_gen.py          # @tool — report generation (JSON/TXT/DOCX)
│   └── api/                       # REST API layer
│       ├── __init__.py
│       ├── routes.py              # FastAPI route definitions + auth
│       └── schemas.py             # Pydantic request/response schemas
├── data/                          # Runtime data directories
│   ├── regulatory_docs/           # Drop your documents here for ingestion
│   │   └── .gitkeep
│   ├── chroma/                    # ChromaDB persistent store (git-ignored)
│   └── uploads/                   # Temporary upload directory
│       └── .gitkeep
├── scripts/                       # Utility scripts
│   ├── ingest.py                  # Knowledge base ingestion CLI
│   └── benchmark.py               # Performance benchmarking
├── tests/                         # Test suite
│   ├── __init__.py
│   ├── test_agent.py              # Agent + CIRAResult unit tests
│   ├── test_tools.py              # Tool unit tests (mocked vector store)
│   └── test_api.py                # API integration tests (FastAPI TestClient)
├── .env.example                   # Environment variable template (documented)
├── .gitignore                     # Git ignore rules
├── requirements.txt               # Python dependencies (pinned ranges)
├── pyproject.toml                 # Project metadata, pytest/ruff/mypy config
├── Dockerfile                     # Production container image
├── docker-compose.yml             # Multi-service orchestration (app + Ollama)
├── Makefile                       # Developer commands (install/run/test/lint)
├── LICENSE                        # MIT License
└── README.md                      # This file

Module Reference

cira/config.py — Configuration

The single source of truth for all application settings. Uses Pydantic Settings to load and validate environment variables with type safety.

from cira.config import settings

# Access any setting
print(settings.llm_provider)       # LLMProvider.OLLAMA
print(settings.ollama_model)       # "llama3.1"
print(settings.vector_store)       # VectorStoreBackend.CHROMA
print(settings.api_port)           # 8000

# Ensure runtime directories exist
settings.ensure_directories()

Key design decisions:

  • Enum types (LLMProvider, EmbeddingProvider, VectorStoreBackend) prevent typos and enable IDE autocomplete.
  • SettingsConfigDict with case_sensitive=False and extra="ignore" for flexible env var handling.
  • A module-level settings singleton is instantiated on import — shared across the entire app.

cira/llm/provider.py — LLM Factory

Two factory functions with lazy imports — only the selected provider's SDK is loaded at runtime:

from cira.llm.provider import get_chat_model, get_embeddings

llm = get_chat_model()        # Returns BaseChatModel
embeddings = get_embeddings()  # Returns Embeddings

Provider selection is automatic based on LLM_PROVIDER and EMBEDDING_PROVIDER env vars.

cira/graph.py — Agent Graph

Builds and compiles the LangGraph agent with:

  • A system prompt (cira/llm/prompts.py) that defines CIRA's compliance reasoning rules.
  • All tools from cira/tools/ bound to the LLM via llm.bind_tools().
  • Conditional routing: tool calls → tool execution → back to agent, or final answer → end.

cira/agent.py — Public Interface

The CIRAAgent class wraps the graph and provides:

  • run(task, input)CIRAResult — structured task execution
  • query(question)str — simple question answering
  • Automatic task-to-message translation for different task types
  • Audit trail logging for task start/completion

Configuration Deep Dive

LLM Providers

CIRA supports four LLM providers. Set LLM_PROVIDER in your .env:

Provider Value Requires Best For
Ollama ollama Ollama running locally Privacy, offline use, development
Azure OpenAI azure_openai Azure subscription + deployment Enterprise with Azure
OpenAI openai OpenAI API key Quick start, strong reasoning
Anthropic anthropic Anthropic API key Long-context analysis

Embedding Providers

Set EMBEDDING_PROVIDER independently of the LLM provider:

Provider Value Default Model
Ollama ollama nomic-embed-text
Azure OpenAI azure_openai (set via AZURE_OPENAI_EMBEDDING_DEPLOYMENT)
OpenAI openai text-embedding-3-small

Vector Store Backends

Backend Value Use Case
ChromaDB chroma Local development, small-medium datasets
pgvector pgvector Production, large datasets, concurrent access

Environment Variables Reference

Variable Type Default Description
LLM_PROVIDER enum ollama LLM backend: ollama, azure_openai, openai, anthropic
OLLAMA_MODEL string llama3.1 Ollama model name
OLLAMA_BASE_URL string http://localhost:11434 Ollama API endpoint
AZURE_OPENAI_ENDPOINT string Azure OpenAI resource URL
AZURE_OPENAI_API_KEY string Azure OpenAI API key
AZURE_OPENAI_DEPLOYMENT string Azure OpenAI deployment name
AZURE_OPENAI_API_VERSION string 2024-02-01 Azure API version
OPENAI_API_KEY string OpenAI API key
OPENAI_MODEL string gpt-4o OpenAI model name
ANTHROPIC_API_KEY string Anthropic API key
ANTHROPIC_MODEL string claude-3-5-sonnet-20241022 Anthropic model name
EMBEDDING_PROVIDER enum ollama Embedding backend: ollama, azure_openai, openai
OLLAMA_EMBEDDING_MODEL string nomic-embed-text Ollama embedding model
OPENAI_EMBEDDING_MODEL string text-embedding-3-small OpenAI embedding model
VECTOR_STORE enum chroma Vector store: chroma, pgvector
CHROMA_PERSIST_DIR path ./data/chroma ChromaDB storage directory
PGVECTOR_CONNECTION_STRING string PostgreSQL connection string for pgvector
KNOWLEDGE_BASE_DIR path ./data/regulatory_docs Directory for document ingestion
API_HOST string 0.0.0.0 API server bind address
API_PORT int 8000 API server port
API_KEY string changeme-in-production API authentication key
LOG_LEVEL enum INFO Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_DIR path ./logs Log file directory
AUDIT_TRAIL_DIR path ./data/audit_trail Audit trail storage directory

Tools Reference

CIRA's agent has access to the following tools, each implemented as a LangChain @tool:

gap_analysis_tool

Module: cira/tools/gap_analysis.py

Searches the regulatory knowledge base for compliance framework requirements relevant to a query.

Parameter Type Default Description
query str (required) The compliance question or policy description
framework str ISO_27001 Target compliance framework
scope str general Compliance domain scope

Returns formatted regulatory excerpts with source citations (filename, page number).

vendor_risk_tool

Module: cira/tools/risk_scorer.py

Retrieves compliance information for multi-domain vendor risk assessment.

Parameter Type Default Description
vendor_name str (required) Name of the vendor
risk_domains str data_privacy,operational,cyber Comma-separated risk domains
context str "" Additional vendor context

Default risk domain weights:

Domain Weight
data_privacy 25%
cyber 25%
operational 20%
financial_stability 15%
contractual 15%

parse_document_tool

Module: cira/tools/doc_parser.py

Parses a document file and returns extracted text. Supported formats: PDF, DOCX, TXT, CSV, JSON, Markdown. Content is truncated at 50,000 characters for LLM context window safety.

send_alert_tool

Module: cira/tools/alert_engine.py

Sends a compliance alert with optional webhook delivery.

Parameter Type Default Description
title str (required) Alert title
severity str (required) CRITICAL / HIGH / MEDIUM / LOW
description str (required) Alert details
framework str "" Related framework
webhook_url str "" URL for HTTP POST delivery

generate_report_tool

Module: cira/tools/report_gen.py

Generates exportable compliance reports in JSON, plain text, or DOCX format. Reports are saved to ./data/reports/ with timestamped filenames.


Audit Trail System

CIRA maintains an immutable, append-only audit trail for every agent action. This is critical for GRC compliance — every recommendation, finding, and decision has full provenance.

Storage Format

Audit events are stored as JSONL (one JSON object per line) in daily files:

data/audit_trail/
├── audit_2026-03-04.jsonl
├── audit_2026-03-05.jsonl
└── ...

Event Schema

{
  "timestamp": "2026-03-05T06:30:00+00:00",
  "action": "gap_analysis",
  "detail": "Retrieved 8 passages for ISO_27001/information_security",
  "confidence": "HIGH",
  "input_summary": "Perform a compliance gap analysis...",
  "regulatory_ref": "ISO 27001:2022 Annex A.8",
  "metadata": {}
}
Field Description
timestamp UTC ISO 8601 timestamp
action Action type (e.g., gap_analysis, vendor_risk_assessment, alert_sent, report_generated, task_started:*, task_completed:*)
detail Human-readable description of what happened
confidence Agent confidence level: HIGH, MEDIUM, LOW
input_summary Summary of the input that triggered this action
regulatory_ref Specific regulatory clause or framework reference
metadata Additional structured data

Export

Audit trails can be exported as formatted JSON:

from cira.tools.audit_logger import export_audit_trail

export_audit_trail(date="2026-03-05", output_path="./exports/audit.json")

Or retrieved programmatically:

from cira.tools.audit_logger import get_audit_trail

events = get_audit_trail(date="2026-03-05", action_filter="gap_analysis", limit=50)

Compliance Frameworks

CIRA supports any compliance framework through its RAG knowledge base. The following are reference frameworks the system is optimized for:

Category Frameworks
Information Security ISO/IEC 27001:2022, NESA UAE Information Assurance Standards
Risk Management ISO 31000:2018, COBIT 2019
Business Continuity ISO 22301:2019
ESG & Sustainability GRI Standards 2021, TCFD, SASB
Data Privacy UAE PDPL, Saudi PDPL, GDPR (reference)
Quality Management ISO 9001:2015
HSE ISO 45001:2018

Adding New Frameworks

  1. Place the framework documents (PDF, DOCX, TXT) in ./data/regulatory_docs/
  2. Run the ingestion pipeline:
    python scripts/ingest.py
  3. The documents are automatically chunked, embedded, and indexed.
  4. The agent will now include them in its knowledge retrieval.

You can organize documents into subdirectories — the ingestion script scans recursively:

data/regulatory_docs/
├── iso_27001/
│   ├── ISO_27001_2022_full.pdf
│   └── annex_a_controls.pdf
├── uae_pdpl/
│   └── UAE_PDPL_2021.pdf
└── internal/
    ├── company_security_policy.docx
    └── vendor_risk_procedures.pdf

Deployment

Docker

Build and run a standalone container:

docker build -t cira-agent .
docker run -p 8000:8000 --env-file .env cira-agent

The Dockerfile uses python:3.11-slim and includes system dependencies for document parsing (poppler, tesseract).

Docker Compose

The included docker-compose.yml runs CIRA with Ollama:

# Start all services
docker compose up -d

# View logs
docker compose logs -f cira

# Stop
docker compose down

Services:

Service Description Port
cira CIRA API server 8000
ollama Local LLM server 11434

For production with pgvector, uncomment the postgres service in docker-compose.yml and set:

VECTOR_STORE=pgvector
PGVECTOR_CONNECTION_STRING=postgresql://cira:changeme@postgres:5432/cira

Production Considerations

  • API Key: Change API_KEY from the default to a secure random string.
  • CORS: Restrict allow_origins in main.py from ["*"] to your specific domains.
  • HTTPS: Deploy behind a reverse proxy (nginx, Caddy) with TLS termination.
  • Vector Store: Switch from ChromaDB to pgvector for concurrent access and durability.
  • Logging: Set LOG_LEVEL=WARNING in production to reduce noise.
  • Resources: LLM inference (especially local Ollama) is memory-intensive. Allocate at least 8GB RAM for 7B parameter models, 16GB+ for 14B models.

Testing

Run the full test suite:

pytest tests/ -v

Or using the Makefile:

make test       # Run tests
make test-cov   # Run tests with coverage report

Test Architecture

File What It Tests Strategy
tests/test_agent.py CIRAAgent, CIRAResult Unit tests with mocked graph
tests/test_tools.py Audit logger, gap analysis tool Unit tests with mocked vector store
tests/test_api.py FastAPI endpoints Integration tests with TestClient

Tests use unittest.mock.patch to isolate dependencies (LLM, vector store) so they run without an Ollama instance or real API keys.


Development

Developer Commands (Makefile)

make install     # Install dependencies
make dev         # Install deps + dev tools (ruff, mypy)
make run         # Start API server
make cli         # Interactive CLI
make ingest      # Run document ingestion
make test        # Run test suite
make test-cov    # Tests with HTML coverage report
make lint        # Lint with ruff
make format      # Auto-format with ruff
make typecheck   # Type-check with mypy
make clean       # Remove __pycache__, .pytest_cache, etc.
make docker-build # Build Docker image
make docker-up    # Start Docker Compose stack
make docker-down  # Stop Docker Compose stack

Code Quality

The project is configured with:

  • Ruff — linting and formatting (configured in pyproject.toml)
  • mypy — static type checking with disallow_untyped_defs
  • pytest — testing with async support

Roadmap

  • Core LangGraph agent loop (ReAct pattern)
  • Ollama + Azure OpenAI + OpenAI + Anthropic provider support
  • ChromaDB vector store with document ingestion pipeline
  • Compliance gap analysis tool
  • Third-party risk scoring tool
  • Immutable audit trail module (JSONL)
  • FastAPI REST API with OpenAPI docs
  • Report generation (JSON, TXT, DOCX)
  • Real-time alerting with webhook delivery
  • Docker + Docker Compose deployment
  • Structured logging with structlog
  • Pydantic Settings-based configuration
  • pgvector production backend
  • ServiceNow GRC integration
  • Power BI connector
  • Web UI dashboard (React)
  • Automated regulatory change monitoring (scraper + alerting)
  • Multi-tenant support with RBAC
  • Kubernetes deployment manifests
  • Arabic language regulatory document support
  • File upload API endpoint
  • Streaming responses (SSE)
  • Rate limiting and usage tracking

Contributing

Contributions are welcome. Please open an issue to discuss what you'd like to change before submitting a pull request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add your feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a Pull Request

Guidelines

  • Follow the existing code style (enforced by ruff).
  • Add type hints to all function signatures.
  • Include tests for new functionality.
  • Update the README if you add new tools, endpoints, or configuration options.
  • Keep prompts in cira/llm/prompts.py — do not hardcode prompts in tool or agent code.

License

MIT License — see LICENSE for details.


Disclaimer

CIRA is a strategic advisory and monitoring tool. All outputs are informational in nature and do not constitute legal, regulatory, or professional compliance advice. Always engage qualified legal or compliance professionals for formal regulatory obligations. The confidence scores and risk ratings provided are AI-generated assessments and should be validated by domain experts before being used in formal compliance processes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors