CIRA — Compliance Intelligence & Risk Agent

An open-source AI agent for enterprise compliance monitoring, regulatory intelligence, and GRC workflow automation across the GCC and MENA region.

What is CIRA?

CIRA (Compliance Intelligence & Risk Agent) is a production-grade, domain-specialized AI agent built for enterprise Governance, Risk & Compliance (GRC) functions. It combines a large language model reasoning core with a retrieval-augmented regulatory knowledge base to deliver continuous compliance monitoring, automated gap analysis, and audit-ready reporting — at a speed and scale that traditional manual processes cannot match.

CIRA is designed for the GCC and MENA regulatory environment but is architecturally generic enough to be adapted to any jurisdiction. It runs locally via Ollama or connects to cloud LLM providers (Azure OpenAI, OpenAI, Anthropic) with a single environment variable change.

Why CIRA?

Domain-specialized — purpose-built prompts and tools for GRC workflows, not a generic chatbot wrapper.
Source-cited — every finding is backed by a specific regulatory clause or document reference.
Audit-ready — immutable, append-only audit logs for every agent decision with full provenance.
Provider-agnostic — swap between Ollama (local/private) and cloud LLMs (Azure, OpenAI, Anthropic) without code changes.
Extensible — add new compliance frameworks by dropping documents into a folder and running the ingestion script.

Key Capabilities

Domain	What CIRA Does
Regulatory Compliance	Monitors regulatory obligations, detects changes, flags gaps, generates compliance briefs
Third-Party Risk	Scores vendor risk across configurable domains, tracks contract compliance, alerts on changes
ESG Reporting	Maps disclosures to GRI, TCFD, and SASB frameworks; identifies gaps pre-publication
HSE & Business Continuity	Monitors HSE policy adherence, tracks BCP test records, flags overdue reviews
Technical & Cyber Risk	IT risk register management, control gap analysis, security policy compliance
Finance & Project Compliance	Financial control monitoring, project budget compliance, procurement governance
Licensing & Software Governance	Software asset tracking, license compliance, renewal risk flagging
Audit Trail Generation	Immutable, source-cited audit logs for every agent decision and recommendation

Architecture Overview

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        CIRA Agent Loop                          │
│                        (LangGraph ReAct)                        │
│                                                                 │
│  User / System Input                                            │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────┐    ┌──────────────────────────────────┐   │
│  │  Ingestion &    │───▶│   Regulatory Knowledge Base       │   │
│  │  Context Engine │    │  (RAG · ChromaDB / pgvector)      │   │
│  │  (PDF/DOCX/TXT/ │    │  Chunked embeddings with source   │   │
│  │   CSV/JSON/MD)  │    │  metadata and page references     │   │
│  └─────────────────┘    └──────────────────────────────────┘   │
│         │                            │                          │
│         ▼                            ▼                          │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │           Compliance Reasoning Engine                     │  │
│  │    LLM Core (Ollama / Azure OpenAI / OpenAI / Anthropic) │  │
│  │    Chain-of-thought · Source citation · Confidence        │  │
│  │    Tool-calling · Structured output                       │  │
│  └──────────────────────────────────────────────────────────┘  │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────┐    ┌─────────────────────────────────┐    │
│  │  Validation &   │    │   Audit Trail Module             │    │
│  │  Fact-Check     │    │   (JSONL · Immutable · Per-day)  │    │
│  └─────────────────┘    └─────────────────────────────────┘    │
│         │                                                       │
│         ▼                                                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │           Delivery & Integration Layer                    │  │
│  │   REST API · Reports (JSON/TXT/DOCX) · Webhooks          │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Core Components

Component	Module	Description
Ingestion Engine	`cira/knowledge/ingestor.py`	Parses policy documents, regulatory circulars, audit reports, contracts, and structured data (PDF, DOCX, TXT, CSV, JSON, Markdown). Uses `RecursiveCharacterTextSplitter` with configurable chunk size (default 1000) and overlap (default 200).
Regulatory Knowledge Base	`cira/knowledge/vector_store.py`	A vector store of compliance frameworks and regulatory documents. Supports ChromaDB (local development) and pgvector (production). Documents are embedded with source metadata for citation.
Reasoning Engine	`cira/graph.py`	A LangGraph-powered agentic loop using the ReAct pattern. The agent decomposes compliance obligations, invokes tools to retrieve evidence, runs gap analysis, assigns risk ratings, and generates source-cited findings.
Validation Module	Embedded in prompts	Cross-references all factual and regulatory claims against the knowledge base before output. Unverifiable claims are flagged with `LOW` confidence — never stated as fact.
Audit Trail Module	`cira/tools/audit_logger.py`	Every agent action is logged with full provenance: timestamp (UTC), action type, detail, confidence score, input summary, regulatory reference, and metadata. Stored as append-only JSONL files (one per day).
Delivery Layer	`cira/api/routes.py`, `cira/tools/report_gen.py`	Outputs findings as structured JSON (API), narrative text reports, DOCX documents, or real-time webhook alerts. REST API with OpenAPI/Swagger documentation.

Agent Reasoning Loop

CIRA uses a ReAct (Reasoning + Acting) loop implemented with LangGraph:

┌──────────┐
│          │
│  START   │
│          │
└────┬─────┘
     │
     ▼
┌──────────┐     Has tool calls?      ┌──────────┐
│          │─────── YES ──────────────▶│          │
│  Agent   │                           │  Tools   │
│  (LLM)   │◀──────────────────────────│  Execute │
│          │                           │          │
└────┬─────┘                           └──────────┘
     │
     │ No tool calls (final answer)
     ▼
┌──────────┐
│          │
│   END    │
│          │
└──────────┘

The Agent node receives the user query + system prompt and decides whether to invoke a tool or produce a final answer.
If tool calls are present, the Tools node executes them (gap analysis search, document parsing, risk scoring, etc.) and returns results.
The loop continues — the agent reasons over tool outputs, possibly calling additional tools — until it produces a final response with no tool calls.
Every tool invocation is logged to the audit trail with provenance metadata.

Tech Stack

Layer	Technology	Purpose
Agent Framework	LangGraph 0.2+	Stateful agent graph with ReAct loop
LLM Orchestration	LangChain 0.3+	Tool binding, message handling, provider abstractions
LLM (Local)	Ollama	`llama3.1`, `qwen2.5`, `mistral` — runs fully offline
LLM (Cloud)	Azure OpenAI · OpenAI · Anthropic Claude	Configurable via single env var
Embeddings	Ollama (`nomic-embed-text`) · Azure OpenAI · OpenAI	Document embedding for RAG
Vector Store	ChromaDB 0.5+	Local development vector store
Vector Store (Prod)	pgvector	PostgreSQL-based production vector store
Document Parsing	PyMuPDF · python-docx · LangChain loaders	PDF, DOCX, TXT, CSV, JSON, Markdown
API Layer	FastAPI 0.115+	REST API with OpenAPI docs
Validation	Pydantic 2.0+	Request/response schemas and config
Configuration	Pydantic Settings	Type-safe env var management
Logging	structlog	JSON-structured logging
HTTP	httpx	Webhook delivery
Auth	API Key header (`X-API-Key`)	Development authentication
Containerization	Docker + Docker Compose	Production deployment

Getting Started

Prerequisites

Requirement	Version	Notes
Python	3.11+	Required
Ollama	Latest	Required for local LLM mode
Git	Any	Required
Docker	Latest	Optional — for containerized deployment

Installation

1. Clone the Repository

git clone https://github.com/mabualzait/cira-agent.git
cd cira-agent

2. Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

3. Install Dependencies

pip install -r requirements.txt

Or using the Makefile:

make install

4. Pull a Local Model via Ollama

# Recommended for compliance reasoning tasks
ollama pull llama3.1

# Recommended for embeddings
ollama pull nomic-embed-text

# Alternative: stronger tool-calling model
ollama pull qwen2.5:14b

5. Configure Environment

cp .env.example .env

Edit .env with your preferred settings. At minimum, the defaults work with Ollama running locally. See Configuration Reference for all options.

Configuration Reference

The .env file controls all application behavior. The minimal configuration for local development:

LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1
OLLAMA_BASE_URL=http://localhost:11434
VECTOR_STORE=chroma

For Azure OpenAI:

LLM_PROVIDER=azure_openai
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key-here
AZURE_OPENAI_DEPLOYMENT=gpt-4o
AZURE_OPENAI_API_VERSION=2024-02-01

For OpenAI:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

For Anthropic:

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Seeding the Knowledge Base

Drop your regulatory documents (PDF, DOCX, TXT, CSV, JSON, Markdown) into ./data/regulatory_docs/, then ingest:

python scripts/ingest.py

With custom options:

python scripts/ingest.py \
  --dir ./path/to/custom/docs \
  --collection my_collection \
  --chunk-size 1500 \
  --chunk-overlap 300

The ingestion script will:

Recursively scan the directory for supported file types.
Parse each file and extract text content.
Split content into chunks using RecursiveCharacterTextSplitter.
Embed chunks and store them in the configured vector store.
Enrich each chunk with source metadata (filename, page number, file type).

Running the Application

# Start the FastAPI server (with auto-reload)
python main.py

# Or run the interactive CLI
python cli.py

# Or run a single query
python cli.py --query "Identify gaps in our third-party risk policy against ISO 31000"

Using the Makefile:

make run     # Start API server
make cli     # Interactive CLI
make ingest  # Run ingestion pipeline

Usage Guide

Interactive CLI

The CLI provides a REPL interface for compliance queries:

python cli.py

   _____ _____ _____            
  / ____|_   _|  __ \     /\    
 | |      | | | |__) |   /  \   
 | |      | | |  _  /   / /\ \  
 | |____ _| |_| | \ \  / ____ \ 
  \_____|_____|_|  \_\/_/    \_\
                                 
  Compliance Intelligence & Risk Agent
  Type 'help' for commands, 'exit' to quit.

cira> Identify gaps in our information security policy against ISO 27001 Annex A

CLI commands:

Command	Description
`help`	Show available commands
`exit` / `quit`	Exit the CLI
`clear`	Clear the screen
Any text	Submit as a compliance query

Single query mode:

python cli.py --query "What are the key data privacy requirements under UAE PDPL?"

Python SDK

The CIRAAgent class provides a programmatic interface:

from cira import CIRAAgent

agent = CIRAAgent()

Gap Analysis

result = agent.run(
    task="gap_analysis",
    input={
        "document": "path/to/policy.pdf",
        "framework": "ISO_27001",
        "scope": "information_security"
    }
)

print(result.findings)       # Detailed gap analysis with citations
print(result.success)        # True if findings were generated
print(result.report)         # Narrative report text
print(result.raw_messages)   # Full LangGraph message history

Vendor Risk Scoring

result = agent.run(
    task="vendor_risk_score",
    input={
        "vendor_name": "Acme Cloud Services",
        "contract_path": "contracts/acme_2024.pdf",
        "risk_domains": ["data_privacy", "financial_stability", "operational", "cyber"]
    }
)

print(result.findings)  # Risk assessment with per-domain analysis

Simple Query

answer = agent.query("What are the key requirements of ISO 27001 Section A.8?")
print(answer)

CIRAResult Object

All agent.run() calls return a CIRAResult with the following fields:

Field	Type	Description
`findings`	`str`	The agent's analysis output with citations
`risk_ratings`	`dict[str, str]`	Risk ratings per finding (CRITICAL/HIGH/MEDIUM/LOW)
`audit_trail`	`list[dict]`	Audit events generated during this run
`report`	`str`	Narrative summary (same as findings by default)
`raw_messages`	`list`	Full LangGraph message history for debugging
`success`	`bool` (property)	`True` if findings are non-empty

REST API Reference

Start the API server:

python main.py

The server starts at http://localhost:8000 with auto-generated docs at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

All endpoints except /health require the X-API-Key header.

`GET /health` — Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "llm_provider": "ollama",
  "vector_store": "chroma",
  "timestamp": "2026-03-05T06:36:00Z"
}

`POST /api/v1/analyze` — Submit Analysis Task

curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "task": "regulatory_gap_analysis",
    "document_path": "data/uploads/policy.pdf",
    "framework": "NESA_IA",
    "scope": "information_security",
    "output_format": "json"
  }'

Request body (AnalyzeRequest):

Field	Type	Required	Default	Description
`task`	`string`	✅	—	Task type: `regulatory_gap_analysis`, `vendor_risk_score`, `query`
`document_path`	`string`	❌	`null`	Path to document to analyze
`framework`	`string`	❌	`ISO_27001`	Target compliance framework
`scope`	`string`	❌	`general`	Compliance domain scope
`query`	`string`	❌	`null`	Natural-language query (for `task=query`)
`vendor_name`	`string`	❌	`null`	Vendor name (for risk scoring tasks)
`risk_domains`	`string[]`	❌	`null`	Risk domains to evaluate
`output_format`	`string`	❌	`json`	Output: `json`, `text`, or `docx`

Response body (AnalyzeResponse):

{
  "task": "regulatory_gap_analysis",
  "status": "completed",
  "findings": "## Gap Analysis Results\n...",
  "report_path": null,
  "audit_trail_count": 0,
  "timestamp": "2026-03-05T06:36:00Z"
}

`POST /api/v1/ingest` — Ingest Documents

curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "directory": "./data/regulatory_docs",
    "collection_name": "regulatory_docs",
    "chunk_size": 1000,
    "chunk_overlap": 200
  }'

Response:

{
  "status": "completed",
  "chunks_ingested": 342,
  "collection_name": "regulatory_docs",
  "timestamp": "2026-03-05T06:36:00Z"
}

`GET /api/v1/audit-trail` — Query Audit Events

curl "http://localhost:8000/api/v1/audit-trail?date=2026-03-05&limit=50" \
  -H "X-API-Key: your-api-key"

Query parameters:

Param	Type	Default	Description
`date`	`string`	Today (UTC)	Date in `YYYY-MM-DD` format
`action`	`string`	`null`	Filter by action type
`limit`	`int`	`100`	Maximum events to return

Response:

{
  "date": "2026-03-05",
  "events": [
    {
      "timestamp": "2026-03-05T06:30:00+00:00",
      "action": "gap_analysis",
      "detail": "Retrieved 8 passages for ISO_27001/information_security",
      "confidence": "HIGH",
      "input_summary": null,
      "regulatory_ref": null,
      "metadata": {}
    }
  ],
  "total": 1
}

Project Structure

Directory Layout

cira-agent/
├── main.py                        # FastAPI application entry point
├── cli.py                         # Interactive CLI with REPL and --query mode
├── cira/                          # Core Python package
│   ├── __init__.py                # Package root — exports CIRAAgent, __version__
│   ├── config.py                  # Pydantic Settings — all configuration
│   ├── agent.py                   # CIRAAgent class — public SDK interface
│   ├── graph.py                   # LangGraph ReAct agent graph
│   ├── logging.py                 # Structured logging (structlog) setup
│   ├── llm/                       # LLM provider abstraction
│   │   ├── __init__.py
│   │   ├── provider.py            # Factory: get_chat_model(), get_embeddings()
│   │   └── prompts.py             # All prompt templates (system, gap, risk, etc.)
│   ├── knowledge/                 # Knowledge base & vector store
│   │   ├── __init__.py
│   │   ├── vector_store.py        # ChromaDB / pgvector abstraction
│   │   ├── ingestor.py            # Document ingestion pipeline
│   │   └── frameworks/            # Built-in framework definition files
│   │       └── .gitkeep
│   ├── tools/                     # LangGraph tools (agent capabilities)
│   │   ├── __init__.py            # Exports ALL_TOOLS list
│   │   ├── gap_analysis.py        # @tool — compliance gap analysis
│   │   ├── risk_scorer.py         # @tool — vendor risk scoring
│   │   ├── doc_parser.py          # @tool — document parsing
│   │   ├── audit_logger.py        # Audit trail (not a @tool — called internally)
│   │   ├── alert_engine.py        # @tool — compliance alerting + webhooks
│   │   └── report_gen.py          # @tool — report generation (JSON/TXT/DOCX)
│   └── api/                       # REST API layer
│       ├── __init__.py
│       ├── routes.py              # FastAPI route definitions + auth
│       └── schemas.py             # Pydantic request/response schemas
├── data/                          # Runtime data directories
│   ├── regulatory_docs/           # Drop your documents here for ingestion
│   │   └── .gitkeep
│   ├── chroma/                    # ChromaDB persistent store (git-ignored)
│   └── uploads/                   # Temporary upload directory
│       └── .gitkeep
├── scripts/                       # Utility scripts
│   ├── ingest.py                  # Knowledge base ingestion CLI
│   └── benchmark.py               # Performance benchmarking
├── tests/                         # Test suite
│   ├── __init__.py
│   ├── test_agent.py              # Agent + CIRAResult unit tests
│   ├── test_tools.py              # Tool unit tests (mocked vector store)
│   └── test_api.py                # API integration tests (FastAPI TestClient)
├── .env.example                   # Environment variable template (documented)
├── .gitignore                     # Git ignore rules
├── requirements.txt               # Python dependencies (pinned ranges)
├── pyproject.toml                 # Project metadata, pytest/ruff/mypy config
├── Dockerfile                     # Production container image
├── docker-compose.yml             # Multi-service orchestration (app + Ollama)
├── Makefile                       # Developer commands (install/run/test/lint)
├── LICENSE                        # MIT License
└── README.md                      # This file

Module Reference

`cira/config.py` — Configuration

The single source of truth for all application settings. Uses Pydantic Settings to load and validate environment variables with type safety.

from cira.config import settings

# Access any setting
print(settings.llm_provider)       # LLMProvider.OLLAMA
print(settings.ollama_model)       # "llama3.1"
print(settings.vector_store)       # VectorStoreBackend.CHROMA
print(settings.api_port)           # 8000

# Ensure runtime directories exist
settings.ensure_directories()

Key design decisions:

Enum types (LLMProvider, EmbeddingProvider, VectorStoreBackend) prevent typos and enable IDE autocomplete.
SettingsConfigDict with case_sensitive=False and extra="ignore" for flexible env var handling.
A module-level settings singleton is instantiated on import — shared across the entire app.

`cira/llm/provider.py` — LLM Factory

Two factory functions with lazy imports — only the selected provider's SDK is loaded at runtime:

from cira.llm.provider import get_chat_model, get_embeddings

llm = get_chat_model()        # Returns BaseChatModel
embeddings = get_embeddings()  # Returns Embeddings

Provider selection is automatic based on LLM_PROVIDER and EMBEDDING_PROVIDER env vars.

`cira/graph.py` — Agent Graph

Builds and compiles the LangGraph agent with:

A system prompt (cira/llm/prompts.py) that defines CIRA's compliance reasoning rules.
All tools from cira/tools/ bound to the LLM via llm.bind_tools().
Conditional routing: tool calls → tool execution → back to agent, or final answer → end.

`cira/agent.py` — Public Interface

The CIRAAgent class wraps the graph and provides:

run(task, input) → CIRAResult — structured task execution
query(question) → str — simple question answering
Automatic task-to-message translation for different task types
Audit trail logging for task start/completion

Configuration Deep Dive

LLM Providers

CIRA supports four LLM providers. Set LLM_PROVIDER in your .env:

Provider	Value	Requires	Best For
Ollama	`ollama`	Ollama running locally	Privacy, offline use, development
Azure OpenAI	`azure_openai`	Azure subscription + deployment	Enterprise with Azure
OpenAI	`openai`	OpenAI API key	Quick start, strong reasoning
Anthropic	`anthropic`	Anthropic API key	Long-context analysis

Embedding Providers

Set EMBEDDING_PROVIDER independently of the LLM provider:

Provider	Value	Default Model
Ollama	`ollama`	`nomic-embed-text`
Azure OpenAI	`azure_openai`	(set via `AZURE_OPENAI_EMBEDDING_DEPLOYMENT`)
OpenAI	`openai`	`text-embedding-3-small`

Vector Store Backends

Backend	Value	Use Case
ChromaDB	`chroma`	Local development, small-medium datasets
pgvector	`pgvector`	Production, large datasets, concurrent access

Environment Variables Reference

Variable	Type	Default	Description
`LLM_PROVIDER`	enum	`ollama`	LLM backend: `ollama`, `azure_openai`, `openai`, `anthropic`
`OLLAMA_MODEL`	string	`llama3.1`	Ollama model name
`OLLAMA_BASE_URL`	string	`http://localhost:11434`	Ollama API endpoint
`AZURE_OPENAI_ENDPOINT`	string	—	Azure OpenAI resource URL
`AZURE_OPENAI_API_KEY`	string	—	Azure OpenAI API key
`AZURE_OPENAI_DEPLOYMENT`	string	—	Azure OpenAI deployment name
`AZURE_OPENAI_API_VERSION`	string	`2024-02-01`	Azure API version
`OPENAI_API_KEY`	string	—	OpenAI API key
`OPENAI_MODEL`	string	`gpt-4o`	OpenAI model name
`ANTHROPIC_API_KEY`	string	—	Anthropic API key
`ANTHROPIC_MODEL`	string	`claude-3-5-sonnet-20241022`	Anthropic model name
`EMBEDDING_PROVIDER`	enum	`ollama`	Embedding backend: `ollama`, `azure_openai`, `openai`
`OLLAMA_EMBEDDING_MODEL`	string	`nomic-embed-text`	Ollama embedding model
`OPENAI_EMBEDDING_MODEL`	string	`text-embedding-3-small`	OpenAI embedding model
`VECTOR_STORE`	enum	`chroma`	Vector store: `chroma`, `pgvector`
`CHROMA_PERSIST_DIR`	path	`./data/chroma`	ChromaDB storage directory
`PGVECTOR_CONNECTION_STRING`	string	—	PostgreSQL connection string for pgvector
`KNOWLEDGE_BASE_DIR`	path	`./data/regulatory_docs`	Directory for document ingestion
`API_HOST`	string	`0.0.0.0`	API server bind address
`API_PORT`	int	`8000`	API server port
`API_KEY`	string	`changeme-in-production`	API authentication key
`LOG_LEVEL`	enum	`INFO`	Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
`LOG_DIR`	path	`./logs`	Log file directory
`AUDIT_TRAIL_DIR`	path	`./data/audit_trail`	Audit trail storage directory

Tools Reference

CIRA's agent has access to the following tools, each implemented as a LangChain @tool:

`gap_analysis_tool`

Module: cira/tools/gap_analysis.py

Searches the regulatory knowledge base for compliance framework requirements relevant to a query.

Parameter	Type	Default	Description
`query`	`str`	(required)	The compliance question or policy description
`framework`	`str`	`ISO_27001`	Target compliance framework
`scope`	`str`	`general`	Compliance domain scope

Returns formatted regulatory excerpts with source citations (filename, page number).

`vendor_risk_tool`

Module: cira/tools/risk_scorer.py

Retrieves compliance information for multi-domain vendor risk assessment.

Parameter	Type	Default	Description
`vendor_name`	`str`	(required)	Name of the vendor
`risk_domains`	`str`	`data_privacy,operational,cyber`	Comma-separated risk domains
`context`	`str`	`""`	Additional vendor context

Default risk domain weights:

Domain	Weight
`data_privacy`	25%
`cyber`	25%
`operational`	20%
`financial_stability`	15%
`contractual`	15%

`parse_document_tool`

Module: cira/tools/doc_parser.py

Parses a document file and returns extracted text. Supported formats: PDF, DOCX, TXT, CSV, JSON, Markdown. Content is truncated at 50,000 characters for LLM context window safety.

`send_alert_tool`

Module: cira/tools/alert_engine.py

Sends a compliance alert with optional webhook delivery.

Parameter	Type	Default	Description
`title`	`str`	(required)	Alert title
`severity`	`str`	(required)	`CRITICAL` / `HIGH` / `MEDIUM` / `LOW`
`description`	`str`	(required)	Alert details
`framework`	`str`	`""`	Related framework
`webhook_url`	`str`	`""`	URL for HTTP POST delivery

`generate_report_tool`

Module: cira/tools/report_gen.py

Generates exportable compliance reports in JSON, plain text, or DOCX format. Reports are saved to ./data/reports/ with timestamped filenames.

Audit Trail System

CIRA maintains an immutable, append-only audit trail for every agent action. This is critical for GRC compliance — every recommendation, finding, and decision has full provenance.

Storage Format

Audit events are stored as JSONL (one JSON object per line) in daily files:

data/audit_trail/
├── audit_2026-03-04.jsonl
├── audit_2026-03-05.jsonl
└── ...

Event Schema

{
  "timestamp": "2026-03-05T06:30:00+00:00",
  "action": "gap_analysis",
  "detail": "Retrieved 8 passages for ISO_27001/information_security",
  "confidence": "HIGH",
  "input_summary": "Perform a compliance gap analysis...",
  "regulatory_ref": "ISO 27001:2022 Annex A.8",
  "metadata": {}
}

Field	Description
`timestamp`	UTC ISO 8601 timestamp
`action`	Action type (e.g., `gap_analysis`, `vendor_risk_assessment`, `alert_sent`, `report_generated`, `task_started:`, `task_completed:`)
`detail`	Human-readable description of what happened
`confidence`	Agent confidence level: `HIGH`, `MEDIUM`, `LOW`
`input_summary`	Summary of the input that triggered this action
`regulatory_ref`	Specific regulatory clause or framework reference
`metadata`	Additional structured data

Export

Audit trails can be exported as formatted JSON:

from cira.tools.audit_logger import export_audit_trail

export_audit_trail(date="2026-03-05", output_path="./exports/audit.json")

Or retrieved programmatically:

from cira.tools.audit_logger import get_audit_trail

events = get_audit_trail(date="2026-03-05", action_filter="gap_analysis", limit=50)

Compliance Frameworks

CIRA supports any compliance framework through its RAG knowledge base. The following are reference frameworks the system is optimized for:

Category	Frameworks
Information Security	ISO/IEC 27001:2022, NESA UAE Information Assurance Standards
Risk Management	ISO 31000:2018, COBIT 2019
Business Continuity	ISO 22301:2019
ESG & Sustainability	GRI Standards 2021, TCFD, SASB
Data Privacy	UAE PDPL, Saudi PDPL, GDPR (reference)
Quality Management	ISO 9001:2015
HSE	ISO 45001:2018

Adding New Frameworks

Place the framework documents (PDF, DOCX, TXT) in ./data/regulatory_docs/
Run the ingestion pipeline:
```
python scripts/ingest.py
```
The documents are automatically chunked, embedded, and indexed.
The agent will now include them in its knowledge retrieval.

You can organize documents into subdirectories — the ingestion script scans recursively:

data/regulatory_docs/
├── iso_27001/
│   ├── ISO_27001_2022_full.pdf
│   └── annex_a_controls.pdf
├── uae_pdpl/
│   └── UAE_PDPL_2021.pdf
└── internal/
    ├── company_security_policy.docx
    └── vendor_risk_procedures.pdf

Deployment

Docker

Build and run a standalone container:

docker build -t cira-agent .
docker run -p 8000:8000 --env-file .env cira-agent

The Dockerfile uses python:3.11-slim and includes system dependencies for document parsing (poppler, tesseract).

Docker Compose

The included docker-compose.yml runs CIRA with Ollama:

# Start all services
docker compose up -d

# View logs
docker compose logs -f cira

# Stop
docker compose down

Services:

Service	Description	Port
`cira`	CIRA API server	`8000`
`ollama`	Local LLM server	`11434`

For production with pgvector, uncomment the postgres service in docker-compose.yml and set:

VECTOR_STORE=pgvector
PGVECTOR_CONNECTION_STRING=postgresql://cira:changeme@postgres:5432/cira

Production Considerations

API Key: Change API_KEY from the default to a secure random string.
CORS: Restrict allow_origins in main.py from ["*"] to your specific domains.
HTTPS: Deploy behind a reverse proxy (nginx, Caddy) with TLS termination.
Vector Store: Switch from ChromaDB to pgvector for concurrent access and durability.
Logging: Set LOG_LEVEL=WARNING in production to reduce noise.
Resources: LLM inference (especially local Ollama) is memory-intensive. Allocate at least 8GB RAM for 7B parameter models, 16GB+ for 14B models.

Testing

Run the full test suite:

pytest tests/ -v

Or using the Makefile:

make test       # Run tests
make test-cov   # Run tests with coverage report

Test Architecture

File	What It Tests	Strategy
`tests/test_agent.py`	`CIRAAgent`, `CIRAResult`	Unit tests with mocked graph
`tests/test_tools.py`	Audit logger, gap analysis tool	Unit tests with mocked vector store
`tests/test_api.py`	FastAPI endpoints	Integration tests with `TestClient`

Tests use unittest.mock.patch to isolate dependencies (LLM, vector store) so they run without an Ollama instance or real API keys.

Development

Developer Commands (Makefile)

make install     # Install dependencies
make dev         # Install deps + dev tools (ruff, mypy)
make run         # Start API server
make cli         # Interactive CLI
make ingest      # Run document ingestion
make test        # Run test suite
make test-cov    # Tests with HTML coverage report
make lint        # Lint with ruff
make format      # Auto-format with ruff
make typecheck   # Type-check with mypy
make clean       # Remove __pycache__, .pytest_cache, etc.
make docker-build # Build Docker image
make docker-up    # Start Docker Compose stack
make docker-down  # Stop Docker Compose stack

Code Quality

The project is configured with:

Ruff — linting and formatting (configured in pyproject.toml)
mypy — static type checking with disallow_untyped_defs
pytest — testing with async support

Roadmap

Contributing

Contributions are welcome. Please open an issue to discuss what you'd like to change before submitting a pull request.

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Commit your changes (git commit -m 'Add your feature')
Push to the branch (git push origin feature/your-feature)
Open a Pull Request

Guidelines

Follow the existing code style (enforced by ruff).
Add type hints to all function signatures.
Include tests for new functionality.
Update the README if you add new tools, endpoints, or configuration options.
Keep prompts in cira/llm/prompts.py — do not hardcode prompts in tool or agent code.

License

MIT License — see LICENSE for details.

Disclaimer

CIRA is a strategic advisory and monitoring tool. All outputs are informational in nature and do not constitute legal, regulatory, or professional compliance advice. Always engage qualified legal or compliance professionals for formal regulatory obligations. The confidence scores and risk ratings provided are AI-generated assessments and should be validated by domain experts before being used in formal compliance processes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cira		cira
data		data
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cli.py		cli.py
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CIRA — Compliance Intelligence & Risk Agent

Table of Contents

What is CIRA?

Why CIRA?

Key Capabilities

Architecture Overview

System Architecture

Core Components

Agent Reasoning Loop

Tech Stack

Getting Started

Prerequisites

Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Pull a Local Model via Ollama

5. Configure Environment

Configuration Reference

Seeding the Knowledge Base

Running the Application

Usage Guide

Interactive CLI

Python SDK

Gap Analysis

Vendor Risk Scoring

Simple Query

CIRAResult Object

REST API Reference

GET /health — Health Check

POST /api/v1/analyze — Submit Analysis Task

POST /api/v1/ingest — Ingest Documents

GET /api/v1/audit-trail — Query Audit Events

Project Structure

Directory Layout

Module Reference

cira/config.py — Configuration

cira/llm/provider.py — LLM Factory

cira/graph.py — Agent Graph

cira/agent.py — Public Interface

Configuration Deep Dive

LLM Providers

Embedding Providers

Vector Store Backends

Environment Variables Reference

Tools Reference

gap_analysis_tool

vendor_risk_tool

parse_document_tool

send_alert_tool

generate_report_tool

Audit Trail System

Storage Format

Event Schema

Export

Compliance Frameworks

Adding New Frameworks

Deployment

Docker

Docker Compose

Production Considerations

Testing

Test Architecture

Development

Developer Commands (Makefile)

Code Quality

Roadmap

Contributing

Guidelines

License

Disclaimer

About

Resources

License

Uh oh!

Stars

`GET /health` — Health Check

`POST /api/v1/analyze` — Submit Analysis Task

`POST /api/v1/ingest` — Ingest Documents

`GET /api/v1/audit-trail` — Query Audit Events

`cira/config.py` — Configuration

`cira/llm/provider.py` — LLM Factory

`cira/graph.py` — Agent Graph

`cira/agent.py` — Public Interface

`gap_analysis_tool`

`vendor_risk_tool`

`parse_document_tool`

`send_alert_tool`

`generate_report_tool`

Packages