Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Feature Request] Audit Logging #396

@rkochhar9

Description

@rkochhar9

Feature Request: Audit Logging Support

Purpose

An optional feature to enable audit logging for DML/DDL/User or Role management events, similar to MongoDB Auditing.

Many users require audit logging to meet compliance and security requirements (SOC2, HIPAA, PCI-DSS, GDPR). This feature would allow administrators to track database activity for security monitoring, forensic analysis, and regulatory compliance.

Desired Functionality

  • Configurable audit events (read, write, ddl, role)
  • Filtering by namespace, user, role, or command type
  • MongoDB-style audit output with database/collection context

POC Research: pgAudit Integration

I conducted initial research using pgAudit to understand what audit logging we get out of the box and its limitations.

pgAudit Configuration Used

shared_preload_libraries = 'pgaudit, pg_cron, pg_documentdb_core, pg_documentdb'
pgaudit.log = 'read, write, ddl, role'
pgaudit.log_catalog = off
pgaudit.log_parameter = on

pgAudit Filtering Capabilities

pgAudit provides some filtering options:

  • By event class: pgaudit.log can be set to any combination of read, write, ddl, role, function, misc
  • By role: pgaudit.role enables auditing only for statements executed by a specific role
  • Object-level: Can grant audit privileges on specific tables/schemas for fine-grained control

However, pgAudit cannot filter by:

  • MongoDB database or collection name (only sees PostgreSQL tables like documents_66)
  • MongoDB operation type (e.g., insertOne vs insertMany)
  • Specific document fields or values

Test Results Summary

Operations That Work

Operation Audit Class/Command
insertOne/insertMany WRITE/INSERT
updateOne WRITE/UPDATE
deleteOne/deleteMany WRITE/DELETE
find READ/SELECT
aggregate READ/SELECT
$merge (insert) WRITE/INSERT
createIndex DDL/CREATE INDEX
dropIndex DDL/DROP INDEX
createCollection DDL/CREATE TABLE
dropCollection DDL/DROP TABLE

Operations With Limitations

Operation Expected Actual Issue
updateMany WRITE/UPDATE READ/SELECT CTE pattern misclassified
$merge (update) WRITE/UPDATE No entry Not audited at all
renameCollection DDL/ALTER TABLE WRITE/UPDATE Catalog update, not DDL

Sample Audit Log Output

insertOne

Document data is hex-encoded BSON in the audit entry.

Mongo: db.demo.insertOne({_id: 1, name: "test_user", email: "[email protected]"})

[POSTGRES] 2025-12-18 21:10:44.314 UTC [4816] LOG:  AUDIT: SESSION,89,36,WRITE,INSERT,,,,"\x0b00000010000100000000,\x3e000000105f69640001000000026e616d65000a000000746573745f757365720002656d61696c001100000074657374406578616d706c652e636f6d0000"

insertMany

Multiple documents in single INSERT.

Mongo: db.demo.insertMany([{_id: 2, name: "user2"}, {_id: 3, name: "user3"}])

[POSTGRES] 2025-12-18 21:10:44.320 UTC [4818] LOG:  AUDIT: SESSION,81,3,WRITE,INSERT,,,,"\x0b00000010000200000000,\x1e000000105f69640002000000026e616d65000600000075736572320000,\x0b00000010000300000000,\x1e000000105f69640003000000026e616d65000600000075736572330000"

updateOne

Shows UPDATE with full document in hex-encoded BSON.

Mongo: db.demo.updateOne({_id: 1}, {$set: {status: "active"}})

[POSTGRES] 2025-12-18 21:10:44.326 UTC [4816] LOG:  AUDIT: SESSION,90,4,WRITE,UPDATE,,,UPDATE documentdb_data.documents_66 SET document = $3::documentdb_core.bson WHERE ctid = $2 AND shard_key_value = $1,"66,""(0,1)"",\x51000000105f69640001000000026e616d65000a000000746573745f757365720002656d61696c001100000074657374406578616d706c652e636f6d000273746174757300070000006163746976650000"

updateMany (Known Limitation)

Uses CTE pattern - appears as READ/SELECT instead of WRITE/UPDATE.

Mongo: db.demo.updateMany({}, {$set: {bulk: true}})

[POSTGRES] 2025-12-18 21:10:44.334 UTC [4818] LOG:  AUDIT: SESSION,82,2,READ,SELECT,,,"WITH u AS (UPDATE documentdb_data.documents_66 SET document = (SELECT COALESCE(newDocument, document) FROM documentdb_api_internal.bson_update_document(document, $2::documentdb_core.bson, $1::documentdb_core.bson, $3::documentdb_core.bson, false)) WHERE document OPERATOR(documentdb_api_catalog.@@) $1::documentdb_core.bson AND shard_key_value = $4  RETURNING documentdb_api_internal.bson_update_returned_value(shard_key_value) as updated) SELECT COUNT(*), SUM(updated) FROM u","\x0500000000,\x1e0000000300170000000324736574000c0000000862756c6b0001000000,,66"

deleteOne

Direct DELETE statement captured.

Mongo: db.demo.deleteOne({_id: 3})

[POSTGRES] 2025-12-18 21:10:44.338 UTC [4816] LOG:  AUDIT: SESSION,91,2,WRITE,DELETE,,,WITH s AS MATERIALIZED (SELECT ctid FROM  documentdb_data.documents_66 WHERE shard_key_value = $1 AND document OPERATOR(documentdb_api_catalog.@@) $2::documentdb_core.bson AND object_id OPERATOR(documentdb_core.=) $3::documentdb_core.bson LIMIT 1 FOR UPDATE) DELETE FROM documentdb_data.documents_66 d USING s WHERE d.ctid = s.ctid AND shard_key_value = $1 RETURNING object_id,"66,BSONHEX0e000000105f6964000300000000,\x0b00000010000300000000,"

find (Read)

Mongo: db.demo.find({})

[POSTGRES] 2025-12-18 21:10:44.342 UTC [4818] LOG:  AUDIT: SESSION,83,2,READ,SELECT,,,,BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000

aggregate

Mongo: db.demo.aggregate([{$match: {status: "active"}}])

[POSTGRES] 2025-12-18 21:10:44.344 UTC [4816] LOG:  AUDIT: SESSION,92,2,READ,SELECT,,,,BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000

createIndex

Mongo: db.demo.createIndex({name: 1})

[POSTGRES] 2025-12-18 21:10:44.726 UTC [5437] LOG:  AUDIT: SESSION,1,1,DDL,CREATE INDEX,INDEX,documentdb_data.documents_rum_index_79,"CREATE INDEX CONCURRENTLY documents_rum_index_79 ON documentdb_data.documents_66 USING documentdb_rum ( document documentdb_api_catalog.bson_rum_single_path_ops(path='name',tl=2699))",<none>

dropIndex

Mongo: db.demo.dropIndex("name_1")

[POSTGRES] 2025-12-18 21:10:45.363 UTC [4816] LOG:  AUDIT: SESSION,94,6,DDL,DROP INDEX,INDEX,documentdb_data.documents_rum_index_79,DROP INDEX  IF EXISTS documentdb_data.documents_rum_index_79,<none>

dropCollection

Mongo: db.demo.drop()

[POSTGRES] 2025-12-18 21:10:45.370 UTC [4816] LOG:  AUDIT: SESSION,95,3,DDL,DROP TABLE,TABLE,documentdb_data.documents_66,DROP TABLE IF EXISTS documentdb_data.documents_66,<none>

Known Limitations with pgAudit

1. updateMany appears as READ/SELECT

The updateMany operation uses a CTE pattern (WITH u AS (UPDATE...) SELECT FROM u), which pgAudit classifies as READ/SELECT rather than WRITE/UPDATE.

2. $merge (whenMatched: "merge") generates no UPDATE audit

When $merge updates existing documents, the UPDATE happens internally within the aggregate_cursor_first_page function and does NOT trigger pgAudit hooks. Only the aggregate call is logged as READ/SELECT.

3. renameCollection logged as WRITE/UPDATE

renameCollection is implemented as an UPDATE on documentdb_api_catalog.collections table rather than DDL/ALTER TABLE.

4. Missing MongoDB context

Audit entries show PostgreSQL table names (e.g., documentdb_data.documents_66) instead of MongoDB database/collection names. Clients familiar with MongoDB may prefer MongoDB-style context.

5. Document data is hex-encoded BSON

All document content appears as hex-encoded BSON, requiring decoding to be human-readable.


Discussion Points

  1. Log format: Should we support MongoDB-style audit format, SQL format, or both?
  2. Filtering: How should namespace/user/role filtering be implemented?
  3. Performance: pgAudit is synchronous, which adds latency to every audited operation. Should we support async auditing for high-throughput workloads? What is acceptable overhead?
  4. Edge cases: How to handle disk full scenarios or log rotation under heavy load?
  5. Packaging: Should this be part of the core engine, a separate extension, or a fork of pgAudit?
  6. Log destination: By default pgAudit logs to PostgreSQL server log. Should we support separate audit log files?
  7. Session correlation: Audit entries currently show PostgreSQL backend PIDs but not MongoDB session IDs or authenticated users. Correlating audit events to original clients may depend on the RBAC implementation.
  8. Log integrity: Audit logs are self-managed by the user. Worth considering how to ensure logs cannot be tampered with or deleted by an attacker (e.g., streaming to external SIEM, write-once storage).

Test Script

A test script is available at audit_test_workload.py that exercises various MongoDB operations and verifies corresponding pgAudit entries. It includes installation instructions for pgAudit in the Docker container.


Possible Architecture Approaches

Several approaches could address the limitations found in the POC:

  1. pgAudit with enhancements - Use pgAudit as the foundation but add a translation layer to map PostgreSQL table names back to MongoDB database/collection names and decode BSON for readability.

  2. DocumentDB API layer hooks - Emit audit events at the DocumentDB API layer before SQL translation. This would capture MongoDB-native context (db, collection, operation) but would not audit direct PostgreSQL access.

  3. Gateway-level auditing - Add audit logging in the gateway component, capturing the wire protocol commands. Provides full MongoDB context but misses any operations that bypass the gateway.

  4. Hybrid approach - Combine gateway-level logging for MongoDB context with pgAudit for SQL-level completeness, using correlation IDs to link the two.

audit_test_workload.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions