-
Notifications
You must be signed in to change notification settings - Fork 193
Description
Feature Request: Audit Logging Support
Purpose
An optional feature to enable audit logging for DML/DDL/User or Role management events, similar to MongoDB Auditing.
Many users require audit logging to meet compliance and security requirements (SOC2, HIPAA, PCI-DSS, GDPR). This feature would allow administrators to track database activity for security monitoring, forensic analysis, and regulatory compliance.
Desired Functionality
- Configurable audit events (read, write, ddl, role)
- Filtering by namespace, user, role, or command type
- MongoDB-style audit output with database/collection context
POC Research: pgAudit Integration
I conducted initial research using pgAudit to understand what audit logging we get out of the box and its limitations.
pgAudit Configuration Used
shared_preload_libraries = 'pgaudit, pg_cron, pg_documentdb_core, pg_documentdb'
pgaudit.log = 'read, write, ddl, role'
pgaudit.log_catalog = off
pgaudit.log_parameter = on
pgAudit Filtering Capabilities
pgAudit provides some filtering options:
- By event class:
pgaudit.logcan be set to any combination ofread,write,ddl,role,function,misc - By role:
pgaudit.roleenables auditing only for statements executed by a specific role - Object-level: Can grant audit privileges on specific tables/schemas for fine-grained control
However, pgAudit cannot filter by:
- MongoDB database or collection name (only sees PostgreSQL tables like
documents_66) - MongoDB operation type (e.g.,
insertOnevsinsertMany) - Specific document fields or values
Test Results Summary
Operations That Work
| Operation | Audit Class/Command |
|---|---|
| insertOne/insertMany | WRITE/INSERT |
| updateOne | WRITE/UPDATE |
| deleteOne/deleteMany | WRITE/DELETE |
| find | READ/SELECT |
| aggregate | READ/SELECT |
| $merge (insert) | WRITE/INSERT |
| createIndex | DDL/CREATE INDEX |
| dropIndex | DDL/DROP INDEX |
| createCollection | DDL/CREATE TABLE |
| dropCollection | DDL/DROP TABLE |
Operations With Limitations
| Operation | Expected | Actual | Issue |
|---|---|---|---|
| updateMany | WRITE/UPDATE | READ/SELECT | CTE pattern misclassified |
| $merge (update) | WRITE/UPDATE | No entry | Not audited at all |
| renameCollection | DDL/ALTER TABLE | WRITE/UPDATE | Catalog update, not DDL |
Sample Audit Log Output
insertOne
Document data is hex-encoded BSON in the audit entry.
Mongo: db.demo.insertOne({_id: 1, name: "test_user", email: "[email protected]"})
[POSTGRES] 2025-12-18 21:10:44.314 UTC [4816] LOG: AUDIT: SESSION,89,36,WRITE,INSERT,,,,"\x0b00000010000100000000,\x3e000000105f69640001000000026e616d65000a000000746573745f757365720002656d61696c001100000074657374406578616d706c652e636f6d0000"
insertMany
Multiple documents in single INSERT.
Mongo: db.demo.insertMany([{_id: 2, name: "user2"}, {_id: 3, name: "user3"}])
[POSTGRES] 2025-12-18 21:10:44.320 UTC [4818] LOG: AUDIT: SESSION,81,3,WRITE,INSERT,,,,"\x0b00000010000200000000,\x1e000000105f69640002000000026e616d65000600000075736572320000,\x0b00000010000300000000,\x1e000000105f69640003000000026e616d65000600000075736572330000"
updateOne
Shows UPDATE with full document in hex-encoded BSON.
Mongo: db.demo.updateOne({_id: 1}, {$set: {status: "active"}})
[POSTGRES] 2025-12-18 21:10:44.326 UTC [4816] LOG: AUDIT: SESSION,90,4,WRITE,UPDATE,,,UPDATE documentdb_data.documents_66 SET document = $3::documentdb_core.bson WHERE ctid = $2 AND shard_key_value = $1,"66,""(0,1)"",\x51000000105f69640001000000026e616d65000a000000746573745f757365720002656d61696c001100000074657374406578616d706c652e636f6d000273746174757300070000006163746976650000"
updateMany (Known Limitation)
Uses CTE pattern - appears as READ/SELECT instead of WRITE/UPDATE.
Mongo: db.demo.updateMany({}, {$set: {bulk: true}})
[POSTGRES] 2025-12-18 21:10:44.334 UTC [4818] LOG: AUDIT: SESSION,82,2,READ,SELECT,,,"WITH u AS (UPDATE documentdb_data.documents_66 SET document = (SELECT COALESCE(newDocument, document) FROM documentdb_api_internal.bson_update_document(document, $2::documentdb_core.bson, $1::documentdb_core.bson, $3::documentdb_core.bson, false)) WHERE document OPERATOR(documentdb_api_catalog.@@) $1::documentdb_core.bson AND shard_key_value = $4 RETURNING documentdb_api_internal.bson_update_returned_value(shard_key_value) as updated) SELECT COUNT(*), SUM(updated) FROM u","\x0500000000,\x1e0000000300170000000324736574000c0000000862756c6b0001000000,,66"
deleteOne
Direct DELETE statement captured.
Mongo: db.demo.deleteOne({_id: 3})
[POSTGRES] 2025-12-18 21:10:44.338 UTC [4816] LOG: AUDIT: SESSION,91,2,WRITE,DELETE,,,WITH s AS MATERIALIZED (SELECT ctid FROM documentdb_data.documents_66 WHERE shard_key_value = $1 AND document OPERATOR(documentdb_api_catalog.@@) $2::documentdb_core.bson AND object_id OPERATOR(documentdb_core.=) $3::documentdb_core.bson LIMIT 1 FOR UPDATE) DELETE FROM documentdb_data.documents_66 d USING s WHERE d.ctid = s.ctid AND shard_key_value = $1 RETURNING object_id,"66,BSONHEX0e000000105f6964000300000000,\x0b00000010000300000000,"
find (Read)
Mongo: db.demo.find({})
[POSTGRES] 2025-12-18 21:10:44.342 UTC [4818] LOG: AUDIT: SESSION,83,2,READ,SELECT,,,,BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000
aggregate
Mongo: db.demo.aggregate([{$match: {status: "active"}}])
[POSTGRES] 2025-12-18 21:10:44.344 UTC [4816] LOG: AUDIT: SESSION,92,2,READ,SELECT,,,,BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000
createIndex
Mongo: db.demo.createIndex({name: 1})
[POSTGRES] 2025-12-18 21:10:44.726 UTC [5437] LOG: AUDIT: SESSION,1,1,DDL,CREATE INDEX,INDEX,documentdb_data.documents_rum_index_79,"CREATE INDEX CONCURRENTLY documents_rum_index_79 ON documentdb_data.documents_66 USING documentdb_rum ( document documentdb_api_catalog.bson_rum_single_path_ops(path='name',tl=2699))",<none>
dropIndex
Mongo: db.demo.dropIndex("name_1")
[POSTGRES] 2025-12-18 21:10:45.363 UTC [4816] LOG: AUDIT: SESSION,94,6,DDL,DROP INDEX,INDEX,documentdb_data.documents_rum_index_79,DROP INDEX IF EXISTS documentdb_data.documents_rum_index_79,<none>
dropCollection
Mongo: db.demo.drop()
[POSTGRES] 2025-12-18 21:10:45.370 UTC [4816] LOG: AUDIT: SESSION,95,3,DDL,DROP TABLE,TABLE,documentdb_data.documents_66,DROP TABLE IF EXISTS documentdb_data.documents_66,<none>
Known Limitations with pgAudit
1. updateMany appears as READ/SELECT
The updateMany operation uses a CTE pattern (WITH u AS (UPDATE...) SELECT FROM u), which pgAudit classifies as READ/SELECT rather than WRITE/UPDATE.
2. $merge (whenMatched: "merge") generates no UPDATE audit
When $merge updates existing documents, the UPDATE happens internally within the aggregate_cursor_first_page function and does NOT trigger pgAudit hooks. Only the aggregate call is logged as READ/SELECT.
3. renameCollection logged as WRITE/UPDATE
renameCollection is implemented as an UPDATE on documentdb_api_catalog.collections table rather than DDL/ALTER TABLE.
4. Missing MongoDB context
Audit entries show PostgreSQL table names (e.g., documentdb_data.documents_66) instead of MongoDB database/collection names. Clients familiar with MongoDB may prefer MongoDB-style context.
5. Document data is hex-encoded BSON
All document content appears as hex-encoded BSON, requiring decoding to be human-readable.
Discussion Points
- Log format: Should we support MongoDB-style audit format, SQL format, or both?
- Filtering: How should namespace/user/role filtering be implemented?
- Performance: pgAudit is synchronous, which adds latency to every audited operation. Should we support async auditing for high-throughput workloads? What is acceptable overhead?
- Edge cases: How to handle disk full scenarios or log rotation under heavy load?
- Packaging: Should this be part of the core engine, a separate extension, or a fork of pgAudit?
- Log destination: By default pgAudit logs to PostgreSQL server log. Should we support separate audit log files?
- Session correlation: Audit entries currently show PostgreSQL backend PIDs but not MongoDB session IDs or authenticated users. Correlating audit events to original clients may depend on the RBAC implementation.
- Log integrity: Audit logs are self-managed by the user. Worth considering how to ensure logs cannot be tampered with or deleted by an attacker (e.g., streaming to external SIEM, write-once storage).
Test Script
A test script is available at audit_test_workload.py that exercises various MongoDB operations and verifies corresponding pgAudit entries. It includes installation instructions for pgAudit in the Docker container.
Possible Architecture Approaches
Several approaches could address the limitations found in the POC:
-
pgAudit with enhancements - Use pgAudit as the foundation but add a translation layer to map PostgreSQL table names back to MongoDB database/collection names and decode BSON for readability.
-
DocumentDB API layer hooks - Emit audit events at the DocumentDB API layer before SQL translation. This would capture MongoDB-native context (db, collection, operation) but would not audit direct PostgreSQL access.
-
Gateway-level auditing - Add audit logging in the gateway component, capturing the wire protocol commands. Provides full MongoDB context but misses any operations that bypass the gateway.
-
Hybrid approach - Combine gateway-level logging for MongoDB context with pgAudit for SQL-level completeness, using correlation IDs to link the two.