feat(ts): MeshClient — relay transport with plaintext peers, KNOCK queue, wsFactory#1301
Conversation
…eue, wsFactory High-level mesh client for the TS SDK, addressing three AzureClaw compatibility requirements: - plaintextPeers: bypass E2E encryption for legacy peers (Rust controller uses base64(JSON), not Signal). addPlaintextPeer/ removePlaintextPeer/isPlaintextPeer API. - wsFactory: custom WebSocket constructor hook for HTTPS_PROXY CONNECT tunneling (Node 22 global fetch/undici quirk). - KNOCK pending queue: when a message arrives for a peer with an in-flight KNOCK, await resolution instead of rejecting. Fixes the race condition documented in vendored patch #5. Also handles: - Session reuse (returns existing session, no crash — patch #10) - Buffer-based base64 (avoids stack overflow on >100KB — patch #9) - Heartbeat sending Clean-room: implements against Wire Protocol spec Sections 9, 10, 12. Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Code Review for MeshClient Implementation
🔴 CRITICAL: Security Issues
-
Plaintext Peer Support (
plaintextPeers)- Allowing plaintext communication for specific peers (
plaintextPeers) introduces a significant security risk. This bypasses end-to-end encryption (E2EE) for certain peers, which could lead to sensitive data being transmitted in plaintext. This feature should be avoided unless absolutely necessary. If it must be included, it should be disabled by default, and its usage should be logged and monitored. Additionally, there should be a clear warning in the documentation about the risks of enabling this feature.
- Allowing plaintext communication for specific peers (
-
Lack of Authentication for
plaintextPeers- The
plaintextPeersfeature does not include any mechanism to authenticate the identity of the peer. This opens the door to impersonation attacks, where an attacker could spoof a plaintext peer's identity and intercept sensitive communications. If plaintext communication is unavoidable, implement a robust authentication mechanism to verify the identity of the peer.
- The
-
Potential Replay Attacks
- The
handleMessagefunction does not include any mechanism to prevent replay attacks. For encrypted messages, themessageNumberin the header could be used to detect and reject replayed messages. For plaintext messages, there is no equivalent mechanism, which makes them particularly vulnerable to replay attacks.
- The
-
KNOCK Race Condition
- While the
knockPendingmap andknockTimeoutattempt to handle the race condition between KNOCK and the first message, there is no guarantee that theknockPendingmap will be cleared in all scenarios. For example, if theknockTimeoutexpires but the message is still processed, theknockPendingentry might not be cleared, leading to memory leaks or incorrect behavior. Ensure thatknockPendingis always cleared, even in edge cases.
- While the
-
Lack of Rate Limiting for KNOCK Messages
- The
handleKnockfunction does not implement any rate-limiting mechanism for incoming KNOCK messages. This could allow an attacker to flood the system with KNOCK requests, leading to a denial-of-service (DoS) attack. Implement rate limiting to mitigate this risk.
- The
-
Lack of Validation for Incoming Frames
- The
handleFrameandhandleMessagefunctions do not perform sufficient validation on incoming frames. For example, theframeobject is accessed directly without verifying the presence or type of required fields liketype,from,ciphertext, etc. This could lead to runtime errors or even security vulnerabilities if maliciously crafted frames are received. Use a schema validation library to validate incoming frames.
- The
-
Potential WebSocket Hijacking
- The
wsFactoryoption allows the injection of a custom WebSocket implementation. While this is useful for scenarios like HTTPS_PROXY, it also opens up the possibility of a malicious WebSocket implementation being injected. Consider adding a mechanism to validate or restrict thewsFactoryoption to trusted implementations.
- The
🟡 WARNING: Potential Breaking Changes
- Public API Changes
- The addition of
MeshClient,MeshClientOptions,MeshSession, andWebSocketFactoryto the public API could potentially break existing code if there are naming conflicts or if users are relying on the previous API structure. Ensure that these changes are well-documented and consider versioning the SDK appropriately.
- The addition of
💡 Suggestions for Improvement
-
Type Safety
- The
handleFramefunction usesRecord<string, unknown>for theframeparameter, which is too generic. Define a TypeScript interface for the frame structure to ensure type safety and improve code readability.
- The
-
Error Handling
- The
connectmethod does not handle WebSocket connection errors robustly. For example, if theonerrorevent is triggered, the error is passed to therejectfunction, but there is no retry mechanism or detailed logging. Consider implementing a retry mechanism with exponential backoff for connection attempts.
- The
-
Logging
- Add logging for critical events such as connection establishment, disconnection, session creation, and errors. This will aid in debugging and monitoring the system in production.
-
Documentation
- The
MeshClientclass and its methods are partially documented, but some critical details are missing. For example:- Explain the security implications of using
plaintextPeers. - Document the expected structure of the
frameobject inhandleFrame. - Provide examples of how to use the
MeshClientclass, including setting up encrypted and plaintext sessions.
- Explain the security implications of using
- The
-
Thread Safety
- The
knockPendingandknockAcceptedmaps are accessed and modified in multiple asynchronous contexts. While JavaScript's single-threaded nature reduces the risk of race conditions, consider using a locking mechanism or atomic operations to ensure thread safety, especially if the code is run in environments with true multithreading (e.g., Node.js worker threads).
- The
-
Backward Compatibility
- If this feature is intended to replace existing functionality, ensure that the transition is smooth for existing users. Provide migration guides and deprecate old APIs gracefully.
-
Unit Tests
- The PR does not include any unit tests for the
MeshClientclass. Given the complexity and security implications of this feature, comprehensive tests are essential. Focus on:- Validating the behavior of
plaintextPeers. - Testing the
connect,disconnect, andsendmethods under various scenarios, including error conditions. - Ensuring the
handleFrameandhandleMessagemethods correctly process valid frames and reject invalid ones. - Verifying the behavior of the KNOCK mechanism, including timeouts and race conditions.
- Validating the behavior of
- The PR does not include any unit tests for the
-
Use of
crypto.randomUUID- While
crypto.randomUUIDis a secure and convenient way to generate UUIDs, ensure that it is supported in all target environments of the SDK. If not, consider using a polyfill or an alternative library likeuuid.
- While
-
WebSocket Lifecycle Management
- The
disconnectmethod does not handle cases where the WebSocket is already closed or in the process of closing. Add checks to handle these scenarios gracefully.
- The
-
Memory Management
- Ensure that all resources, such as WebSocket connections and timers, are properly cleaned up when the
MeshClientinstance is destroyed or when a session is closed. This will prevent memory leaks.
- Ensure that all resources, such as WebSocket connections and timers, are properly cleaned up when the
-
Default Values
- Provide sensible default values for optional
MeshClientOptionsproperties likeknockTimeoutandplaintextPeers. This will make the API easier to use and reduce the likelihood of errors.
- Provide sensible default values for optional
Summary of Recommendations
- 🔴 Address the critical security issues, especially around plaintext communication, authentication, and replay attacks.
- 🟡 Ensure that the introduction of the
MeshClientclass does not break existing functionality or APIs. - 💡 Improve type safety, error handling, logging, documentation, and test coverage.
- 💡 Consider thread safety and memory management for asynchronous operations and resource cleanup.
Let me know if you need further clarification or assistance!
🤖 AI Agent: security-scanner — Security Review of `MeshClient` ImplementationSecurity Review of
|
| ID | Issue | Severity | Recommendation |
|---|---|---|---|
| 1 | Prompt injection defense bypass | 🔴 CRITICAL | Avoid plaintext communication or enforce strong authentication for plaintext peers. |
| 2 | Policy engine circumvention | 🔴 CRITICAL | Enforce quorum-based KNOCK evaluation and validate intent fields. |
| 3 | Trust chain weaknesses | 🔴 CRITICAL | Validate wsFactory implementations and enforce secure TLS connections. |
| 4 | Credential exposure | 🔴 CRITICAL | Encrypt sensitive data in plaintext messages or avoid plaintext communication. |
| 5 | Sandbox escape | 🟠 HIGH | Validate wsFactory to prevent arbitrary code execution. |
| 6 | Deserialization attacks | 🟠 HIGH | Validate and sanitize JSON payloads against a strict schema. |
| 7 | Race conditions in KNOCK handling | 🟠 HIGH | Use a robust mechanism for managing KNOCK requests and ensure unique handling. |
| 8 | Supply chain risks | 🟡 MEDIUM | Pin dependencies and audit for vulnerabilities. |
| 9 | Lack of logging for security events | 🟡 MEDIUM | Implement structured logging for critical security events. |
General Recommendations
- Security Review: Conduct a thorough security review of the
plaintextPeersfeature and its implications for the overall security model. - Testing: Implement unit and integration tests to validate the security of the
MeshClientimplementation, particularly around KNOCK handling and plaintext communication. - Documentation: Clearly document the risks associated with using plaintext peers and custom WebSocket factories, and provide guidance on secure usage.
This PR introduces critical security risks that must be addressed before merging.
…eue, wsFactory (microsoft#1301) High-level mesh client for the TS SDK, addressing three AzureClaw compatibility requirements: - plaintextPeers: bypass E2E encryption for legacy peers (Rust controller uses base64(JSON), not Signal). addPlaintextPeer/ removePlaintextPeer/isPlaintextPeer API. - wsFactory: custom WebSocket constructor hook for HTTPS_PROXY CONNECT tunneling (Node 22 global fetch/undici quirk). - KNOCK pending queue: when a message arrives for a peer with an in-flight KNOCK, await resolution instead of rejecting. Fixes the race condition documented in vendored patch microsoft#5. Also handles: - Session reuse (returns existing session, no crash — patch microsoft#10) - Buffer-based base64 (avoids stack overflow on >100KB — patch microsoft#9) - Heartbeat sending Clean-room: implements against Wire Protocol spec Sections 9, 10, 12. Co-authored-by: Copilot <[email protected]>
Adds MeshClient to TS SDK: WebSocket relay transport, plaintext peer compat, KNOCK pending queue (race fix), wsFactory hook for HTTPS_PROXY. Addresses patches #5, #9, #10, #11 from AzureClaw vendor.