Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(ts): MeshClient — relay transport with plaintext peers, KNOCK queue, wsFactory#1301

Merged
imran-siddique merged 1 commit into
microsoft:mainfrom
imran-siddique:feat/ts-mesh-client
Apr 21, 2026
Merged

feat(ts): MeshClient — relay transport with plaintext peers, KNOCK queue, wsFactory#1301
imran-siddique merged 1 commit into
microsoft:mainfrom
imran-siddique:feat/ts-mesh-client

Conversation

@imran-siddique

Copy link
Copy Markdown
Collaborator

Adds MeshClient to TS SDK: WebSocket relay transport, plaintext peer compat, KNOCK pending queue (race fix), wsFactory hook for HTTPS_PROXY. Addresses patches #5, #9, #10, #11 from AzureClaw vendor.

…eue, wsFactory

High-level mesh client for the TS SDK, addressing three AzureClaw
compatibility requirements:

- plaintextPeers: bypass E2E encryption for legacy peers (Rust
  controller uses base64(JSON), not Signal). addPlaintextPeer/
  removePlaintextPeer/isPlaintextPeer API.
- wsFactory: custom WebSocket constructor hook for HTTPS_PROXY
  CONNECT tunneling (Node 22 global fetch/undici quirk).
- KNOCK pending queue: when a message arrives for a peer with an
  in-flight KNOCK, await resolution instead of rejecting. Fixes
  the race condition documented in vendored patch #5.

Also handles:
- Session reuse (returns existing session, no crash — patch #10)
- Buffer-based base64 (avoids stack overflow on >100KB — patch #9)
- Heartbeat sending

Clean-room: implements against Wire Protocol spec Sections 9, 10, 12.

Co-authored-by: Copilot <[email protected]>
@github-actions github-actions Bot added agent-mesh agent-mesh package size/L Large PR (< 500 lines) labels Apr 21, 2026
@imran-siddique imran-siddique merged commit 5615412 into microsoft:main Apr 21, 2026
78 of 80 checks passed
@imran-siddique imran-siddique deleted the feat/ts-mesh-client branch April 21, 2026 23:53

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Code Review for MeshClient Implementation

🔴 CRITICAL: Security Issues

  1. Plaintext Peer Support (plaintextPeers)

    • Allowing plaintext communication for specific peers (plaintextPeers) introduces a significant security risk. This bypasses end-to-end encryption (E2EE) for certain peers, which could lead to sensitive data being transmitted in plaintext. This feature should be avoided unless absolutely necessary. If it must be included, it should be disabled by default, and its usage should be logged and monitored. Additionally, there should be a clear warning in the documentation about the risks of enabling this feature.
  2. Lack of Authentication for plaintextPeers

    • The plaintextPeers feature does not include any mechanism to authenticate the identity of the peer. This opens the door to impersonation attacks, where an attacker could spoof a plaintext peer's identity and intercept sensitive communications. If plaintext communication is unavoidable, implement a robust authentication mechanism to verify the identity of the peer.
  3. Potential Replay Attacks

    • The handleMessage function does not include any mechanism to prevent replay attacks. For encrypted messages, the messageNumber in the header could be used to detect and reject replayed messages. For plaintext messages, there is no equivalent mechanism, which makes them particularly vulnerable to replay attacks.
  4. KNOCK Race Condition

    • While the knockPending map and knockTimeout attempt to handle the race condition between KNOCK and the first message, there is no guarantee that the knockPending map will be cleared in all scenarios. For example, if the knockTimeout expires but the message is still processed, the knockPending entry might not be cleared, leading to memory leaks or incorrect behavior. Ensure that knockPending is always cleared, even in edge cases.
  5. Lack of Rate Limiting for KNOCK Messages

    • The handleKnock function does not implement any rate-limiting mechanism for incoming KNOCK messages. This could allow an attacker to flood the system with KNOCK requests, leading to a denial-of-service (DoS) attack. Implement rate limiting to mitigate this risk.
  6. Lack of Validation for Incoming Frames

    • The handleFrame and handleMessage functions do not perform sufficient validation on incoming frames. For example, the frame object is accessed directly without verifying the presence or type of required fields like type, from, ciphertext, etc. This could lead to runtime errors or even security vulnerabilities if maliciously crafted frames are received. Use a schema validation library to validate incoming frames.
  7. Potential WebSocket Hijacking

    • The wsFactory option allows the injection of a custom WebSocket implementation. While this is useful for scenarios like HTTPS_PROXY, it also opens up the possibility of a malicious WebSocket implementation being injected. Consider adding a mechanism to validate or restrict the wsFactory option to trusted implementations.

🟡 WARNING: Potential Breaking Changes

  1. Public API Changes
    • The addition of MeshClient, MeshClientOptions, MeshSession, and WebSocketFactory to the public API could potentially break existing code if there are naming conflicts or if users are relying on the previous API structure. Ensure that these changes are well-documented and consider versioning the SDK appropriately.

💡 Suggestions for Improvement

  1. Type Safety

    • The handleFrame function uses Record<string, unknown> for the frame parameter, which is too generic. Define a TypeScript interface for the frame structure to ensure type safety and improve code readability.
  2. Error Handling

    • The connect method does not handle WebSocket connection errors robustly. For example, if the onerror event is triggered, the error is passed to the reject function, but there is no retry mechanism or detailed logging. Consider implementing a retry mechanism with exponential backoff for connection attempts.
  3. Logging

    • Add logging for critical events such as connection establishment, disconnection, session creation, and errors. This will aid in debugging and monitoring the system in production.
  4. Documentation

    • The MeshClient class and its methods are partially documented, but some critical details are missing. For example:
      • Explain the security implications of using plaintextPeers.
      • Document the expected structure of the frame object in handleFrame.
      • Provide examples of how to use the MeshClient class, including setting up encrypted and plaintext sessions.
  5. Thread Safety

    • The knockPending and knockAccepted maps are accessed and modified in multiple asynchronous contexts. While JavaScript's single-threaded nature reduces the risk of race conditions, consider using a locking mechanism or atomic operations to ensure thread safety, especially if the code is run in environments with true multithreading (e.g., Node.js worker threads).
  6. Backward Compatibility

    • If this feature is intended to replace existing functionality, ensure that the transition is smooth for existing users. Provide migration guides and deprecate old APIs gracefully.
  7. Unit Tests

    • The PR does not include any unit tests for the MeshClient class. Given the complexity and security implications of this feature, comprehensive tests are essential. Focus on:
      • Validating the behavior of plaintextPeers.
      • Testing the connect, disconnect, and send methods under various scenarios, including error conditions.
      • Ensuring the handleFrame and handleMessage methods correctly process valid frames and reject invalid ones.
      • Verifying the behavior of the KNOCK mechanism, including timeouts and race conditions.
  8. Use of crypto.randomUUID

    • While crypto.randomUUID is a secure and convenient way to generate UUIDs, ensure that it is supported in all target environments of the SDK. If not, consider using a polyfill or an alternative library like uuid.
  9. WebSocket Lifecycle Management

    • The disconnect method does not handle cases where the WebSocket is already closed or in the process of closing. Add checks to handle these scenarios gracefully.
  10. Memory Management

    • Ensure that all resources, such as WebSocket connections and timers, are properly cleaned up when the MeshClient instance is destroyed or when a session is closed. This will prevent memory leaks.
  11. Default Values

    • Provide sensible default values for optional MeshClientOptions properties like knockTimeout and plaintextPeers. This will make the API easier to use and reduce the likelihood of errors.

Summary of Recommendations

  • 🔴 Address the critical security issues, especially around plaintext communication, authentication, and replay attacks.
  • 🟡 Ensure that the introduction of the MeshClient class does not break existing functionality or APIs.
  • 💡 Improve type safety, error handling, logging, documentation, and test coverage.
  • 💡 Consider thread safety and memory management for asynchronous operations and resource cleanup.

Let me know if you need further clarification or assistance!

@github-actions

Copy link
Copy Markdown
🤖 AI Agent: security-scanner — Security Review of `MeshClient` Implementation

Security Review of MeshClient Implementation

The MeshClient implementation introduces several new features, including plaintext peer compatibility, a WebSocket factory for proxy support, and a KNOCK pending queue. While these features are useful, they also introduce potential security risks. Below is a detailed analysis of the code with identified vulnerabilities, their severity, attack vectors, and recommended fixes.


🔴 1. Prompt Injection Defense Bypass

Issue: The plaintextPeers feature allows certain peers to bypass end-to-end encryption (E2EE) and communicate in plaintext. This creates a significant attack surface for malicious actors who can masquerade as a "plaintext peer" to intercept or inject malicious payloads.

Attack Vector:

  • An attacker could spoof a peerId to match one in the plaintextPeers list, allowing them to send or receive unencrypted messages.
  • The plaintextPeers feature undermines the security guarantees of E2EE, making the system vulnerable to man-in-the-middle (MITM) attacks.

Recommendation:

  • Avoid plaintext communication entirely. If legacy support is necessary, ensure that plaintext peers are authenticated using a secure mechanism (e.g., cryptographic signatures or certificates).
  • Log and monitor all plaintext communication for potential misuse.
  • Add strict validation for peerId to ensure it cannot be spoofed.

🔴 2. Policy Engine Circumvention

Issue: The knockHandlers array allows dynamic registration of handlers to evaluate KNOCK requests. However, the logic for evaluating KNOCK requests (handleKnock) does not enforce a minimum number of approvals or a quorum. A single handler rejecting a request will deny it, but there is no guarantee that all handlers are executed securely.

Attack Vector:

  • A malicious or misconfigured handler could approve unauthorized KNOCK requests, allowing unauthorized sessions to be established.
  • An attacker could exploit this to bypass access control policies.

Recommendation:

  • Enforce a quorum-based policy for KNOCK evaluation (e.g., require a majority or all handlers to approve).
  • Log all KNOCK evaluations and their outcomes for auditing.
  • Validate intent fields in KNOCK requests to ensure they conform to expected schemas.

🔴 3. Trust Chain Weaknesses

Issue: The wsFactory option allows users to provide a custom WebSocket implementation. While this is useful for proxy support, it introduces the risk of using an insecure WebSocket implementation that does not validate TLS certificates or is vulnerable to man-in-the-middle attacks.

Attack Vector:

  • A malicious actor could provide a compromised wsFactory that disables TLS verification or introduces vulnerabilities, allowing them to intercept or manipulate WebSocket traffic.

Recommendation:

  • Enforce strict validation of the wsFactory implementation to ensure it uses secure TLS connections.
  • Provide a secure default implementation and warn users about the risks of using custom WebSocket factories.

🔴 4. Credential Exposure

Issue: The plaintextPeers feature uses base64 encoding for JSON payloads in plaintext communication. Base64 encoding is not encryption and does not provide any confidentiality. Sensitive data in plaintext messages could be exposed to eavesdroppers.

Attack Vector:

  • An attacker intercepting plaintext messages could easily decode the base64-encoded payload and access sensitive information.

Recommendation:

  • Do not use plaintext communication for sensitive data. If plaintext communication is unavoidable, ensure that sensitive data is encrypted at the application layer before transmission.

🟠 5. Sandbox Escape

Issue: The wsFactory option allows arbitrary code execution by accepting a user-provided function to create WebSocket connections. This could be exploited to execute malicious code within the application.

Attack Vector:

  • An attacker with access to the wsFactory option could inject malicious code that compromises the host environment.

Recommendation:

  • Validate the wsFactory function to ensure it adheres to a strict interface and does not execute arbitrary code.
  • Consider restricting the use of wsFactory to trusted contexts or environments.

🟠 6. Deserialization Attacks

Issue: The handleMessage method deserializes JSON payloads from plaintext messages without validating their structure or content.

Attack Vector:

  • An attacker could craft a malicious JSON payload that exploits vulnerabilities in the JSON parser or causes unexpected behavior in the application.

Recommendation:

  • Validate and sanitize all deserialized JSON payloads against a strict schema before processing.
  • Use a secure JSON parsing library that mitigates common deserialization vulnerabilities.

🟠 7. Race Conditions in KNOCK Handling

Issue: The knockPending map and its associated KnockResolver objects are vulnerable to race conditions. Multiple KNOCK requests from the same peer could overwrite each other's resolve functions, leading to inconsistent behavior.

Attack Vector:

  • An attacker could send multiple KNOCK requests in rapid succession, potentially causing the knockPending map to behave unpredictably or drop valid requests.

Recommendation:

  • Use a more robust mechanism for managing pending KNOCK requests, such as a queue or a more sophisticated state machine.
  • Ensure that each KNOCK request is uniquely identified and handled independently.

🟡 8. Supply Chain Risks

Issue: The implementation relies on external dependencies (e.g., WebSocket, crypto, and potentially user-provided wsFactory implementations). If these dependencies are compromised, the security of the entire system could be at risk.

Attack Vector:

  • An attacker could exploit vulnerabilities in third-party dependencies or introduce malicious code via dependency confusion or typosquatting.

Recommendation:

  • Use dependency pinning to lock versions of third-party libraries.
  • Regularly audit dependencies for known vulnerabilities using tools like npm audit or Snyk.
  • Consider implementing runtime integrity checks for critical dependencies.

🟡 9. Lack of Logging for Security Events

Issue: The implementation lacks comprehensive logging for critical security events, such as KNOCK evaluations, session establishment, and plaintext message transmissions.

Attack Vector:

  • Without proper logging, it would be difficult to detect or investigate security incidents, such as unauthorized access or data breaches.

Recommendation:

  • Implement structured logging for all security-sensitive events, including KNOCK evaluations, session establishment, and plaintext message handling.
  • Ensure logs are securely stored and monitored for suspicious activity.

Summary of Findings

ID Issue Severity Recommendation
1 Prompt injection defense bypass 🔴 CRITICAL Avoid plaintext communication or enforce strong authentication for plaintext peers.
2 Policy engine circumvention 🔴 CRITICAL Enforce quorum-based KNOCK evaluation and validate intent fields.
3 Trust chain weaknesses 🔴 CRITICAL Validate wsFactory implementations and enforce secure TLS connections.
4 Credential exposure 🔴 CRITICAL Encrypt sensitive data in plaintext messages or avoid plaintext communication.
5 Sandbox escape 🟠 HIGH Validate wsFactory to prevent arbitrary code execution.
6 Deserialization attacks 🟠 HIGH Validate and sanitize JSON payloads against a strict schema.
7 Race conditions in KNOCK handling 🟠 HIGH Use a robust mechanism for managing KNOCK requests and ensure unique handling.
8 Supply chain risks 🟡 MEDIUM Pin dependencies and audit for vulnerabilities.
9 Lack of logging for security events 🟡 MEDIUM Implement structured logging for critical security events.

General Recommendations

  1. Security Review: Conduct a thorough security review of the plaintextPeers feature and its implications for the overall security model.
  2. Testing: Implement unit and integration tests to validate the security of the MeshClient implementation, particularly around KNOCK handling and plaintext communication.
  3. Documentation: Clearly document the risks associated with using plaintext peers and custom WebSocket factories, and provide guidance on secure usage.

This PR introduces critical security risks that must be addressed before merging.

MohammadHaroonAbuomar pushed a commit to MohammadHaroonAbuomar/agt-acs that referenced this pull request Jun 1, 2026
…eue, wsFactory (microsoft#1301)

High-level mesh client for the TS SDK, addressing three AzureClaw
compatibility requirements:

- plaintextPeers: bypass E2E encryption for legacy peers (Rust
  controller uses base64(JSON), not Signal). addPlaintextPeer/
  removePlaintextPeer/isPlaintextPeer API.
- wsFactory: custom WebSocket constructor hook for HTTPS_PROXY
  CONNECT tunneling (Node 22 global fetch/undici quirk).
- KNOCK pending queue: when a message arrives for a peer with an
  in-flight KNOCK, await resolution instead of rejecting. Fixes
  the race condition documented in vendored patch microsoft#5.

Also handles:
- Session reuse (returns existing session, no crash — patch microsoft#10)
- Buffer-based base64 (avoids stack overflow on >100KB — patch microsoft#9)
- Heartbeat sending

Clean-room: implements against Wire Protocol spec Sections 9, 10, 12.

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-mesh agent-mesh package size/L Large PR (< 500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant