Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@terwey
Copy link
Collaborator

@terwey terwey commented Nov 10, 2025

Reviewer

#105 is a known problem no need to tell me in a review about it
Location: engine/engine.go:117-124
ProduceActiveDeals pessimistically reserves the entire rate limit quota before knowing how many bots need to be polled. This causes the periodic bot polling to stall when any other workflow has active reservations, and permanently skips bots when the bot count exceeds the rate limit.

claude and others added 30 commits November 10, 2025 08:41
Fixes the error that occurs during vault unseal when a user-defined venue
with the same (type, wallet) already exists:

  UNIQUE constraint failed: venues.type, venues.wallet (2067)

Root cause:
- The venues table has a UNIQUE constraint on (type, wallet)
- During unseal, EnsureDefaultVenueWallet tries to upsert a venue with
  id="hyperliquid:default", type="hyperliquid", and the runtime wallet
- If a user-defined venue (e.g., id="hyperliquid:primary") with the same
  type and wallet already exists, the INSERT fails because the UpsertVenue
  query only handles conflicts on the id primary key, not on (type, wallet)

Solution:
- Add GetVenueByTypeAndWallet SQL query to check for existing venues by
  (type, wallet) combination
- Update upsertDefaultVenueLocked to check if a venue with the same
  (type, wallet) already exists before attempting the upsert
- If a venue with the same (type, wallet) exists but has a different ID,
  skip creating the default venue and use the existing one instead

This prevents duplicate venues with the same type and wallet, which would
violate the unique constraint and cause the application to crash on unseal.
Implements both server-side ERROR logging and UI notifications for
3commas API errors that occur during runtime (e.g., subscription errors).

Changes:
1. Backend (Go):
   - Add SystemError event type to OrderLogEntryType
   - Add ErrorMessage field to StreamEvent for system errors
   - Update writeSSEFrame to handle system_error events specially
   - Change ProduceActiveDeals error log from DEBUG to ERROR level
   - Publish errors through StreamController for UI consumption

2. Frontend (TypeScript/React):
   - Add 'system_error' to ORDER_EVENT_TYPES in orderStream.ts
   - Create useSystemErrors hook to listen for system_error SSE events
   - Display errors as toast notifications using sonner
   - Integrate hook into App component

Example error now shows:
- Server console: ERROR level log with full error details
- Web UI: Toast notification with message like "The request type 'read'
  is not available with your current subscription plan"

Fixes the issue where 3commas API errors were only visible in DEBUG
logs and users saw no indication of problems in the UI.
Added two new endpoints:
- GET /system/status: Polling endpoint for system health (3commas API, venues, engine)
- GET /stream/system: SSE stream for system-level events (errors, warnings, info, logs)

Schemas added:
- SystemStatus: Complete system health snapshot
- SystemEvent: Individual system event with level, source, message, details

These endpoints properly separate system-level concerns from order-level
concerns, fixing the design issue where system errors were being pushed
through the order stream.

Next step: Regenerate Go and TypeScript types from this spec.
…tream/system

This refactor properly separates system-level events from order-level events,
fixing the design issue where system errors were incorrectly pushed through
the order stream.

Backend Changes:
- Added SystemStreamController for managing system event subscribers
- Added SystemStatusTracker for tracking system health state
- Implemented GET /system/status (polling endpoint for health checks)
- Implemented GET /stream/system (SSE endpoint for system events)
- Added RECOMMA_SYSTEM_STREAM_MIN_LEVEL config (default: info)
- System events have levels: debug, info, warn, error
- System events have source field (e.g., "3commas", "hyperliquid", "engine")
- 3commas API errors now properly logged as ERROR and published to system stream

Frontend Changes:
- Updated useSystemErrors hook to listen on /stream/system instead of /stream/orders
- Hook now handles system_error, system_warn, and system_info events
- Toast notifications display with appropriate severity and duration

Cleanup:
- Removed SystemError from OrderLogEntryType
- Removed ErrorMessage field from StreamEvent
- Removed system_error from ORDER_EVENT_TYPES in orderStream.ts
- Removed system_error handling logic from writeSSEFrame

The system now properly separates concerns:
- /sse/orders: Order-specific events only
- /stream/system: System-level events only
- /system/status: Point-in-time system health for polling/health checks

This architecture allows future extensions like log streaming, metrics,
and health monitoring without polluting the order stream.
Add missing 500 response to /stream/system endpoint in OpenAPI spec
The oapi-codegen generated inline anonymous structs rather than named
types for SystemStatus fields. Updated system_status.go to match the
generated types exactly.
System errors can occur before vault unseal (e.g., 3commas API errors
during startup). The /stream/system endpoint must be unauthenticated
so the frontend can receive these errors even when the vault is sealed.

Added security: [] to override the global SessionCookie requirement.
Added console logging to help diagnose SSE connection issues:
- Log when connecting to /stream/system
- Log when connection opens successfully
- Log connection errors
- Log when disconnecting

This will help identify if the SSE connection is failing silently.
The error occurs AFTER vault unseal when the user is authenticated,
not before. Making the endpoint unauthenticated was unnecessary and
could leak sensitive system information.

The real issue must be elsewhere - need to investigate why the SSE
events aren't reaching the frontend despite authentication being valid.
The useSystemErrors hook was being called unconditionally when App
mounted, causing it to try connecting to /stream/system before the
user was authenticated. Since the endpoint requires authentication,
the connection failed with 401/403.

Now the hook is only called after vault state is confirmed to be
unsealed, ensuring the session cookie is present and valid.
Fixed Rules of Hooks violation by:
- Calling useSystemErrors() unconditionally at top level
- Passing 'enabled' parameter based on vault unsealed state
- Hook internally skips connection when disabled

This maintains the behavior (only connect when unsealed) while
following React's rules that hooks must be called unconditionally.
Added detailed logging to StreamSystemEvents handler to diagnose
connection failures:
- Log when handler is called
- Log if systemStream is nil
- Log subscription attempts
- Log each event written to SSE
- Log context cancellation and channel closure

This will show exactly what's happening server-side.
The /stream/system endpoint was defined in OpenAPI and implemented in
the handler, but the /stream/ prefix was never registered in the HTTP
router, causing client connections to fail immediately.

Added /stream/ and /stream route registration to rootMux.
System events can fire before clients connect (e.g., 3commas error at
startup). Added message history buffer to SystemStreamController:

Backend:
- Keep last 50 events or 5 minutes of history (whichever is smaller)
- When new client connects, immediately flush history to them
- Continue streaming new events normally
- Prune history by both size and age

Frontend:
- Add console.error/warn logging with full event details
- Errors persist in console even after toast dismisses
- Log includes source, timestamp, and details for debugging

Now late-joining clients receive recent errors immediately instead of
waiting for the next occurrence.
Add comprehensive logging to diagnose SSE event delivery:
- Log when subscribers connect and how many history events they receive
- Log each history event being sent to new subscribers
- Log when events are published (level, source, message, subscriber count)
- Log when events are filtered by level
- Log when events are successfully sent to subscribers

This will help identify:
1. Whether errors are being published to the stream
2. Whether the SSE connection is being established
3. Whether event history is being delivered to late-joining clients
4. Whether events are reaching subscribers
Add logging to diagnose why toast notifications don't appear for
early system errors. This will help identify if:
1. toast.error is being called
2. toast.error returns successfully or throws errors
3. The Sonner toast system is properly initialized

Based on browser console logs, we know:
- SSE connection works
- console.error for system errors works
- But toast.error doesn't show UI for first error (5s after unseal)
- toast.error DOES show UI for second error (5min later)

This suggests a timing/initialization issue with the toast system.
The issue: toast notifications weren't showing for errors that occurred
shortly after unseal because toast.error() was being called before the
Sonner Toaster component was fully mounted and ready.

Solution: Track when Toaster is ready and only connect to SSE after both:
1. Vault is unsealed
2. Toaster component is mounted (detected via DOM element)

Changes:
- Added useToasterReady hook to detect Sonner toast container in DOM
- Updated useSystemErrors to accept toasterReady parameter
- Only connect SSE when both isUnsealed AND toasterReady are true
- Server-side event history buffer (5min) ensures no events are lost

This leverages the existing server-side buffering instead of adding
client-side event queuing. Events that occur before SSE connects are
automatically delivered from the server's history buffer once the
connection is established.
The proper solution to ensure Sonner is ready:
1. Call toast.info("System monitoring active") when enabled becomes true
2. This forces Sonner to create its DOM portal
3. Wait for [data-sonner-toaster] element to exist in DOM (polls every 50ms)
4. Once detected, connect to SSE
5. Server's 5-minute history buffer delivers any missed events

This ensures:
- Toast system is fully initialized before SSE connection
- No events are lost (server buffers them)
- User gets feedback that system monitoring is active
- Error toasts will display immediately when they arrive

Removed useToasterReady hook - initialization is now handled inside
useSystemErrors via the toast.info() call followed by DOM polling.
The issue: EventSource was created inside an async function that
returned a cleanup callback, but React registered the effect's cleanup
before the async operation completed. This caused the EventSource to
be orphaned - created but never accessible for cleanup.

Result: SSE connection never opened (onopen never fired).

Fix:
- Move eventSource variable to effect scope
- Async initializeSSE() assigns to the outer eventSource variable
- Cleanup function accesses eventSource via closure
- This works even if cleanup runs before async completes

Also fixed timeout clearing:
- Store both interval and timeout handles
- Clear BOTH when element is found (prevents duplicate log)
- Clear interval when timeout fires

Now the EventSource lifecycle is properly managed by React's effect
cleanup, regardless of async timing.
Add detailed logging for every step of SSE lifecycle:
- EventSource creation with readyState
- All event listener registrations
- Generic onmessage handler to catch ALL messages
- Detailed onopen handler with readyState and URL
- Detailed onerror handler with readyState
- Log raw event data for system_error, system_warn, system_info
- Log parsed event data
- Log cleanup with readyState

This will show exactly:
1. If EventSource is created (readyState: 0=CONNECTING, 1=OPEN, 2=CLOSED)
2. If connection opens (onopen fires)
3. If ANY messages arrive (onmessage)
4. If specific events arrive (system_error, etc)
5. If errors occur (onerror with readyState)
6. Event data content for debugging
Add try/catch and logging around toast.error() and toast.warning() calls
to diagnose why the first error doesn't show a toast but subsequent ones do.

Logs will show:
- When toast.error() is called
- What it returns (toast ID or undefined)
- If it throws an error

This will reveal if:
1. toast.error() is being called
2. It executes successfully
3. It returns a value (meaning toast was created)
4. Or if it silently fails
The StreamSystemEvents handler was creating a pipe and goroutine but not
writing anything until an event arrived. This prevented the HTTP response
headers from being sent, causing the browser's EventSource to stay stuck
in readyState 0 (CONNECTING) indefinitely.

The fix writes an SSE comment (`: connected\n\n`) immediately after
creating the pipe, which:
1. Forces Go's HTTP server to flush response headers
2. Completes the SSE handshake
3. Triggers browser's onopen callback
4. Transitions EventSource to readyState 1 (OPEN)

This is a standard SSE pattern for immediate connection establishment.
The previous fix wrote an initial SSE comment, but the HTTP response
was never flushed to the client because io.Copy() doesn't flush
automatically.

This commit replaces the io.Pipe approach with a custom response type
that:
1. Directly implements VisitStreamSystemEventsResponse
2. Writes initial SSE comment (`: connected\n\n`)
3. **Explicitly calls http.Flusher.Flush()** to send headers to client
4. Flushes after every event write

This ensures the browser's EventSource receives the response headers
immediately, completing the SSE handshake and triggering the onopen
callback.

Root cause: io.Copy buffers data and only flushes when the buffer is
full or the stream closes. For SSE, we need immediate flushing after
each message to maintain real-time communication.
Remove verbose debug logging that was added during SSE handshake troubleshooting:

Frontend (useSystemErrors.ts):
- Removed toast system initialization checks
- Removed waitForToaster function
- Removed debug console.log statements
- Kept only essential error logging

Backend (handler.go):
- Removed per-request info logs ("StreamSystemEvents called", etc.)
- Removed per-event verbose logging
- Kept error and warning logs for actual issues

Backend (system_stream.go):
- Removed per-event debug logging in history sending
- Removed per-subscriber per-event debug logs
- Removed verbose publish info logs
- Kept operational logs (subscriber registration) and warnings

The SSE connection now works correctly with minimal, focused logging.
claude and others added 15 commits November 12, 2025 01:07
- Change tc.Bot.Name from string to removed (field not used in tests)
- Change tc.Deal.Id from int64 to int to match SDK types
- Align with type usage patterns from other tests in codebase
Previously, when the rate limit window reset (every 60 seconds), waiting
workflows in the queue were not notified. This caused them to remain stuck
in the queue indefinitely, even though capacity was now available.

The bug manifested as:
- Deal workflows would reserve and get queued
- produce:all-bots would consume all 5 slots and release
- Window would reset after 60 seconds
- produce:all-bots would reserve again immediately
- Deal workflows remained stuck in queue forever

Fix: Call tryGrantWaiting() after resetting the window in resetWindowIfNeeded()
to wake up and grant reservations to queued workflows that now have capacity.

This ensures fair FIFO processing and prevents workflow starvation.
fix: hyperliquid wire format expects 8 decimals for price and size
This refactor addresses the critical design flaw where waiting workflows
could not use freed capacity from AdjustDown/SignalComplete calls.

**Problem:**
Previously, tryGrantWaiting() would return immediately if activeReservation
was non-nil, defeating the entire "early release" pattern:
- produce:all-bots reserves 5 slots
- produce:all-bots calls AdjustDown(2), freeing 3 slots
- But tryGrantWaiting() couldn't grant to waiting deal workflows
- All workflows were serialized, no concurrency

**Solution:**
- Changed from single `activeReservation *reservation` to multiple
  `activeReservations map[string]*reservation`
- tryGrantWaiting() now grants to waiting workflows whenever there's
  capacity, regardless of existing active reservations
- Added calculateTotalReserved() to sum slots across all reservations
- Updated Reserve/Consume/AdjustDown/Extend/SignalComplete/Release
  to work with the map

**Result:**
When produce:all-bots adjusts down from 5→2 slots, the freed 3 slots
are immediately available for deal workflows. Multiple workflows can
now run concurrently, enabling true "early release" behavior per spec.

Closes the issue raised in code review regarding workflow serialization.
Line 337 was trying to redeclare totalReserved with := but it was already
declared at line 280 in the function scope. Changed to = for reassignment.
The Extend method was incorrectly using the Reserve capacity formula,
which prevented reservations from growing beyond the per-window limit.

**The Contract:**
Reservations can span multiple windows. The rate limiter enforces that
*consumption per window* doesn't exceed the limit, not that *total
reservations* can't exceed the limit.

**Old (incorrect) formula:**
  if l.consumed + totalReserved - res.slotsReserved + newReservation <= l.limit

This prevented extending a reservation beyond the window limit, even
when the additional consumption would span multiple windows.

**New (correct) formula:**
  if l.consumed + newReservation - res.slotsConsumed <= l.limit

This checks: "Can the additional slots I need (beyond what I've already
consumed) fit in the current window's remaining capacity?"

**Example (limit=10):**
- Reserve 8 slots, consume all 8 in window 1
- Window resets: consumed=0, slotsConsumed=8 (persists with reservation)
- Extend by 5 (total 13): Check 0 + 13 - 8 = 5 <= 10 ✓
- The 5 additional slots fit in the new window

This allows workflows to have large total reservations that span
multiple windows, while still enforcing per-window rate limits.

Fixes TestLimiter_ExtendRequiresWindowReset
Updated rate_limit.adoc to accurately describe the implementation:

**Core Principle Changes:**
- Changed from "single active reservation" to "multiple concurrent reservations"
- Clarified goal: guarantee sequential execution per workflow, not global serialization
- Added "Sequential Execution Guarantee" mechanism

**Key Updates:**
1. Principle (line 92): Now describes coordination vs serialization
2. Key Mechanisms (lines 110-126):
   - Multiple Concurrent Reservations (replaces Single Active Reservation)
   - Added Sequential Execution Guarantee
   - Added Cross-Window Reservations mechanism
3. Core State (lines 139-145): activeReservation → activeReservations map
4. Reserve operation (lines 157-162): Updated capacity check formula and behavior
5. AdjustDown (lines 180-182): Clarifies enables early release for concurrent workflows
6. Extend (lines 188-192): Documents cross-window formula and rationale
7. Release (lines 208-210): Multiple workflows may be granted
8. Window Reset (lines 218-221): slotsConsumed persists, immediate re-evaluation

**Open Questions Answered:**
- Question 3: Documented capacity formula with rationale
- Question 4: Documented Extend cross-window formula with example

**Example Updates:**
- Line 496: Clarified both workflows have concurrent reservations

The spec now accurately reflects that workflows get sequential execution
guarantees (preventing thundering herd), while multiple workflows can
run concurrently when capacity allows (via early release pattern).
…1CV2b6CpvF4vQWJtSJiSwW

feat: implement ThreeCommas API rate limiting with workflow reservation system
@terwey terwey marked this pull request as ready for review November 13, 2025 18:40
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

 - internal/api/system_stream.go (lines 36-174) adds per-subscriber locking plus trySend/close helpers so channel closes can’t race with Publish/history sends; cancellation now removes a subscriber under the controller lock and closes it safely afterward.
 - internal/api/system_stream.go (lines 188-254) reworks Publish/Flush to fan out via the guarded send path, skipping already-closed subscribers and logging only when buffers are full.

Co-authored-by: Codex <[email protected]>
@terwey
Copy link
Collaborator Author

terwey commented Nov 13, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +150 to +168
// Wait for our turn or context cancellation
startWait := time.Now()
select {
case <-ready:
waitDuration := time.Since(startWait)
if waitDuration > 30*time.Second {
l.logger.Warn("rate limit queue wait exceeded threshold",
slog.String("workflow_id", workflowID),
slog.Duration("wait_duration", waitDuration),
slog.Int("queue_position", queuePos),
)
}
return nil
case <-ctx.Done():
// Remove from queue
l.mu.Lock()
l.removeFromQueue(workflowID)
l.mu.Unlock()
return ctx.Err()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Waiting reservation never wakes when window resets

In Limiter.Reserve the goroutine enqueued for lack of capacity simply blocks on ready after releasing the mutex. The only paths that close this channel come from tryGrantWaiting, which is invoked only when another limiter operation runs (e.g. Release, AdjustDown, Extend, or another Reserve). If the minute quota is exhausted and there are no active reservations left, a single caller that queues here will never be woken when the window resets because no other method re-enters the limiter to trigger resetWindowIfNeeded and grant the wait queue. The workflow stalls indefinitely until its context cancels, effectively deadlocking all 3Commas polling after a burst hits the rate limit. The limiter needs a background tick or self-scheduling wake-up to retry queued reservations when the window rolls over even if no other operations occur.

Useful? React with 👍 / 👎.

claude and others added 6 commits November 13, 2025 21:25
This commit documents the investigation of the deadlock bug in the rate
limiter where queued reservations never wake when the time window resets
if no other limiter operations occur.

Key findings:
- Queued workflows block on a channel that's only closed by
  tryGrantWaiting(), which is only called during other limiter operations
- If no active reservations exist and no new operations occur, queued
  workflows hang indefinitely even after the window resets
- The existing TestLimiter_WindowReset actually demonstrates the bug by
  requiring a third workflow to trigger reset detection

Changes:
- Corrected TestLimiter_WindowReset to test expected behavior (auto-wake)
- Added TestLimiter_QueuedReservationDeadlock to explicitly reproduce bug
- Created BUG_ANALYSIS.md with detailed root cause analysis and solutions

Both corrected tests will FAIL with current implementation, proving the
bug exists. They define the expected contract that queued workflows
should automatically wake when the window resets.

Recommended solution: Add background ticker or scheduled wake-up to
detect window resets even when no other operations occur.
Fixes the deadlock bug where queued reservations never wake when the
rate limit window resets if no other limiter operations occur.

Root Cause:
Queued workflows blocked on a channel that was only closed by
tryGrantWaiting(), which was only called during other limiter operations
(Reserve, Release, Consume, etc.). When no operations occurred, the
window could reset but nothing detected it, causing indefinite hangs.

Solution:
Implemented a background ticker that periodically checks for window
resets every window/10 duration (min 100ms). The ticker only runs
resetWindowIfNeeded() when there are queued workflows, ensuring
minimal overhead.

Changes:
1. Updated all godoc comments to explicitly document auto-wake behavior
   - Reserve(): Now clearly states queued workflows auto-wake on reset
   - Other operations: Document their role in waking queued workflows

2. Added background ticker infrastructure:
   - ticker: Runs every window/10 to detect resets
   - done: Channel for graceful shutdown
   - windowResetWatcher(): Background goroutine
   - Stop(): Cleanup method for tests

3. Updated NewLimiter() to start background watcher

The fix ensures the limiter matches its specification, which already
documented that "waiting workflows are immediately re-evaluated and
granted if capacity now available" when windows reset.

Tests:
- TestLimiter_WindowReset: Now correctly tests auto-wake (previously
  required 3rd workflow to trigger reset detection)
- TestLimiter_QueuedReservationDeadlock: Explicitly tests the bug
  scenario (burst exhausts quota, single caller queues, no other ops)

Both tests will now PASS with the background ticker implementation.
The background ticker goroutine needs to be stopped after each test
to prevent synctest from detecting goroutine leaks. This adds
defer l.Stop() after every NewLimiter() call in tests.

This fixes TestLimiter_FIFOQueue and ensures all other tests properly
clean up resources.
Removing investigation notes as they're not needed in the PR.
The fix is well-documented in commit messages and godoc comments.
Documents the lessons learned from the rate limiter deadlock bug:

1. Tests must validate contracts (spec + godoc), not implementation
2. Godoc is the bridge between spec and implementation
3. Vague godoc leads to tests that validate bugs instead of exposing them

Includes case study from the ratelimit package showing how the bug
went undetected because tests followed the implementation instead of
the documented contract.

Provides concrete examples of vague vs clear godoc with guidelines
for writing explicit documentation that makes it impossible to write
tests that ignore expected behavior.
…01To3YBZrRDQmvwaAev7cKWZ

Rate Limiter Deadlock Bug
@terwey
Copy link
Collaborator Author

terwey commented Nov 13, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +270 to +276
// Get tier-specific rate limit configuration
rateLimitCfg := planTier.RateLimitConfig()

// Override configuration with tier-specific values (unless explicitly set by user)
cfg.DealWorkers = rateLimitCfg.DealWorkers
cfg.ResyncInterval = rateLimitCfg.ResyncInterval

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect user-specified deal worker and resync settings

After determining the plan tier, the code unconditionally assigns cfg.DealWorkers and cfg.ResyncInterval from the tier defaults. This overwrites any values explicitly provided by flags or environment variables, despite the comment stating the override should only happen when the user hasn’t set a value. Operators who tune these settings will now have their configuration silently ignored every boot. Consider only applying the tier defaults when the current value still equals the compiled default.

Useful? React with 👍 / 👎.

Comment on lines +179 to +191
// Check if a venue with this (type, wallet) already exists (possibly with a different ID)
existingVenue, err := s.queries.GetVenueByTypeAndWallet(ctx, sqlcgen.GetVenueByTypeAndWalletParams{
Type: defaultHyperliquidVenueType,
Wallet: wallet,
})
if err == nil {
// A venue with this (type, wallet) already exists.
// If it has a different ID than the default, we should use it instead of trying to create a duplicate.
if existingVenue.ID != string(defaultHyperliquidVenueID) {
// There's already a user-defined venue with this type and wallet.
// No need to create a separate default venue - just return success.
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Default Hyperliquid venue ID may never be created

When an existing venue with the same (type, wallet) is found, the new logic returns early without inserting or updating the default row (hyperliquid:default). Downstream code (e.g. EnsureDefaultVenueWallet, order identifier construction, and default venue lookups) expects this well-known ID to exist. If an operator already has a venue for the primary wallet under a different ID, startup will skip the upsert and later lookups by hyperliquid:default will fail with sql.ErrNoRows. Instead of returning immediately, the routine should reconcile the existing venue into the default ID or update it in place so the canonical identifier always exists.

Useful? React with 👍 / 👎.

@terwey terwey merged commit 05dae77 into codex/investigate-multi-wallet-support-for-hyperliquid Nov 14, 2025
3 checks passed
@terwey terwey deleted the claude/fix-venue-wallet-unique-constraint-011CUyu1iovAvRm9q9iEP9cq branch November 14, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants