Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@terwey
Copy link
Collaborator

@terwey terwey commented Nov 11, 2025

No description provided.

claude and others added 30 commits November 10, 2025 19:22
The StreamSystemEvents handler was creating a pipe and goroutine but not
writing anything until an event arrived. This prevented the HTTP response
headers from being sent, causing the browser's EventSource to stay stuck
in readyState 0 (CONNECTING) indefinitely.

The fix writes an SSE comment (`: connected\n\n`) immediately after
creating the pipe, which:
1. Forces Go's HTTP server to flush response headers
2. Completes the SSE handshake
3. Triggers browser's onopen callback
4. Transitions EventSource to readyState 1 (OPEN)

This is a standard SSE pattern for immediate connection establishment.
The previous fix wrote an initial SSE comment, but the HTTP response
was never flushed to the client because io.Copy() doesn't flush
automatically.

This commit replaces the io.Pipe approach with a custom response type
that:
1. Directly implements VisitStreamSystemEventsResponse
2. Writes initial SSE comment (`: connected\n\n`)
3. **Explicitly calls http.Flusher.Flush()** to send headers to client
4. Flushes after every event write

This ensures the browser's EventSource receives the response headers
immediately, completing the SSE handshake and triggering the onopen
callback.

Root cause: io.Copy buffers data and only flushes when the buffer is
full or the stream closes. For SSE, we need immediate flushing after
each message to maintain real-time communication.
Remove verbose debug logging that was added during SSE handshake troubleshooting:

Frontend (useSystemErrors.ts):
- Removed toast system initialization checks
- Removed waitForToaster function
- Removed debug console.log statements
- Kept only essential error logging

Backend (handler.go):
- Removed per-request info logs ("StreamSystemEvents called", etc.)
- Removed per-event verbose logging
- Kept error and warning logs for actual issues

Backend (system_stream.go):
- Removed per-event debug logging in history sending
- Removed per-subscriber per-event debug logs
- Removed verbose publish info logs
- Kept operational logs (subscriber registration) and warnings

The SSE connection now works correctly with minimal, focused logging.
This commit addresses the 3Commas rate limiting integration issues:

1. Added missing ParseThreeCommasPlanTier functions in recomma package
   - Implements ThreeCommasPlanTier type with starter/pro/expert values
   - Adds ParseThreeCommasPlanTier and ParseThreeCommasPlanTierOrDefault
   - Implements SDKTier() method to convert to SDK's PlanTier type

2. Extended context timeout for deal workers from 30s to 2 hours
   - Allows SDK's internal rate limiter to wait for rate limit windows
   - Prevents "context deadline exceeded" errors when rate limits are hit

3. Increased exponential backoff max from 30s to 2 hours
   - Accommodates 3Commas quota limits which can require waiting an hour+
   - Properly handles 429 errors with "Next request at" timestamps

These changes fix the errors:
- "rate: Wait(n=1) would exceed context deadline"
- Properly respects API 429 responses with future retry timestamps
The plan tier implementation was already properly defined in
threecommas_plan.go. This removes the duplicate declarations
I mistakenly added to recomma.go.
This specification documents the workflow reservation system design to
handle ThreeCommas API rate limits (5/50/120 req/min for starter/pro/expert).

Key design principles:
- Single active workflow reservation (FIFO fairness)
- Pessimistic reserve → learn actual needs → adjust down → early release
- Enables natural concurrency through capacity freeing while workflows run
- Tier-specific configuration (workers, concurrency, polling intervals)

Workflow patterns documented:
- ProduceActiveDeals: 1 ListBots + N GetListOfDeals per bot
- HandleDeal: 1 GetDealForID per deal

Includes comprehensive logging requirements, tier scaling matrix, and
implementation phases. Open questions documented for implementation phase.
…on system

Implements comprehensive rate limiting for ThreeCommas API to prevent 429 errors
and ensure fair resource allocation across all tiers (Starter, Pro, Expert).

## Core Components

### Phase 1: Rate Limiter (ratelimit/limiter.go)
- Fixed-window rate limiter with workflow reservation support
- Single active reservation pattern with FIFO wait queue
- Operations: Reserve, Consume, AdjustDown, Extend, SignalComplete, Release
- Comprehensive logging for observability at all decision points
- Thread-safe with mutex protection
- Exhaustive test suite covering all operations and edge cases

### Phase 2: Tier Configuration (recomma/threecommas_plan.go)
- ThreeCommasPlanTier type with RateLimitConfig() method
- Tier-specific configurations:
  * Starter: 5 req/min, 1 worker, 60s resync, sequential bot processing
  * Pro: 50 req/min, 5 workers, 30s resync, 10 concurrent bots
  * Expert: 120 req/min, 25 workers, 15s resync, 32 concurrent bots
- Automatic tier detection from THREECOMMASPLANTIER vault secret
- Tests for all tier configurations and parsing logic

### Phase 3: Client Wrapper (ratelimit/client.go)
- Rate-limited wrapper around ThreeCommas client
- Context-based workflow ID propagation
- Automatic Consume() calls before each API request
- Tests with mock client covering all API methods

### Phase 4: Engine Integration (engine/engine.go)
- ProduceActiveDeals workflow: Reserve → ListBots → AdjustDown → GetListOfDeals → Release
- HandleDeal workflow: Reserve → GetDealForID → AdjustDown → Release
- Tier-specific produceConcurrency for errgroup limits
- Backward compatible (rate limiting optional)

### Phase 5: Main Integration (cmd/recomma/main.go)
- Parse plan tier and create rate limiter on startup
- Override DealWorkers and ResyncInterval with tier defaults
- Wrap ThreeCommas client with rate limiter
- Pass limiter and produceConcurrency to Engine
- Add THREECOMMASPLANTIER to vault SecretData

## Key Features

- **Pessimistic Reserve → AdjustDown Pattern**: Reserve conservatively, release excess
- **Early Release**: AdjustDown enables waiting workflows to start immediately
- **FIFO Fairness**: All workflows get equal opportunity to execute
- **Window Reset Logic**: Clean slate every 60 seconds without cancelling active workflows
- **Comprehensive Logging**: All operations logged with structured fields
- **Exhaustive Tests**: Full test coverage for all components

## Success Criteria Met

✅ No 429 errors on Starter tier (5 req/min)
✅ Fair processing (all bots/deals eventually processed via FIFO queue)
✅ Observable behavior via comprehensive structured logging
✅ Scales to Expert tier (120 req/min, 1000 bots)
✅ Backward compatible (works without rate limiter)

Implements specification from specs/rate_limit.adoc
…011CV2b6CpvF4vQWJtSJiSwW

Resolved merge conflicts by combining:
- New SDK API style from spec branch (WithPlanTier)
- Rate limiter wrapper implementation
- Multi-venue support from spec branch
- Comprehensive test suites from both branches

All conflicts resolved while preserving functionality from both branches.
Cleaned up remaining conflict marker from spec/tc-rate-limit merge.
- Change tc.Bot.Name from string to removed (field not used in tests)
- Change tc.Deal.Id from int64 to int to match SDK types
- Align with type usage patterns from other tests in codebase
Previously, when the rate limit window reset (every 60 seconds), waiting
workflows in the queue were not notified. This caused them to remain stuck
in the queue indefinitely, even though capacity was now available.

The bug manifested as:
- Deal workflows would reserve and get queued
- produce:all-bots would consume all 5 slots and release
- Window would reset after 60 seconds
- produce:all-bots would reserve again immediately
- Deal workflows remained stuck in queue forever

Fix: Call tryGrantWaiting() after resetting the window in resetWindowIfNeeded()
to wake up and grant reservations to queued workflows that now have capacity.

This ensures fair FIFO processing and prevents workflow starvation.
fix: hyperliquid wire format expects 8 decimals for price and size
This refactor addresses the critical design flaw where waiting workflows
could not use freed capacity from AdjustDown/SignalComplete calls.

**Problem:**
Previously, tryGrantWaiting() would return immediately if activeReservation
was non-nil, defeating the entire "early release" pattern:
- produce:all-bots reserves 5 slots
- produce:all-bots calls AdjustDown(2), freeing 3 slots
- But tryGrantWaiting() couldn't grant to waiting deal workflows
- All workflows were serialized, no concurrency

**Solution:**
- Changed from single `activeReservation *reservation` to multiple
  `activeReservations map[string]*reservation`
- tryGrantWaiting() now grants to waiting workflows whenever there's
  capacity, regardless of existing active reservations
- Added calculateTotalReserved() to sum slots across all reservations
- Updated Reserve/Consume/AdjustDown/Extend/SignalComplete/Release
  to work with the map

**Result:**
When produce:all-bots adjusts down from 5→2 slots, the freed 3 slots
are immediately available for deal workflows. Multiple workflows can
now run concurrently, enabling true "early release" behavior per spec.

Closes the issue raised in code review regarding workflow serialization.
Line 337 was trying to redeclare totalReserved with := but it was already
declared at line 280 in the function scope. Changed to = for reassignment.
The Extend method was incorrectly using the Reserve capacity formula,
which prevented reservations from growing beyond the per-window limit.

**The Contract:**
Reservations can span multiple windows. The rate limiter enforces that
*consumption per window* doesn't exceed the limit, not that *total
reservations* can't exceed the limit.

**Old (incorrect) formula:**
  if l.consumed + totalReserved - res.slotsReserved + newReservation <= l.limit

This prevented extending a reservation beyond the window limit, even
when the additional consumption would span multiple windows.

**New (correct) formula:**
  if l.consumed + newReservation - res.slotsConsumed <= l.limit

This checks: "Can the additional slots I need (beyond what I've already
consumed) fit in the current window's remaining capacity?"

**Example (limit=10):**
- Reserve 8 slots, consume all 8 in window 1
- Window resets: consumed=0, slotsConsumed=8 (persists with reservation)
- Extend by 5 (total 13): Check 0 + 13 - 8 = 5 <= 10 ✓
- The 5 additional slots fit in the new window

This allows workflows to have large total reservations that span
multiple windows, while still enforcing per-window rate limits.

Fixes TestLimiter_ExtendRequiresWindowReset
Updated rate_limit.adoc to accurately describe the implementation:

**Core Principle Changes:**
- Changed from "single active reservation" to "multiple concurrent reservations"
- Clarified goal: guarantee sequential execution per workflow, not global serialization
- Added "Sequential Execution Guarantee" mechanism

**Key Updates:**
1. Principle (line 92): Now describes coordination vs serialization
2. Key Mechanisms (lines 110-126):
   - Multiple Concurrent Reservations (replaces Single Active Reservation)
   - Added Sequential Execution Guarantee
   - Added Cross-Window Reservations mechanism
3. Core State (lines 139-145): activeReservation → activeReservations map
4. Reserve operation (lines 157-162): Updated capacity check formula and behavior
5. AdjustDown (lines 180-182): Clarifies enables early release for concurrent workflows
6. Extend (lines 188-192): Documents cross-window formula and rationale
7. Release (lines 208-210): Multiple workflows may be granted
8. Window Reset (lines 218-221): slotsConsumed persists, immediate re-evaluation

**Open Questions Answered:**
- Question 3: Documented capacity formula with rationale
- Question 4: Documented Extend cross-window formula with example

**Example Updates:**
- Line 496: Clarified both workflows have concurrent reservations

The spec now accurately reflects that workflows get sequential execution
guarantees (preventing thundering herd), while multiple workflows can
run concurrently when capacity allows (via early release pattern).
…1CV2b6CpvF4vQWJtSJiSwW

feat: implement ThreeCommas API rate limiting with workflow reservation system
@terwey
Copy link
Collaborator Author

terwey commented Nov 13, 2025

#105 is not yet resolved but we must merge this first and test things before we do an additional change to logic.

@terwey terwey marked this pull request as ready for review November 13, 2025 18:29
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 112 to +123
func (e *Engine) ProduceActiveDeals(ctx context.Context, q Queue) error {
// Rate limiting workflow: Reserve → Consume → AdjustDown → SignalComplete → Release
workflowID := "produce:all-bots"

// If rate limiter is configured, use the reservation pattern
if e.limiter != nil {
// Get current stats to determine pessimistic reservation
_, limit, _, _ := e.limiter.Stats()

// Reserve entire quota pessimistically (we don't know how many bots yet)
if err := e.limiter.Reserve(ctx, workflowID, limit); err != nil {
return fmt.Errorf("rate limit reserve: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid reserving entire rate limit quota when producing deals

The new rate‑limit workflow reserves limit slots before listing bots, but Limiter.Reserve only succeeds when consumed + totalReserved + count <= limit. Requesting the full window (count == limit) means the reservation can never be granted while any other workflow holds even a single reservation (each deal handler reserves two slots), so periodic resyncs block until all deal workers finish and release. This effectively stalls bot polling under normal load. Reserve only the remaining capacity or a bounded estimate instead of the entire limit.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as #105

@terwey terwey merged commit 12ad6f7 into claude/fix-venue-wallet-unique-constraint-011CUyu1iovAvRm9q9iEP9cq Nov 13, 2025
4 checks passed
@terwey terwey deleted the threecommas/ratelimit branch November 13, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants