Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@louisinger
Copy link
Collaborator

@louisinger louisinger commented Nov 19, 2025

@altafan please review

Summary by CodeRabbit

  • Refactor

    • Event storage migrated from in-memory caching to persistent database-backed storage; dispatch now reads persisted history
    • Repository factory updated to accept a database handle when created
    • Save/publish flow adjusted to publish then dispatch from DB; logging improved to include commitment transaction details
  • Bug Fixes

    • Removed duplicate identifier assignments in event fixtures
    • Dropped an unused exported identifier from an off-chain event type
  • Tests

    • End-to-end tests adjusted with short pauses for improved synchronization

✏️ Tip: You can customize this high-level summary in your review settings.

@louisinger louisinger requested a review from altafan November 19, 2025 23:46
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

Event storage moved from an in-memory cache to SQL persistence: the Watermill event repository now accepts a DB handle and reads historical events from the database. The exported Id field was removed from OffchainTxAccepted; tests and call sites were updated. Minor test timing sleeps and a logging formatting tweak were added.

Changes

Cohort / File(s) Summary
Domain model update
internal/core/domain/offchain_tx_event.go
Removed the exported Id field from OffchainTxAccepted.
Watermill event repository (DB-backed)
internal/infrastructure/db/watermill/event_repo.go
Added db *sql.DB field; constructor changed to NewWatermillEventRepository(publisher, db); removed in-memory cache; added getAllEvents() to query watermill_<topic> and deserializeEvent() to build concrete events; publish/dispatch now reads history from DB and logs dispatch errors.
Removed caching layer
internal/infrastructure/db/watermill/cache.go
Deleted file — removed eventCache type and its methods (newEventCache, add, get, remove).
Postgres caller update
internal/infrastructure/db/postgres/event_repo.go
Updated call site to pass DB handle: NewWatermillEventRepository(publisher, db).
Tests — fixtures
internal/infrastructure/db/service_test.go
Removed duplicated Id assignments in OffchainTx event fixtures to match the updated OffchainTxAccepted shape.
Tests — e2e synchronization
internal/test/e2e/e2e_test.go
Added short sleeps (1s and 5s) after certain waits to allow async processing in fraud and batch-session tests.
Logging tweak
internal/infrastructure/db/service.go
Changed round-sweep log to use structured logging with WithField("commitment_txid", ...).

Sequence Diagram(s)

sequenceDiagram
    participant Pub as Publisher
    participant Repo as WatermillRepo
    participant DB as SQLDB
    participant Sub as Subscriber

    rect rgb(220,240,255)
    Note over Pub,Repo: Previous (in-memory cache)
    Pub->>Repo: Publish(event)
    Repo->>Repo: store in eventCache
    Sub->>Repo: Subscribe(topic)
    Repo->>Repo: read from eventCache
    Repo->>Sub: deliver cached events
    end

    rect rgb(240,255,240)
    Note over Pub,Repo: New (DB-backed)
    Pub->>Repo: Publish(event)
    Repo->>DB: insert into watermill_<topic>
    Sub->>Repo: Subscribe(topic)
    Repo->>DB: getAllEvents(topic, id)
    DB->>Repo: return payloads
    Repo->>Repo: deserializeEvent(payload)
    Repo->>Sub: deliver deserialized events
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • SQL query correctness, parameterization, and performance in getAllEvents()
    • deserializeEvent() correctness and handling unknown/changed Type values
    • All call sites updated to new NewWatermillEventRepository(publisher, db) signature
    • Impact of removing OffchainTxAccepted.Id across codebase and tests
    • Added sleeps in e2e tests — potential flakiness and timing sensitivity

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: removing in-memory cache from the Watermill event repository. The cache.go file was deleted and event_repo.go was refactored to use database-backed persistence instead.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch watermill-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
internal/infrastructure/db/watermill/event_repo.go (2)

189-240: Optional: Consider reducing repetition in event deserialization.

The switch statement has significant code duplication across all event type cases. While the current explicit approach is type-safe and clear, you could reduce repetition using a map-based approach.

Here's an example refactor using a registry pattern:

var eventTypeRegistry = map[domain.EventType]func() domain.Event{
	domain.EventTypeRoundStarted:              func() domain.Event { return &domain.RoundStarted{} },
	domain.EventTypeRoundFinalizationStarted:  func() domain.Event { return &domain.RoundFinalizationStarted{} },
	domain.EventTypeRoundFinalized:            func() domain.Event { return &domain.RoundFinalized{} },
	domain.EventTypeRoundFailed:               func() domain.Event { return &domain.RoundFailed{} },
	domain.EventTypeBatchSwept:                func() domain.Event { return &domain.BatchSwept{} },
	domain.EventTypeIntentsRegistered:         func() domain.Event { return &domain.IntentsRegistered{} },
	domain.EventTypeOffchainTxRequested:       func() domain.Event { return &domain.OffchainTxRequested{} },
	domain.EventTypeOffchainTxAccepted:        func() domain.Event { return &domain.OffchainTxAccepted{} },
	domain.EventTypeOffchainTxFinalized:       func() domain.Event { return &domain.OffchainTxFinalized{} },
	domain.EventTypeOffchainTxFailed:          func() domain.Event { return &domain.OffchainTxFailed{} },
}

func deserializeEvent(buf []byte) (domain.Event, error) {
	var eventType struct {
		Type domain.EventType
	}

	if err := json.Unmarshal(buf, &eventType); err != nil {
		return nil, fmt.Errorf("failed to extract event type: %w", err)
	}

	factory, ok := eventTypeRegistry[eventType.Type]
	if !ok {
		return nil, fmt.Errorf("unknown event type: %s", eventType.Type)
	}

	event := factory()
	if err := json.Unmarshal(buf, event); err != nil {
		return nil, fmt.Errorf("failed to unmarshal %s: %w", eventType.Type, err)
	}

	return event, nil
}

This reduces code duplication and makes it easier to add new event types, though it does introduce a bit of indirection.


116-119: SQL injection risk in dynamic table name is LOW in current codebase.

Verification confirms that the topic parameter is never user-controlled. All production calls to Save() use hardcoded domain constants (RoundTopic="round", OffchainTxTopic="offchain_tx"). The call chain is: Save(topic)dispatch(topic)getAllEvents(topic), with topic originating exclusively from internal constants.

However, the string interpolation pattern (line 117) does violate secure coding practices. While currently safe, consider applying defensive validation to prevent future vulnerabilities if this code path is ever modified to accept external input:

// Add this validation helper
func isValidTopicName(topic string) bool {
	matched, _ := regexp.MatchString(`^[a-zA-Z0-9_]+$`, topic)
	return matched
}

// Add this check in getAllEvents
if !isValidTopicName(topic) {
	return nil, fmt.Errorf("invalid topic name: %s", topic)
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38b7bf6 and 46f23c7.

📒 Files selected for processing (5)
  • internal/core/domain/offchain_tx_event.go (0 hunks)
  • internal/infrastructure/db/postgres/event_repo.go (1 hunks)
  • internal/infrastructure/db/service_test.go (0 hunks)
  • internal/infrastructure/db/watermill/cache.go (0 hunks)
  • internal/infrastructure/db/watermill/event_repo.go (4 hunks)
💤 Files with no reviewable changes (3)
  • internal/core/domain/offchain_tx_event.go
  • internal/infrastructure/db/service_test.go
  • internal/infrastructure/db/watermill/cache.go
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 693
File: internal/core/application/service.go:2379-2388
Timestamp: 2025-08-20T06:26:18.377Z
Learning: In the arkd codebase, transaction events should never be dropped when propagating to channels. The interface layer is designed to read events immediately, so buffering channels is preferred over non-blocking sends with drop semantics to maintain delivery guarantees.
🧬 Code graph analysis (2)
internal/infrastructure/db/postgres/event_repo.go (1)
internal/infrastructure/db/watermill/event_repo.go (1)
  • NewWatermillEventRepository (29-36)
internal/infrastructure/db/watermill/event_repo.go (2)
internal/core/domain/round_event.go (4)
  • RoundStarted (17-20)
  • RoundFinalizationStarted (22-30)
  • BatchSwept (50-57)
  • IntentsRegistered (45-48)
internal/core/domain/offchain_tx_event.go (4)
  • OffchainTxRequested (13-18)
  • OffchainTxAccepted (20-27)
  • OffchainTxFinalized (29-33)
  • OffchainTxFailed (35-39)
🔇 Additional comments (4)
internal/infrastructure/db/postgres/event_repo.go (1)

34-34: LGTM: Constructor call updated correctly.

The additional db parameter is passed to align with the new database-backed event storage approach.

internal/infrastructure/db/watermill/event_repo.go (3)

5-5: LGTM: Necessary imports and struct field added.

The new imports (database/sql, fmt, logrus) and db field support the migration from in-memory caching to database-backed event storage.

Also applies to: 7-7, 13-13, 23-23


29-36: LGTM: Constructor signature updated correctly.

The constructor now accepts the database handle and properly initializes all fields including the new db field.


117-117: The review comment is based on an incorrect assumption.

OffchainTxAccepted does have an Id field—it's inherited from the embedded OffchainTxEvent struct (line 21 of offchain_tx_event.go). Through Go's embedding mechanism, all fields from OffchainTxEvent, including Id, are automatically promoted to OffchainTxAccepted. When events are serialized via json.Marshal() (line 166 of event_repo.go), all exported fields from embedded structs are included in the JSON payload. Therefore, the query at line 117 will correctly filter events by payload->>'Id' for all event types, including OffchainTxAccepted.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/test/e2e/e2e_test.go (2)

1627-1629: Consider replacing fixed delays with polling for server state.

The addition of fixed 5-second delays (and similar delays throughout the file) to wait for server-side processing introduces timing dependencies that could make tests flaky in slower environments or under load.

Consider implementing a polling mechanism that checks for the expected server state with a timeout:

// Example helper function to add to the test file
func waitForCondition(t *testing.T, timeout time.Duration, condition func() bool, description string) {
    deadline := time.Now().Add(timeout)
    for time.Now().Before(deadline) {
        if condition() {
            return
        }
        time.Sleep(100 * time.Millisecond)
    }
    require.Fail(t, fmt.Sprintf("timeout waiting for: %s", description))
}

Then replace fixed sleeps with:

// Give time for the server to detect and process the fraud
waitForCondition(t, 10*time.Second, func() bool {
    spentStatus, err := expl.GetTxOutspends(vtxo.Txid)
    if err != nil || len(spentStatus) <= int(vtxo.VOut) {
        return false
    }
    return spentStatus[vtxo.VOut].Spent
}, "vtxo to be spent by forfeit tx")

1458-1467: The error capture pattern is correct but could benefit from documentation.

Throughout the file, incomingErr (and similar variables) are written in goroutines and read in the main test thread after wg.Wait(). While this is safe due to the happens-before relationship established by WaitGroup, the pattern relies on understanding these semantics.

Consider adding a comment where the pattern is first introduced to document the synchronization approach:

// incomingErr is safely shared between goroutines due to WaitGroup synchronization.
// The goroutine writes the error, wg.Done() creates a happens-before relationship,
// and the main thread reads after wg.Wait(), ensuring no data race.
var incomingErr error

This helps future maintainers understand the synchronization guarantees.

Also applies to: 1555-1565, 1679-1688

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46f23c7 and ff3e7f4.

📒 Files selected for processing (1)
  • internal/test/e2e/e2e_test.go (8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/test/e2e/e2e_test.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/test/e2e/e2e_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration tests
  • GitHub Check: Build and Scan
  • GitHub Check: unit tests
🔇 Additional comments (1)
internal/test/e2e/e2e_test.go (1)

102-126: ****

The concern about loss of test coverage for incomingFunds validation is not substantiated. Investigation reveals:

  1. incomingFunds IS validated elsewhere: Multiple other tests in the same file validate the return value (lines 498, 515, 532, 549, 783, 2037, 2061 use require.NotNil() or require.NotEmpty()).

  2. No evidence of removed assertions: No assertions on incomingFunds exist in the current test file. The discarding pattern at lines 104, 108 may have been the original design for this test.

  3. Test scope is settlement flow, not vtxo validation: TestBatchSession (lines 102-126) tests batch settlement between parties—it validates error handling and commitment transaction equality, not the content of incoming vtxos. Other tests specifically designed for vtxo validation capture and assert on incomingFunds.

The refactored code does not reduce test coverage; it maintains appropriate separation of concerns across tests.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
internal/infrastructure/db/watermill/event_repo.go (3)

88-93: Critical: Context propagation ignored in dispatch.

The dispatch method uses context.Background() (line 90) instead of accepting and propagating the caller's context. This prevents timeout/cancellation propagation and was previously flagged in review comments.

Update the signature to accept context:

-func (e *eventRepository) dispatch(topic string, id string) error {
+func (e *eventRepository) dispatch(ctx context.Context, topic string, id string) error {
 	// get all events for the topic from the database
-	events, err := e.getAllEvents(context.Background(), topic, id)
+	events, err := e.getAllEvents(ctx, topic, id)
 	if err != nil {
 		return err
 	}

And update the call site in Save:

-	if err := e.dispatch(topic, id); err != nil {
+	if err := e.dispatch(ctx, topic, id); err != nil {
 		log.WithError(err).Error("failed to dispatch saved events")
 	}

148-156: Major: Deserialization failures silently drop events.

When deserializeEvent fails (lines 150-154), the error is logged but the event is silently dropped from the result set. This violates event delivery guarantees and was previously flagged in review comments.

Consider failing fast to preserve data integrity:

 	events := make([]domain.Event, 0, len(records))
 	for _, record := range records {
 		event, err := deserializeEvent(record)
 		if err != nil {
-			log.WithError(err).Warnf("failed to deserialize event: %s", string(record))
-			continue
+			return nil, fmt.Errorf("failed to deserialize event: %w", err)
 		}
 		events = append(events, event)
 	}

183-246: Major: Weak error handling masks unmarshal failures.

The deserializeEvent function silently ignores JSON unmarshal errors. The pattern if err := json.Unmarshal(buf, &event); err == nil { return event, nil } means if unmarshal fails, execution continues to check other event types, eventually returning a generic "unknown event" error that doesn't distinguish between unrecognized types and deserialization failures. This was previously flagged in review comments.

Make errors explicit:

 	switch eventType.Type {
 	case domain.EventTypeRoundStarted:
 		var event = domain.RoundStarted{}
-		if err := json.Unmarshal(buf, &event); err == nil {
-			return event, nil
+		if err := json.Unmarshal(buf, &event); err != nil {
+			return nil, fmt.Errorf("failed to unmarshal RoundStarted: %w", err)
 		}
+		return event, nil
 	case domain.EventTypeRoundFinalizationStarted:
 		var event = domain.RoundFinalizationStarted{}
-		if err := json.Unmarshal(buf, &event); err == nil {
-			return event, nil
+		if err := json.Unmarshal(buf, &event); err != nil {
+			return nil, fmt.Errorf("failed to unmarshal RoundFinalizationStarted: %w", err)
 		}
+		return event, nil
 	// ... apply same pattern to all other cases ...
+	default:
+		return nil, fmt.Errorf("unknown event type: %s", eventType.Type)
 	}
-
-	return nil, fmt.Errorf("unknown event")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afd23fb and b8167c0.

📒 Files selected for processing (2)
  • internal/infrastructure/db/service.go (1 hunks)
  • internal/infrastructure/db/watermill/event_repo.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/infrastructure/db/service.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/db/watermill/event_repo.go
🧬 Code graph analysis (1)
internal/infrastructure/db/watermill/event_repo.go (2)
internal/core/domain/round_event.go (4)
  • RoundStarted (17-20)
  • RoundFinalizationStarted (22-30)
  • BatchSwept (50-57)
  • IntentsRegistered (45-48)
internal/core/domain/offchain_tx_event.go (4)
  • OffchainTxRequested (13-18)
  • OffchainTxAccepted (20-27)
  • OffchainTxFinalized (29-33)
  • OffchainTxFailed (35-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit tests
  • GitHub Check: integration tests
  • GitHub Check: Build and Scan
🔇 Additional comments (1)
internal/infrastructure/db/watermill/event_repo.go (1)

29-36: LGTM! Constructor properly wired for database integration.

The constructor correctly accepts and stores the database handle. The defensive nil check in getAllEvents (line 115) ensures safe handling if db is not initialized.

@altafan altafan merged commit 508b845 into master Nov 20, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants