fix(trace): use sync.Once to prevent multiple trace initialization #5244

kevwan · 2025-10-13T23:31:17Z

Description

This PR resolves an issue where the trace agent would reinitialize when running multiple servers (REST + RPC) with different trace endpoints, causing the global TracerProvider to be overwritten.

Problem

When running multiple servers with tracing enabled:

RestConf.SetUp()  // Calls trace.StartAgent with endpoint A
ZRpcConf.SetUp()  // Calls trace.StartAgent with endpoint B

The trace agent would reinitialize for each server with different endpoints, overwriting the global TracerProvider. Only the last configuration would take effect.

Root Cause

The StartAgent function only prevented re-initialization when the exact same endpoint was used:

lock.Lock()
if _, ok := agents[c.Endpoint]; !ok {
    agents[c.Endpoint] = lang.Placeholder
    err = startAgent(c)
}
lock.Unlock()

When different endpoints were configured (common in multi-server setups), startAgent() would be called again, creating a new TracerProvider and overwriting the global one via otel.SetTracerProvider().

Solution

Adopted the sync.Once pattern used by prometheus.StartAgent and logx.SetUp to ensure trace initialization happens exactly once:

var (
    once sync.Once
    tp   *sdktrace.TracerProvider
)

func StartAgent(c Config) {
    if c.Disabled {
        return
    }
    once.Do(func() {
        if err := startAgent(c); err != nil {
            logx.Error(err)
        }
    })
}

Changes

✅ Replaced agents map and lock mutex with sync.Once
✅ Updated StartAgent() to wrap initialization in once.Do()
✅ Simplified StopAgent() by removing unnecessary lock
✅ Added documentation explaining the behavior
✅ Updated TestStartAgent to verify new sync.Once behavior

Benefits

Consistent with go-zero patterns: Both prometheus.StartAgent and logx.SetUp use sync.Once
Predictable behavior: First configuration wins, subsequent calls safely ignored
Safe multi-server setup: Works correctly with REST + RPC servers
Prevents silent configuration loss: No more overwriting
Simpler code: Fewer lines, clearer intent

Testing

✅ All existing tests pass (core/trace, core/service, rest, zrpc)
✅ Created verification tests demonstrating the fix
✅ No breaking changes, fully backward compatible

Verification

Before fix - TracerProvider being overwritten:

First StartAgent: tp=0x1400023a870
Second StartAgent: tp=0x1400023a990  // Different pointer!

After fix - TracerProvider remains consistent:

First StartAgent: tp=0x140001c6870
Second StartAgent: tp=0x140001c6870  // Same pointer

codecov · 2025-10-13T23:33:13Z

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
core/trace/agent.go	71.42%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

core/trace/agent.go

Fixes zeromicro#5242 Problem: When running multiple servers (REST + RPC) with tracing enabled, the trace agent would reinitialize for each server with different endpoints, causing the global TracerProvider to be overwritten. Only the last configuration would take effect, and the first configuration was silently lost. Root Cause: The StartAgent function only prevented re-initialization when the exact same endpoint was used. If different endpoints were configured (common in multi-server setups), startAgent() would be called again, creating a new TracerProvider and overwriting the global one via otel.SetTracerProvider(). Solution: Adopted the sync.Once pattern used by prometheus.StartAgent and logx.SetUp to ensure trace initialization happens only once. The first configuration wins, and subsequent calls are safely ignored. Changes: - Replaced agents map and lock mutex with sync.Once - Updated StartAgent() to wrap initialization in once.Do() - Simplified StopAgent() by removing unnecessary lock - Added documentation explaining the behavior - Updated TestStartAgent to verify new sync.Once behavior Benefits: - Consistent with go-zero patterns (Prometheus, logx) - Predictable behavior (first config wins) - Safe multi-server setup - Prevents silent configuration loss - Simpler code Testing: - All existing tests pass (core/trace, core/service, rest, zrpc) - Created verification tests demonstrating the fix - No breaking changes, fully backward compatible

Signed-off-by: kevin <[email protected]>

zcong1993 reviewed Oct 15, 2025

View reviewed changes

core/trace/agent.go Show resolved Hide resolved

kevwan mentioned this pull request Oct 24, 2025

fix(trace): use sync.Once to prevent multiple trace initialization #5257

Closed

kevwan force-pushed the chore/undefined branch 2 times, most recently from e41e015 to ec65a86 Compare October 25, 2025 11:47

chore: use sync.OnceFunc to guarantee shutdown once

4bdaf55

Signed-off-by: kevin <[email protected]>

kevwan force-pushed the chore/undefined branch from ec65a86 to 4bdaf55 Compare October 25, 2025 11:52

kevwan merged commit 4e52d77 into zeromicro:master Oct 25, 2025
5 of 6 checks passed

kevwan deleted the chore/undefined branch October 25, 2025 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(trace): use sync.Once to prevent multiple trace initialization #5244

fix(trace): use sync.Once to prevent multiple trace initialization #5244

Uh oh!

kevwan commented Oct 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(trace): use sync.Once to prevent multiple trace initialization #5244

fix(trace): use sync.Once to prevent multiple trace initialization #5244

Uh oh!

Conversation

kevwan commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Root Cause

Solution

Changes

Benefits

Testing

Verification

Uh oh!

codecov bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevwan commented Oct 13, 2025 •

edited

Loading

codecov bot commented Oct 13, 2025 •

edited

Loading