feat: parallelize NGAP processing in AMF #194

qawl987 · 2026-01-12T09:47:07Z

1. What changes were made

Implemented UEScheduler (scheduler.go): Introduced a worker pool architecture with hash-based task distribution to manage concurrent NGAP message processing.
Added ExtractUEID (ue_id_extractor.go): Created a lightweight decoder to extract UE identifiers (AMF-UE-NGAP-ID or RAN-UE-NGAP-ID) from incoming packets without full ASN.1 unmarshalling.
Refactored I/O Layer (service.go): Modified handleConnection to decouple SCTP reading from message processing. It now dispatches tasks to the worker pool instead of blocking on handler.HandleMessage.
Updated Lifecycle Management (init.go): Added initialization and graceful shutdown routines for the scheduler within the AMF service startup/teardown sequence.
Enhanced Configuration: Added NgapWorkerPoolSize and NgapTaskBufferSize options to amfcfg.yaml to allow performance tuning.
Added Comprehensive Test Suite:
ue_id_extractor_test.go: Validates ID extraction across 9 scenarios, including InitialUEMessage, UplinkNASTransport, HandoverRequired, PDUSessionResourceSetupResponse, and invalid message handling.
scheduler_test.go (Distribution): Verifies hash consistency and uniform distribution across workers (tested with 10,000 UEs).
scheduler_test.go (Concurrency): Validates system stability under high load (50 concurrent goroutines submitting 5,000 tasks), ensures strict per-UE message sequentiality, and verifies graceful shutdown procedures.

2. How it works

Incoming NGAP messages are immediately dispatched to a buffered channel, allowing the SCTP reader to resume listening without blocking.
The scheduler uses consistent hashing (hash(UE_ID) % N) to route all messages belonging to the same UE to the specific worker goroutine.
This guarantees Per-UE Sequentiality, ensuring that messages for a specific user are processed in order, preventing race conditions while allowing different UEs to be processed in parallel.
Non-UE associated messages (like NGSetup) are routed to a default worker to maintain global order where necessary.

3. Why this change is necessary

Eliminate Performance Bottlenecks: The previous "Per-Connection Sequential Processing Model" forced all UEs on a single gNB connection to be processed serially by a single goroutine. This caused Head-of-Line blocking and saturated a single CPU core even on powerful servers.
Improve Scalability: This refactor enables the AMF to utilize all available CPU cores for message processing ("Per-UE Parallelism Model"), significantly increasing throughput and responsiveness under high load without altering the existing business logic handlers.

Notes

Architecture & Safety Considerations:
To minimize changes to the core architecture, this implementation continues to utilize the global AMFContext (via sync.Map) for storing UE contexts. Consequently, a UE's InitialUEMessage (keyed by RAN ID) may be processed by Worker A, while subsequent NAS messages (keyed by AMF ID) may be processed by Worker B.

Safety: This approach is safe because the global context uses thread-safe sync.Map for access. Furthermore, the 5G Request-Response architecture prevents race conditions where a UE sends messages with a RAN ID and an AMF ID simultaneously.
Future Work: Achieving a strictly local "Per-UE Connection" model (removing global lock contention entirely) would require removing the global UePool and modifying the AMF-UE-ID allocation mechanism to bind specific IDs to specific workers.

Performance Testing Tools:

Webconsole Multi-UE Script: https://github.com/qawl987/amf/pull/1/files
UERANSIM: sudo ./nr-ue -c ../config/free5gc-ue.yaml -n 30 (Supports up to ~512 UEs, though ~100 is recommended to avoid gNB instability).
free-ran-ue: multi-ue parallel registration -n option support free-ran-ue/free-ran-ue#149 (Capable of simulating up to 3,000 UEs).

Architecture & Simple testing result

Original AMF

Parallelize NGAP AMF

A simple performance comparison with Open5GS across different workers and simulation environments(UERANSIM, free-ran-ue)

Copilot

Pull request overview

This PR implements parallel processing of NGAP messages in the AMF component by introducing a worker pool architecture with hash-based task distribution. The goal is to eliminate performance bottlenecks from the previous sequential processing model while maintaining per-UE message ordering guarantees.

Changes:

Introduced a UE scheduler with configurable worker pool for concurrent NGAP message processing
Added lightweight UE ID extraction logic to route messages to appropriate workers without full ASN.1 unmarshalling
Refactored connection handling to dispatch messages asynchronously through the worker pool
Added configuration options for worker pool size and task buffer size with graceful defaults

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
`pkg/factory/config.go`	Added configuration fields and getters for NGAP worker pool size and task buffer size
`pkg/service/init.go`	Added scheduler initialization on startup and graceful shutdown on termination
`internal/ngap/scheduler.go`	Implemented worker pool architecture with hash-based UE-to-worker routing
`internal/ngap/ue_id_extractor.go`	Implemented lightweight UE ID extraction from NGAP messages covering all major message types
`internal/ngap/service/service.go`	Modified connection handler to dispatch messages through worker pool with fallback
`internal/ngap/scheduler_test.go`	Added comprehensive tests for hash distribution, concurrency, sequentiality, and shutdown
`internal/ngap/ue_id_extractor_test.go`	Added tests for UE ID extraction across 9 different NGAP message types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-12T09:53:51Z

internal/ngap/scheduler_test.go

+	time.Sleep(3 * time.Second)
+
+	expectedTotal := numGoroutines * tasksPerGoroutine
+	actualProcessed := atomic.LoadInt32(&processedCount)
+
+	t.Logf("Expected %d tasks, processed %d tasks", expectedTotal, actualProcessed)
+	assert.Equal(t, int32(expectedTotal), actualProcessed,
+		"All tasks should be processed")
+
+	// Verify distribution
+	t.Log("Tasks processed per worker:")
+	for i := 0; i < numWorkers; i++ {
+		count := processedByWorker[i]
+		t.Logf("  Worker %d: %d tasks", i, count)
+	}
+}
+
+func TestScheduler_PerUESequentiality(t *testing.T) {
+	// Test that messages for the same UE are processed in order
+	numWorkers := 4
+	ueID := uint64(12345)
+	numMessages := 100
+
+	var processedOrder []int
+	var mu sync.Mutex
+
+	handler := func(conn net.Conn, msg []byte) {
+		// Extract message sequence number from message
+		seqNum := int(msg[0])
+		mu.Lock()
+		processedOrder = append(processedOrder, seqNum)
+		mu.Unlock()
+		// Small delay to test ordering
+		time.Sleep(1 * time.Millisecond)
+	}
+
+	scheduler := NewUEScheduler(numWorkers, 1000, handler)
+	defer scheduler.Shutdown()
+
+	// Submit messages for the same UE in order
+	for i := 0; i < numMessages; i++ {
+		task := Task{
+			UEID:    ueID,
+			Conn:    &mockConn{},
+			Message: []byte{byte(i)},
+		}
+		scheduler.DispatchTask(task)
+	}
+
+	// Wait for all messages to be processed
+	time.Sleep(2 * time.Second)
+
+	// Verify messages were processed in order
+	require.Equal(t, numMessages, len(processedOrder),
+		"All messages should be processed")
+
+	for i := 0; i < numMessages; i++ {
+		assert.Equal(t, i, processedOrder[i],
+			"Message %d should be processed in order", i)
+	}
+}
+
+func TestScheduler_MultipleUEsConcurrent(t *testing.T) {
+	// Test multiple UEs being processed concurrently
+	numWorkers := 8
+	numUEs := 20
+	messagesPerUE := 50
+
+	processedByUE := make(map[uint64][]int)
+	var mu sync.Mutex
+
+	handler := func(conn net.Conn, msg []byte) {
+		ueID := uint64(msg[0])
+		seqNum := int(msg[1])
+
+		mu.Lock()
+		processedByUE[ueID] = append(processedByUE[ueID], seqNum)
+		mu.Unlock()
+
+		time.Sleep(1 * time.Millisecond)
+	}
+
+	scheduler := NewUEScheduler(numWorkers, 1000, handler)
+	defer scheduler.Shutdown()
+
+	var wg sync.WaitGroup
+	wg.Add(numUEs)
+
+	// Each UE submits messages in its own goroutine
+	for ueIdx := 0; ueIdx < numUEs; ueIdx++ {
+		go func(ueID uint64) {
+			defer wg.Done()
+
+			for msgIdx := 0; msgIdx < messagesPerUE; msgIdx++ {
+				task := Task{
+					UEID:    ueID,
+					Conn:    &mockConn{},
+					Message: []byte{byte(ueID), byte(msgIdx)},
+				}
+				scheduler.DispatchTask(task)
+				// Small random delay between messages
+				time.Sleep(100 * time.Microsecond)
+			}
+		}(uint64(ueIdx))
+	}
+
+	wg.Wait()
+
+	// Give workers time to process
+	time.Sleep(3 * time.Second)
+
+	// Verify each UE's messages were processed in order
+	for ueID := uint64(0); ueID < uint64(numUEs); ueID++ {
+		messages := processedByUE[ueID]
+		require.Equal(t, messagesPerUE, len(messages),
+			"UE %d should have all messages processed", ueID)
+
+		for i := 0; i < messagesPerUE; i++ {
+			assert.Equal(t, i, messages[i],
+				"UE %d message %d should be in order", ueID, i)
+		}
+	}
+}
+
+func TestScheduler_GracefulShutdown(t *testing.T) {
+	// Test graceful shutdown of scheduler
+	numWorkers := 4
+
+	var processedCount int32
+	handler := func(conn net.Conn, msg []byte) {
+		atomic.AddInt32(&processedCount, 1)
+		time.Sleep(10 * time.Millisecond)
+	}
+
+	scheduler := NewUEScheduler(numWorkers, 100, handler)
+
+	// Submit some tasks
+	for i := 0; i < 50; i++ {
+		task := Task{
+			UEID:    uint64(i),
+			Conn:    &mockConn{},
+			Message: []byte{0x00},
+		}
+		scheduler.DispatchTask(task)
+	}
+
+	// Give some time for processing to start
+	time.Sleep(100 * time.Millisecond)
+
+	// Shutdown
+	scheduler.Shutdown()
+
+	// Verify some tasks were processed (not all, due to shutdown)
+	processed := atomic.LoadInt32(&processedCount)
+	t.Logf("Processed %d tasks before shutdown", processed)
+	assert.Greater(t, processed, int32(0),
+		"Some tasks should be processed before shutdown")
+}
+
+func TestScheduler_WorkerCount(t *testing.T) {
+	testCases := []struct {
+		name          string
+		numWorkers    int
+		expectedCount int
+	}{
+		{"Single worker", 1, 1},
+		{"Four workers", 4, 4},
+		{"Eight workers", 8, 8},
+		{"Auto-detect (0)", 0, -1}, // -1 means check > 0
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			scheduler := NewUEScheduler(tc.numWorkers, 100,
+				func(conn net.Conn, msg []byte) {})
+			defer scheduler.Shutdown()
+
+			actualCount := len(scheduler.workers)
+			if tc.expectedCount == -1 {
+				assert.Greater(t, actualCount, 0,
+					"Auto-detected worker count should be > 0")
+			} else {
+				assert.Equal(t, tc.expectedCount, actualCount,
+					"Worker count should match expected")
+			}
+		})
+	}
+}
+
+func TestScheduler_NonUEMessage(t *testing.T) {
+	// Test handling of non-UE messages (UE ID = 0)
+	numWorkers := 4
+
+	var processedCount int32
+
+	handler := func(conn net.Conn, msg []byte) {
+		atomic.AddInt32(&processedCount, 1)
+	}
+
+	scheduler := NewUEScheduler(numWorkers, 100, handler)
+	defer scheduler.Shutdown()
+
+	// Submit non-UE messages (UE ID = 0)
+	// All should go to the same worker (determined by hash)
+	expectedWorkerIndex := scheduler.hashUEID(0)
+
+	for i := 0; i < 20; i++ {
+		task := Task{
+			UEID:    0, // Non-UE message
+			Conn:    &mockConn{},
+			Message: []byte{0x00},
+		}
+
+		// Verify they all go to the same worker
+		workerIndex := scheduler.hashUEID(0)
+		assert.Equal(t, expectedWorkerIndex, workerIndex,
+			"All non-UE messages should route to the same worker")
+
+		scheduler.DispatchTask(task)
+	}
+
+	time.Sleep(500 * time.Millisecond)


The test relies on time.Sleep() calls to wait for message processing to complete (lines 155, 205, 264, 376). This makes tests non-deterministic and unnecessarily slow. Consider using synchronization primitives like channels or WaitGroups in the mock handler to signal when processing is complete, rather than arbitrary sleep durations.

Change: Refactor scheduler_test.go to use synchronization primitives instead of time.Sleep

Eliminated time.Sleep: Removed all arbitrary sleep calls (time.Sleep) used to wait for async processing.

Implemented sync.WaitGroup: Introduced sync.WaitGroup in the test functions and mock handlers to track the exact completion of task processing.

Copilot · 2026-01-12T09:53:51Z

internal/ngap/scheduler.go

+// Shutdown gracefully shuts down all workers.
+func (s *UEScheduler) Shutdown() {
+	s.workerMutex.Lock()
+	defer s.workerMutex.Unlock()
+
+	logger.NgapLog.Info("Shutting down UE Scheduler and all workers...")
+
+	var wg sync.WaitGroup
+	for i, worker := range s.workers {
+		wg.Add(1)
+		go func(idx int, w *Worker) {
+			defer wg.Done()
+			logger.NgapLog.Infof("Closing task channel for Worker %d", idx)
+			close(w.taskChan)
+		}(i, worker)
+	}
+	wg.Wait()
+
+	logger.NgapLog.Info("All workers shut down successfully")
+}


The Worker's wg WaitGroup is added to when workers start but never waited upon during shutdown. Each worker calls wg.Done() when it exits, but there's no corresponding wg.Wait() in the shutdown sequence. This means the Shutdown() method can return before all workers have finished processing their remaining tasks and exited cleanly. The UEScheduler should store a reference to this WaitGroup and wait for all workers to complete during shutdown.

Change: Integrated sync.WaitGroup into the UEScheduler struct and ensured Shutdown() waits for all workers. Removed the unused stopChan and Stop() method.
Rationale: This ensures the AMF process does not exit until all workers have finished processing their remaining tasks.

// internal/ngap/scheduler.go type UEScheduler struct { // ... wg sync.WaitGroup // <--- Stored in struct } func NewUEScheduler(...) *UEScheduler { // ... for i := 0; i < numWorkers; i++ { // Pass the scheduler's WG reference to workers scheduler.workers[i] = NewWorker(i, taskBufferSize, handler, &scheduler.wg) } return scheduler } func (s *UEScheduler) Shutdown() { s.workerMutex.Lock() // ... close channels ... s.workerMutex.Unlock() s.wg.Wait() // <--- Explicit wait prevents data loss logger.NgapLog.Info("All workers shut down successfully") }

I’d like to double-check whether the current implementation can run into a send on closed channel issue.

Shutdown() closes w.taskChan, while UEScheduler.DispatchTask() may still be running in a different goroutine and calling worker.Submit(task). If these overlap, it seems possible for a send to occur after the channel has been closed.

Could you help confirm whether this race is already prevented by the current design, or if additional coordination (e.g. a scheduler-level stop signal or submit-side guarding) is needed?

After consideration, I have decided to revert to the stopChan + select design.

Implementation Details:

Mechanism: taskChan is never explicitly closed (left to GC). Submit uses select to manage both shutdown behavior and normal traffic flow, ensuring backpressure is maintained.

Draining: A drainAndExit function is added to process residual packets during shutdown.

Safety: This approach resolves both the "send on closed channel" panic and "shutdown deadlock" issues.

Lock-Free: All mutexes have been removed. The worker structure is effectively read-only after initialization, and the shutdown process no longer depends on the specific closing order of taskChan.

Error Handling: defer recover is retained in the run loop strictly to handle unexpected runtime errors (e.g., nil pointer dereferences) and prevent the worker from crashing.

Performance Impact (Hot Path Analysis):
I acknowledge that using select in the Submit hot path introduces overhead. I conducted a simple test using UERANSIM with 100 concurrent UEs (-n 100) to measure the impact on registration time (calculated via log timestamps):

The initial PR version (which had no extra overhead in Submit): 10.292 seconds

Current Version (Select-based): 11.829 seconds

Result: The current safety-focused design incurs an approximate 14% performance cost.

Copilot · 2026-01-12T09:53:51Z

internal/ngap/scheduler.go

+
+// Submit submits a task to this worker's queue.
+func (w *Worker) Submit(task Task) {
+	w.taskChan <- task


The Submit method performs a blocking send on the task channel. If the worker's buffer is full and no workers are consuming (e.g., during shutdown or if a worker panics), this will block indefinitely, potentially deadlocking the system. Consider using a select statement with a timeout or a default case to handle the scenario where the buffer is full, allowing the caller to handle backpressure appropriately.

Suggested change

w.taskChan <- task

select {

case w.taskChan <- task:

// Task successfully enqueued.

case <-w.stopChan:

// Worker is stopping; avoid blocking and drop the task.

logger.NgapLog.Warnf("Worker %d is stopping; dropping task for UE ID %d", w.ID, task.UEID)

default:

// Channel buffer is full; avoid blocking and drop the task to prevent deadlock.

logger.NgapLog.Warnf("Worker %d task channel full; dropping task for UE ID %d", w.ID, task.UEID)

}

Change: Moved the blocking Submit call outside the workerMutex lock scope.
Rationale: This prevents a deadlock scenario where a full channel blocks Submit (holding the RLock), preventing Shutdown from acquiring the Lock. Note that I retained the blocking behavior (instead of dropping tasks) to maintain backpressure for flow control in the 5G control plane.

// internal/ngap/scheduler.go func (s *UEScheduler) DispatchTask(task Task) { s.workerMutex.RLock() // Hash the UE ID to determine which worker should handle it workerIndex := s.hashUEID(task.UEID) worker := s.workers[workerIndex] s.workerMutex.RUnlock() // <--- Unlock BEFORE submitting logger.NgapLog.Debugf("Dispatching UE ID %d to Worker %d", task.UEID, workerIndex) worker.Submit(task) // <--- Safe blocking submission }

My understanding is that changing only DispatchTask does not fully address the issue. If w.taskChan is full, UEScheduler can still be blocked on worker.Submit(task).

I would suggest that the select-based handling inside Submit is still necessary, so that cases like a full buffer or stopChan being closed are explicitly handled. This also ensures proper behavior during worker shutdown.

I have re-implemented the select-based logic within Submit. Currently, when the worker is full, the operation blocks to ensure backpressure, which effectively reduces retransmissions and prevents signaling storms. If a shutdown signal is received while the worker is full, Submit will exit immediately, leaving drainAndExit to handle the remaining packets.

Copilot · 2026-01-12T09:53:52Z

internal/ngap/service/service.go

+	scheduler, err := ngap_internal.GetScheduler()
+	if err != nil {
+		// Fallback to direct handling if scheduler is not initialized
+		logger.NgapLog.Warnf("Scheduler not initialized, falling back to sequential processing: %v", err)


When the scheduler is not initialized, the code falls back to sequential processing with a warning log. However, there's no mechanism to prevent continued operation in this degraded state. If scheduler initialization fails during startup, the system silently continues without parallelization. Consider making scheduler initialization a hard requirement, or at least log this as an error rather than a warning to make the degraded state more visible.

Suggested change

logger.NgapLog.Warnf("Scheduler not initialized, falling back to sequential processing: %v", err)

logger.NgapLog.Errorf("Scheduler not initialized, falling back to sequential processing: %v", err)

Copilot · 2026-01-12T09:53:52Z

internal/ngap/scheduler.go

+	// Convert uint64 to bytes for hashing
+	b := make([]byte, 8)
+	for i := 0; i < 8; i++ {
+		b[i] = byte(ueID >> (i * 8))
+	}
+	h.Write(b)


The byte conversion for hashing uses little-endian order (shifting right and taking lower bytes first). While this is not incorrect, using a standard library function like binary.LittleEndian.PutUint64() would be clearer and more maintainable. This makes the byte order explicit and follows Go conventions.

Change: Replaced fnv hashing with simple modulo arithmetic.
Rationale: fnv required memory allocation (make([]byte)) on every packet. Using modulo is zero-allocation and sufficient for load distribution.

// internal/ngap/scheduler.go func (s *UEScheduler) hashUEID(ueID uint64) int { return int(ueID % uint64(s.numWorkers)) }

Copilot · 2026-01-12T09:53:55Z

internal/ngap/scheduler.go

+// Stop gracefully stops the worker.
+func (w *Worker) Stop() {
+	close(w.stopChan)
+}


The stopChan is closed in the Stop() method but never actually used in the Shutdown() function. The Shutdown method only closes the taskChan, but doesn't call Stop() on individual workers. This means the stopChan case in the worker's run() loop is dead code. Either remove the stopChan or properly utilize it in the shutdown sequence.

#194 (comment)
I removed the unused stopChan and Stop() method. Worker termination is now handled purely by closing the taskChan, which is a cleaner and more idiomatic Go pattern.

Copilot · 2026-01-12T09:53:55Z

pkg/factory/config.go

+	if c.Configuration != nil && c.Configuration.NgapWorkerPoolSize > 0 {
+		return c.Configuration.NgapWorkerPoolSize
+	}
+	return 0 // 0 means auto-detect based on CPU cores


The configuration getters return 0 for auto-detect when NgapWorkerPoolSize is not set or is 0. However, the InitScheduler function is called with this 0 value, and within InitScheduler, it checks if numWorkers <= 0 and defaults to runtime.NumCPU(). This logic is duplicated - InitScheduler already handles the auto-detect case, so the comment on line 1043 is misleading as the function doesn't directly perform auto-detection based on the returned 0.

Suggested change

return 0 // 0 means auto-detect based on CPU cores

return 0 // 0 indicates that the caller (e.g., InitScheduler) should auto-detect based on CPU cores

Updated the comment in config.go to clearly explain that a return value of 0 signals the caller to perform auto-detection.

Copilot · 2026-01-12T09:53:56Z

pkg/service/init.go

+	if err := ngap.InitScheduler(workerPoolSize, taskBufferSize, ngap.Dispatch); err != nil {
+		logger.InitLog.Fatalf("Failed to initialize NGAP scheduler: %v", err)
+	}


If InitScheduler returns an error (from the initErr variable), the code calls logger.InitLog.Fatalf which terminates the program. However, InitScheduler never actually sets initErr to a non-nil value - the function always returns nil. Either remove the error return type from InitScheduler or properly implement error handling for initialization failures (e.g., if worker creation fails).

Suggested change

if err := ngap.InitScheduler(workerPoolSize, taskBufferSize, ngap.Dispatch); err != nil {

logger.InitLog.Fatalf("Failed to initialize NGAP scheduler: %v", err)

}

ngap.InitScheduler(workerPoolSize, taskBufferSize, ngap.Dispatch)

Change: Removed the error return from InitScheduler.
Rationale: Since invalid inputs (<=0) are now handled by applying safe defaults (auto-detect or 4096 buffer), the function cannot fail. Removing the return value clarifies the API and satisfies linter checks.

// internal/ngap/scheduler.go func InitScheduler(numWorkers int, taskBufferSize int, handler func(conn net.Conn, msg []byte)) { globalSchedulerOnce.Do(func() { if numWorkers <= 0 { numWorkers = runtime.NumCPU() } if taskBufferSize <= 0 { taskBufferSize = 4096 } // ... init ... }) // No return error needed }

Copilot · 2026-01-12T09:53:56Z

internal/ngap/service/service.go

+// dispatchToWorkerPool extracts the UE ID and dispatches the task to the appropriate worker.
+// For non-UE messages (e.g., NGSetupRequest), it dispatches to a default worker (worker 0).
+func dispatchToWorkerPool(conn net.Conn, msg []byte, handler NGAPHandler) {
+	// Import the ngap package to access the scheduler


The comment "Import the ngap package to access the scheduler" is misleading. The import statement is at the top of the file (line 12), not at this location. This comment should be removed or clarified to describe what this function does, not how imports work.

Suggested change

// Import the ngap package to access the scheduler

// Get the NGAP scheduler used for dispatching tasks based on UE ID

Removed the confusing comment in service.go

Copilot · 2026-01-12T09:53:56Z

internal/ngap/scheduler.go

+func (s *UEScheduler) hashUEID(ueID uint64) int {
+	h := fnv.New64a()
+	// Convert uint64 to bytes for hashing
+	b := make([]byte, 8)
+	for i := 0; i < 8; i++ {
+		b[i] = byte(ueID >> (i * 8))
+	}
+	h.Write(b)
+	return int(h.Sum64() % uint64(s.numWorkers))
+}


The hash function creates a new FNV hash instance and allocates a new byte slice on every call. Since this is called for every message dispatch (potentially thousands of times per second), this creates unnecessary allocations and garbage collection pressure. Consider creating a pool of hash instances or using a simpler modulo operation directly on the uint64 value if perfect distribution isn't critical: return int(ueID % uint64(s.numWorkers)).

#194 (comment)
Switched to modulo arithmetic to avoid memory allocation on every packet dispatch.

qawl987 · 2026-01-12T14:01:33Z

Subject: Fix deadlock, improve shutdown safety, and optimize performance.

internal/ngap/scheduler.go:
Moved worker.Submit() out of the read lock to prevent deadlocks.
Integrated sync.WaitGroup into UEScheduler for reliable shutdown.
Added recover() to worker goroutines.
Optimized hashUEID to avoid allocation.
Removed dead code (stopChan).
internal/ngap/ue_id_extractor.go:
Updated comments to accurately reflect ID extraction priority.
pkg/factory/config.go:
Clarified auto-detect comments.
pkg/service/init.go:
Aligned InitScheduler call with the simplified error handling logic.

ss920386 · 2026-01-13T07:11:52Z

internal/ngap/ue_id_extractor.go

+	case ngapType.ProcedureCodePathSwitchRequest:
+		if msg.Value.PathSwitchRequest != nil {
+			for _, ie := range msg.Value.PathSwitchRequest.ProtocolIEs.List {
+				if ie.Id.Value == ngapType.ProtocolIEIDRANUENGAPID && ie.Value.RANUENGAPID != nil {


Should this be SourceAMFUENGAPID? Otherwise, it will use a worker different from that of existing RanUe.

Yeah, this possible issue I have written in above Notes. Under the current global AMF Context architecture, a UE message triggers a worker switch only once. While this results in a cache miss, it does not lead to any functional errors. However, if the global Context is to be removed in the future, the allocation mechanisms for both amf-ue-id and ran-ue-id must be modified accordingly.

Architecture & Safety Considerations:
To minimize changes to the core architecture, this implementation continues to utilize the global AMFContext (via sync.Map) for storing UE contexts. Consequently, a UE's InitialUEMessage (keyed by RAN ID) may be processed by Worker A, while subsequent NAS messages (keyed by AMF ID) may be processed by Worker B.

Safety: This approach is safe because the global context uses thread-safe sync.Map for access. Furthermore, the 5G Request-Response architecture prevents race conditions where a UE sends messages with a RAN ID and an AMF ID simultaneously.

Future Work: Achieving a strictly local "Per-UE Connection" model (removing global lock contention entirely) would require removing the global UePool and modifying the AMF-UE-ID allocation mechanism to bind specific IDs to specific workers.

My understanding (please correct me if I’m mistaken) is that while InitialUEMessage only triggers a single worker switch and is functionally safe under the current global AMFContext + sync.Map design, PathSwitchRequest is different.

At this stage, the UE may already have ongoing traffic (e.g. previous uplink NGAP messages) keyed by AMF UE NGAP ID. If PathSwitchRequest is dispatched using the new RAN UE NGAP ID instead of SourceAMFUENGAPID, it could introduce an additional worker switch during an active UE lifecycle.

A possible scenario is:

Initial UE Message → Worker A (RAN ID) Uplink NAS Transport → Worker B (AMF UE ID) Path Switch Request → Worker C (new RAN UE ID)

This could lead to multiple workers accessing the same UE context and extra worker hopping.
If my understanding differs from the actual implementation, I’m happy to discuss and clarify.

Sorry, I didn't fully grasp your question at first. You are absolutely right—we should keep the same Source AMF UE NGAP ID. I've updated the code. Thanks!

qawl987 added 3 commits January 12, 2026 08:14

feat: parallel ngap

6c8b92b

feat: support all UL ngaptype in dispatcher_generated

1cebd20

test: UE id extract, hash, scheduled test

ef1c1b6

Copilot AI review requested due to automatic review settings January 12, 2026 09:47

Copilot started reviewing on behalf of qawl987 January 12, 2026 09:47 View session

Copilot AI reviewed Jan 12, 2026

View reviewed changes

qawl987 mentioned this pull request Jan 12, 2026

feat: parallel amf config free5gc/free5gc#787

Open

fix: fix as copilot suggestion

a6cda31

fix: initScheduler unuse error

7252ba0

ss920386 reviewed Jan 13, 2026

View reviewed changes

qawl987 added 5 commits January 13, 2026 08:17

fix: PathSwitchRequest init needs SourceAMFUENGAPID

2bd00ea

test: use wg instead of sleep for scheduler_test

d84705b

fix: use select and stop channel instead of lock

518ec6d

chore: change NGAP decode fail log to warn

a8d7cef

fix: single select is sufficient for channel closed handle

2d46c50

-	w.taskChan <- task
+	select {
+	case w.taskChan <- task:
+		// Task successfully enqueued.
+	case <-w.stopChan:
+		// Worker is stopping; avoid blocking and drop the task.
+		logger.NgapLog.Warnf("Worker %d is stopping; dropping task for UE ID %d", w.ID, task.UEID)
+	default:
+		// Channel buffer is full; avoid blocking and drop the task to prevent deadlock.
+		logger.NgapLog.Warnf("Worker %d task channel full; dropping task for UE ID %d", w.ID, task.UEID)
+	}

	logger.NgapLog.Warnf("Scheduler not initialized, falling back to sequential processing: %v", err)
	logger.NgapLog.Errorf("Scheduler not initialized, falling back to sequential processing: %v", err)

	return 0 // 0 means auto-detect based on CPU cores
	return 0 // 0 indicates that the caller (e.g., InitScheduler) should auto-detect based on CPU cores

	// Import the ngap package to access the scheduler
	// Get the NGAP scheduler used for dispatching tasks based on UE ID

feat: parallelize NGAP processing in AMF #194

Are you sure you want to change the base?

feat: parallelize NGAP processing in AMF #194

Uh oh!

Conversation

qawl987 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. What changes were made

2. How it works

3. Why this change is necessary

Notes

Architecture & Simple testing result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

qawl987 Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

qawl987 Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

qawl987 Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qawl987 commented Jan 12, 2026 •

edited

Loading

qawl987 Jan 13, 2026 •

edited

Loading

qawl987 Jan 13, 2026 •

edited

Loading

qawl987 Jan 13, 2026 •

edited

Loading