Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency #192

Copilot · 2025-09-28T12:36:29Z

Fixes a critical panic that occurs during high-concurrency RADIUS authentication failures: panic: sync: RUnlock of unlocked RWMutex.

Problem

Under heavy load with frequent authentication failures, the RADIUS service crashes with:

fatal error: sync: RUnlock of unlocked RWMutex

goroutine 77 [running]:
sync.(*RWMutex).RUnlock(...)
github.com/talkincode/toughradius/v8/toughradius.(*RejectItem).IsOver
    toughradius/radius_reject_delay.go:32

The issue occurs in radius_reject_delay.go where read-to-write lock upgrades cause double unlocking:

func (ri *RejectItem) IsOver(max int64) bool {
    ri.Lock.RLock()
    defer ri.Lock.RUnlock()          // Deferred unlock #1
    if time.Since(ri.LastReject).Seconds() > 10 {
        ri.Lock.RUnlock()            // Manual unlock #2 - PANIC!
        ri.Lock.Lock()
        // ...
    }
}

The same pattern existed in RejectCache.GetItem().

Solution

Removed problematic defer statements and implemented proper manual lock management:

Before:

ri.Lock.RLock()
defer ri.Lock.RUnlock()    // Causes double unlock
if condition {
    ri.Lock.RUnlock()      // Manual unlock before upgrade
    ri.Lock.Lock()
    defer ri.Lock.Unlock()
}
return value

After:

ri.Lock.RLock()
if condition {
    ri.Lock.RUnlock()      // Explicit unlock before upgrade
    ri.Lock.Lock()
    // ... operations ...
    ri.Lock.Unlock()       // Explicit unlock after operations
    return value
}
over := value            // Store result before unlock
ri.Lock.RUnlock()        // Single unlock for normal path
return over

Testing

Added comprehensive test suite (radius_reject_delay_test.go) with:

Concurrent access testing (100+ goroutines)
Race condition detection (go test -race)
Normal operation verification
Reset behavior validation

All tests pass with no panics or race conditions detected.

Impact

This fix prevents process crashes and systemd restart loops in production deployments experiencing high authentication failure rates. The change is minimal and preserves all existing functionality while ensuring thread safety under concurrent access.

Resolves the panic described in issue where production systems crash during high-concurrency authentication scenarios.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG]panic: sync: RUnlock of unlocked RWMutex** —— Radius 认证高并发时进程崩溃</issue_title>
<issue_description>使用过程中发现一个高并发的bug，用AI修复了一下。以下文字来自AI：

🐞 Bug Report

panic: sync: RUnlock of unlocked RWMutex —— Radius 认证高并发时进程崩溃

1. 环境信息

项目版本 / 描述

ToughRADIUS v8 (9a9edd1 及之后 commit)

Go 1.20.x

OS CentOS 7 / Rocky 9（多台同配置，仅高并发节点复现）

数据库 PostgreSQL 15

2. 复现步骤
使用默认配置启动 ToughRADIUS（radiusd.enabled=true，其余保持缺省）。

向 1812/UDP 持续发送 错误密码 的 Access-Request，触发 Reject 逻辑（脚本或真实 NAS 均可）。
数分钟后进程崩溃并被 systemd 重启，journalctl 输出首行：
fatal error: sync: RUnlock of unlocked RWMutex
栈顶定位在 toughradius/radius_reject_delay.go:32, RejectItem.IsOver()。
3. 实际日志（截取）
fatal error: sync: RUnlock of unlocked RWMutex

goroutine 77 [running]:
sync.fatal(...)
sync.(*RWMutex).rUnlockSlow(...)
sync.(*RWMutex).RUnlock(...)
github.com/talkincode/toughradius/v8/toughradius.(*RejectItem).IsOver
    toughradius/radius_reject_delay.go:32
github.com/talkincode/toughradius/v8/toughradius.(*RadiusService).CheckRadAuthError
    toughradius/errors.go:28
...
4. 预期行为

高并发认证失败场景下，进程应稳定运行，Reject 限速逻辑正常生效，而不是触发 panic。

5. 初步根因分析

触发 commit：9a9edd1 Refactor RejectCache to use a read-write mutex for concurrent access

代码将 Mutex 升级为 RWMutex，但保留了旧的 defer RUnlock()，又在逻辑分支里 手动 RUnlock() 后再次升级写锁，导致同一读锁被释放两次：
ri.Lock.RLock()
defer ri.Lock.RUnlock()          // 第一次解锁
...
ri.Lock.RUnlock()                // 第二次解锁 → panic
ri.Lock.Lock()                   // 升级写锁
RejectCache.GetItem() 中同样存在「读锁 + defer + 手动 RUnlock」的重复解锁。

6. 修复建议（已验证）
func (ri *RejectItem) IsOver(max int64) bool {
-   ri.Lock.RLock()
-   defer ri.Lock.RUnlock()
+   ri.Lock.RLock()

    if time.Since(ri.LastReject).Seconds() > 10 {
-       ri.Lock.RUnlock()
+       ri.Lock.RUnlock()              // 升级前显式释放
        ri.Lock.Lock()
-       defer ri.Lock.Unlock()
        if time.Since(ri.LastReject).Seconds() > 10 {
            atomic.StoreInt64(&ri.Rejects, 0)
        }
+       ri.Lock.Unlock()
        return false
    }
-   return atomic.LoadInt64(&ri.Rejects) > max
+   over := atomic.LoadInt64(&ri.Rejects) > max
+   ri.Lock.RUnlock()
+   return over
}
RejectCache.GetItem() 同理，去掉 defer RUnlock()，按需手动释放并在升级写锁后再 Unlock()。

7. 影响范围

任何并发量较高、认证失败频繁的部署都会触发，导致进程循环崩溃并被 systemd/NSSM 重启。

8. 附件

完整 panic stack trace

最小复现脚本（可选提供）

完整的修改（仅供参考）：falseen@def3c8d

感谢作者： 如需更多信息或验证补丁，请 @ 我 😊</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #184

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: jamiesun <[email protected]>

Initial plan

2c66ed7

Copilot AI assigned Copilot and jamiesun Sep 28, 2025

Copilot started work on behalf of jamiesun September 28, 2025 12:36 View session

Fix RWMutex double unlock panic in RADIUS reject delay mechanisms

7dcd78e

Co-authored-by: jamiesun <[email protected]>

Copilot AI changed the title ~~[WIP] [BUG]panic: sync: RUnlock of unlocked RWMutex** —— Radius 认证高并发时进程崩溃~~ Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency Sep 28, 2025

Copilot AI requested a review from jamiesun September 28, 2025 12:46

Copilot finished work on behalf of jamiesun September 28, 2025 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency #192

Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency #192

Copilot AI commented Sep 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

项目	版本 / 描述
ToughRADIUS	v8 (`9a9edd1` 及之后 commit)
Go	1.20.x
OS	CentOS 7 / Rocky 9（多台同配置，仅高并发节点复现）
数据库	PostgreSQL 15

Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency #192

Are you sure you want to change the base?

Fix RWMutex double unlock panic in RADIUS reject delay under high concurrency #192

Conversation

Copilot AI commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Impact

🐞 Bug Report

1. 环境信息

2. 复现步骤

3. 实际日志（截取）

4. 预期行为

5. 初步根因分析

6. 修复建议（已验证）

7. 影响范围

8. 附件

完整的修改（仅供参考）：falseen@def3c8d

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Sep 28, 2025 •

edited

Loading

完整的修改（仅供参考）：falseen@`def3c8d`