Thanks to visit codestin.com
Credit goes to github.com

Skip to content

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

@AlexGuteniev

Description

@AlexGuteniev

Currently memory_order_acquire and memory_order_release are considered unsafe:
The problem is critical sections overlap in the following situation with mutexes or other synch object:

T1: a.acquire(); a.release(); b.acquire(); b.release(), 
T2: b.acquire(); b.release(); a.acquire(); a.release();

Release reorders past subsequent unrelated acquire, so sections overlap and deadlock occurs.
The current resolution is believed to be the following:

  1. Acquire should observe the release result in a finite time, so release operations cannot be reordered past infinite amount of acquire attempts
  2. In hardware, memory changes take time to propagate, but relatively a very small time, definitely not infinite time
  3. In software, the compiler either does not reorder operations at all, or does not reorder them past potentially infinite amount of other operations

Unfortunately, 1 is not what Standard currently says, and 2 and 3 has to be confirmed with compiler vendors


Before the status of acquire / release is clarified, currently seq_cst is used in some places, specifically:

  1. atomic_shared_ptr internal spinlock:
    if (!_Repptr.compare_exchange_weak(_Rep, (_Rep & _Ptr_value_mask) | _Locked_notify_needed)) {

    uintptr_t _Rep = _Repptr.exchange(reinterpret_cast<uintptr_t>(_Value));
  2. Non-lock-free atomic

    STL/stl/inc/atomic

    Lines 394 to 407 in 12c684b

    inline void _Atomic_lock_spinlock(long& _Spinlock) noexcept {
    while (_InterlockedExchange(&_Spinlock, 1)) {
    _YIELD_PROCESSOR();
    }
    }
    inline void _Atomic_unlock_spinlock(long& _Spinlock) noexcept {
    #if defined(_M_ARM) || defined(_M_ARM64)
    _Memory_barrier();
    __iso_volatile_store32(reinterpret_cast<int*>(&_Spinlock), 0);
    _Memory_barrier();
    #else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv
    _InterlockedExchange(&_Spinlock, 0);
    #endif // hardware
  3. Parallel algorithms in <execution> (more than just this occurrence):
    _State.store(_New_state);
  4. memory_resource.cpp
    memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

    memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

    memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);

    memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);
  5. filesystem.cpp
    auto _Result = __crt_interlocked_read_pointer(_Cache);
    if (_Result) {
    return _Result;
    }
    const HMODULE _HMod = GetModuleHandleW(_Module);
    if (_HMod) {
    _Result = reinterpret_cast<_Fn_ptr>(GetProcAddress(_HMod, _Fn_name));
    }
    if (!_Result) {
    _Result = _Fallback;
    }
    __crt_interlocked_exchange_pointer(_Cache, _Result);

Some places believed to be not affected by the issue still use acquire / release, specifically:

  1. shared_ptr external spinlock:

    STL/stl/src/atomic.cpp

    Lines 16 to 34 in 12c684b

    _CRTIMP2_PURE void __cdecl _Lock_shared_ptr_spin_lock() { // spin until _Shared_ptr_flag successfully set
    #ifdef _M_ARM
    while (_InterlockedExchange_acq(&_Shared_ptr_flag, 1)) {
    __yield();
    }
    #else // _M_ARM
    while (_interlockedbittestandset(&_Shared_ptr_flag, 0)) { // set bit 0
    }
    #endif // _M_ARM
    }
    _CRTIMP2_PURE void __cdecl _Unlock_shared_ptr_spin_lock() { // release previously obtained lock
    #ifdef _M_ARM
    __dmb(_ARM_BARRIER_ISH);
    __iso_volatile_store32(reinterpret_cast<volatile int*>(&_Shared_ptr_flag), 0);
    #else // _M_ARM
    _interlockedbittestandreset(&_Shared_ptr_flag, 0); // reset bit 0
    #endif // _M_ARM
    }
  2. <system_error>

    STL/stl/inc/system_error

    Lines 590 to 597 in 12c684b

    if (_Storage[0].load(memory_order_acquire) != 0) {
    return reinterpret_cast<_Ty&>(_Storage);
    }
    const _Ty _Target;
    const auto _Target_iter = reinterpret_cast<const uintptr_t*>(_STD addressof(_Target));
    _CSTD memcpy(_Storage + 1, _Target_iter + 1, sizeof(_Ty) - sizeof(uintptr_t));
    _Storage[0].store(_Target_iter[0], memory_order_release);
  3. atomic_wait.cpp

    STL/stl/src/atomic_wait.cpp

    Lines 147 to 152 in 12c684b

    _Wait_functions._Api_level.store(_Level, _STD memory_order_release);
    return _Level;
    }
    [[nodiscard]] __std_atomic_api_level _Acquire_wait_functions() noexcept {
    auto _Level = _Wait_functions._Api_level.load(_STD memory_order_acquire);

The task is to confirm the situation with compiler team and decide on using memory_order_acquire / memory_order_release in mentioned and possibly unmentioned preexisting code and new code

Note also that memory model implementation on arm may change in the future, see #83 , see also ##488 , #775 , #1082

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions