Let's write async-compatible read/write lock

Today, we're going to discuss .NET locks API, how are they (un)fit for the async workflows and thread-pool backed runtimes and what can we do about it. We'll also challenge some of the decades old design decisions and propose a new ones. Finally we're going to implement a working prototype of a read/write lock API that's aligned with modern day .NET coding practices.

What's the problem?

Async workflows in form of C# async/await or F# async/task expressions are pretty much everpresent in any modern .NET codebase that's supposed to do any sort of I/O work. From C# main method down to file API and HTTP handlers, due to their epidemic nature found their place in most of the libraries in current ecosystem and apparently they'll stay with us.

However there's one last bastion, the winds of change seems to avoid - a System.Threading locks API. You may have noticed, that locks as a synchronization primitives are almost never mentioned in .NET concurrency tutorials nowadays. One of the reasons is that more and more projects started to move away to new approaches like actors, channels and lock-free data structures, as new frameworks emerged and made these solutions easier and more applicable. However this is not the only reason. As concurrency model in .NET space moved away from managed threads (backed by actual OS threads) over to user-space threads (eg. Task<'t>/Async<'t>), it came into a direct conflict with an OS-backed locks.

So let's illustrate an issue using following example:

open System.Threading
open FSharp.Control.Tasks

let run (lock: ReaderWriterLockSlim) state = task {
  lock.EnterReadLock() // we can block underlying pooled thread here
  try
    let mutable value = getValue state
    if not (condition value) then
      // we can block underlying pooled thread here
      let lc = lock.UpgradeToWriterLock() 
      try
        do! updateStateAsync state // possible thread switch
        value <- getValue state
      finally
        lock.DowngradeFromWriterLock &lc // fail point
    return value
  finally
    lock.ExitReadLock() // fail point
}

The code above is almost guaranteed to fail, for a simple reason: locks available in System.Threading namespace operate mostly at the level of OS threads and (except Semaphore) don't expose task-aware API! It's actually dangerous to try to use them across async bindings (like let! in F# or await in C#) for two reasons:

  1. Since it's not always possible for a lock to be acquired immediately - because it's being held by another thread - it may suspend current thread until it's available. Notice, that we've said thread, not task. And since tasks are usually executed over a fixed number of CPU-aligned threads (thread pool), this may effectively suspend entire CPU core from picking up any work.
  2. In order to manage enter/exit lock scopes, locks maintain information about holder thread ID in an ambient context of code execution, pined to a particular thread. With async code however (by default) there's no guarantee that the entire code path will be executed on the same single thread. In fact it's quite possible that thread of execution will switch between let!/await bindings. This means, that lock exits will fail as they will be called from threads which didn't originally acquired the lock.

All of that because .NET locks operate on the threads, which are below the level at which tasks themselves operate. They are simply unaware of existence of task scheduler infrastructure.

API design

We'll not going to write down something that just works as a drop-in replacement of ReaderWriterLock. Why? Simply speaking: .NET lock API design ergonomics is bad nonetheless of used language (C# or F#). We have 2021 now and we can do better that that. So to begin with, let's start with a simple draft and use case of an API:

[<Sealed>]
type RWLock<'state>(initialValue: 'state) =
  member Read: CancellationToken -> ValueTask<ReadLockAcquisition<'state>>
  member Write: CancellationToken -> ValueTask<WriteLockAcquisition<'state>>
  interface IDisposable
  
[<Struct>]
type ReadLockAcquisition<'state> =
  member Value: 'state
  member Upgrade: CancellationToken -> ValueTask<WriteLockAcquisition<'state>>
  interface IDisposable
  
[<Struct>]
type WriteLockAcquisition<'state> =
  member Value: 'state with get, set
  interface IDisposable

// sample use
let run (lock: RWLock<int>) (cancel: CancellationToken) = task {
  use! reader = lock.Read cancel
  let mutable value = getValue reader.Value
  if not (condition value) then
    use! writer = reader.Upgrade cancel
    let! state' = updateStateAsync writer.Value
    writer.Value <- state'
    value <- getValue state
  return value
}

Now let's motivate these choices:

  • Unlike .NET locks, this lock has a generic 'state type parameter. It's because this lock isn't used only for synchronization, but also as a container holding the synchronized state. This way we introduce a bit more of compile time safety into our code - you cannot by accident alter a state that you want to protect, you must acquire a read/write lock first.
  • Acquiring read/write lock returns a ValueTask with an acquisition struct - this struct is used as a proxy when accessing protected value. We use ValueTask as in non-contended scenarios, lock acquisition is instantaneous and doesn't require falling back to capabilities of async code. In contended case, we can always return non-finished ValueTask that can be awaited for completion, so that lock.Read/lock.Write won't block underlying thread execution.
  • ReadLockAcquisition has a Value getter, while WriteLockAcquisition offers Value get/set - this is to mark read-only/read-write capabilities of these structs. Unfortunately (unlike ie. in Rust) F# has no way to prevent potential mutable fields of protected state from being changed in read-only case, but this is the second best thing we can do.
  • While we won't cover it here we could make Value property return an inref/byref parameter - this way we could ensure that state protected by the lock doesn't leak outside of synchronized region.
  • ReadLockAcquisition offers an Upgrade method, which promotes it into a write lock. We aim to support reentrancy, which is quite challenging goal with its own set of issues - we'll talk about them later.
  • All types support IDisposable interface, which means that they work well with use! syntax. We need that, since locks will eventually have to be released (also when exceptions are being thrown) - .NET syntax for exiting locks is really cumbersome in this regard.

These decisions were clearly inspired by Rust's tokio RWLock API. We simply adapted it to .NET TPL capabilities even thou we cannot get all benefits due to non-existence of lifecycles in F# type system.

Making operations thread safe with atomics and immutable data

In order to manage all of the reads/writes and operations awaiting for completion, our lock will need multiple different fields that we are about to modify using all-or-nothing semantics. All of that while they're about to be modified in concurrent environment. How are we going to ensure safety of our state update operations - things that are usually protected by locks - when what we are building is the lock itself?

Fortunately OS-level locks are not the only primitives capable of ensuring thread-safe data access. Here we'll use atomic operations - these are simple hardware-level instructions that can be executed on register-sized data. We're interested with reads/writes that will ensure invalidation of CPU's L-cache (a.k.a. volatile fields or operations), but most importantly Compare-And-Swap (CAS) operation, realized in .NET API by Interlocked.CompareExchange method.

Since these are low-level operations and we're going to use them a lot, we'll abstract them using an equivalent of F# ref cell with atomic/thread-safe operations:

open System.Threading

type AtomicRef<'t>(initialValue: 't) when 'a: not struct =
  let mutable value = initialValue
  member this.Value = Volatile.Read(&value)
  member this.Swap(newValue: 't) = Interlocked.Exchange(&value, newValue)
  member this.CompareAndSwap(comparand: 't, newValue: 't) : bool =
    let updated = Interlocked.CompareExchange(&value, newValue, comparand)
    obj.ReferenceEquals(comparand, updated)
    
module Atomic =
  let inline cas (expected: 't) (newValue: 't) (atom: AtomicRef<'t>) =
    atom.CompareAndSwap(expected, newValue)
    
  let inline update (modify: 't -> 't) (atom: AtomicRef<'t>): 't =
    let mutable old = atom.Value
    let mutable newValue = modify old
    while not (atom.CompareAndSwap(old, newValue)) do
      // another thread updated entry in the meantime - just retry
      old <- atom.Value
      newValue <- modify old
    nval
    
let inline (!) (atom: AtomicRef<'t>): 't = atom.Value
let inline (:=) (atom: AtomicRef<'t>) (value: 't): 't = atom.Swap value

With such data type we'll be able to perform thread-safe updates of more complex data types (read: our lock internal state) as long as we'll satisfy two requirements:

  1. Hardware instructions cannot operate on data bigger than register size, which means that our AtomicRef<'t> is limited to work with reference types (as .NET class object references fit into register).
  2. For the same reason, we can only replace entire object references. It means that our objects needs to be immutable - every state change will produce a new object with new reference.

If we'll fail to update state using Atomic.cas function - which is possible if another thread managed to update it before we did - we can simply retry. This is a viable solution in case when update call executes fairly fast, at low cost and (most importantly) idempotently. We can guarantee last property by using immutable data type. For this reasons we also cannot use this approach as a full replacement of the RWLock itself - which is capable of securing heavy and non-idempotent operations (like I/O calls).

Implementation

Here, we're going to cover only the crucial bits of an implementation that you can find on github. It shouldn't be a surprise that it will perform best in low-contention scenarios or situations when critical sections guarded by the lock are expensive enough to justify using it in the first place.

To begin with, we need some way to address a traces of code execution across async/await bindings. As we already mentioned, we cannot rely on tasks/threads for this as they likely will change whenever binding (like F# let! or C# await) will happen. While there's no easy identifier ready out of the box, .NET TPL offers ExecutionContext, which can be used to propagate ambient context, which is exactly what we want.

open System.Threading

type LockId = int

module LockUtils =
  let mutable counter = 0
  let current: AsyncLocal<LockId> = AsyncLocal()
    
  let inline lockId () =
    if current.Value = 0 then
      current.Value <- Interlocked.Increment &counter
    current.Value

With this we'll be able to acquire multiple independent locks from the same code execution path - after our sample code had both read and write locks hold at the same time on the same execution trace. It's one of the safety hatches we aim for. Keep in mind that it doesn't go well with nesting, therefore:

let run () = task {
  let spawn () = task {
    printfn "%i" (LockUtilis.lockId ())
    do! Task.Delay 100
  }
  
  do! Task.WhenAll [| spawn(); spawn(); spawn() |]
}

Will print 1,2,3. However:

let run () = task {
  let spawn () = task {
    printfn "%i" (LockUtilis.lockId ())
    do! Task.Delay 100
  }
  
  printfn "%i" (LockUtilis.lockId ()) // set lock id early
  do! Task.WhenAll [| spawn(); spawn(); spawn() |]
}

Will print 1,1,1,1 - which is bad for our use case - because TPL execution context from parent task will be shared by all of its children and is not reset on parallel calls. You may consider setting it explicitly. Unfortunately I still haven't found a right way to make it work, besides going full berserk and implementing custom synchronization context, which would blow up the scope of this blog post even more.

Now we can go over the shape of a state machine we're going to operate on. We'll split it into two: actual state of a lock and awaiter machinery required to prevent reader/writer acquisition from blocking the current thread. Let's start from describing the lock state machine.

Definition of a read/write lock state machine covers two cases:

  • A shared read state where multiple code execution paths can access a synchronized state. Doing so is safe as long as there are no writers around.
  • An exclusive write state which can be only hold by a single execution path. Notice that it's also safe for the same path to contain read locks as a single path here means there's no risk of concurrent access.

We'll model these cases pretty literally using F# discriminated union:

type LockState =
  | Write of writeLocks:int * readLocks:int * ownerId:LockId
  | Read of readers:Map<LockId, int>
  
  member this.IsOnlyHolder(id: LockId) =
    match this with
    | Read map ->
      Map.isEmpty map || (map.Count = 1 && Map.containsKey id map)
    | Write(_,_, ownerId) when ownerId = id -> true
    | _ -> false
    
  member this.IsReadLocked =
    match this with
    | Read map -> not (Map.isEmpty map)
    | Write(_, reads, _) when reads <> 0 -> false
    | _ -> true
    
  member this.IsWriteLocked =
    match this with
    | Write _ -> true
    | _ -> false

There's another piece of the state, which is responsible for managing the async/await mechanism. As already mentioned, this state is going to be hidden behind AtomicRef we introduced above, therefore it should remain immutable:

type Pending<'t> =
  { id: LockId
    awaiter: TaskCompletionSource<'t>
    cancel: CancellationTokenRegistration option }
        
type State<'t> =
  { value: 't
    stm: LockState
    readerQ: ImmutableQueue<Pending<ReadLockAcquisition<'t>>>
    writerQ: ImmutableQueue<Pending<WriteLockAcquisition<'t>>> }
      
type RWLock<'t>(initialValue: 't) =
  // atomic ref guarding thread-safe lock state updates
  let state = AtomicRef(State.Create initialValue)
  
  member this.Read(?cancellationToken: CancellationToken): ValueTask<ReadLockAcquisition<'a>> =
    let cancellationToken = cancellationToken |> Option.defaultValue CancellationToken.None
    let id = LockUtils.lockId ()
    match this.AcquireRead(id, cancellationToken) with
    // we successfully acquired the read lock
    | ValueNone -> ValueTask<ReadLockAcquisition<'a>>(new ReadLockAcquisition<'a>(this, id))
    // we need to wait for write lock to be released
    | ValueSome awaiter -> ValueTask<ReadLockAcquisition<'a>>(awaiter.Task)
      
  member internal this.AcquireRead(id: LockId, cancel: CancellationToken) = failwith "todo"
  member internal this.ReleaseRead(id: LockId) = failwith "todo"
  member internal this.AcquireWrite(id: LockId, cancel: CancellationToken) = failwith "todo"
  member internal this.ReleaseWrite(id: LockId) = failwith "todo"

So far we presented only a skeleton of an actual RW lock. The only method implemented in the snippet above is Read which we've shown only to present a pattern (used also by Write method): the output of our acquisition methods is optional TaskCompletionSource object, which can be used to await for a lock acquisition to complete in case when it's not possible to be acquired right away.

As you'll see read/write locking paths have a lot of common parts, but they differ in subtle ways. Now let's deep dive into details of read lock acquisition.

Read lock acquisition

We can represent a read lock acquisition by a following structure:

[<Struct>]
type ReadLockAcquisition<'t>(lock: RWLock<'t>, id: LockId) =
  member this.Value with get () = lock.GetValue()
  
  member this.Upgrade() = this.Upgrade(Unchecked.defaultof<_>)
  
  member this.Upgrade(cancellationToken: CancellationToken) : ValueTask<WriteLockAcquisition<'t>> =
    let cancellationToken = cancellationToken |> Option.defaultValue CancellationToken.None
    match lock.AcquireWrite(lockId, cancellationToken) with
    | ValueNone -> ValueTask<_>(new WriteLockAcquisition<'a>(lock, lockId))
    | ValueSome awaiter -> ValueTask<WriteLockAcquisition<'a>>(awaiter.Task)
    
  member this.Dispose() = lock.ReleaseRead(id)
  interface IDisposable with member this.Dispose () = this.Dispose()

The core idea is that it allows us to read synchronized state in read only mode - which is bit of a stretch, as F# language has no features to protect us from altering mutable fields or storing value aside and accessing it after lock has been released. It also allows us to make an upgrade into read-write lock state. Finally, it's disposable which gives us more elegant way of releasing the lock once it's no longer needed.

We already presented a snipped of rwLock.Read method before, but we left a crucial bits till now.

type RWLock<'t>(initialValue: 't) =
  static let cancelRead: Action<obj, CancellationToken> = Action<_,_>(fun o ct ->
    let promise = o :?> TaskCompletionSource<ReadLockAcquisition<'value>>
    promise.TrySetCanceled(ct) |> ignore) 
    
  member this.AcquireRead(id: LockId, cancel: CancellationToken) =
    let rec attempt (state: AtomicRef<_>) (id: LockId) (ct: CancellationToken) =
      if ct.IsCancellationRequested then raise (OperationCanceledException(ct))
      else
        let old = !state
        let updated = old.stm.AdjustRead(id, 1)
        // if read adjust was not possible we return the same input object
        if obj.ReferenceEquals(old.stm, updated) then
          // another task holds a write lock
          let awaiter = TaskCompletionSource<_>()
          let reg =
            if cancellationToken.CanBeCanceled then
              // register awaiter cancellation
              Some (cancellationToken.Register(cancelRead, box awaiter))
            else None
          // enqueue read lock aquisition awaiter
          let readerQ = old.readerQ.Enqueue { id = id; awaiter = awaiter; cancel = reg }
          // try to update the state
          if Atomic.cas old { old with readerQ = readerQ } state then 
            ValueSome awaiter
          else
            // another thread already altered lock state
            // dispose cancellation registration and retry
            reg |> Option.iter (fun c -> c.Dispose())
            attempt state id cancellationToken
        elif Atomic.cas old { old with stm = updated } state then 
          // we successfully acquired a lock
          ValueNone
        else 
          // another thread already altered lock state - retry
          attempt state id ct
    attempt state id cancellationToken

As you may see, there's a lot of things going on here, but the most complex part lies around situation, when lock is already being held in a write mode and cannot be acquired right away. In such case we try to create a TaskCompletionSource<'t>, and queue it on top of readers queue. This queue is used whenever a write lock is being released and read acquisitions are finally allowed to complete.

Additionally we're presenting support for cancellation - after all if we're need to wait for lock for too long, we may no longer be interested in obtaining it at all. It's a tricky piece, since updating AtomicRef state may fail and need to be retried - it's a very nature of compare-and-swap update pattern. However since we're using CancellationTokenSource callback registration, we should remember to dispose it before we want to retry. Always remember to keep some extra precautions when trying to cope immutable lock state with mutable Task Parallel Library classes.

While we already discussed async machinery part of lock acquisition, we still didn't mention what happen in AdjustRead method responsible for lock state machine changes:

type LockState =
  | Write of writeLocks:int * readLocks:int * ownerId:LockId
  | Read of Map<LockId, int>
  member this.AdjustRead(id: LockId, delta: int) =
    match this with
    | Read map ->
      let total =
        match map.TryGetValue(id) with
        | true, total -> total
        | false, _    -> 0
      let modified = total + delta
      if modified > 0 then Read (Map.add id modified map)
      else Read (Map.remove id map)
    | Write(writes, reads, ownerId) when ownerId = id ->
      let modified = reads + delta
      Write(writes, modified, ownerId)
    | _ -> this // another task is holding a write lock

Compared to what we read a while ago, this piece is trivial - we simply increment/decrement a number of read locks acquired by an execution path identified using provided LockId. If a state change was not possible - because another task holds a write lock - we simply return current state object and go to an awaiter creation code we presented before.

The last missing bit (and arguably the most complex one) is lock release process.

type RWLock<'a> =
  member internal this.ReleaseRead(id: LockId) =
    let rec attempt (state: AtomicRef<_>) (id: LockId) =
      let old = !state
      // decrement read lock counter for current LockId
      let mutable updated = old.stm.AdjustRead(id, -1)
      if updated.IsReadLocked then
        if not (Atomic.cas old { old with stm = updated } state) then
          // another thread altered state already - retry
          attempt state id
      else
        let mutable writers = old.writerQ
        let mutable pending = Unchecked.defaultof<_>
        // read and write locks are released, we can try to acquire write now
        if pollNonCancelled(&writers, &pending) then
          // we have awaiting writer
          let stm = Write(1, 0, pending.id)
          if Atomic.cas old { old with stm = stm; writerQ = writers } state then
            if pending.awaiter.TrySetResult(new WriteLockAcquisition<_>(this, pending.id)) then
              // task was successfully completed
              pending.cancel |> Option.iter (fun c -> c.Dispose())
            else
              // task was cancelled before we managed to complete it
              // we need to revert write acquisition
              this.ReleaseWrite(pending.id)
          else attempt state id
        else
          // failsafe - try complete potential readers if writers were cancelled
          this.CompleteAllReaders()
    attempt state id            

Here we try to decrement a read lock count and - if there are no other locks being held - checkout a write lock awaiters queue for non-cancelled awaiter and complete while acquiring write lock. We need to be extra secure here. Even though we might have successfully updated a lock state using atomic operations, we still might run into situation when the whole process fails because task completion source we want to complete has been cancelled in the meantime. If we wouldn't revert acquisition (namely call this.ReleaseWrite), we would permanently deadlock the entire thing. Such is the nature of synchronizing two actions, which might be thread-safe individually, but not when combined together.

Write lock acquisition

A time has come to dig into write lock acquisition. We'll represent it using following structure:

[<Struct>]
type WriteLockAcquisition<'t>(lock: ReentrantLock<'t>, id: LockId) =
    member this.Value
        with get () = lock.GetValue()
        and set value = lock.SetValue(value)
    member this.Dispose() = lock.ReleaseWrite(id)
    interface IDisposable with member this.Dispose () = this.Dispose()

The guts are very similar to ReadLockAcquisition we've written about already - the major difference is that here Value property has both getter and setter defined to reflect its write capabilities.

So, how does the write lock acquisition process look like?

type RWLock<'a> =
  member internal this.AcquireWrite(id: LockId, cancellationToken: CancellationToken) =
    let rec attempt (state: AtomicRef<_>) (id: LockId) (ct: CancellationToken) =
      if ct.IsCancellationRequested then raise (OperationCanceledException(ct))
      else
        let old = !state
        if old.stm.IsOnlyHolder id then
          // we're safe to acquire write lock
          let updated =
            match old.stm with
            | Read map ->
              let reads =
                let (ok, r) = map.TryGetValue(id)
                if ok then r else 0
              Write(1, reads, id)
            | Write(w, r, id) -> Write(w+1, r, id)
          if Atomic.cas old { old with stm = updated } state then
            // succesffully acquire write lock
            ValueNone
          else
            // another thread altered lock state - retry
            attempt state id ct
        else 
          // another task holds a lock
          let awaiter = TaskCompletionSource<_>()
          let reg =
            if cancellationToken.CanBeCanceled then
              Some (cancellationToken.Register(cancelWrite, box awaiter))
            else None
          let writerQ = old.writerQ.Enqueue { id = id; awaiter = awaiter; cancel = reg }
          if Atomic.cas old { old with writerQ = writerQ } state then 
            // successfully updated lock state
            ValueSome awaiter
          else
            // another thread altered lock state
            // dispose cancellation registration and retry
            reg |> Option.iter LockUtils.dispose
            attempt state id cancellationToken
    attempt state id cancellationToken

Logic here is very similar to what we've seen in AcquireRead. The major difference is acquisition assertion - read locks require to check if any other code execution paths holds a write lock, while write locks require check for any kind of lock (both reads and writes). Once it's proven possible, we're transitioning our state lock machine into Write state - possibly copying other read locks held by the same lock ID along the way. If we didn't do so, read→write lock upgrades wouldn't be possible.

During write lock release we do two kinds of checks:

  • If we released all write lock acquisitions, we'll prioritize obtaining possible write lock awaiter and complete it. It goes with the same risk of trying complete TaskCompletionSource that was cancelled while lock state was updated, just like in the read lock release case.
  • If there were no awaiting writers, we're now free to complete awaiting readers - unlike write acquisitions (which are exclusive and therefore need to be completed only one at the time), read acquisitions can be released all at once.
type RWLock<'a> =
  member internal this.ReleaseWrite(id: LockId) =
    let rec attempt (state: AtomicRef<_>) (id: LockId) =
      let old = !state
      let mutable updated =
        match old.stm with
        | Write(writes, reads, ownerId) when ownerId = id ->
          let writes = writes - 1
          if writes = 0 then
            // all write locks have been released
            Read (if reads = 0 then Map.empty else Map.add id reads Map.empty)
          else 
            // current execution path still has other write locks acquired
            Write(writes, r, ownerId)
        | other -> other // this should never happen
      // read lock is not held, so try to find awaiting writer than can be completed
      let mutable writers = old.writerQ
      let mutable pending = Unchecked.defaultof<_>
      if not updated.IsReadLocked && pollNonCancelled(&writers, &pending) then
        // all locks have been released, try complete next awaiting writer
        updated <- Write(1, 0, pending.id)
        if Atomic.cas old { old with stm = updated; writerQ = writers } state then
          if pending.awaiter.TrySetResult(new WriteLockAcquisition<_>(this, pending.id)) then
            // task was successfully completed
            pending.cancel |> Option.iter (fun c -> c.Dispose())
          else
            // task was cancelled before we managed to complete it
            // we need to revert write acquisition
            attempt state pending.id       
        else attempt state id
      elif Atomic.cas old { old with stm = updated } state then
        // write lock has been released and there are no awaiting writers
        // we can complete all read lock acquisitions
        this.CompleteAllReaders()
      else
        attempt state id
    attempt state id

Again you can see there are some similarities to between read/write acquire/release methods. Completing all read awaiters is ultimately again a repetition of the same pattern boiling down to lock state update + task completion with conditional rollback (if second operations fails), this time looped in a recursive function.

type RWLock<'a> =
  member internal this.CompleteAllReaders () =
    let rec loop (state: AtomicRef<_>) =
      let old = !state
      if not old.readerQ.IsEmpty && not old.stm.IsWriteLocked then
        let mutable pending = Unchecked.defaultof<_>
        let readerQ = old.readerQ.Dequeue &pending
        let updated = old.stm.AdjustRead(pending.id, 1)
        if not (obj.ReferenceEquals(old.stm, updated)) then
          // we successfully acquired read
          if Atomic.cas old { old with readerQ = readerQ; stm = updated } state then
            if pending.awaiter.TrySetResult(new ReadLockAcquisition<'a>(this, pending.id)) then
              // successfully completed reader lock task acquisition, try to complete others
              pending.cancel |> Option.iter LockUtils.dispose
              loop state
            else
              // reader lock acquisition task failed (probably task was already cancelled)
              // revert read acquisition
              state 
              |> Atomic.update (fun s -> { s with stm = s.stm.AdjustRead(pending.id, -1) }) 
              |> ignore
              loop state
          else
            // we failed to update - retry
            loop state
    loop state

With all of these in hand, we've got a fully functional read/write lock in user-space. Now, let's talk about hard part.

Problems with lock upgrades

In the implementation discussed above we've already covered a situation a read lock is getting upgraded into a write lock. It uses a pretty naive approach, but since our code enables multiple locks taken by the same execution path, all should go smooth. Or should it? Let's consider following scenario:

let getOrInit cancel key (lock: RWLock<Map<string, Resource>>) = task {
  use! reader = lock.Read(cancel)
  match Map.tryFind key reader.Values with
  | Some resource -> return resource
  | None ->
    use! writer = reader.Update(cancel)
    let! resource = obtainResourceAsync key cancel
    writer.Value <- Map.add key resource writer.Value
    return resource
}

While not immediately obvious, this code may lead to a deadlock. In case when two parallel tasks will acquire read locks (which is allowed, as read locks are shared), they both may end up in case when lock upgrade is required. Each of them will try to gain write (exclusive) lock without releasing their own reads, ending up in waiting until cancellation for each other to release read acquisitions.

An obvious question would be: while upgrading, could we simply release a read lock and then reacquire write? This should solve the problem. The answer is no, because in-between read-release/write-acquire it would be possible that another task could jump in and alter the state, invalidating all of the assertions made over it prior to call for Upgrade method.

The example above is a warning to be considered before going all into upgrade semantics - it's tricky and may end up leading developers to a field of of wrong expectations, hence some libraries (like Rust tokio) do not expose such API at all.

Summary

That was quite a ride. In this blog post we've implemented a read/write lock that works in a user-space, supports reentrancy/upgrades and cancellation. All of that wrapped within strongly typed and ergonomic API.

You may have noticed that approach we taken here is inherently allocating at least a little bit of memory on a managed heap, which is inevitable due to combination of atomic updated and immutable data types in use. It's another reason why this approach is less valid in a face of small thigh regions and hot paths, and more as a supplement for longer and slower sync operations. In case if less expensive approach is what you're after, I recommend you to take a look at DotNext implementation of AsyncReaderWriterLock.