State-based CRDTs: Maps

In this blog post, we'll cover the idea of CRDT maps, and how we could create them and utilize them in common scenarios. A prerequisite for this talk is some general knowledge of CRDTs, especially observed-remove sets, which were already covered on this blog in the past.

Other blog posts from this series:

The code presented here is available at this repository on Github. Feel free to play with it.

Our map will work in a very similar manner to the Observed-Remove Set, that we implemented earlier:

  • We're free to add and remove entries to the map any number of times we wish to.
  • In case of concurrent conflicting updates (one replica adds the entry, while other removes it), we'll prefer Add-Wins approach.

One extra requirement for us is a need of addressing another type of conflict - when two replicas add two different values under the same key without knowing about others update. Here we're come after the most open approach: we'll require, that the values of our map must be CRDTs themselves. This way we can achieve variety of behaviors by further composition of different CRDTs.

For now let's define a basic frame of our map type, playing along with shape of other data types defined in previous posts:

type AWORMap<'k, 'v, 'm when 'k: comparison and 'm: struct and 'm :> IConvergent<'v>> = 
    AWORMap of keys:AWORSet<'k> * entries:Map<'k,'v>

[<RequireQualifiedAccess>]
module AWORMap =
	
	let zero () = AWORMAP(AWORSet.zero, Map.empty)
    
    let value (AWORMap(_,map)): Map<'k,'v> = map
    
    let add replica key value (map) = ??? // to be defined
    
    let rem replica key map = ??? // to be defined
    
    let merge m1 m2 = ??? // to be defined

As you may already expect, the implementation here is pretty simple - we won't try to implement everything from scratch, because we already did it. We'll make use of AWORSet to keep CRDT properties over our keys, and we'll use a separate map to attach values to them.

This way our AWORMap.value function doesn't require any extra computations - simply return map component of OR-Map - while making our add and remove functions very straightforward:

let add r (key: 'k) (value: 'v) (m: AWORMap<'k,'v,'m>): AWORMap<'k,'v,'m> =
    let (AWORMap(keys, map)) = m
    let keys' = keys |> AWORSet.add r key
    let map' = map |> Map.add key value
    AWORMap(keys', map')
        
let rem r key (m: AWORMap<'k,'v,'m>): AWORMap<'k,'v,'m> =
    let (AWORMap(keys, map)) = m
    let keys' = keys |> AWORSet.rem r key
    let map' = map |> Map.remove key
    AWORMap(keys', map')

All we need is simply to add/remove elements to both OR-set and map building our OR-Map. I guess the confusing part for you might be the m type parameter. This is a longer story, quite specific to .NET, so if you're interested, read the appendix below, or just skip it otherwise.

The Trait pattern

Unlike some programming languages (eg. Scala, Rust or Haskell), there's no way to easy extend any arbitrary type with any abstract feature in F#. The keyword here is abstract - sure you can "add" new methods to existing types (even the ones defined in other assemblies) using extension methods, however they don't let you abstract over the generic features provided by these methods.

C# and F# compilers understand extension methods alligned with some features (like LINQ, awaiter pattern or computation expressions), but only if they were defined by compiler creators.

Example of such generic feature could be a merge function working over any type implementing our IConvergent<T> interface. For exercise imagine that other CRDTs we defined in the past didn't implement that interface (because we defined it just a while ago), and there's no way to modify them now (maybe they were defined in assembly of someone's else library).

In object oriented paradigm we'd probably use an adapter pattern, but functional languages have solved this problem long ago by introduction of type classes a.k.a. traits. We could define such trait together with function using it as follows:

type IConvergent<'v> =
  interface
    abstract merge: 'v * 'v -> 'v
  end

module Helpers =
  let inline merge<'m, 'v when 'm: struct and 'm :> IConvergent<'v>> a b = 
    Unchecked.defaultof<'m>.merge a b

Now, what's so special about it? It turns out, that .NET JIT does pretty good job on specializing the code execution over struct types - in that case we simply produce a zero-size structure implementing given generic functionality, which then is going to be devirtualized:

module LWWReg =
  /// Merger for Last-Write-Wins register defined previously.
  [<Struct;NoEquality;NoComparison>]
  type Merge<'a> = 
    interface IConvergent<LWWReg<'a>> with
      member __.merge a b = merge a b 
    
// use the API
let a = LWWReg.update DateTime.UtcNow "hello" LWWReg.zero
let b = LWWReg.update DateTime.UtcNow "world" LWWReg.zero
let c = Helpers.merge<LWWReg.Merge<_>, _> a b

Now, if you'd take a look into assembly code produced this way, you'd find that our generic parse function is equivalent to calling the underlying implementation directly - our custom generic parser evades all cost related to doing virtual method dispatch and boxing, that would be necessary in other case when interfaces are in use. This way we could use this pattern to create a code, that doesn't impose runtime cost usually related to using abstractions.

In case of our AWORMap implementation there's also another improvement from using generic merge resolution, i.e. we could use different struct implementing our IConvergent<LWWReg<'a>> interface to revert its behavior (as example using first-write-wins instead of last-write-wins) without any additional cost but also without risking that two of our data types would behave differently. Counter-example of such case would be union of two C# sets using different comparers, eg:

var a = ImmutableHashSet.Create(StringComparer.CurrentCulture, new [] { "A", "a", "b"});
var b = ImmutableHashSet.Create(StringComparer.InvariantCultureIgnoreCase, new [] { "A", "a", "C"});

// let's compare the contents
var ab = a.Union(b);
var ba = b.Union(a);
ab.SetEquals(ba); // false - turns out set union is not always commutative

By promoting this "implementation detail" into generic type parameter using trait pattern we not only open our data type for further extensions, but also make it safer from weird unexpected behaviors that sometimes could happen at runtime.

Currently the biggest issue of that approach is that neither C# nor even F# compilers are able to infer our generic "trait" struct on their own - something that Rust, Scala or Haskell have no problems with. For this reason we need to specify these type info by hand. Fortunately, there's a hope that this issue will be solved in the future - see Classes for the Masses.

Merge

Now we need to implement our merge function. We could describe it with following steps:

  1. From each OR-Map take their OR-Sets (used for keys) and merge them together. This will handle key addition/removal conflicts for us.
  2. Now for each key in newly created OR-Set, try to find a corresponding value in each of the map-components of our OR-Maps:
    1. If both maps have a value related to that key, we merge them (using our generic m type).
    2. In other case (when only one map has a value), we'll take it as it is.

Step 2 defined here is pretty much a definition of recursive merge function for an Option<'a> type. Once you'll start to see how nice these properties fit with each other, there's no way to unsee it.

let merge (a: AWORMap<'k,'v,'m>) (b: AWORMap<'k,'v,'m>): AWORMap<'k,'v,'m> =
    let (AWORMap(keys1, m1)) = a
    let (AWORMap(keys2, m2)) = b
    let keys3 = AWORSet.merge keys1 keys2
    let m3 =
        keys3
        |> AWORSet.value
        |> Set.fold (fun acc key ->
            match Map.tryFind key m1, Map.tryFind key m2 with
            // this could be defines as Option.merge
            | Some v1, Some v2 -> 
                let v3 = Helpers.merge<'m,'v> v1 v2
                Map.add key v3 acc
            | Some v, None -> Map.add key v acc
            | None, Some v -> Map.add key v acc
            | None, None -> acc 
        ) Map.empty
    AWORMap(keys3, m3)

We could end here, as right now, our OR-Map is already done. However in practice, we tend to use it for further composition - while already pretty powerful, it restricts us to use other CRDT-compliant type instances as only possible values.

CRDT map with last write wins updates

In practice we often would like to use our CRDTs over ordinary types - usually it's not possible right away, as we need some sort of metadata to hint us through reliable conflict resolution. However we could take some compromises i.e. let us to operate on any value type using last-write-wins semantics.

In this context, LWW applies only to a situation when two different replicas updated the same key with different values without knowing of each other updates. Other conflicts - like add/remove entry - are still solved via add-wins semantics.

Given the types we already had in our disposition (we already defined Last-Write-Wins register here), definition of such map should be simple enough:

type LWWMap<'k,'v> = AWORMap<'k, 'v, LWWReg.Merge<'v>>

module LWWMap =

	let zero: LWWMap<'k,'v> = AWORMap (AWORSet.zero, Map.empty)
	
	let value (map: LWWMap<'k,'v>): Map<'k,'v> =
		AWORMap.value map
		|> Map.map (fun k reg -> LWWReg.value reg)
		
	let add replica key value (map: LWWMap<'k,'v>) =
		let reg = LWWReg.zero |> LWWReg.update DateTime.UtcNow value
		AWORMap.add replica key reg map
	
	let rem replica key (map: LWWMap<'k,'v>) = AWORMap.rem replica key reg map
	
	let merge (a: LWWMap<'k,'v>) (b: LWWMap<'k,'v>) = AWORMap.merge a b	

Summary

We already defined a handful of state-based CRDT data types. My hope is to introduce more of them in the future, so that we could have full understanding on to implement any kind of capabilities we need and move on to more advanced topics.