close

Function Repository Resource:

MergeByKey

Source Notebook

Merge a list of associations using different merge functions for different keys

Contributed by: Sjoerd Smit

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{key1f1,key2f2,}]

merges the associations associ, using the the functions fi for combining the values of keys keyi.

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{key1f1,key2f2,},fdefault]

uses fdefault as the merging function for any key not specified.

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{,{keyi,1,keyi,2,}fi,},]

uses fi for merging keys keyi,j.

ResourceFunction["MergeByKey"][{key1 f1, key2 f2, }]

represents an operator form of ResourceFunction["MergeByKey"] that can be applied to an expression.

ResourceFunction["MergeByKey"][{key1 f1, key2 f2, }, fdefault]

gives an operator with a default merging function.

Details and Options

If no default merging function is specified, any key without a merging function will be assigned a list of all values found for that key. This is equivalent to merging with the function Identity.
The Key wrapper can be used to disambiguate between multiple keys that use the same merging function and a single key that is a List.
Merging functions that are specified for keys that do not exist in the data are ignored.

Examples

Basic Examples (6) 

Merge the values of keys a and b with different functions to combine the values:

In[1]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total,
   b -> RootMeanSquare}]
Out[1]=
Image

Not all associations need to have the same keys:

In[2]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10, c -> Pi|>, <|
   c -> E|>}, {a -> Total, b -> RootMeanSquare, c -> Mean}]
Out[2]=
Image

Specify multiple keys that use the same merging function:

In[3]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10, c -> Pi|>, <|
   c -> E|>}, {{a, b} -> Total, c -> Mean}]
Out[3]=
Image

If no function is specified for a given key, all values are returned as a List:

In[4]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total}]
Out[4]=
Image

Specify a default function for merging unspecified keys:

In[5]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total}, RootMeanSquare]
Out[5]=
Image

Define two operator forms and data:

In[6]:=
merge1 = ResourceFunction["MergeByKey"][{a -> Total}];
merge2 = ResourceFunction["MergeByKey"][{a -> Total}, RootMeanSquare];
data = {<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>};

Apply them to the data:

In[7]:=
merge1@data
Out[7]=
Image
In[8]:=
merge2@data
Out[8]=
Image

Applications (3) 

Quickly summarize categorical and numerical columns of a dataset:

In[9]:=
data = Normal@ExampleData[{"Dataset", "Titanic"}];
ResourceFunction["MergeByKey"][data,
 {"age" -> Histogram},
 BarChart[Counts[#], ChartLabels -> Automatic] &
 ]
Out[10]=
Image

Turn the columns into distributions:

In[11]:=
ResourceFunction["MergeByKey"][data,
 {"age" -> DeleteMissing/*EmpiricalDistribution},
 CategoricalDistribution
 ]
Out[11]=
Image

Draw samples from these distributions. Note that this is different from drawing a random row from the original dataset because it doesn't account for correlations between the columns:

In[12]:=
RandomVariate /@ %
Out[12]=
Image

Properties and Relations (2) 

Only specifying a default merging function is equivalent to using Merge:

In[13]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {}, Total]
Out[13]=
Image
In[14]:=
Merge[{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, Total]
Out[14]=
Image

For large datasets, it can be used as a faster alternative to Merge:

In[15]:=
data = Join @@ ConstantArray[Normal @ ExampleData[{"Dataset", "Titanic"}], 10];
merge1 = Merge[data, Counts]; // RepeatedTiming
Out[16]=
Image
In[17]:=
merge2 = ResourceFunction["MergeByKey"][data, {}, Counts]; // RepeatedTiming
Out[17]=
Image
In[18]:=
merge1 === merge2
Out[18]=
Image

Merging an empty list returns an empty Association:

In[19]:=
ResourceFunction["MergeByKey"][{}, {a -> 1}]
Out[19]=
Image
In[20]:=
ResourceFunction["MergeByKey"][{}, {a -> 1}, f]
Out[20]=
Image

Similarly for a list of empty associations:

In[21]:=
ResourceFunction["MergeByKey"][{<||>, <||>, <||>}, {a -> 1}]
Out[21]=
Image
In[22]:=
ResourceFunction["MergeByKey"][{<||>, <||>, <||>}, {a -> 1}, f]
Out[22]=
Image

Possible Issues (2) 

If the associations have keys wrapped in List, you need the Key wrapper to indicate this. The following will not work since it is ambiguous:

In[23]:=
ResourceFunction[
 "MergeByKey"][{<|{a} -> 1, b -> 2|>, <|{a} -> 5, b -> 10|>}, {{a} -> Total}]
Out[23]=
Image

Use Key to specify a List as a key:

In[24]:=
ResourceFunction[
 "MergeByKey"][{<|{a} -> 1, b -> 2|>, <|{a} -> 5, b -> 10|>}, {Key[{a}] -> Total}]
Out[24]=
Image

Publisher

Sjoerd Smit

Version History

  • 2.0.0 – 23 July 2020
  • 1.0.0 – 10 July 2020

License Information