In this post, I talk about the `std::multiset`

and about some design decisions I needed when trying to implement it. The full implementation is on GitHub.

# Multisets and multimaps

There are a few data structures in the standard library that I never use. In some cases, it’s because I never had the right use case no my hands, in others (embarrassingly enough) because I am too lazy to read up about one that I’m not already familiar with. But lately I’ve found myself needing something to solve a problem, and I discovered that I could get some great help by just using a data structure I’d never heard of: the `std::multiset`

(or its cousin, the `std::multimap`

).

I had read about it before. On the surface, it is just like a `std::set`

, but it let you store the keys multiple times. Why on Earth would you need that, and in particular, which cases are not served as well by just the `std::set`

?

There’s something interesting, however, about this structure: It keeps its keys sorted. Consider the following situation: You want to store the nodes of a binary tree by level (root being at level 1, leaves at `level O(log(N)) + 1`

if we have `N`

nodes and a balanced tree). Then you want to print the sum of the values of the nodes at each level, sorted by (ascending) level.

You could scan the tree and use a map level-to-nodes to accomplish this. However, you know that a `std::map`

would not keep the order of the keys.

A better way to do it is by using a multimap! You’d do something like:

1 | typedef int level_t; |

# Implementing a multiset

The requirements on the basic operations of a multiset, expressed in terms of time complexity, are the following:

1 | // O(log(size)): |

We see that the insertion and search are not constant like they would if using a hash, but logarithmic. The `count`

operation reveals how many elements having the same key are in the multiset.

One way of implementing this is by using a binary search tree, where each node contains a `vector<Key>`

. We would then have something like:

1 | template <typename Key> |

## Back-end: the `node`

The strategy I used was to just implement a BST with the right data and then forwarding the operations to the node. You all know what a node/BST looks like, so I’ll just write a few words about bizarre operations I implemented. Here’s the complete interface of the node/BST structure (I decided to use the node as a recursive definition of a BST):

1 | // A node is a tree. |

`vectorize`

: this method transforms the multiset into a vector of the keys, without repeating them. It is useful for the implementation of the`iterator`

. I’ll talk about the iterators in a separate section.`nodesize`

: The number of items associated with a single`Key`

.`size`

just returns the number of stored keys.`findParent`

: I don’t store the node’s parent in my implementation. It would have simplified things, but it’s a trade-off between space and complexity, or space and time (I need to find the parent when I need it, which takes`O(log(N))`

time, so I can spare a pointer in each node, that would allow me to find the parent in`O(1)`

if I know the address of the node in question. Oh well.)

### Iterators

The iterator on a multiset should offer what you usually expect from such a class: a `begin()`

and an `end()`

method, plus some pre- and post-increment or another equivalent of a `next()`

method to go to the next node. To do it, I’ve used a `vector`

of Key pointers and an index into it. When I build an iterator from a node, I can process the sub-tree starting at that node. Since the multiset contains only Key values, my iterator needs to hide the fact that I’m using a tree in the back-end, which is why the de-referencing operator returns a Key itself, and not a node.

1 | class iterator { |

# Conclusions

A few considerations:

Needless to say, there’s a lot of room for improvement. It took me a few days to implement this basic version, and I realize now that it might not necessarily do what I wanted to do (for instance, the iterator doesn’t allow you to iterate over all the keys, like the `std::multiset`

does, but only on the non-repeating ones).

Once more, I see that implementing on paper to have a full vision of what you’re going to do is of paramount importance: The farther down the way to the full implementation you spot errors, the longer the corrections take, and the more they can impact other areas which you have lost track of in the meanwhile. With more experience, you can usually keep a bigger plan in your head.