I noticed that we use std::map
quite a bit throughout the CMake source code. No doubt this stems from before the C++11 days, but now that we require C++11, I’m wondering if at least some of these uses might be more efficient if replaced with std::unordered_map
? If we have containers of a reasonable number of values, they may be good candidates for replacement. Could be some easy wins lurking in there, if someone is interested and knows where a few hot code paths are.
As with anything to do with performance, results from measured projects are the only thing that actually matter here. Speculation can guide what to test first, but these kinds of things are sensitive to the weirdest things that are hard to predict and hand-waving just doesn’t save time over actual measurements.
That said, there are even low-hanging fruits over std::unordered_map
if we’re going down that route. Of note is that, for some reason, the standard requires a size_t n_buckets;
behavior of the hashing containers. This leads to 64-bit integer modulus operations which are vastly slower than the 32-bit counterpart:
From c - Why does using mod with int64_t operand makes this function 150% slower? - Stack Overflow
IDIV r64 Throughput 85-100 cycles per instruction IDIV r32 Throughput 20-26 cycles per instruction
A hashing container which just “limits” its number of buckets to 32-bit should get a nice perf boost. I don’t know if we can measure how “hot” each of our map instances are or not, but that might be the first thing to do here.
As for actual measuring, callgrind with KCacheGrind as a viewer has worked well for me. perf
can also find interesting things, but I find it harder to use (though usually because I get what I need from callgrind and haven’t used perf
that much).
Several historical uses of std::map
have been replaced with std::unordered_map
already. Such updates are fine with me. However, there may be some cases where we depend on the ordering guarantees of std::map
to produce deterministic results. That needs to be considered on a case-by-case basis.
The one that got my attention which led to this post was in cmGeneratorExpressionNode::GetNode()
. There’s a static std::map
object that seems like it could be a std::unordered_map
and I am curious if it is a factor in generation time (I hindsight, it probably isn’t a big enough map to lead to an appreciable difference). I don’t have any test cases at hand with which I can readily measure that though.